UnAngelo blog

…se io fossi un angelo
Subscribe

Correlation

April 27, 2011 By: doghouse Category: Uncategorized

Correlation

Every time Will goes to the kitchen, Raf expects to receive a sandwich; however, statistically, he’s more likely to receive a punch in the face.

(Props to Caleb for the slogan.) (So good that apparently we used it twice.)

Digg
del.icio.us
Facebook
Google Bookmarks
StumbleUpon
Reddit

Job Interview II

March 30, 2011 By: doghouse Category: Uncategorized

Job Interview II

Thanks for the slogan Andy :)

Digg
del.icio.us
Facebook
Google Bookmarks
StumbleUpon
Reddit

Sounds About Right

March 25, 2011 By: doghouse Category: Uncategorized

Sounds About Right

Coincidentally, this is also Will’s weekly Pizza order.

Digg
del.icio.us
Facebook
Google Bookmarks
StumbleUpon
Reddit

Machinarium on Mac App Store

March 21, 2011 By: Jakub Dvorsky Category: Uncategorized

Now you can buy Machinarium also on Mac App Store. It’s currently on sale only for $9.99 until the end of March.

Modern Analogies

February 18, 2011 By: doghouse Category: Uncategorized

Modern Analogies

This is gonna be on the test.

Digg
del.icio.us
Facebook
Google Bookmarks
StumbleUpon
Reddit

What’s The Point?

February 11, 2011 By: doghouse Category: Uncategorized

What's The Point?

Oh no, but things are different this time. I’m gonna put away anything and everything as soon as it is out of place…

Digg
del.icio.us
Facebook
Google Bookmarks
StumbleUpon
Reddit

Tech Envy

February 09, 2011 By: doghouse Category: Uncategorized

Tech Envy

This is the truly true truth about Facebook.

Digg
del.icio.us
Facebook
Google Bookmarks
StumbleUpon
Reddit

Trochee Chart

February 04, 2011 By: xkcd Category: 1, google, quotes

Here’s something I made as I drew today’s comic.  It’s a chart of Google results for “X Y” (in quotes) where X and Y are words from the first panel of the strip.  The first word is on the top, the second down the side (the opposite of the intuitive way, of course).

"Doctor Doctor" and "Jesus Jesus" are highest. The highest non-repeating combo is "Pirate Captain", followed by "Robot Monkey" and "Penguin Zombie".

I generated this using a Google API variable search tool developed by Eviltwin on #xkcd (I’m not linking to the tool so as to avoid potentially getting his API key revoked) Edit: He now offers the source and says it can be run without a key, and is happy to let people use it until Google does something. Not only is the API helpful in making these kinds of charts (which I spend more time doing than I care to admit), it also gives a roughly accurate count of results—in contrast to the Google search page.

The “number of results” count that Google gives when you search is clearly fabricated.  This is clear for a few reasons.  When Google says this:

Excellent!  That's a lot!

You can tell that it’s wrong first by scrolling to the end of the results.  When you get to page 32, it suddenly becomes:

I learned in AP Calculus that 316 is WAY less than 190,000.

This doesn’t usually matter, since nobody looks much past the first few pages of results, but it’s annoying if you’re trying to use the number of results as a measure of something.  When I was making the Numbers comic, I didn’t use the API, and there were a few graphs I had to throw out, crop, or put on an unnecessary log scale; otherwise, Google’s clumsy number-fudging made the graphs look nonsensical.  I can’t find a good example now (perhaps they’ve smoothed it out a bit) but when searching for things like “I was born in <X>”, the results for successive years would look something like this:

… 150 : 200 : 250 : 300 : 350 : 117,000 : 450 : 251,000 : 500 : 550 : 312,000 : 320,000 : 390,000 : 425,000 …

If you scrolled to the last page for each, you’d find that the smaller counts were roughly accurate, but the counts in the hundreds of thousands had no more actual results than their neighbors.

I suppose it’s remotely possible that these numbers are correct, there are no years with an in-between number of hits, and for some reason they’re just not showing you most of the promised pages when you try to flip through them.  But making this even less likely is the fact that the search API (which is apparently being deprecated and replaced right now) doesn’t return these bad numbers—it gives reasonable-looking results which seem to be roughly consistent with the number you come up with by navigating to the last search page.

So it really looks like there’s a certain threshold of result volume beyond which Google apparently says “screw it” and throws out a gigantic number.  I imagine this is probably due to incompetence rather than intentional deception; I’m sure it’s hard to generate pages quickly from many sources, and maybe for searches with a lot of results they don’t have time to get it all synced up.  So they fudge the numbers.  The fact that this makes it look like they have way more results than they do is presumably just an unintended bonus.

All in all, this isn’t a big deal and I don’t think there’s anything particularly evil about it. It does make it hard to use Google hits as an accurate gauge of anything, but I suppose if you’re trying to study something by seriously analyzing Google result counts, you have bigger methodological problems to worry about.

Edit: As Mankoff observes, it looks like the API sometimes *underestimates* the number of results, too.  For example, it still reports 0 results for “narwhal zombie”, when a regular search shows quite a few. Now, I notice, scrolling through them, that most either have some minor character/text in between the two words, or are related to the comic I just posted.  But at least one seems to date back to last year.

Messing With Customer Service

February 02, 2011 By: doghouse Category: Uncategorized

Messing With Customer Service

I only do this when they make you go through a hundred dial options and then wait fifteen minutes.

Digg
del.icio.us
Facebook
Google Bookmarks
StumbleUpon
Reddit

The Truth About Facebook

January 28, 2011 By: doghouse Category: Uncategorized

The Truth About Facebook

And do I really need to know in an unsolicited fashion what my friends are saying to each other?

Digg
del.icio.us
Facebook
Google Bookmarks
StumbleUpon
Reddit