In This Battle, Size Does Matter: Google Responds to Yahoo Index Claims

As I posted earlier, Yahoo's claim of indexing more than 20 billion items ruffled more than a few feathers across the web, and nowhere more distinctly than at Google. I spent an hour or so on the phone with a group of Google folks, and they shared a lot…

Goog SizeAs I posted earlier, Yahoo’s claim of indexing more than 20 billion items ruffled more than a few feathers across the web, and nowhere more distinctly than at Google. I spent an hour or so on the phone with a group of Google folks, and they shared a lot of information about how they measure index size, how they deal with issues of duplicate URLs and documents, and why they are baffled by Yahoo’s claim.

I am still reporting this story, so a longer post is forthcoming, but an update at the end of the day is worth penning.

First of all, I agreed to review some of the Google information on background, agreeing not to disclose it save with permission. (I agreed to this only if I could tell you all that I did in fact agree to it). I am still digesting what Google had to say, and the information they sent me, but it did leave a distinct set of questions percolating in my mind, questions that I plan to speak to Yahoo about (Yahoo has agreed to talk as well, we just haven’t had time yet).

In any case, the lead really is this: I asked Google to go on the record with their concerns about Yahoo’s index and whether they believed the news was in fact accurate, and Google agreed. The quote, which I can only attribute at this point to a “Google spokesperson,” is as follows:

“Our scientists are not seeing the increase claimed in the Yahoo! index. The data we have doesn’t support the 19.2 (billion page) claim and we’re confused by that.”

Now, the size of an index is only one part of the equation of what makes a good search engine – relevance, speed, UI, and other factors are also critical, but when it comes to comprehensiveness (size), Google has been king pretty much since day one, save a couple of short lapses with FAST in 2002 and another in 03, as I recall, with Yahoo (briefly). The company has always trumpeted its size on its home page, and Yahoo’s announcement had to come as a slap in the face. Down to the presumptive specificity of the pronouncement on their home page since 2000 – “searching 8,168,684,336 web pages” – Google set the tone for all future “size matter” battles.

I plan a longer post on this, as I said, but there are some tantalizing examples (I will add some in the next post) that one might expect would yield significantly different results between Yahoo and Google, given Yahoo’s massive new size, but don’t. The math, in essence, seems not to be adding up. At least, that is what the Google scientists are saying. But then again, I am not a mathematician, and there are always at least two sides to the story. So stay tuned and we’ll see how this one plays out…

(I must say, this calls for a benchmark/standard for measurement that might makes all of this moot…)

21 thoughts on “In This Battle, Size Does Matter: Google Responds to Yahoo Index Claims”

  1. Nice to see Google on the run a lot PR-wise of late. Their positioning is laughable. If Google scientists don’t see 20B pages, it can’t be real. Get over yourselves Googlers, you’re not the only smart people out there. Go get’em humbler, less evil Yahoo!

  2. Why not yahoo can index 20 billion .. Considering the yahoo 360 and web 2.0 and all the flickr taging and photos and recent axuistions it seems possible.. I agree that there search has improved a lot better now.. Typically research.yahoo.com is good one..

    Onl 1 qn Why Not ??

  3. In response to Jim’s point, it’s Google that is well known for being deliberately misleading with numbers, not Yahoo! Yeah, we have 10,000+ servers, answer 150m queries per day, etc. For years, Google has deliberately mislead the community by using out of date and inconsistent numbers, hoping that since the numbers were true at some point in time, this was a non-evil dialog with the public. Where’s the integrity Google? See the 2004 MIT Technology Review article for more.

  4. I couldnt agree more with Thomas…like when they doubled their index after the MSN search move. (4Bln to 8Bln I believe)

    Although the web 2.0 people know that size doesnt matter, moms and pops think bigger = better. Yahoo has stepped up their PR monster, is reaping the benefits of the entertainment focus in CA, and is now flexing most of its muscle.

    If nothing else, I believe that the consumer (we) will end up winning in the end.

  5. I hope this doesn’t devolve into a contest like we saw with Intel and AMD. The megahertz race probably slowed the development of processors and the same could happen to search if a focus on McDonalds style stats emerges.

  6. For me the key lies within the post, Google scientists were baffled – this is the real fight between the two R&D camps. Yahoo! was seriously caught out by Google in the last round, to the point in which the pr uprooting was embarassing. Knowing folks at Yahoo this press release has a important implication – how much of the new index has come from the deep web – partnerships that Google don’t have. All in all quite a disruptive release by Tim Mayer, expect more of it.

  7. Futronic is a HK based company focus on Biometric technology and solution.
    Fingerprint authentication can be used to prevent unauthorized physical and logical access to factory, warehouse, office, laboratory, ATM machines, desktop PCs, notebook PCs and any computer network.
    And we are now offering a series of fingerprint recognition hardware and software products based on our proprietary Fingerprint Recognition Algorithm. Please click into the Products section for details. Besides the standard products, we are also ready to assist our customers to develop their own fingerprint recognition application.

  8. Futronic is a HK based company focus on Biometric technology and solution.
    Fingerprint authentication can be used to prevent unauthorized physical and logical access to factory, warehouse, office, laboratory, ATM machines, desktop PCs, notebook PCs and any computer network.
    And we are now offering a series of fingerprint recognition hardware and software products based on our proprietary Fingerprint Recognition Algorithm. Please click into the Products section for details. Besides the standard products, we are also ready to assist our customers to develop their own fingerprint recognition application.

  9. Futronic is a HK based company focus on Biometric technology and solution.
    Fingerprint authentication can be used to prevent unauthorized physical and logical access to factory, warehouse, office, laboratory, ATM machines, desktop PCs, notebook PCs and any computer network.
    And we are now offering a series of fingerprint recognition hardware and software products based on our proprietary Fingerprint Recognition Algorithm. Please click into the Products section for details. Besides the standard products, we are also ready to assist our customers to develop their own fingerprint recognition application.

  10. Futronic is a HK based company focus on Biometric technology and solution.
    Fingerprint authentication can be used to prevent unauthorized physical and logical access to factory, warehouse, office, laboratory, ATM machines, desktop PCs, notebook PCs and any computer network.
    And we are now offering a series of fingerprint recognition hardware and software products based on our proprietary Fingerprint Recognition Algorithm. Please click into the Products section for details. Besides the standard products, we are also ready to assist our customers to develop their own fingerprint recognition application.

  11. For me the key lies within the post, Google scientists were baffled – this is the real fight between the two R&D camps. Yahoo! was seriously caught out by Google in the last round, to the point in which the pr uprooting was embarassing. Knowing folks at Yahoo this press release has a important implication – how much of the new index has come from the deep web – partnerships that Google don’t have. All in all quite a disruptive release by Tim Mayer, expect more of it.

Leave a Reply

Your email address will not be published. Required fields are marked *