Sony TV is Bigfooting Jason Kottke for his coverage of Jeapordy and perennial winner Kenn Jennings.
Jason, we’re with you on this one. Come on, Sony, wake up to the fact that Jason made Jeapordy bigger than it would have been, because of his coverage. You should be encouraging folks like Jason. I’m going to lob a few emails into folks at Sony I know, and I encourage any readers with contacts there to do the same.
That’s the best line from a story posted today on ZDNet UK. It’s spoken by Urs Hölzle, Google Fellow, who is currently on a tour of Europe recruiting engineers. ZD “snuck into” one of his talks to potential recruits and has an extensive overview of what he said. The piece includes metrics on Google’s infrastructure, but to my eye they seem understated (ie it mentions a 4 billion document index, when Google now claims 8 billion, and 30 clusters of up to 2000 computers, when I’ve got sources saying it’s more than twice that). In any case, it’s very interesting reading.
It is one of the largest computing projects on the planet, arguably employing more computers than any other single, fully managed system (we’re not counting distributed computing projects here), some 200 computer science PhDs, and 600 other computer scientists….
Google replicates servers, sets of servers and entire data centres, added Hölzle, and has not had a complete system failure since February 2000. Back then it had a single data centre, and the main switch failed, shutting the search engine down for an hour. Today the company mirrors everything across multiple independent data centres, and the fault tolerance works across sites, “so if we lose a data centre we can continue elsewhere — and it happens more often than you would think. Stuff happens and you have to deal with it.”
A new data centre can be up and running in under three days. “Our data centre now is like an iMac,” said Schulz.” You have two cables, power and data. All you need is a truck to bring the servers in and the whole burning in, operating system install and configuration is automated.”…
If the index size doubles, then the embarrassingly parallel nature of the problem means that Google could double the number of machines and get the same response time so it can grow linearly with traffic. “In reality (from a business point of view) we would like to grow less than linear to keep costs down,” said Hölzle, “but luckily the hardware keeps getting cheaper.”
So every year as the Web gets bigger and requires more hardware to index, search and return Web pages, hardware gets cheaper so it “more or less evens out” to use Hölzle’s words. …
Google wrote its own spell checker, and maintains that nobody know as many spelling errors as it does. The amount of computing power available at the company means it can afford to begin teaching the system which words are related – for instance “Imperial”, “College” and “London”. It’s a job that many CPU years, and which would not have been possible without these thousands of machines. “When you have tons of data and tons of computation you can make things work that don’t work on smaller systems,” said Hölzle. One goal of the company now is to develop a better conceptual understanding of text, to get from the text string to a concept…..
Even three years ago, he said, the Web had much more of a grass roots feeling to it. “We have thought of having a button saying ‘give me less commercial results’,” but the company has shied away from implementing this yet.
Thanks for the tip, KK.
Will update this post later, been a bit swamped, but Google relaunched Groups today, with an emphasis on letting folks create their own groups. (Recall this was Usenet for a long time). While the company didn’t say it was a driver, Groups will drive more registrations and more content into the core of Google’s services. The interface is similar to Gmail. Some info from Google is in the extended entry.
Update: Google Blog posted and then retracted a Google Groups announcement, but the cats over at Slashdot caught it. Slashdot appears to be tearing Google a new one for not supporting search by date and deep linking, among other things. I will check into the deep linking thing, if they don’t support that, I am sure it’s an oversight. Not supporting deep linking into content that you want as part of the Index is insane. Thanks to reader Brian for the tip.
More on GGroups and all that:
]]> Read More
From the WSJ’s Online Journal:
Google News is a great site…(but) If you want to know what the top stories are, you’re better off going to a news site that has an actual human editor (at this point we’d be remiss if we didn’t plug The Wall Street Journal Online), but some of the stuff that makes its way through Google’s algorithms can be a source of high hilarity.
Example: A left-wing site called Axis of Logic published a satirical (though unfunny) article yesterday titled “Canadians Authorities Arrest U.S. President Bush on War Charges,” and it ended up as Google’s top story. Seriously.
Movies rang up a healthy increase in 2003, from $60 billion to $64 billion, and music revenues stayed even. I imagine that fact will be used by both sides in the piracy debate, but my sense is this: if it weren’t for Napster (the old Napster, that is), those revenues would have been far lower, as no one would have sampled the tail, and begun to buy down it.
(Thanks for the tip, Gary)
From SEW. First, the Google lawyers are busy, suing a competitor of their recent acquisition Keyhole. Second, Gary does an appreciation of Eugene Garfield, father of citation analysis, whose spirit was most definitely in the room last night as I spoke with Larry and Sergey (for the book.) Larry was quite energized by the portion of our conversation that dealt with annotation and citation analysis. (After all, what is the web but one great big annotation engine, right?!)
Lastly, noting recent reports on the Chinese Google News controversy, Danny furthers an issue dear to my heart, transparency with regard to how results are obtained (my take on the China portion of this issue is here).
From The Standard (I still love being able to say that, even if the site is only running IDG newsservice stuff):
America Online Inc. (AOL) on Tuesday released a preview version of a new Netscape Web browser that is based on the open-source Firefox Web browser, but also supports Microsoft Corp.’s Internet Explorer (IE) browser engine. IE is part of Windows and is used by the great majority of Web users. Many Web sites have been designed specifically to work with the Microsoft browser and may not work correctly in browsers using other engines, including the Gecko engine in Firefox.
Developing: According to MediaPost. Am trying to get confirmation and details. In any case, Overture always had a stronger case than Google, according to folks I’ve spoken with, as they protected trademarks more robustly and did not adopt the blanket “any keyword can be sold” policy that Google did back in the Spring.