As many have already noted, last week at Web 2.0 Peter Norvig, Google director of search quality, demonstrated word clustering, “named entities,” and machine translation technology to the audience. The translation software was impressive, but somehow lacked zing – “good enough” translation doesn’t seem like much of a revelation anymore. That in itself is an extraordinary achievement – Norvig showed translations from Arabic and Chinese – both significantly distinct languages compared to English. Google already has translation features built into its engine (from a third party), but this hand-rolled stuff was far more powerful, it seemed to me.
In any case, the demos that really got the audience going (and me, to be honest) was the named entities and the clustering technology. Seeing anything behind the veil of Google’s real research and development is of course a revelation, but seeing something that was so clearly ready for prime time felt rather close to a declaration of where Google is heading, in particular given the recent moves in the personalization and clustering space from Amazon, Ask, Vivisimo, and Yahoo.
“Named entity extraction” is a relatively new project called which Norvig said Google had been working on for about six months. As Norvig explained the concept – essentially identifying semantically important concepts and the meaning wrapped around them – I couldn’t help but think of WebFountain and my wish (near the end of the post) that Google would add a bit of IBM’s semantic peanut butter into its PageRank chocolate.
Norvig also showed an entertaining (and live) demo of clustering, which he claimed was the “largest bayesian database of clusters” extant. Hmmm.
From the eWeek story covering the news:
For example, Norvig said, researchers are looking for ways to break down sentences by looking for a phrase like “such as” and grabbing the names that follow it. The goal is to not only pull out the name but also its clusters, so that a name such as “Java” can be associated both with the computer language and with language in general, Norvig said.
“We want to be able to search and find these [entities] and the relationships between them, rather than you typing in the words specifically,” Norvig said.
This has potentially interesting implications in next-generation ranking methodologies, for one, but combined with clustering, it signals that Google is serious about taking what one might call the UI plunge.
What do I mean by that? Well, of all the major engines, only Google has strictly maintained what might be called the C prompt interface to search: put in yer command, get out yer list of results (Google Local is a departure, but it’s still in beta). Yahoo, Ask, A9 and others have begun to twiddle in pretty significant ways with evolved interfaces which – by employing your search history, your personal data, clustering, and other tricks – deliver more filtered and intentional results (though it is still arguable if they are more relevant). I sense it’s only a matter of time before Google takes this approach as well, and Norvig’s demo certainly points that way. After all, it’s not that often Google decides to give us a glimpse behind the curtain, and coupled with Google Board member John Doerr’s semi-announcement the day before (he told the audience that Google would become “the Google that knows you”) I think the UI plunge might come sooner than we all expect.
Update: Lazy linking on my part, the clustering paper is about hardwaree (though it is really interesting…)