Had a good talk today with Jon Kleinberg, professor at Cornell who some credit with work that inspired PageRank, though he’s far too modest to accept that mantle. He says he’s proud that in academic citations, his work on hubs and authorities is cited alongside PageRank as seminal to the current state of web search. While talking to Kleinberg was great for the historical perspective of my book (he was at IBM Almaden in the 96/97 timeframe, near Stanford, working on very similar stuff) it was also very interesting to hear his views on where search might be going.
He agrees with the consensus view that search is in its early days. The really hard problems – natural language queries, for example, have yet to be solved. “It’s kind of interesting to see how far search has gotten without actually understanding what’s in the document,” he noted. In other words, search has gotten pretty sophisticated using keyword matching, and link/pattern analysis. But search technology still has no idea what a document actually *means* – in the human sense.
Kleinberg outlined one of his core frustrations with search engines, one I am sure all readers have experienced: the inverse search. In this scenario, you know there is a core term or phrase that, if typed into Google, would yield exactly the set of pages you’re looking for. But you don’t know the term, and your attempts to divine it continually bring up frustrating and non-relevant results. Say, for example, you want to know more about that regulation that you’ve heard about, the one that says you have the right to fly – with no additional charge – on a different airline if the one you are on cancels your flight. You want to find out the specifics of that regulation, but how?
You might Google “regulation airline overbooked” or somesuch. That takes you to a few pages that are relevant – if you’re in Europe. So maybe try it again, this time with a “-Europe” (we’re already way over the heads of normal searchers’ syntax, but never mind that). Nope – at least not in the first few pages. Maybe take out all the EC and EU references? Nope, but now you’re a little smarter on airline policies as interpreted by CATO. You get my point.
But if you knew that the regulation was in fact called the FAA Rule 240, you’d be in like Flynn. This is the “knowing the definition but not the term” problem, and it’s an area Kleinberg thinks could use some improvement. After doing that exercise, and realizing how often I in fact do run headlong into this very cul de sac, I must agree (I bet Tara has some useful hacks to get around this?).
Other areas where Kleinberg sees improvement in the next five to ten years: The addition of a time axis in search results, Local/Personalized/social networking search, “wordbursting”-based search and analytics (a la Feedster/Technorati, he has a paper on this, run through the Not Born A Total Geek filter in Scientific American).
In any case, Kleinberg had a lot to say about a lot, and I wish I could put it all down here, but…gotta save it for the book, and all that. I have to say, I got the sense that Kleinberg is really just getting started in his work. He’s been prodigious, and has a long career ahead of him.