Being Jon Kleinberg

By - February 03, 2004

Had a good talk today with Jon Kleinberg, professor at Cornell who some credit with work that inspired PageRank, though he’s far too modest to accept that mantle. He says he’s proud that in academic citations, his work on hubs and authorities is cited alongside PageRank as seminal to the current state of web search. While talking to Kleinberg was great for the historical perspective of my book (he was at IBM Almaden in the 96/97 timeframe, near Stanford, working on very similar stuff) it was also very interesting to hear his views on where search might be going.

He agrees with the consensus view that search is in its early days. The really hard problems – natural language queries, for example, have yet to be solved. “It’s kind of interesting to see how far search has gotten without actually understanding what’s in the document,” he noted. In other words, search has gotten pretty sophisticated using keyword matching, and link/pattern analysis. But search technology still has no idea what a document actually *means* – in the human sense.

Kleinberg outlined one of his core frustrations with search engines, one I am sure all readers have experienced: the inverse search. In this scenario, you know there is a core term or phrase that, if typed into Google, would yield exactly the set of pages you’re looking for. But you don’t know the term, and your attempts to divine it continually bring up frustrating and non-relevant results. Say, for example, you want to know more about that regulation that you’ve heard about, the one that says you have the right to fly – with no additional charge – on a different airline if the one you are on cancels your flight. You want to find out the specifics of that regulation, but how?

You might Google “regulation airline overbooked” or somesuch. That takes you to a few pages that are relevant – if you’re in Europe. So maybe try it again, this time with a “-Europe” (we’re already way over the heads of normal searchers’ syntax, but never mind that). Nope – at least not in the first few pages. Maybe take out all the EC and EU references? Nope, but now you’re a little smarter on airline policies as interpreted by CATO. You get my point.

But if you knew that the regulation was in fact called the FAA Rule 240, you’d be in like Flynn. This is the “knowing the definition but not the term” problem, and it’s an area Kleinberg thinks could use some improvement. After doing that exercise, and realizing how often I in fact do run headlong into this very cul de sac, I must agree (I bet Tara has some useful hacks to get around this?).

Other areas where Kleinberg sees improvement in the next five to ten years: The addition of a time axis in search results, Local/Personalized/social networking search, “wordbursting”-based search and analytics (a la Feedster/Technorati, he has a paper on this, run through the Not Born A Total Geek filter in Scientific American).

In any case, Kleinberg had a lot to say about a lot, and I wish I could put it all down here, but…gotta save it for the book, and all that. I have to say, I got the sense that Kleinberg is really just getting started in his work. He’s been prodigious, and has a long career ahead of him.

Queries Getting Denser

Via DMNews, saw this study from OneStat (a web analytics company) on query trends. It basically said that folks are starting to use more words in their queries. Why? They’re not getting the results they want? They know more words will mean a better result? Little of both? Not much here on that piece of the story.

Slowly, The Battleship Comes About

Verizon’s Yellow Pages is finally getting into the pay-per-click game, Wonk reports via MediaPost. The massive phone co. plans to revamp its website to focus on the local advertising market. Funniest quote: the Verizon guy claiming this is not in response to Yahoo/Google handing them their ass. He does have a pretty funny slap at the leader: “If you want to find out about the history of plumbing, you go to Google. But if your sink’s backed up at 2 a.m., we get you right to what you want to know.”

I dunno, but if my plumbing is backed up at midnight, something tells me turning on the computer is not first on my mind.

This Search Blows

Blowsearch aggregates 20 different engines and claims to be “fast as the wind.” The site also has a toolbar that’s got some buzz round the search community (link via Search Engine Lowdown).

Udell on Scylla and Charybdis

Over at Infoworld (thanks Matt) Jon Udell is working out what might be a neat hack between the full text approach to search found at most search engines, and the rather utopian approach of the fully structured semantic web. It involves, among other things, converting RSS feeds into XHMTL. Not for the faint of heart, but an interesting angle in terms of grokking how useful search may evolve from the feed-o-sphere….

When Gary Price Writes…

…many folks listen. Gary is the Editor of Resourceshelf and a strong voice in cutting edge librarian/geek culture. In this piece, guest written for, Gary lists his top ten grips about Google. Many of them run along a theme which might best be summed up as failures to nurture the open, geek culture from which Google sprang.


1) Google needs to fix several advanced search problems. Many of them have been known for several months. These are things that should work….

2) Google’s page estimates haven’t been close to accurate for many months. I’ve been told that they’re, “just estimates.” However, can’t estimates be more accurate?

4) The company should clearly state that they don’t show all backlinks when running a link: search.

9) In late August IEEE announced that Google was crawling abstracts of their publication database. According to the news release, the project was to be completed by September. That was five months ago and a very small percentage of IEEE material appears in Google. What happened?

10) In 2001 Google spokesperson David Krane told News.Com, “…we’ve firmly established ourselves as the No. 1 search service on the Internet, and this can be attributed to our laser-like focus on a search-only business model.” It’s obvious that this business model is gone.

His conclusion:

The company now has many constituencies to please and will have even more once they go public. Is Google doing what AltaVista, Excite, and so many others did by trying to become all things to all people?