The Anatomy of a Large-Scale Social Search Engine

The folks at Aardvark have posted an ambitious paper over on the 'vark blog. Titled after Brin and Page's original “Anatomy of a Large-Scale Hypertextual Web Search Engine”, the paper presents the Aardvark engine and, in its authors' words: "describes the fundamental differences between the traditional “Library” paradigm of web…

Screen shot 2010-02-02 at 6.02.56 PM.pngThe folks at Aardvark have posted an ambitious paper over on the ‘vark blog. Titled after Brin and Page’s original “Anatomy of a Large-Scale Hypertextual Web Search Engine”, the paper presents the Aardvark engine and, in its authors’ words: “describes the fundamental differences between the traditional “Library” paradigm of web search — in which answers are found in existing online content — and the new “Village” paradigm of social search — in which answers arise in conversation with the people in your network.”

I have read most of the paper, which has been accepted at WWW 2010 (it reminded me of all the search papers I read in preparation for writing The Search), and found a lot worthy of interest.

First, the paper’s authors, both of whom have worked at Google, clearly have a sense of potential history here, in that they not only crib Google’s original paper’s title, they also mirror the first line (substituting “Aardvark” for “Google”, of course). Now that’s some b*lls. Of course, when Larry and Sergey first presented Google, they couldn’t even get their paper accepted (it took three tries, if I recall correctly. Someone should write a book about that…).

Second, it’s unusual for a Valley startup to lay out its architecture and technological specs as willingly as Aardvark has. There’s a lot of math in here that I couldn’t parse even if I had the will to try.

Third, we learn some cool things about how Aardvark works. Check this quote out: “…unlike quality scores like PageRank [13], Aardvark’s quality score aims to measure intimacy rather than authority. And unlike the relevance scores in corpus-based search

Screen shot 2010-02-02 at 5.57.33 PM.png

engines, Aardvark’s relevance score aims to measure a user’s potential to answer a query, rather than a document’s existing capability to answer a query.”

Also interesting: ” this involves modeling a user as a content- generator, with probabilities indicating the likelihood she will likely respond to questions about given topics. Each topic in a user profile has an associated score, depending upon the confidence appropriate to the source of the topic. In addition, Aardvark learns over time which topics not to send a user questions about…”

There’s a lot more like this in the paper, it’s worth reading. The authors even did a test of Aardvark results against Google, with the results being something of a push (see the last page for details). Not bad for an upstart service.

Lastly, we learn a lot about the service, thanks to a number of charts, including something about Aardvark’s growth, which I had not really anticipated. It’s up and to the right, as you can see from the chart.

7 thoughts on “The Anatomy of a Large-Scale Social Search Engine”

  1. I think that one of he issues with current search engines that Aardvark did not point out is the proliferation of unstructured data into the most of these systems. Until there is a format for structured input we will live in a tech world were google owns the game.

    If content creation had a standard structured format search would become an open part of internet stack that cold be easily expanded and improved.

  2. My group at SRI’s Artificial Intelligence Center published a paper at the 2007 KDD: http://www.ai.sri.com/pubs/files/1523.pdf

    In particular, Basu (then at SRI, now at Google Research) and Banerjee (U Minn) developed a model called the Social Query Model that is a true formal extension of Page Rank to a network that includes social nodes.

    FAQ and other routing systems were developed by the SRI group (under DARPA funding) based on this work (SQM and the iLink routing model) that are actively used in various military settings.

  3. This is an interesting post and I am also keen to read the original article. While I am not willing to reveal everything that happens inside TipTop, I think each of you should also spend some time understanding http://FeelTipTop.com to see better what Search will be like a year or two from now.

  4. I used to work with one of the co-authors (Damon Horowitz), and he is a scary-bright guy indeed. But I don’t think he’s ever worked at Google. What makes you think he has?

    Mahlen

Leave a Reply to Jeffrey Davitz Cancel reply

Your email address will not be published. Required fields are marked *