I have read most of the paper, which has been accepted at WWW 2010 (it reminded me of all the search papers I read in preparation for writing The Search), and found a lot worthy of interest.
First, the paper’s authors, both of whom have worked at Google, clearly have a sense of potential history here, in that they not only crib Google’s original paper’s title, they also mirror the first line (substituting “Aardvark” for “Google”, of course). Now that’s some b*lls. Of course, when Larry and Sergey first presented Google, they couldn’t even get their paper accepted (it took three tries, if I recall correctly. Someone should write a book about that…).
Second, it’s unusual for a Valley startup to lay out its architecture and technological specs as willingly as Aardvark has. There’s a lot of math in here that I couldn’t parse even if I had the will to try.
Third, we learn some cool things about how Aardvark works. Check this quote out: “…unlike quality scores like PageRank [13], Aardvark’s quality score aims to measure intimacy rather than authority. And unlike the relevance scores in corpus-based search
engines, Aardvark’s relevance score aims to measure a user’s potential to answer a query, rather than a document’s existing capability to answer a query.”
Also interesting: ” this involves modeling a user as a content- generator, with probabilities indicating the likelihood she will likely respond to questions about given topics. Each topic in a user profile has an associated score, depending upon the confidence appropriate to the source of the topic. In addition, Aardvark learns over time which topics not to send a user questions about…”
There’s a lot more like this in the paper, it’s worth reading. The authors even did a test of Aardvark results against Google, with the results being something of a push (see the last page for details). Not bad for an upstart service.
Lastly, we learn a lot about the service, thanks to a number of charts, including something about Aardvark’s growth, which I had not really anticipated. It’s up and to the right, as you can see from the chart.