Become Talks On AIR

Announced about a month ago, Become.com is a shopping search engine that its creators claim is vastly superior to its competitors. These guys can put some wood behind that particular arrow, collectively they were responsible for MySimon (now owned by Cnet) and Wisenut (now owned by LookSmart). I spoke…

Become

Announced about a month ago, Become.com is a shopping search engine that its creators claim is vastly superior to its competitors. These guys can put some wood behind that particular arrow, collectively they were responsible for MySimon (now owned by Cnet) and Wisenut (now owned by LookSmart).

I spoke to Michael Yang, Become CEO, and Yeogirl Yun, the CTO. The founders have developed a new ranking technology – they call it the “Affiinity Ranking Index,” or AIR – which applies a unique combination of math and human editing. Before it does any math, Become puts people in the process of determining relevance for particular shopping-related search topics. A team of editors contextualize pages based on how they relate to each other, then those pages are crawled, and Become’s AIR algorithm is applied.

I can’t really grok how AIR works, but this is from a draft release on AIR: “AIR identifies exceptional web pages by understanding the level of interconnection between valuable sites from within specific fields of interest. AIR evaluates a web page based on what other “knowledgeable” sites in that specific field say about the page, and also evaluates the page based on what the page says about other “knowledgeable” sites in the specific field.”

“Unlike Become.com’s AIR, Google’s PageRank estimates the popularity of a given web page by looking only at links into the page and doing so without any understanding of context. Become.com’s AIR, on the other hand, considers a site to be valuable if 1) it receives links from valuable sites within a similar topic of interest and 2) if it provides links to other valuable sites within a similar topic of interest (while minimizing links to off-topic sites). ”

I pressed Yang and Yun for more details – PageRank is published, after all. But they were mum, save adding that their inspiration was Applied Physics and Engineering Dynamics – two fields in which I must confess I am not very keen. I chided them a bit – after all, calling your new algorithm AIR, but not publishing it might just open one up to some jokes – but they do have the right to protect trade secrets, after all.

The proof is in the use of the engine itself. It’s in a registration-based beta, so you’ll need to sign up. I used it, although cursorily, and I did like how it seems to understand the intent behind a shopping query – it’s not a product search engine, like Froogle, instead it seems to give you a lot of information that helps you in your process of buying. Yang added that a comparison feature is coming.

Yang and Yun hope to take Become and AIR across many vertical search areas – health, people, travel, etc. Given these guys backgrounds, it’s worth checking out.

10 thoughts on “Become Talks On AIR”

  1. “2) if it provides links to other valuable sites within a similar topic of interest (while minimizing links to off-topic sites)”

    I had the same idea myself.

  2. Before they take a swipe at the biggest search engine on the ‘net they should probably brief themselves on everything that’s happened since that famous paper of 1998 (that’s 7 years ago now Yang and Yun). Suggesting that Google’s ranking is strictly linear and based soley on inbould links is ridiculous. How do you suppose they were able to extrapolate the exact same search algorithm to to the Google Mini, Froogle, Google Scholar, Google Local, Google News, Gmail, Google Desktop Search, Google Groups, Google Print, Google Catalogs and anything I missed.

    Hint: They didn’t.

  3. Re the simple phrase: “PageRank is published”. This is true, but misleading, as it’s just one factor in how Google scores pages – I think I read that there approx 200 factors in a pages score -other ones I’ve heard of are that title matches score higher, along with words in bold or h1 or h2 tags.

    Great blog BTW.

    thx,
    Dave

  4. Dave, great blog I enjoy reading your insights.

    It sounds like quite a bit of calculating for each page (both intense human and computer review) how can it possibly scale with out extraordinary cost?

  5. Sounds very much like Authorities & Hubs (Jon Kleinberg ’98, Authoritative Sources in a Hyperlinked Environment) – but the editors are in the process of selecting perhaps either the Hubs or enabling the crawler to backrub from useful authorities.

    So its a hybrid, where the context for the crawler is pre-determined, then the algo calculates the score in terms of relevance.

    Well maybe, could be wrong!!

  6. I tried it looking for a print server. I got 10m responses but no way to search by price. Either I don’t understand or …

  7. > after all, calling your new algorithm AIR, but not publishing it might just open one up to some jokes – but they do have the right to protect trade secrets, after all.

    Well, it ain’t quite so, at least not all of it. Check out this page, courtesy of US PTO:

    http://appft1.uspto.gov/netacgi/nph-Parser?d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220030208482%22.PGNR.&OS=DN/20030208482&RS=DN/20030208482

    It has pretty detailed explanation of how the algorithm works. I’ve known this link for a long time, but I just didn’t bother to read through it, as I didn’t believe mathematics itself could lead to much user experience improvement. Maybe some curious people can study it and let us know how “superior” it is?

    It’s also kinda amusing to hear that these two guys were trying to hide something that’s actually already public for a long time. Oh, did I hear somebody say …”ostrich”?!

Leave a Reply

Your email address will not be published. Required fields are marked *