Comments on: WebFountain, the Long Version

By: Dominic

Dominic — Tue, 24 Jul 2007 17:48:05 +0000

Just to wrap up this comment stream, I believe that PB was referring to the use of “bisque” in describing the first firing of pottery, probably alluding to the fact that this type of pot has fairly full form and function in mind, however, without a glaze surface (which, since around 1900 has been applied in a separate and subsequent “high firing”) it is missing much of the visual detail, color, and water-proofness that comes from the final glazing and firing.

On the other hand, I think this technology seems pretty well baked, and loved the discussion that John gave to it. It does seem like the kind of thing that will never see economies of scale — indeed, an increase of web records leads to an exponential increase in processing power needed to index them. As time goes by, even the most mundane word accumulates more semantic meaning. Bisque, above, is a perfect example. It may originally have meant simply a soup, but humanity is rife with analogy, and soon that same word begins to have validity among sports enthusiasts, and potters as well. A system like like IBM’s would have to re-generate it’s database every time something new came up like this, and since the number of the nodes(websites) is changing, and the meaning of the content of those nodes is becoming more multifaceted, there really is an exponential scaling going on here.

Contrast that with user-contributed methods of classification like Flickr and Del.icio.us, and you see how at least one of those multipliers is reduced by the sheer magnitude off continuous processors on the system. I think any system that really works semantically will need to leverage the processing power of people in order to derive the meaning, rather than a computer.

By: Doctor

Doctor — Thu, 05 Jan 2006 17:56:35 +0000

Thanks, John.

By: Mike

Mike — Thu, 13 May 2004 05:37:36 +0000

Interesting note on Teoma and Ask Jeeves, Jim.
What’s the URL again?

By: John Battelle

John Battelle — Wed, 24 Mar 2004 20:38:23 +0000

I’ve heard a lot since posting this piece and will be responding once I have a clearer picture…

By: Rick Palmeri

Rick Palmeri — Tue, 23 Mar 2004 20:38:29 +0000

IBM PR response is interesting. WebFountain is basically embedding tags in to the code of its specialized crawlers. The specialized crawler looking for geographic information decides how to tag a location, the same is true for people, organizations or whatever. This approach removes the need for everyone to agree on a tag set – IBM will do it for us 🙂

I am also not so sure about WebFountain getting Semagix up and running. I heard a different story – Semagix had to bring their technology to WebFountain projects because WebFountain did not deliver.

Could you check to see what is the real scoop?

Thanks,

By: Jim Lanzone

Jim Lanzone — Thu, 11 Mar 2004 03:38:57 +0000

The article above fails to mention that our Teoma search engine was the first, and remains the only, technology to solve the problem of determining hubs and authorities, now called “subject-specific popularity”. Indeed, the quality of Teoma, which has now scaled to over 2 billion documents and computes hubs and authorities across those documents in real time (something the Clever and Google folks thought couldn’t be done), is largely to thank for the re-birth of Ask Jeeves as a top search property. WebFountain may be taking a new spin on our approach, which of course was inspired by Clever, but as Kleinberg himself pointed out recently in the Wall Street Journal, Teoma is now the technology leader in the space, thanks to its unique approach.

For more about Teoma and the history of Clever, Hits and Kleinberg, I recommend the following paper written last year by search pundit Mike Grehan: http://www.searchguild.com/topic_distillation.pdf

And if you haven’t tried Ask Jeeves in a while because you remember us from the “question and answer days”, please give us a try. You’ll be pleasantly surprised at the quality of the results and the overall experience vs. our competitors.

When’s the book due, John?

Jim

Jim Lanzone
VP, Product Management
Ask Jeeves

By: Frank Ruscica

Frank Ruscica — Wed, 10 Mar 2004 14:12:00 +0000

Thanks, John.

FWIW, the class of content I envision being scrupulously author-tagged would derive from a SocNet service providing an intuitive interface through which users maintain Atom-based blogs and link them using FOAF metadata…

This way, search/navigation can be optimized — key for selling blog ads — and users keep control of their personal information…

We’ll see…

Thanks again for the follow-up.

By: John Battelle

John Battelle — Wed, 10 Mar 2004 00:35:41 +0000

Bisque? How so?

bisque

By: pb

pb — Wed, 10 Mar 2004 00:27:43 +0000

Wow, this is a lot of bisque. Wake me up when anything’s actually produced.

By: Dave Buttler

Dave Buttler — Tue, 09 Mar 2004 19:05:21 +0000

Matthew Walker’s point is critical. Much of the deep Web is hidden behind robots.txt files that prohibit crawling. Even if this file is ignored, sites can easily identify a spider by its behavior of grabbing a large amount of content in a short period of time. Choking back the request rate means that the content cannot be retrieved in a reasonable amount of time.