Site icon John Battelle's Search Blog

Nutch Update

The recent announcement of Mozdex, which is leveraging the Nutch open source engine, reminded me to ping Doug Cutting and see how things were going with the Nutch project. He replied that while Mozdez only crawls a few million pages, it’s a start, and he was pleased to see folks starting to use Nutch. He also pointed to ObjectsSearch, another site which uses Nutch.

But Doug said that his focus these days with Nutch is not to try to get a major, open source alternative to Google or Yahoo out there, though that remains a long term goal. Instead, he reports:

I’m opting for organic growth: get some users and developers
will follow.

In this vein, I put together a demonstration a few weeks ago for Oregon
State University. They love it. It’s at:

http://devjr.cws.oregonstate.edu:8080/en/search.html

Compare this to their Google appliance at:

http://search.oregonstate.edu/web/

The quality is pretty close, and the price a lot less. It took me about
20 steps to build that demo, I want to reduce that to just a couple, to
put it within the grasp of any campus webmaster. Then I’ll turn it over
to them to operate themselves.

I’m also contracting to build a Nutch-based search engine for the
Creative Commons, searching everything which uses one of their licenses.

Meanwhile, folks at a few universities are starting to use Nutch as a
platform for larger-scale search experiments.

Combined, these efforts should continue to push Nutch’s scalablility at
the same time as build an installed base, all without having to first
find a sugar daddy.

Exit mobile version