Nutch Update

The recent announcement of Mozdex, which is leveraging the Nutch open source engine, reminded me to ping Doug Cutting and see how things were going with the Nutch project. He replied that while Mozdez only crawls a few million pages, it's a start, and he was pleased to see folks…

logo_nutchThe recent announcement of Mozdex, which is leveraging the Nutch open source engine, reminded me to ping Doug Cutting and see how things were going with the Nutch project. He replied that while Mozdez only crawls a few million pages, it’s a start, and he was pleased to see folks starting to use Nutch. He also pointed to ObjectsSearch, another site which uses Nutch.

But Doug said that his focus these days with Nutch is not to try to get a major, open source alternative to Google or Yahoo out there, though that remains a long term goal. Instead, he reports:

I’m opting for organic growth: get some users and developers
will follow.

In this vein, I put together a demonstration a few weeks ago for Oregon
State University. They love it. It’s at:

http://devjr.cws.oregonstate.edu:8080/en/search.html

Compare this to their Google appliance at:

http://search.oregonstate.edu/web/

The quality is pretty close, and the price a lot less. It took me about
20 steps to build that demo, I want to reduce that to just a couple, to
put it within the grasp of any campus webmaster. Then I’ll turn it over
to them to operate themselves.

I’m also contracting to build a Nutch-based search engine for the
Creative Commons, searching everything which uses one of their licenses.

Meanwhile, folks at a few universities are starting to use Nutch as a
platform for larger-scale search experiments.

Combined, these efforts should continue to push Nutch’s scalablility at
the same time as build an installed base, all without having to first
find a sugar daddy.

4 thoughts on “Nutch Update”

  1. I’m also putting together an install of nutch. I’ve got a few servers dedicated to the experiment. I’m going to go for more of the niche vertical market segment and see what happens. I’m mostly interested in tweaking algorithms for specific niche industries based on the demographics of the typical “searcher” in that segment.

    I’ve said it before, users don’t want a bunch of hoops and complicated steps to jump through in order to get high quality results. They want to punch in a url, type their keywords and hit search. It’s that simple.

    I’m just thankful that guys like Doug Cutting are working on a project that’s going to save all of us a lot of time and energy. I’ll be contributing as much as time permits to Nutch because I see it as an equalizer for the small guy.

    I’m all about Power to the People!

  2. I am writing regarding and NEW, if any, information relative to the present status of Nutch. My searches have only resulted in information ending in the 2004-3005 time frames. Any help relative to current, as in July 2006, information realtive to Nutch’s staus will be appreciated..

  3. I am writing regarding and NEW, if any, information relative to the present status of Nutch. My searches have only resulted in information ending in the 2004-2005 time frames. Any help relative to current, as in July 2006, information realtive to Nutch’s staus will be appreciated..

Leave a Reply

Your email address will not be published. Required fields are marked *