#5: Nutch Presages a New Kind of Search Engine

Open source search – in an age where innovation is increasingly silo'd in large companies, this is a good idea whose time has come. I didn't like the "watch out" angle, but…it gets a reader's interest. THE MESSAGE Watch Out, Google Nutch could rewrite the rules of search development –…

Open source search – in an age where innovation is increasingly silo’d in large companies, this is a good idea whose time has come. I didn’t like the “watch out” angle, but…it gets a reader’s interest.

THE MESSAGE
Watch Out, Google
Nutch could rewrite the rules of search development — especially with an impressive roster of Internet luminaries now lining up behind it.

By John Battelle, August 08, 2003 (Web Special)

Ask anyone in Silicon Valley what the hottest application on the Internet is today and you can bet their answer will be search. The dealmaking has been nothing short of torrid. Only a year ago there were at least half a dozen major players. Now there are just three: Yahoo (YHOO), which last month bought search giant Overture (OVER) in a $1.6 billion deal; Google, the undisputed king of search; and Microsoft (MSFT), which is busy building a search platform of its own. They’re all fighting to dominate the huge and ballooning market, already worth $2 billion and expected to generate between $6 billion and $8 billion in revenues by 2007.

Search is a game of intellectual property, innovation, and market position. The three combatants all keep jealous watch over their patents (Yahoo, for one, has more than 60), engineering talent (hundreds of Ph.D. holders work at Google), and market advantages (Microsoft — need we say more?). Indeed, search is such a complicated and expensive undertaking that analysts have pegged the cost of market entry at well over $100 million.

All that could change this fall, when a new player strides onto the field.
(more via link below)

]]>< ![CDATA[

Meet Nutch, the open-source search engine. Open-source applications are unusual in that the code upon which the software runs is not owned by a private, commercial company but rather bound by a simple license that allows anyone to use, modify, and even profit from it free of charge, as long as they pledge to contribute their own innovations back into the code base. Because of this, anyone will be able to access Nutch’s code and use it to their own ends, without paying licensing fees or hewing to a particular company’s set of rules. Perhaps more important, Google takes a “trust us” approach to search; they say they don’t skew their PageRank formula to favor certain sites, but we have no way of knowing for sure. With Nutch, the indexing and page-ranking technologies are all open and visible; you can check them yourself if you have a problem with your page’s ranking. Just as Linux has taken on Windows, revolutionizing the rules of search-engine development and distribution, Nutch could pose an enormous threat to Google and other search giants.

“Search is interesting again,” says Doug Cutting, a founder and core project manager at Nutch. Cutting, whose development chops were honed at Xerox (XRX) PARC, Excite and Apple (AAPL), is building Nutch (that’s his toddler’s all-purpose word for “meal”) with a small team of engineers based around the country. But Cutting says they hope that once Nutch is loosed on the world, tinkerers from Romania to China to Palo Alto will help build it into a robust platform, in the spirit of Linux or Apache (which has garnered more than 60 percent of the Web-server software market in just the last couple of years). “Search is the first thing people use on the Web now, and there are fewer and fewer alternatives,” Cutting says. With Nutch, “researchers, university folks, and anyone else can have a test bed to make search better. There are a lot of smart people out there that Google can’t hire.”

Mitch Kapor, who helped found Lotus Development and the Electronic Frontier Foundation and is founder and president of the Open Source Applications Foundation, certainly agrees. He’s thrown his weight behind the project by joining Nutch’s nonprofit board, as has Tim O’Reilly, the CEO of O’Reilly & Associates. Brewster Kahle, the visionary behind the Internet Archive, has also lended his support. Nutch is moving its servers to Kahle’s high-bandwidth location this weekend, a crucial step toward readying the engine for its public debut.

“I love Google,” Kapor says, “but this will push search to places that are not immediately obvious. In terms of research and innovation, there is a clear need for an open platform for search.” Kapor and others imagine new kinds of applications springing from Nutch, ideas that commercially driven companies like Yahoo or Microsoft would never fund. “Search is close to a duopoly,” Kapor points out. “Historically we know there are risks when that happens. It’s too important an application to not be transparent.”

Cutting won’t commit to a specific launch date for the engine, but he said he expects it to go live at Nutch.org sometime early this fall. Due to the move to Kahle’s facility and insufficient hardware (Cutting is looking for additional sponsors), Nutch’s demo — based on an initial crawl of more than 100 million webpages — is not yet open to the public. But Cutting, who together with his development partners has built an impressive resume in the search field, is confident his latest creation will be a contender once it launches. “It’s fun to go toe-to-toe with market leaders,” he says. “It’s always a challenge to build a better mousetrap.”

John Battelle (john@battellemedia.com) is a visiting professor at the UC Berkeley Graduate School of Journalism, where he directs the business reporting program. He was the founder of the Industry Standard and a co-founding editor of Wired.

Find this article at http://www.business2.com/b2/subscribers/articles/0,17863,515961,00.html

©2003 Business 2.0 Media Inc. All rights reserved.
Reproduction in whole or in part without permission is prohibited.

Leave a Reply

Your email address will not be published. Required fields are marked *