free html hit counter Creative Commons Search, Now In Beta - John Battelle's Search Blog

Creative Commons Search, Now In Beta

By - September 02, 2004

ccDoug Cutting reminds me that his Nutch open source engine is powering a beta version of Creative Commons search. This is a great example of a domain specific search application, in this case, the engine crawls and indexes all CC licensed sites and lets you find stuff by how you might want to use it. As Doug points out, there’s no way the Creative Commons could have built an engine like this had it not been for open source. Cool….

Related Posts Plugin for WordPress, Blogger...

4 thoughts on “Creative Commons Search, Now In Beta

  1. ID:entity says:

    I’m surprised that Nutch hasn’t released a “SETI” type download for additional computing power – the open source engine thats powered by the computing power of users using it. I’m sure a memeber of the open source community could develop one for them.

    Now that’s taking “personilisation” to the next level, help to power your own queries.

  2. ID:entity, that suggestion is answered in the Nutch FAQ.

    Will Nutch be a distributed, P2P-based search engine?

    We don’t think it is presently possible to build a peer-to-peer search engine that is competitive with existing search engines. It would just be too slow. Returning results in less than a second is important: it lets people rapidly reformulate their queries so that they can more often find what they’re looking for. In short, a fast search engine is a better search engine. I don’t think many people would want to use a search engine that takes ten or more seconds to return results.

    That said, if someone wishes to start a sub-project of Nutch exploring distributed searching, we’d love to host it. We don’t think these techniques are likely to solve the hard problems Nutch needs to solve, but we’d be happy to be proven wrong.

    Will Nutch use a distributed crawler, like Grub?

    Distributed crawling can save download bandwidth, but, in the long run, the savings is not significant. A successful search engine requires more bandwidth to upload query result pages than its crawler needs to download pages, so making the crawler use less bandwidth does not reduce overall bandwidth requirements. The dominant expense of operating a search engine is not crawling, but searching.

  3. ID:entity says:

    Mike thank you for the update, I should have been more specific with post also (e.g. the front end query servers…probably not.)

    A collegue of mine game me this little nugget, and I think that we are both coming from the latter perspective…

    Samuel Johnson (1709 – 1784), quoted in Boswell’s Life of Johnson

  4. Pat McDonald says:

    We have made a Creative Commons toolbar for Firefox.
    Links to key areas of Creative Commons, searches all the google engines and Creative Commons.

    Works up to PR1.0