Creative Commons Search, Now In Beta

Doug Cutting reminds me that his Nutch open source engine is powering a beta version of Creative Commons search. This is a great example of a domain specific search application, in this case, the engine crawls and indexes all CC licensed sites and lets you find stuff by how you…

ccDoug Cutting reminds me that his Nutch open source engine is powering a beta version of Creative Commons search. This is a great example of a domain specific search application, in this case, the engine crawls and indexes all CC licensed sites and lets you find stuff by how you might want to use it. As Doug points out, there’s no way the Creative Commons could have built an engine like this had it not been for open source. Cool….

4 thoughts on “Creative Commons Search, Now In Beta”

  1. I’m surprised that Nutch hasn’t released a “SETI” type download for additional computing power – the open source engine thats powered by the computing power of users using it. I’m sure a memeber of the open source community could develop one for them.

    Now that’s taking “personilisation” to the next level, help to power your own queries.

  2. ID:entity, that suggestion is answered in the Nutch FAQ. http://www.nutch.org/docs/en/faq.html

    Will Nutch be a distributed, P2P-based search engine?

    We don’t think it is presently possible to build a peer-to-peer search engine that is competitive with existing search engines. It would just be too slow. Returning results in less than a second is important: it lets people rapidly reformulate their queries so that they can more often find what they’re looking for. In short, a fast search engine is a better search engine. I don’t think many people would want to use a search engine that takes ten or more seconds to return results.

    That said, if someone wishes to start a sub-project of Nutch exploring distributed searching, we’d love to host it. We don’t think these techniques are likely to solve the hard problems Nutch needs to solve, but we’d be happy to be proven wrong.

    Will Nutch use a distributed crawler, like Grub?

    Distributed crawling can save download bandwidth, but, in the long run, the savings is not significant. A successful search engine requires more bandwidth to upload query result pages than its crawler needs to download pages, so making the crawler use less bandwidth does not reduce overall bandwidth requirements. The dominant expense of operating a search engine is not crawling, but searching.

  3. Mike thank you for the update, I should have been more specific with post also (e.g. the front end query servers…probably not.)

    A collegue of mine game me this little nugget, and I think that we are both coming from the latter perspective…

    Samuel Johnson (1709 – 1784), quoted in Boswell’s Life of Johnson

Leave a Reply

Your email address will not be published. Required fields are marked *