free html hit counter - John Battelle's Search Blog

By - January 29, 2004

(via IP) I am constantly amazed by the business models made possible through the Internet. is an anti-plagiarisim site – the student’s work is submitted to the site, the site then makes a “digital fingerprint” of the work and compares it to thousands of others. It feels eerie and somehow wrong, and a student at McGill University agreed. CNN reports he won a university review case regarding the use of the system.

Is this a search-related story? I think so. has to search the web for papers, apply their algorithm, then compare submitted papers against their database. Newly submitted papers add even more to the database. Given the site has contracts with 3000 universities, this database must be massive. Trusting’s algorithms to determine what is “unique” without some level of review and transparency strikes me as insane. In my own classes, i’d never force my students to use such a system. But then, perhaps it would have helped the NYT in the Blair case….

Related Posts Plugin for WordPress, Blogger...

6 thoughts on “

  1. I believe that Turnitin returns the submitted document, with sections highlighed that appear to come from other documents. The highlighted sections can then be manually compared to those other documents, by the professor or whomever.

    Btw, I just looked at your robots.txt to see if you were blocking Turnitin, and I see you’ve got no robots.txt file. What gives?

  2. Antonio says:

    It could be used with books and films too. Perhaps some teachers are thinking about use it wiht their student

  3. I’ve been blocking for quite a while.

  4. [re: why no robots.txt…]Well, I don’t have a robot.txt file because there’s nothing I want to keep robots and spiders away from, and as your classic non-geek blogging newbie, i didn’t read the manual when I started this last year (not that there is one), so I didn’t think I had to do it in the first place. I’m not exactly facile with this stuff…and don’t necessarily want to be…but I don’t want to be a moron either…so educate me on what I’m missing by not having one?

  5. From what others seem to be saying, the whole point of having a robots.txt file (if you’re not disallowing anything) is to avoid lots of 404s in your logs.

    Doing some Webmaster World queries I’m finding lots of vague answers, such as and .

    Since you’re running MT, you may want to disallow crawling of /cgi-bin/mt/mt-comments.cgi and /cgi-bin/mt/mt-tb.cgi , since those are just comments and trackbacks, which are already included in the individual entry archives. It’d make search engines lives a bit easier without the duplicates.