free html hit counter CenSEARCHip | John Battelle's Search Blog

CenSEARCHip

By - March 16, 2006

Censearchip

A neat project at the Indiana School of Infomatics compares Google US and China for the same search. Fascinating to see the results build on the fly. The site uses a variation of a tag map from the results, as opposed to just the results themselves. From the about page:

When you click the “Web Search” button, each side of the display will first show you an estimate of how many English-language results the search engine has for that national version. Our system will then begin downloading the top few pages that are unique to that country’s results. As the pages are downloaded, you’ll see a set of words of varying size in each half of the display.

We get those words by breaking the pages up into individual terms, throwing out some common noise words (“and”, “the”, etc.), and tallying up the results. We then find the 50 words that have the highest relative frequency of use on each side and draw them in a font size proportional to their frequency. For example, if you see that the word violin is very large on the Chinese side of the display, that means that the pages unique to the Chinese search results use the word violin much more often than the pages unique to the United States search results.



You can also see image searches. Unfortunately it does not support inline URL searches so I can’t link to specific searches, but try Dali Lama, or Falun Gong, or Tiananmen (image results shown below).

Tiananmen

(Thanks, Brent)


Related Posts Plugin for WordPress, Blogger...

3 thoughts on “CenSEARCHip

  1. Interesting tool. I played with it and decided to use it for Semantic Research; forget about comparing US results to China – the way this tool works I’m not sure it would help me to see censorship in Chinese search results.

    For Semantic Research it could be very useful as it’s taking Google’s choice of the top 10 urls for a term, getting rid of all the extranous info and leaving you with the most commonly used words for those top 10 results ….. and the largest words would be those you would want to use on your pages for the most Semantic meaning.

    I posted an example in my blog, http://www.webmetricsguru.com

  2. soreng says:

    Great effort. It no doubt helps that the Dalai Lama’s brother is a retired Indiana University professor who lives in Bloomington. Still, a great way to see the difference. I would guess that MSFT and Yahoo show the same results, so I doubt it is just Google, but for a company that claims not to do evil, it is walking a fine line. Would love to see them explain their philosophy to the Dalai Lama — who knows, he may actually understand, but I doubt it with these kind of results.

  3. NM says:

    Dalai* Lama-?