<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Scraping Google To See What Happens</title>
	<atom:link href="http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php/feed" rel="self" type="application/rss+xml" />
	<link>http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=scraping_google_to_see_what_happens</link>
	<description>Thoughts on the intersection of search, media, technology, and more.</description>
	<lastBuildDate>Thu, 23 May 2013 12:55:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
	<item>
		<title>By: Judy Salter</title>
		<link>http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-29574</link>
		<dc:creator>Judy Salter</dc:creator>
		<pubDate>Fri, 11 May 2012 23:16:00 +0000</pubDate>
		<guid isPermaLink="false">http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-29574</guid>
		<description>Google exists because it scraped everything and everyone without caring if they allow it or not.

Now they are large and don&#039;t want to be scraped, that just sucks imho.
But anyway, it is possible to scrape Google. http://google-rank-checker.squabbel.com is an open source project that is able to scrape millions of hits without issues.Might be worth to add to the blog.</description>
		<content:encoded><![CDATA[<p>Google exists because it scraped everything and everyone without caring if they allow it or not.</p>
<p>Now they are large and don&#8217;t want to be scraped, that just sucks imho.<br />
But anyway, it is possible to scrape Google. <a href="http://google-rank-checker.squabbel.com" rel="nofollow">http://google-rank-checker.squabbel.com</a> is an open source project that is able to scrape millions of hits without issues.Might be worth to add to the blog.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: MetaSearch</title>
		<link>http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-22071</link>
		<dc:creator>MetaSearch</dc:creator>
		<pubDate>Wed, 22 Nov 2006 20:18:29 +0000</pubDate>
		<guid isPermaLink="false">http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-22071</guid>
		<description>&lt;p&gt;How do metasearch engines survive? Don&#039;t they just scrape results?&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>How do metasearch engines survive? Don&#8217;t they just scrape results?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vitaliy</title>
		<link>http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-22070</link>
		<dc:creator>Vitaliy</dc:creator>
		<pubDate>Wed, 25 Oct 2006 15:25:46 +0000</pubDate>
		<guid isPermaLink="false">http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-22070</guid>
		<description>&lt;p&gt;I like Google and have the toolbar, but please scraping Google. I can&#039;t find spyware.&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>I like Google and have the toolbar, but please scraping Google. I can&#8217;t find spyware.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve Crcker</title>
		<link>http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-22069</link>
		<dc:creator>Steve Crcker</dc:creator>
		<pubDate>Fri, 15 Sep 2006 05:27:09 +0000</pubDate>
		<guid isPermaLink="false">http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-22069</guid>
		<description>&lt;p&gt;Perhaps in a legalistic sense Google has not &quot;stolen&quot; anything. But I understand Daniel to be saying, essentially, that Google has profited hugely from the free. volunteer and idealistic efforts of others. And has given nothing back to those whose efforts have made Google possible. Now this last point is certainly debatable. One could argue that by providing search capability, access is facilitated to many small sites which would otherwise go unnoticed. But wait, it&#039;s not that simple. The growing comercialization of the search industry, led by Google, has created a situation where noise increasingly drowns out information in search results - at least if you are a user searching for information for its own sake - and not for some buying opportunity. Commercialization of the search function virtually insures that paid results will tend to dominate and crowd out material put on the Web as a volunteer &quot;labor of love&quot; - the very kind of material which caused many of us to gravitate toward the Internet in the first place. &lt;/p&gt;

&lt;p&gt;Just a thought,&lt;br /&gt;
-Steve&lt;/p&gt;

&lt;p&gt;P.S. No relation to the Steve Crocker who helped invent the Internet. &lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Perhaps in a legalistic sense Google has not &#8220;stolen&#8221; anything. But I understand Daniel to be saying, essentially, that Google has profited hugely from the free. volunteer and idealistic efforts of others. And has given nothing back to those whose efforts have made Google possible. Now this last point is certainly debatable. One could argue that by providing search capability, access is facilitated to many small sites which would otherwise go unnoticed. But wait, it&#8217;s not that simple. The growing comercialization of the search industry, led by Google, has created a situation where noise increasingly drowns out information in search results &#8211; at least if you are a user searching for information for its own sake &#8211; and not for some buying opportunity. Commercialization of the search function virtually insures that paid results will tend to dominate and crowd out material put on the Web as a volunteer &#8220;labor of love&#8221; &#8211; the very kind of material which caused many of us to gravitate toward the Internet in the first place. </p>
<p>Just a thought,<br />
-Steve</p>
<p>P.S. No relation to the Steve Crocker who helped invent the Internet. </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: pb</title>
		<link>http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-22068</link>
		<dc:creator>pb</dc:creator>
		<pubDate>Wed, 12 Jan 2005 17:25:24 +0000</pubDate>
		<guid isPermaLink="false">http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-22068</guid>
		<description>&lt;p&gt;Looks like we&#039;ll need to go to Microsoft to scrape search results:&lt;br /&gt;
&lt;a href=&quot;http://blogs.msdn.com/msnsearch/archive/2005/01/11/351064.aspx&quot; rel=&quot;nofollow&quot;&gt;http://blogs.msdn.com/msnsearch/archive/2005/01/11/351064.aspx&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Google used to offer this simple, RESTful API method but ditched it for its current, cumbersome SOAP-based APIs.&lt;br /&gt;
&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Looks like we&#8217;ll need to go to Microsoft to scrape search results:<br />
<a href="http://blogs.msdn.com/msnsearch/archive/2005/01/11/351064.aspx" rel="nofollow">http://blogs.msdn.com/msnsearch/archive/2005/01/11/351064.aspx</a></p>
<p>Google used to offer this simple, RESTful API method but ditched it for its current, cumbersome SOAP-based APIs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Miles Barr</title>
		<link>http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-22067</link>
		<dc:creator>Miles Barr</dc:creator>
		<pubDate>Wed, 12 Jan 2005 11:10:29 +0000</pubDate>
		<guid isPermaLink="false">http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-22067</guid>
		<description>&lt;p&gt;In the UK in addition to regular copyright law we also have the Database Act:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://www.legislation.hmso.gov.uk/si/si1997/1973032.htm&quot; rel=&quot;nofollow&quot;&gt;http://www.legislation.hmso.gov.uk/si/si1997/1973032.htm&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This allows (among other things) someone to compile of a database of public domain works and have IP rights on that database. This would make what Scroogle does illegal in the UK. Is there something similar in the US?&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>In the UK in addition to regular copyright law we also have the Database Act:</p>
<p><a href="http://www.legislation.hmso.gov.uk/si/si1997/1973032.htm" rel="nofollow">http://www.legislation.hmso.gov.uk/si/si1997/1973032.htm</a></p>
<p>This allows (among other things) someone to compile of a database of public domain works and have IP rights on that database. This would make what Scroogle does illegal in the UK. Is there something similar in the US?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Adam</title>
		<link>http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-22066</link>
		<dc:creator>Adam</dc:creator>
		<pubDate>Wed, 12 Jan 2005 04:54:10 +0000</pubDate>
		<guid isPermaLink="false">http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-22066</guid>
		<description>&lt;p&gt;Mr. Brandt is an idiot, and a rude idiot at that.&lt;/p&gt;

&lt;p&gt;1) Don&#039;t want your sites on the Web spidered by Google?  Put up a simple robots.txt file.  Voila.&lt;/p&gt;

&lt;p&gt;2) Don&#039;t like Google?  Don&#039;t use it.  Mr. Brandt&#039;s effort to use Google&#039;s bandwidth, R&amp;D, and processing power while stripping their ads is akin to the selfishness of BugMeNot&#039;s childishly gleeful facilitation of helping people access registration-required news sites while violating the terms of those sites.  Same thing:  Don&#039;t want to register at the New York Times?  Don&#039;t read their articles.&lt;/p&gt;

&lt;p&gt;3) There are many things I don&#039;t like about Google.  Their blog is typically fluffy and uninformative, their hiring practices are tiresome and inefficient, and they&#039;ve done a surprisingly poor job of integrating their various services.  With that said, though, Google&#039;s done a LOT of stuff right... and most of it quite unselfishly and without being evil.  If only the same could be said of Andrew O at The Register (another annoying twit) and Daniel Brandt.&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Mr. Brandt is an idiot, and a rude idiot at that.</p>
<p>1) Don&#8217;t want your sites on the Web spidered by Google?  Put up a simple robots.txt file.  Voila.</p>
<p>2) Don&#8217;t like Google?  Don&#8217;t use it.  Mr. Brandt&#8217;s effort to use Google&#8217;s bandwidth, R&#038;D, and processing power while stripping their ads is akin to the selfishness of BugMeNot&#8217;s childishly gleeful facilitation of helping people access registration-required news sites while violating the terms of those sites.  Same thing:  Don&#8217;t want to register at the New York Times?  Don&#8217;t read their articles.</p>
<p>3) There are many things I don&#8217;t like about Google.  Their blog is typically fluffy and uninformative, their hiring practices are tiresome and inefficient, and they&#8217;ve done a surprisingly poor job of integrating their various services.  With that said, though, Google&#8217;s done a LOT of stuff right&#8230; and most of it quite unselfishly and without being evil.  If only the same could be said of Andrew O at The Register (another annoying twit) and Daniel Brandt.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: brian</title>
		<link>http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-22065</link>
		<dc:creator>brian</dc:creator>
		<pubDate>Wed, 12 Jan 2005 00:10:23 +0000</pubDate>
		<guid isPermaLink="false">http://battellemedia.com/archives/2005/01/scraping_google_to_see_what_happens.php#comment-22065</guid>
		<description>&lt;p&gt;But, but...well Google and other search engines by the strictest definition of copyright law (an affirmative right) could be construed to have cached web pages and images of others without the explict permission of the owners.&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>But, but&#8230;well Google and other search engines by the strictest definition of copyright law (an affirmative right) could be construed to have cached web pages and images of others without the explict permission of the owners.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
