<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Danny: Screw Size</title>
	<atom:link href="http://battellemedia.com/archives/2005/08/danny_screw_size.php/feed" rel="self" type="application/rss+xml" />
	<link>http://battellemedia.com/archives/2005/08/danny_screw_size.php?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=danny_screw_size</link>
	<description>Thoughts on the intersection of search, media, technology, and more.</description>
	<lastBuildDate>Mon, 20 May 2013 00:38:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
	<item>
		<title>By: James MacAonghus</title>
		<link>http://battellemedia.com/archives/2005/08/danny_screw_size.php#comment-20363</link>
		<dc:creator>James MacAonghus</dc:creator>
		<pubDate>Tue, 16 Aug 2005 21:39:03 +0000</pubDate>
		<guid isPermaLink="false">http://battellemedia.com/archives/2005/08/danny_screw_size.php#comment-20363</guid>
		<description>&lt;p&gt;It&#039;s good to have your analytical posts back again, thank you :)&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>It&#8217;s good to have your analytical posts back again, thank you <img src='http://battellemedia.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Seth Finkelstein</title>
		<link>http://battellemedia.com/archives/2005/08/danny_screw_size.php#comment-20362</link>
		<dc:creator>Seth Finkelstein</dc:creator>
		<pubDate>Tue, 16 Aug 2005 18:32:25 +0000</pubDate>
		<guid isPermaLink="false">http://battellemedia.com/archives/2005/08/danny_screw_size.php#comment-20362</guid>
		<description>&lt;p&gt;I&#039;ve written a post pointing out&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://sethf.com/infothought/blog/archives/000899.html&quot; rel=&quot;nofollow&quot;&gt;Flaws in NCSA Yahoo/Google study&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;       I&#039;ve dug into some of the study&#039;s data, and written an initial&lt;br /&gt;
quick blog post to point out two bad flaws. The methodology used does&lt;br /&gt;
indeed have a selective bias, towards both:&lt;br /&gt;
1) search-engine spam pages, and 2) large word lists.&lt;/p&gt;

&lt;p&gt;        Briefly, by using searches for random words from a large&lt;br /&gt;
wordlist, that created a tendency to select *large* *wordlists*, and&lt;br /&gt;
also gibberish spam pages which happened to have those words (probably&lt;br /&gt;
derived from the same large wordlists). Moreover, this effect applies&lt;br /&gt;
(to some extent) to *every* *search* *sample*. In fact, many of the&lt;br /&gt;
searches could be repeatedly selecting the *same wordlist file*,&lt;br /&gt;
or similar. Since either Google had more large wordlists indexed, or&lt;br /&gt;
Yahoo eliminated many of them as useless data, this results in an&lt;br /&gt;
extremely misleading conclusion about the relative size of their databases.&lt;/p&gt;

&lt;p&gt;        In effect, the outcome is that a relatively small number of&lt;br /&gt;
dubious documents are being repeatedly sampled, rather than any sort&lt;br /&gt;
of comprehensive examination.&lt;br /&gt;
&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>I&#8217;ve written a post pointing out</p>
<p><a href="http://sethf.com/infothought/blog/archives/000899.html" rel="nofollow">Flaws in NCSA Yahoo/Google study</a></p>
<p>       I&#8217;ve dug into some of the study&#8217;s data, and written an initial<br />
quick blog post to point out two bad flaws. The methodology used does<br />
indeed have a selective bias, towards both:<br />
1) search-engine spam pages, and 2) large word lists.</p>
<p>        Briefly, by using searches for random words from a large<br />
wordlist, that created a tendency to select *large* *wordlists*, and<br />
also gibberish spam pages which happened to have those words (probably<br />
derived from the same large wordlists). Moreover, this effect applies<br />
(to some extent) to *every* *search* *sample*. In fact, many of the<br />
searches could be repeatedly selecting the *same wordlist file*,<br />
or similar. Since either Google had more large wordlists indexed, or<br />
Yahoo eliminated many of them as useless data, this results in an<br />
extremely misleading conclusion about the relative size of their databases.</p>
<p>        In effect, the outcome is that a relatively small number of<br />
dubious documents are being repeatedly sampled, rather than any sort<br />
of comprehensive examination.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
