<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: If This is A Real Google Employee, It&apos;s Fascinating</title>
	<atom:link href="http://battellemedia.com/archives/2006/06/if_this_is_a_real_google_employee_its_fascinating.php/feed" rel="self" type="application/rss+xml" />
	<link>http://battellemedia.com/archives/2006/06/if_this_is_a_real_google_employee_its_fascinating.php?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=if_this_is_a_real_google_employee_its_fascinating</link>
	<description>Thoughts on the intersection of search, media, technology, and more.</description>
	<lastBuildDate>Thu, 23 May 2013 12:55:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
	<item>
		<title>By: Miles Barr</title>
		<link>http://battellemedia.com/archives/2006/06/if_this_is_a_real_google_employee_its_fascinating.php#comment-14844</link>
		<dc:creator>Miles Barr</dc:creator>
		<pubDate>Wed, 21 Jun 2006 10:12:01 +0000</pubDate>
		<guid isPermaLink="false">http://battellemedia.com/archives/2006/06/if_this_is_a_real_google_employee_its_fascinating.php#comment-14844</guid>
		<description>&lt;p&gt;Surely Google News isn&#039;t that complicated. I haven&#039;t read any papers on it, but having looked at it, it&#039;s fairly obvious what they&#039;re doing.&lt;/p&gt;

&lt;p&gt;They crawl news sites (or just subscribe to their RSS feeds) and cluster (http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/) the stories they find, hence identifying which ones are talking about the same thing. You can take advantage of the fact that most news sites will classify each story into a section (Google probably maps each site&#039;s section breakdown to their own) so you can group the different story clusters that way too.&lt;/p&gt;

&lt;p&gt;Of course the implementation will be non-trivial, and there will be fine tuning, but the concept is pretty simple.&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Surely Google News isn&#8217;t that complicated. I haven&#8217;t read any papers on it, but having looked at it, it&#8217;s fairly obvious what they&#8217;re doing.</p>
<p>They crawl news sites (or just subscribe to their RSS feeds) and cluster (<a href="http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/" rel="nofollow">http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/</a>) the stories they find, hence identifying which ones are talking about the same thing. You can take advantage of the fact that most news sites will classify each story into a section (Google probably maps each site&#8217;s section breakdown to their own) so you can group the different story clusters that way too.</p>
<p>Of course the implementation will be non-trivial, and there will be fine tuning, but the concept is pretty simple.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: JG</title>
		<link>http://battellemedia.com/archives/2006/06/if_this_is_a_real_google_employee_its_fascinating.php#comment-14843</link>
		<dc:creator>JG</dc:creator>
		<pubDate>Tue, 20 Jun 2006 15:51:51 +0000</pubDate>
		<guid isPermaLink="false">http://battellemedia.com/archives/2006/06/if_this_is_a_real_google_employee_its_fascinating.php#comment-14843</guid>
		<description>&lt;p&gt;Yes, I know of published academic work that described GoogleNews-like systems, &lt;i&gt;at least&lt;/i&gt; 2-3 years before Google News.  There is probably more work that I am not aware of, that is even older.  So if Google News did come out of that 20% time, they must have spent that 20% time reading conference papers...&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Yes, I know of published academic work that described GoogleNews-like systems, <i>at least</i> 2-3 years before Google News.  There is probably more work that I am not aware of, that is even older.  So if Google News did come out of that 20% time, they must have spent that 20% time reading conference papers&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brandon Byers</title>
		<link>http://battellemedia.com/archives/2006/06/if_this_is_a_real_google_employee_its_fascinating.php#comment-14842</link>
		<dc:creator>Brandon Byers</dc:creator>
		<pubDate>Fri, 16 Jun 2006 14:32:42 +0000</pubDate>
		<guid isPermaLink="false">http://battellemedia.com/archives/2006/06/if_this_is_a_real_google_employee_its_fascinating.php#comment-14842</guid>
		<description>&lt;p&gt;&quot;... you can divide internet traffic into five approximate and unequal segments ...&quot;&lt;/p&gt;

&lt;p&gt;Philipp later took this one back, saying he&#039;d mistaken it. Very interesting all the same. A few other noteworthy bits:&lt;/p&gt;

&lt;p&gt;- his Gmail account storage limit is 1 TB. &lt;br /&gt;
- &quot;After being asked what it takes to get fired at Google, Zorba replies that abusing logins is a fireable offense.&quot; Especially if you get anywhere near the log system. Which makes sense ... the &quot;database of intentions&quot; is that log, and there&#039;s a *lot* of information there that Google likes having, doesn&#039;t want getting out, and doesn&#039;t want insiders abusing. &lt;br /&gt;
- Zorba says it was depressing to be able to use Google Maps [before it was launched], but not be able to print out routes to take them along as that might breach confidentiality. (I&#039;d agree, and frankly, I&#039;d think if it&#039;s only a few weeks away, what if a few people see it. so what?)&lt;/p&gt;

&lt;p&gt;And the anecdote about Larry and Sergey with a laptop + webcam strapped to a remote-control jeep is funny. &lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>&#8220;&#8230; you can divide internet traffic into five approximate and unequal segments &#8230;&#8221;</p>
<p>Philipp later took this one back, saying he&#8217;d mistaken it. Very interesting all the same. A few other noteworthy bits:</p>
<p>- his Gmail account storage limit is 1 TB. <br />
- &#8220;After being asked what it takes to get fired at Google, Zorba replies that abusing logins is a fireable offense.&#8221; Especially if you get anywhere near the log system. Which makes sense &#8230; the &#8220;database of intentions&#8221; is that log, and there&#8217;s a *lot* of information there that Google likes having, doesn&#8217;t want getting out, and doesn&#8217;t want insiders abusing. <br />
- Zorba says it was depressing to be able to use Google Maps [before it was launched], but not be able to print out routes to take them along as that might breach confidentiality. (I&#8217;d agree, and frankly, I&#8217;d think if it&#8217;s only a few weeks away, what if a few people see it. so what?)</p>
<p>And the anecdote about Larry and Sergey with a laptop + webcam strapped to a remote-control jeep is funny. </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mihran Shahinian</title>
		<link>http://battellemedia.com/archives/2006/06/if_this_is_a_real_google_employee_its_fascinating.php#comment-14841</link>
		<dc:creator>Mihran Shahinian</dc:creator>
		<pubDate>Fri, 16 Jun 2006 13:50:06 +0000</pubDate>
		<guid isPermaLink="false">http://battellemedia.com/archives/2006/06/if_this_is_a_real_google_employee_its_fascinating.php#comment-14841</guid>
		<description>&lt;p&gt;I am not entirely sure that his claim that Google News came out 20% think tank is accurate. Alltheweb had the News Search implementation prior to Google News. Google News was a response to alltheweb news.(if webmasterworld.com keeps archives search for alltheweb oneups google).&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>I am not entirely sure that his claim that Google News came out 20% think tank is accurate. Alltheweb had the News Search implementation prior to Google News. Google News was a response to alltheweb news.(if webmasterworld.com keeps archives search for alltheweb oneups google).</p>
]]></content:encoded>
	</item>
</channel>
</rss>
