<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>John Battelle&#039;s Search Blog &#187; The Search Papers</title>
	<atom:link href="http://battellemedia.com/archives/category/the-search-papers/feed" rel="self" type="application/rss+xml" />
	<link>http://battellemedia.com</link>
	<description>Thoughts on the intersection of search, media, technology, and more.</description>
	<lastBuildDate>Mon, 20 May 2013 05:14:26 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
		<item>
		<title>The Anatomy of a Large-Scale Social Search Engine</title>
		<link>http://battellemedia.com/archives/2010/02/the_anatomy_of_a_large-scale_social_search_engine.php?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the_anatomy_of_a_large-scale_social_search_engine</link>
		<comments>http://battellemedia.com/archives/2010/02/the_anatomy_of_a_large-scale_social_search_engine.php#comments</comments>
		<pubDate>Wed, 03 Feb 2010 02:06:02 +0000</pubDate>
		<dc:creator>jbat</dc:creator>
				<category><![CDATA[Future of Search]]></category>
		<category><![CDATA[Of Note in Search Biz]]></category>
		<category><![CDATA[The Search Papers]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://battellemedia.com/archives/2010/02/the_anatomy_of_a_large-scale_social_search_engine.php</guid>
		<description><![CDATA[<p><p>The post <a href="http://battellemedia.com/archives/2010/02/the_anatomy_of_a_large-scale_social_search_engine.php">The Anatomy of a Large-Scale Social Search Engine</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>The folks at Aardvark have posted an ambitious paper over on the &apos;vark blog. Titled after Brin and Page&apos;s original “Anatomy of a Large-Scale Hypertextual Web Search Engine”, the paper presents the Aardvark engine and, in its authors&apos; words: &#34;describes the fundamental differences between the traditional “Library” paradigm of web...</p></p><p>The post <a href="http://battellemedia.com/archives/2010/02/the_anatomy_of_a_large-scale_social_search_engine.php">The Anatomy of a Large-Scale Social Search Engine</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></description>
			<content:encoded><![CDATA[<p>The post <a href="http://battellemedia.com/archives/2010/02/the_anatomy_of_a_large-scale_social_search_engine.php">The Anatomy of a Large-Scale Social Search Engine</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p><img src="http://battellemedia.com/media/images/Screen shot 2010-02-02 at 6.02.56 PM.png" width="364" height="156" alt="Screen shot 2010-02-02 at 6.02.56 PM.png" style="float:left; margin-top:5px; margin-right:5px; margin-bottom:5px;" />The folks at Aardvark have posted an <a href="http://blog.vark.com/?p=352">ambitious paper over on the &#8216;vark blog</a>. Titled after Brin and Page&#8217;s original <a href="http://infolab.stanford.edu/~backrub/google.html">“Anatomy of a Large-Scale Hypertextual Web Search Engine”</a>, the paper presents the Aardvark engine and, in its authors&#8217; words: &#8220;describes the fundamental differences between the traditional “Library” paradigm of web search — in which answers are found in existing online content — and the new “Village” paradigm of social search — in which answers arise in conversation with the people in your network.&#8221;</p>
<p>I have read most of the paper, which has been accepted at <a href="http://www2010.org/www/">WWW 2010</a> (it reminded me of all the search papers I read in preparation for writing The Search), and found a lot worthy of interest.</p>
<p>First, the paper&#8217;s authors, both of whom have worked at Google, clearly have a sense of potential history here, in that they not only crib Google&#8217;s original paper&#8217;s title, they also mirror the first line (substituting &#8220;Aardvark&#8221; for &#8220;Google&#8221;, of course). Now that&#8217;s some b*lls. Of course, when Larry and Sergey first presented Google, they couldn&#8217;t even get their paper accepted (it took three tries, if I recall correctly. Someone should write a book about that&#8230;).</p>
<p>Second, it&#8217;s unusual for a Valley startup to lay out its architecture and technological specs as willingly as Aardvark has. There&#8217;s a lot of math in here that I couldn&#8217;t parse even if I had the will to try.</p>
<p>Third, we learn some cool things about how Aardvark works. Check this quote out: &#8220;&#8230;unlike quality scores like PageRank [13], Aardvark’s quality score aims to measure intimacy rather than authority. And unlike the relevance scores in corpus-based search</p>
<p><img src="http://battellemedia.com/media/images/Screen shot 2010-02-02 at 5.57.33 PM.png" width="304" height="297" alt="Screen shot 2010-02-02 at 5.57.33 PM.png" style="float:left;" /></p>
<p>engines, Aardvark’s relevance score aims to measure a user’s <i>potential</i> to answer a query, rather than a document’s existing capability to answer a query.&#8221;</p>
<p>Also interesting: &#8221; this involves modeling a user as a content- generator, with probabilities indicating the likelihood she will likely respond to questions about given topics. Each topic in a user profile has an associated score, depending upon the confidence appropriate to the source of the topic. In addition, Aardvark learns over time which topics not to send a user questions about&#8230;&#8221;</p>
<p>There&#8217;s a lot more like this in the paper, it&#8217;s worth reading. The authors even did a test of Aardvark results against Google, with the results being something of a push (see the last page for details). Not bad for an upstart service.</p>
<p>Lastly, we learn a lot about the service, thanks to a number of charts, including something about Aardvark&#8217;s growth, which I had not really anticipated. It&#8217;s up and to the right, as you can see from the chart.</p>
<p>The post <a href="http://battellemedia.com/archives/2010/02/the_anatomy_of_a_large-scale_social_search_engine.php">The Anatomy of a Large-Scale Social Search Engine</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://battellemedia.com/archives/2010/02/the_anatomy_of_a_large-scale_social_search_engine.php/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Of Note: Semantic Search Expert Dr. Rudi Studer</title>
		<link>http://battellemedia.com/archives/2008/12/of_note_semantic_search_expert_dr_rudi_studer.php?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=of_note_semantic_search_expert_dr_rudi_studer</link>
		<comments>http://battellemedia.com/archives/2008/12/of_note_semantic_search_expert_dr_rudi_studer.php#comments</comments>
		<pubDate>Mon, 29 Dec 2008 17:40:55 +0000</pubDate>
		<dc:creator>jbat</dc:creator>
				<category><![CDATA[Of Note in Search Biz]]></category>
		<category><![CDATA[The Search Papers]]></category>

		<guid isPermaLink="false">http://battellemedia.com/archives/2008/12/of_note_semantic_search_expert_dr_rudi_studer.php</guid>
		<description><![CDATA[<p><p>The post <a href="http://battellemedia.com/archives/2008/12/of_note_semantic_search_expert_dr_rudi_studer.php">Of Note: Semantic Search Expert Dr. Rudi Studer</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>From the Yahoo Search blog. Worth a read if you&apos;re into this stuff. I think we&apos;re going to see some breakthroughs in this area thanks to new services like Twitter and others adding a layer of real time data. So far, semantic technologies have been used in commercial products...</p></p><p>The post <a href="http://battellemedia.com/archives/2008/12/of_note_semantic_search_expert_dr_rudi_studer.php">Of Note: Semantic Search Expert Dr. Rudi Studer</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></description>
			<content:encoded><![CDATA[<p>The post <a href="http://battellemedia.com/archives/2008/12/of_note_semantic_search_expert_dr_rudi_studer.php">Of Note: Semantic Search Expert Dr. Rudi Studer</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>
<a href="http://ysearchblog.com/2008/12/16/an-interview-with-dr-rudi-studer-on-semantic-search-technologies/">From the Yahoo Search blog.</a> Worth a read if you&#8217;re into this stuff. I think we&#8217;re going to see some breakthroughs in this area thanks to new services like Twitter and others adding a layer of real time data.
</p>
<p>
<em>So far, semantic technologies have been used in commercial products for data integration, enterprise semantic search and content management, etc. I expect this area to grow, but prospectively I see more and more potential for business opportunities in the combination of the social web and semantic technologies as well as in the context of mashups. An area that is also still largely unexplored is the area of advertisements in the context of semantic search.</em></p>
<p>The post <a href="http://battellemedia.com/archives/2008/12/of_note_semantic_search_expert_dr_rudi_studer.php">Of Note: Semantic Search Expert Dr. Rudi Studer</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://battellemedia.com/archives/2008/12/of_note_semantic_search_expert_dr_rudi_studer.php/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Yes, But Now That He&apos;s At Microsoft, Can He Keep Giving It Away For Free?</title>
		<link>http://battellemedia.com/archives/2008/10/yes_but_now_that_hes_at_microsoft_can_he_keep_giving_it_away_for_free.php?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=yes_but_now_that_hes_at_microsoft_can_he_keep_giving_it_away_for_free</link>
		<comments>http://battellemedia.com/archives/2008/10/yes_but_now_that_hes_at_microsoft_can_he_keep_giving_it_away_for_free.php#comments</comments>
		<pubDate>Mon, 27 Oct 2008 04:54:57 +0000</pubDate>
		<dc:creator>jbat</dc:creator>
				<category><![CDATA[Random, But Interesting]]></category>
		<category><![CDATA[The Search Papers]]></category>

		<guid isPermaLink="false">http://battellemedia.com/archives/2008/10/yes_but_now_that_hes_at_microsoft_can_he_keep_giving_it_away_for_free.php</guid>
		<description><![CDATA[<p><p>The post <a href="http://battellemedia.com/archives/2008/10/yes_but_now_that_hes_at_microsoft_can_he_keep_giving_it_away_for_free.php">Yes, But Now That He&apos;s At Microsoft, Can He Keep Giving It Away For Free?</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>Great piece in the Times on a fellow who made his name hacking the wii remote and talking about it on YouTube. Now he&apos;s at Microsoft, after being wooed by nearly everyone. Contrast this with what might have followed from other options Mr. Lee considered for communicating his ideas....</p></p><p>The post <a href="http://battellemedia.com/archives/2008/10/yes_but_now_that_hes_at_microsoft_can_he_keep_giving_it_away_for_free.php">Yes, But Now That He&apos;s At Microsoft, Can He Keep Giving It Away For Free?</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></description>
			<content:encoded><![CDATA[<p>The post <a href="http://battellemedia.com/archives/2008/10/yes_but_now_that_hes_at_microsoft_can_he_keep_giving_it_away_for_free.php">Yes, But Now That He&apos;s At Microsoft, Can He Keep Giving It Away For Free?</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>
<a href="http://battellemedia.com/images/wiiremote.jpg" onclick="window.open('http://battellemedia.com/images/wiiremote.jpg','popup','width=405+20,height=242+20,scrollbars=no,resizable=yes,toolbar=no,directories=no,location=no,menubar=no,status=yes,left=0,top=0');return false"><img src="http://battellemedia.com/media/images/images/wiiremote-tm.jpg" height="83" width="138" align="left" border="1" hspace="6" vspace="4" alt="Wiiremote" title="" longdesc="" /></a><br />
<br /><a href="http://www.nytimes.com/2008/10/26/business/26proto.html?_r=1&amp;oref=slogin">Great piece in the Times</a> on a fellow who made his name hacking the wii remote and talking about it on YouTube. Now he&#8217;s at Microsoft, after being wooed by nearly everyone.
</p>
<p>
<em>Contrast this with what might have followed from other options Mr. Lee considered for communicating his ideas. He might have published a paper that only a few dozen specialists would have read. A talk at a conference would have brought a slightly larger audience. In either case, it would have taken months for his ideas to reach others.<br />
</em></p>
<p>
Small wonder, then, that he maintains that posting to YouTube has been an essential part of his success as an inventor. “Sharing an idea the right way is just as important as doing the work itself,” he says. “If you create something but nobody knows, it’s as if it never happened.”</p>
<p>But it made me wonder if he&#8217;s going to be happy there. A very long time ago, I read a ton of search papers (as part of prep for the book) and noticed they were all pretty old, and that once academics got hired by Google or competitors to Google, they sort of stopped innovating out loud.
</p>
<p>
Just a thought.</p>
<p>The post <a href="http://battellemedia.com/archives/2008/10/yes_but_now_that_hes_at_microsoft_can_he_keep_giving_it_away_for_free.php">Yes, But Now That He&apos;s At Microsoft, Can He Keep Giving It Away For Free?</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://battellemedia.com/archives/2008/10/yes_but_now_that_hes_at_microsoft_can_he_keep_giving_it_away_for_free.php/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Search Paper Fun: Most Cited</title>
		<link>http://battellemedia.com/archives/2004/12/search_paper_fun_most_cited.php?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=search_paper_fun_most_cited</link>
		<comments>http://battellemedia.com/archives/2004/12/search_paper_fun_most_cited.php#comments</comments>
		<pubDate>Fri, 17 Dec 2004 01:29:17 +0000</pubDate>
		<dc:creator>jbat</dc:creator>
				<category><![CDATA[Random, But Interesting]]></category>
		<category><![CDATA[The Search Papers]]></category>

		<guid isPermaLink="false">http://battellemedia.com/archives/2004/12/search_paper_fun_most_cited.php</guid>
		<description><![CDATA[<p><p>The post <a href="http://battellemedia.com/archives/2004/12/search_paper_fun_most_cited.php">Search Paper Fun: Most Cited</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>I sent a query to Lee Giles, the guru at Penn State behind CiteSeer (with Steve Lawrence, who is now at Google) asking him which search-related papers are the most cited. I was struck by the near parity between Page and Brin&apos;s original paper on Google and Jon Kleinberg&apos;s...</p></p><p>The post <a href="http://battellemedia.com/archives/2004/12/search_paper_fun_most_cited.php">Search Paper Fun: Most Cited</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></description>
			<content:encoded><![CDATA[<p>The post <a href="http://battellemedia.com/archives/2004/12/search_paper_fun_most_cited.php">Search Paper Fun: Most Cited</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>
<a href="http://battellemedia.com/images/scholar_logo.gif" onclick="window.open('http://battellemedia.com/images/scholar_logo.gif','popup','width=276+20,height=110+20,scrollbars=no,resizable=yes,toolbar=no,directories=no,location=no,menubar=no,status=yes,left=0,top=0');return false"><img src="http://battellemedia.com/media/images/images/scholar_logo-tm.jpg" height="66" width="166" align="left" border="1" hspace="6" vspace="4" alt="Scholar Logo" title="" longdesc="" /></a>I sent a query to <a href="http://clgiles.ist.psu.edu/">Lee Giles</a>, the guru at Penn State behind <a href="http://www.neci.nj.nec.com/homepages/lawrence/citeseer.html">CiteSeer</a> (with <a href="http://www.neci.nec.com/~lawrence/bio.html">Steve Lawrence</a>, who is now at Google) asking him which search-related papers are the most cited. I was struck by the near parity between Page and Brin&#8217;s original paper on Google and <a href="http://battellemedia.com/archives/000304.php">Jon Kleinberg&#8217;s</a> paper on Hubs and Authorities. Giles did a bit of fiddling with Google Scholar and responded:
</p>
<p>
For web related work these are well cited in the Google Scholar using the query &#8220;web&#8221;:
</p>
<p>
&#160;PDF] <a href="http://scholar.google.com/url?q=http://www-personal.si.umich.edu/~rfrost/courses/SI110/readings/In_Out_and_Beyond/Semantic_Web.pdf">The Semantic Web</a><br />
<br /> T Berners-Lee, J Hendler, O Lassila &#8211; View as HTML &#8211; Cited by 1347<br />
<br /> &#8230; May 17, 2001. The Semantic Web. A new form of Web content that is meaningful to<br />
<br /> computers will unleash a revolution of new possibilities. &#8230; Web: A Research Agenda. &#8230;<br />
<br /> Scientific American, 2001 &#8211; www-personal.si.umich.edu
</p>
<p>
&#160;[PDF] <a href="http://scholar.google.com/url?q=http://kulturinformatik.uni-lueneburg.de/veranst/zeitpfeil/material_suchmaschinen/anatomy.pdf">The anatomy of a large-scale hypertextual Web search engine</a><br />
<br />S Brin, L Page &#8211; View as HTML &#8211; Cited by 1087<br />
<br /> Abstract In this paper, we present Google, a prototype of a large-scale search<br />
<br /> engine which makes heavy use of the structure present in hypertext. Google &#8230;<br />
<br /> Computer Networks and ISDN Systems, 1998 &#8211; kulturinformatik.uni-lueneburg.de &#8211; firstrate.co.nz &#8211; net.cs.pku.edu.cn &#8211; scalab.uc3m.es &#8211; all 69 versions&#160; &#160;
</p>
<p>
However, this one can&#8217;t be ignored:
</p>
<p>
&#160;[PDF] <a href="http://scholar.google.com/url?q=http://portal.acm.org/ft_gateway.cfm%253Fid%253D324140%2526type%253Dpdf%2526dl%253DGUIDE%2526dl%253DACM%2526CFID%253D11111111%2526CFTOKEN%253D2222222">Authoritative sources in a hyperlinked environment</a><br />
<br /> J Kleinberg&#8230; &#8211; Cited by 1059<br />
<br /> Abstract. The network structure of a hyperlinked environment can be a rich<br />
<br /> source of information about the content of the environment, provided we &#8230;<br />
<br /> Journal of the ACM, 1999 &#8211; portal.acm.org &#8211; nan.dhs.org &#8211; cs.cmu.edu &#8211; mathe.tu-freiberg.de &#8211; all 73 versions
</p>
<p>
&#160;This book is the first to discuss the web in any detail:
</p>
<p>
&#160;[PS] <a href="http://scholar.google.com/url?q=http://www.dcc.ufmg.br/irbook/print/chap10.ps.gz">Modern Information Retrieval</a><br />
<br /> R Baeza-Yates, B Ribeiro-Neto, R Baeza-Yates &#8211; View as HTML &#8211; Cited by 1198<br />
<br /> Page 1. Modern Information Retrieval. Ricardo Baeza-Yates. Berthier Ribeiro-Neto.<br />
<br /> ACM Press New York. &#8230; 1.1.2 Information Retrieval at the Center of the Stage . . &#8230;<br />
<br /> Addision Wesley, 1999 &#8211; dcc.ufmg.br &#8211; sunsite.dcc.uchile.cl &#8211; sims.berkeley.edu &#8211; portal.acm.org &#8211; all 7 versions &#187;
</p>
<p>
All worthy reads!</p>
<p>The post <a href="http://battellemedia.com/archives/2004/12/search_paper_fun_most_cited.php">Search Paper Fun: Most Cited</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://battellemedia.com/archives/2004/12/search_paper_fun_most_cited.php/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Google Scholar Launches: A Hint of Things to Come?</title>
		<link>http://battellemedia.com/archives/2004/11/google_scholar_launches_a_hint_of_things_to_come.php?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=google_scholar_launches_a_hint_of_things_to_come</link>
		<comments>http://battellemedia.com/archives/2004/11/google_scholar_launches_a_hint_of_things_to_come.php#comments</comments>
		<pubDate>Thu, 18 Nov 2004 13:24:04 +0000</pubDate>
		<dc:creator>jbat</dc:creator>
				<category><![CDATA[Of Note in Search Biz]]></category>
		<category><![CDATA[The Search Papers]]></category>

		<guid isPermaLink="false">http://battellemedia.com/archives/2004/11/google_scholar_launches_a_hint_of_things_to_come.php</guid>
		<description><![CDATA[<p><p>The post <a href="http://battellemedia.com/archives/2004/11/google_scholar_launches_a_hint_of_things_to_come.php">Google Scholar Launches: A Hint of Things to Come?</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>Google has, for some time, had a few verticalized, niche search solutions hidden in their Advanced Search areas, notably their &#34;topic specific&#34; search around Linux, the Mac, govt sites, and the like. Today the company launched another, more ambitious vertical search tool called Google Scholar. According to folks I spoke...</p></p><p>The post <a href="http://battellemedia.com/archives/2004/11/google_scholar_launches_a_hint_of_things_to_come.php">Google Scholar Launches: A Hint of Things to Come?</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></description>
			<content:encoded><![CDATA[<p>The post <a href="http://battellemedia.com/archives/2004/11/google_scholar_launches_a_hint_of_things_to_come.php">Google Scholar Launches: A Hint of Things to Come?</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p><a href="http://battellemedia.com/images/scholar_logo.gif" onclick="window.open('http://battellemedia.com/images/scholar_logo.gif','popup','width=276,height=110,scrollbars=yes,resizable=yes,toolbar=no,directories=no,location=no,menubar=no,status=yes,left=0,top=0');return false"><img src="http://battellemedia.com/media/images/images/scholar_logo-tm.jpg" height="100" width="250" align="left" hspace="6" alt="scholar_logo" /></a>Google has, for some time, had a few verticalized, niche search solutions hidden in their Advanced Search areas, notably their &#8220;topic specific&#8221; search around Linux, the Mac, govt sites, and the like. Today the company launched another, more ambitious vertical search tool called <a href="http://scholar.google.com/">Google Scholar</a>. According to folks I spoke to last night at Google, the service was done by one engineer in his &#8220;20% time.&#8221; Anurag Acharya, the engineer behind the service, tuned Google&#8217;s crawler for academic papers and worked with universities to make those papers available to others on the web. </p>
<p>The services has the tagline &#8220;Stand on the shoulders of giants.&#8221; It includes a cross referenced citation link for each paper, which is very cool, and as we all know, the basis of PageRank (and the WWW) in the first place. Here&#8217;s a search for <a href="http://scholar.google.com/scholar?hl=en&#38;lr=&#38;q=search+vertical+%22OR%22+domain+specific+search&#38;btnG=Search">vertical or domain specific search</a>, for example. </p>
<p>This move marks a trend toward making usually invisible (and useful) information more accessible, one that I could imagine spreads to other domains, perhaps ones more commercial in nature. (Scholar does not have ads in it, at least for now). The special ranking algorithm and policies for dealing with the nature of a structured document universe such as this clearly scales to other opportunities &#8211; ie, travel, automotive, business information and the like. </p>
<p>Here&#8217;s <a href="http://www.resourceshelf.com/2004/11/wow-its-google-scholar.html">Resourceshelf&#8217;s</a> take on this, and <a href="http://searchenginewatch.com/searchday/article.php/3437471">SEW&#8217;s</a>.</p>
<p><a href="http://news.com.com/Google+launches+search+for+scholars/2100-1024_3-5457493.html">Cnet coverage</a>.</p>
<p>The post <a href="http://battellemedia.com/archives/2004/11/google_scholar_launches_a_hint_of_things_to_come.php">Google Scholar Launches: A Hint of Things to Come?</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://battellemedia.com/archives/2004/11/google_scholar_launches_a_hint_of_things_to_come.php/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Upcoming WWW Conference: Loads O Search</title>
		<link>http://battellemedia.com/archives/2004/03/upcoming_www_conference_loads_o_search.php?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=upcoming_www_conference_loads_o_search</link>
		<comments>http://battellemedia.com/archives/2004/03/upcoming_www_conference_loads_o_search.php#comments</comments>
		<pubDate>Thu, 25 Mar 2004 17:00:22 +0000</pubDate>
		<dc:creator>jbat</dc:creator>
				<category><![CDATA[The Search Papers]]></category>

		<guid isPermaLink="false">http://battellemedia.com/archives/2004/03/upcoming_www_conference_loads_o_search.php</guid>
		<description><![CDATA[<p><p>The post <a href="http://battellemedia.com/archives/2004/03/upcoming_www_conference_loads_o_search.php">Upcoming WWW Conference: Loads O Search</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>Resourceshelf has culled the upcoming WWW conference for selected references to search. There&apos;s also a whole track on the Semantic Web. The complete list is a Who&apos;s Who of search stars and a telling map of who&apos;s doing interesting research in the area. Included: Intel, University of Washington, IBM, Yahoo...</p></p><p>The post <a href="http://battellemedia.com/archives/2004/03/upcoming_www_conference_loads_o_search.php">Upcoming WWW Conference: Loads O Search</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></description>
			<content:encoded><![CDATA[<p>The post <a href="http://battellemedia.com/archives/2004/03/upcoming_www_conference_loads_o_search.php">Upcoming WWW Conference: Loads O Search</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p><a href="http://battellemedia.com/images/13th-int.jpg" onclick="window.open('http://battellemedia.com/images/13th-int.jpg','popup','width=246,height=113,scrollbars=yes,resizable=yes,toolbar=no,directories=no,location=no,menubar=no,status=yes,left=0,top=0');return false"><img border="0"  src="http://battellemedia.com/media/images/images/13th-int-tm.jpg" height="50" width="108" hspace="6" align="left" alt="13th-int" /></a>Resourceshelf has culled the upcoming WWW conference for selected <a href="http://www.resourceshelf.com/archives/2004_03_21_resourceshelfextra_archive.html/#108015170507422803">references to search</a>. There&#8217;s also a whole track on the Semantic Web. </p>
<p>The <a href="http://www.resourceshelf.com/archives/2004_03_21_resourceshelfextra_archive.html/#108015170507422803">complete list</a> is a Who&#8217;s Who of search stars and a telling map of who&#8217;s doing interesting research in the area. Included: Intel, University of Washington, IBM, Yahoo (Understanding User Goals in Search), National University of Singapore, MIT, Microsoft. A9&#8242;s Udi Manber (who I did meet with, but can&#8217;t go into our talk quite yet) is giving a keynote. </p>
<p>OK, I think I have to go to this. </p>
<p>The post <a href="http://battellemedia.com/archives/2004/03/upcoming_www_conference_loads_o_search.php">Upcoming WWW Conference: Loads O Search</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://battellemedia.com/archives/2004/03/upcoming_www_conference_loads_o_search.php/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Search Papers: Do Web Search Engines Suppress Controversy?</title>
		<link>http://battellemedia.com/archives/2004/01/the_search_papers_do_web_search_engines_suppress_controversy.php?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the_search_papers_do_web_search_engines_suppress_controversy</link>
		<comments>http://battellemedia.com/archives/2004/01/the_search_papers_do_web_search_engines_suppress_controversy.php#comments</comments>
		<pubDate>Sun, 11 Jan 2004 18:42:16 +0000</pubDate>
		<dc:creator>jbat</dc:creator>
				<category><![CDATA[The Search Papers]]></category>

		<guid isPermaLink="false">http://battellemedia.com/archives/2004/01/the_search_papers_do_web_search_engines_suppress_controversy.php</guid>
		<description><![CDATA[<p><p>The post <a href="http://battellemedia.com/archives/2004/01/the_search_papers_do_web_search_engines_suppress_controversy.php">The Search Papers: Do Web Search Engines Suppress Controversy?</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>The First Monday peer-reviewed journal recently published &#34;Do Web Search Engines Suppress Controversy?&#34; by Susan Gerhart, a software engineering professor at Embry-Riddle Aeronautical University. Driving the paper is this sentiment: &#34;The dilemma of controversies is that the searcher beginning to explore a topic doesn&#8217;t know the search terms to investigate...</p></p><p>The post <a href="http://battellemedia.com/archives/2004/01/the_search_papers_do_web_search_engines_suppress_controversy.php">The Search Papers: Do Web Search Engines Suppress Controversy?</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></description>
			<content:encoded><![CDATA[<p>The post <a href="http://battellemedia.com/archives/2004/01/the_search_papers_do_web_search_engines_suppress_controversy.php">The Search Papers: Do Web Search Engines Suppress Controversy?</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p><img alt="gerhart2.gif" src="http://battellemedia.com/media/images/archives/gerhart2.gif" width="306" height="71" border="0" align="left" hspace="6" />The First Monday peer-reviewed journal recently published  <a href="http://firstmonday.org/issues/issue9_1/gerhart/">&#8220;Do Web Search Engines Suppress Controversy?&#8221;</a> by <a href="http://pr.erau.edu/~gerharts/">Susan Gerhart</a>, a software engineering professor at Embry-Riddle Aeronautical University. Driving the paper is this sentiment:</p>
<p><i>&#8220;The dilemma of controversies is that the searcher beginning to explore a topic doesn&#8217;t know the search terms to investigate a controversy unless it is revealed with reasonable visibility, e.g. not item number 879 in search results, nor buried three links away from result number 30.&#8221;</i></p>
<p>In other words, if you are just starting to research a topic, and have no idea if there are any controversies surrounding said topic, how will you ever know if the search engine has a bias toward not revealing those controversies?</p>
<p>This paper explores the hypothesis that, as Gerhart puts it: &#8220;A given, well&#8211;known specific controversy will not be revealed in the top search results.&#8221; She then creates an experiment to test this hypothesis, by outlining both a broad topic, and a related controversial subtopic. An example is &#8220;Albert Einstein&#8221; as the broad topic, and &#8220;Did Einstein&#8217;s first wife, Mileva Maric, receive appropriate credit for scientific contributions to Einstein&#8217;s early work&#8221; as the subtopic. The question is, do search engines leave out the more controversial bits, the stuff that, taken as a whole, provide texture and context to any searcher&#8217;s understanding of a topic?</p>
<p>For the many examples she tested, Gerhart found proof on both sides of the ledger, and the paper left me disappointed that she could not come to a more decisive conclusion. She did note that in fact most search engines were roughly equal in their performance in the experiments. And she has some interesting thoughts on how controversies are integrated (or not) into the web at large, and some suggestions as to how various actors on the web &#8211; site authors, researchers, search engines &#8211; might better organize themselves to portray a more <a href="http://www.firstmonday.org/issues/issue9_1/gerhart/index.html#g8"> relevant set of SERPs</a> to any particular query. </p>
<p>All in all, I liked this paper, as it forced me to think about the politics and architecture of search engine results. She introduces the idea of &#8220;sunny&#8221; vs. &#8220;dark&#8221; search results, and concludes that &#8220;sunny&#8221; results &#8211; those that do not include controversies, tend to float toward the top. Her final conclusion:</p>
<p>&#8220;<i>Web search engines do not conspire to suppress controversy, but their strategies do lead to organizationally dominated search results depriving searchers of a richer experience and, sometimes, of essential decision&#8211;making information. These experiments suggest that bias exists, in one form or another, on the Web and should, in turn, force thinking about content on the Web in a more controversial light.&#8221;</i></p>
<p>The one thing Dr. Gerhart left out entirely is the effect of blogs. As most of us certainly know, when the blogosphere latches onto a controversy (or just a <a href="http://www.google.com/search?hl=en&#38;ie=ISO-8859-1&#38;q=miserable+failure">politically-driven meme</a>), that aspect of a topic usually shoots to the top of the SERPs. As with most good papers, this one left me feeling like there is much work yet to be done. </p>
<p>The post <a href="http://battellemedia.com/archives/2004/01/the_search_papers_do_web_search_engines_suppress_controversy.php">The Search Papers: Do Web Search Engines Suppress Controversy?</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://battellemedia.com/archives/2004/01/the_search_papers_do_web_search_engines_suppress_controversy.php/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Search Papers: Bray on Search</title>
		<link>http://battellemedia.com/archives/2003/12/the_search_papers_bray_on_search.php?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the_search_papers_bray_on_search</link>
		<comments>http://battellemedia.com/archives/2003/12/the_search_papers_bray_on_search.php#comments</comments>
		<pubDate>Mon, 08 Dec 2003 20:59:29 +0000</pubDate>
		<dc:creator>jbat</dc:creator>
				<category><![CDATA[The Search Papers]]></category>

		<guid isPermaLink="false">http://battellemedia.com/archives/2003/12/the_search_papers_bray_on_search.php</guid>
		<description><![CDATA[<p><p>The post <a href="http://battellemedia.com/archives/2003/12/the_search_papers_bray_on_search.php">The Search Papers: Bray on Search</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>Tim Bray has a series called On Search over at his Ongoing blog, and I find it worthy of a read&apos;n&apos;muse. He starts with this backgrounder on himself and search issues as he sees them, and has a ton of entries on any number of subjects, too numerous to go...</p></p><p>The post <a href="http://battellemedia.com/archives/2003/12/the_search_papers_bray_on_search.php">The Search Papers: Bray on Search</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></description>
			<content:encoded><![CDATA[<p>The post <a href="http://battellemedia.com/archives/2003/12/the_search_papers_bray_on_search.php">The Search Papers: Bray on Search</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>Tim Bray has a series called <a href="http://www.tbray.org/ongoing/When/200x/2003/07/30/OnSearchTOC">On Search</a> over at his <a href="http://www.tbray.org/ongoing/">Ongoing</a> blog, and I find it worthy of a read&#8217;n'muse. He starts with this <a href="http://www.tbray.org/ongoing/When/200x/2003/06/15/OnSearch">backgrounder</a> on himself and search issues as he sees them, and has a ton of entries on any number of subjects, too numerous to go into here. Highlights: he writes on <a href="http://www.tbray.org/ongoing/When/200x/2003/11/16/SearchAPIs">interface issues</a> (warning, not for the faint of geek),  how best <a href="http://www.tbray.org/ongoing/When/200x/2003/11/30/SearchXML">to search XML</a> (answer: we don&#8217;t know yet, recall he was a co-author of same), and on <a href="http://www.tbray.org/ongoing/When/200x/2003/11/13/ResultRanking">result rankings</a>, with a quick refresher on why PageRank works, and good advice on paying attention to your own logs. Also worthy: his primer on <a href="http://www.tbray.org/ongoing/When/200x/2003/06/18/HowSearchWorks">how search works</a>, and his discussion of the technical search terms <a href="http://www.tbray.org/ongoing/When/200x/2003/06/22/PandR">precision and recall</a> (with an interesting note on the absence of top companies in the research community &#8211; see <a href="http://battellemedia.com/archives/000091.php">my post on this here</a>), and lastly (<i>whew</i>), his <a href="http://www.tbray.org/ongoing/When/200x/2003/06/24/IntelligentSearch">mini-rant on intelligent search</a>, and why it&#8217;s a long way off. An excerpt:<br />
&#8220;If we want better search (and we do), we&#8217;d better not count on AI voodoo or linguistic juju or semantic mojo. We need to work with good sound statistical techniques, and be clever about generating and using metadata, and we need to get our APIs right. All of these things are hard, and there is good work being done in all of them.&#8221;</p>
<p>The post <a href="http://battellemedia.com/archives/2003/12/the_search_papers_bray_on_search.php">The Search Papers: Bray on Search</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://battellemedia.com/archives/2003/12/the_search_papers_bray_on_search.php/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Search Papers: Challenges in Web Search Engines (A Google Paper, 2002)</title>
		<link>http://battellemedia.com/archives/2003/12/the_search_papers_challenges_in_web_search_engines_a_google_paper_2002.php?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the_search_papers_challenges_in_web_search_engines_a_google_paper_2002</link>
		<comments>http://battellemedia.com/archives/2003/12/the_search_papers_challenges_in_web_search_engines_a_google_paper_2002.php#comments</comments>
		<pubDate>Mon, 08 Dec 2003 07:53:45 +0000</pubDate>
		<dc:creator>jbat</dc:creator>
				<category><![CDATA[The Search Papers]]></category>

		<guid isPermaLink="false">http://battellemedia.com/archives/2003/12/the_search_papers_challenges_in_web_search_engines_a_google_paper_2002.php</guid>
		<description><![CDATA[<p><p>The post <a href="http://battellemedia.com/archives/2003/12/the_search_papers_challenges_in_web_search_engines_a_google_paper_2002.php">The Search Papers: Challenges in Web Search Engines (A Google Paper, 2002)</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>This paper &#34;presents a high-level discussion of some problems in information retrieval that are unique to web search engines,&#34; according to its abstract in the ACM library. (A reminder as to what this whole &#34;Search Papers&#34; thing is about: read this.) &#34;The goal is to raise awareness and stimulate research...</p></p><p>The post <a href="http://battellemedia.com/archives/2003/12/the_search_papers_challenges_in_web_search_engines_a_google_paper_2002.php">The Search Papers: Challenges in Web Search Engines (A Google Paper, 2002)</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></description>
			<content:encoded><![CDATA[<p>The post <a href="http://battellemedia.com/archives/2003/12/the_search_papers_challenges_in_web_search_engines_a_google_paper_2002.php">The Search Papers: Challenges in Web Search Engines (A Google Paper, 2002)</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>This <a href="http://portal.acm.org/citation.cfm?id=792553&#38;jmp=cit&#38;dl=ACM&#38;dl=ACM">paper</a> &#8220;presents a high-level discussion of some problems in information retrieval that are unique to web search engines,&#8221; according to its abstract in the ACM library. (A reminder as to what this whole &#8220;Search Papers&#8221; thing is about: read <a href="http://battellemedia.com/archives/000091.php">this</a>.)  &#8220;The goal is to raise awareness and stimulate research in these areas,&#8221; it continues. How might such a lofty incitement be backed up? Well, it&#8217;s written by two senior employees of Google, <a href="http://www.henzinger.com/monika/">Monika R. Henzinger</a> and <a href="http://www-cs-students.stanford.edu/~csilvers/">Craig  Silverstein</a> (I&#8217;ve met with Craig, he was employee #1 after Larry and Sergey, and a nice guy to boot), as well as <a href="http://theory.stanford.edu/~rajeev/">Rajeev  Motwani</a>, a professor at Stanford (Craig was his graduate student). </p>
<p>The paper is dated September, 2002, so it does not rank as a missive from the early, more geeky phase of Google&#8217;s life, but rather a more corporate product &#8211; the two Google authors knew they bore the weight of &#8220;being Google&#8221; when they wrote this paper, and it&#8217;s worth keeping that in mind when reading through it.  </p>
<p>This is particularly clear in the paper&#8217;s scope and focus. It lays out six challenges for search engines &#8211; and they read like a laundry list of Google&#8217;s headaches. The paper then goes on to offer suggested paths for more research on the topics, which I could imagine might read either as genuine or a tiny bit patronizing, depending on who you are. (The paper does not tackle a range of other issues it says are already the subject of abundant research  &#8211; natural language queries, image/audio search, improving text-based retrieval, language issues, or interface/clustering, for example.) <br />
(more in the extended entry, click link below)</p>
<p>]]&gt;<span id="more-113"></span>< ![CDATA[
<p>First among the stated problems is spam &#8211; folks who try to game search engine listings for their own commercial gain (this is clearly Google&#8217;s biggest problem, dominating a lot of their time). Second and third are content quality and quality evaluation &#8211; how to determine the relative value of content on a web page, and how to determine if your algorithms w/r/t same are working. Fourth is something they call &#8220;web conventions&#8221; &#8211; how to create useful search engines given the fact that the web follows loose conventions rather than strict rules. Fifth is the problem of duplicate hosts &#8211; two hosts that serve the same content (eliminating these would unclog search results, Google has sometimes been criticized by its competitors for having too many duplicate pages). And sixth is the wonderfully termed &#8220;vaguely structured data&#8221; &#8211; XML is mentioned, but dismissed &#8211; the authors instead suggested there is value in understanding conventions of  HTML presentation (the way a page looks) and somehow using that to make searches better. </p>
<p>So as not to bore the lot of you, I won&#8217;t go into the detail on each. Suffice to say, this paper was interesting and a worthy read if you are a student of the company and/or the field. I have only now begun to read the more recent public papers from Google scientists, so I can&#8217;t compare them as a corpus. A few notes: It&#8217;s not clear who this paper was really written for, as there are notes that seem for less technical readers (ie one note explains what a crawler is &#8211; are there really folks in the research community who are not web savvy?). The paper toots Google&#8217;s horn a few times (it says PageRank is not vulnerable to some types of spam), admits where Google has weaknesses, gives props to <a href="http://www.cs.cornell.edu/home/kleinber/">Jon Kleinberg&#8217;s</a> HITS algorithm (upon which some say PageRank is based) and even seems to float some trial balloons to the research community (on how to detect spamming tactics, for example, in section 2.4). I did take issue with some of the editorial assumptions in the &#8220;Content Quality&#8221; section, but I won&#8217;t go into all that here. Drop me a line if you want to discuss. And&#8230;if you are a researcher in this field, or know one, I&#8217;d be interested in what the academic community thinks of this paper, and any others I post on as well. Thanks!</p>
<p>The post <a href="http://battellemedia.com/archives/2003/12/the_search_papers_challenges_in_web_search_engines_a_google_paper_2002.php">The Search Papers: Challenges in Web Search Engines (A Google Paper, 2002)</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://battellemedia.com/archives/2003/12/the_search_papers_challenges_in_web_search_engines_a_google_paper_2002.php/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Search Papers: Defining Intent</title>
		<link>http://battellemedia.com/archives/2003/11/the_search_papers_defining_intent.php?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the_search_papers_defining_intent</link>
		<comments>http://battellemedia.com/archives/2003/11/the_search_papers_defining_intent.php#comments</comments>
		<pubDate>Fri, 28 Nov 2003 19:48:27 +0000</pubDate>
		<dc:creator>jbat</dc:creator>
				<category><![CDATA[The Search Papers]]></category>

		<guid isPermaLink="false">http://battellemedia.com/archives/2003/11/the_search_papers_defining_intent.php</guid>
		<description><![CDATA[<p><p>The post <a href="http://battellemedia.com/archives/2003/11/the_search_papers_defining_intent.php">The Search Papers: Defining Intent</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>I&apos;ve just finished reading A Taxonomy of Web Search by Andrei Broder, written largely while the author was CTO of Alta Vista (and using AV query data), and published after he moved to IBM Research in 2001. The paper has a trove of references to other papers, which is good...</p></p><p>The post <a href="http://battellemedia.com/archives/2003/11/the_search_papers_defining_intent.php">The Search Papers: Defining Intent</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></description>
			<content:encoded><![CDATA[<p>The post <a href="http://battellemedia.com/archives/2003/11/the_search_papers_defining_intent.php">The Search Papers: Defining Intent</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p><p>I&#8217;ve just finished reading <a href="http://216.239.57.104/search?q=cache:iVoLDHCm3sAJ:www.acm.org/sigir/forum/F2002/broder.pdf+%22andrei+broder%22+taxonomy&#38;hl=en&#38;ie=UTF-8">A Taxonomy of Web Search</a> by Andrei Broder, written largely while the author was CTO of Alta Vista (and using AV query data), and published after he moved to IBM Research in 2001.</p>
<p>The paper has a trove of references to other papers, which is good for my work, and it has a singular thesis: that all web searches are not equal. Broder sets out to dispel the notion that all searches are &#8220;informational&#8221; in nature. He instead maintains that many are &#8220;transactional&#8221; or &#8220;navigational&#8221; in nature. These two seemingly obvious categories are in fact relatively new to the academic field of Information Retrieval (IR), which developed  largely in the context of large islands of data (ie, in the 70s/80s), rather than in the web era. </p>
<p>What I like about this paper is the use of the word &#8220;intent&#8221; &#8211; which over the years I&#8217;ve come to use quite a bit (see my last column on <a href="http://www.business2.com/articles/mag/0,1640,52838,00.html">video advertising over the internet</a>, in which I rant once again on &#8220;intent over content&#8221;, or my post on <a href="http://battellemedia.com/archives/000063.php">The Database of Intentions</a>). Intent is behind every kind of search, Broder says, but &#8220;there is no assumption &#8230; that this intent can be inferred with any certitude from the query.&#8221; Ay, there&#8217;s the rub&#8230;.To get to that intent, Broder employed a short survey on the site. </p>
<p>A few fun facts from Broder&#8217;s analysis of response and related log data: <br />
- nearly 15% of searchers wish for &#8220;a good collection of links on a subject&#8221; as opposed to &#8220;a good document.&#8221;<br />
- 12% of queries in the log data used were sexual in nature <br />
- nearly 25% of searchers were looking for &#8220;a specific website that I already had in mind.&#8221;<br />
- An estimated 36% of searchers were looking for transactional information &#8211; what Broder calls &#8220;the intent to perform some web-mediated activity.&#8221; </p>
<p>Broder concludes that the next generation of search engines will need to take into account this new taxonomy of intent &#8211; transactions, navigation, as well as informational. Given that this paper was published in late 2001, it&#8217;s interesting to see how the major engines already are on that path &#8211; with Yahoo&#8217;s focus on <a href="http://search.yahoo.com/search?fr=sfp&#38;p=shopping">shopping </a> being one of the best examples.</p>
<p>The post <a href="http://battellemedia.com/archives/2003/11/the_search_papers_defining_intent.php">The Search Papers: Defining Intent</a> appeared first on <a href="http://battellemedia.com">John Battelle&#039;s Search Blog</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://battellemedia.com/archives/2003/11/the_search_papers_defining_intent.php/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
