Guess what year I’m writing about (for the book) today…
As many have noted, Froogle has begun to aggregate snippets of product reviews from the web at large. This marks the Google News-ification of Froogle. When will the service jump the shark and start making money on vigs from sales? Or will it? Will publishers like Cnet revolt? Wait, here’s Cnet coverage…
The service, which is similar to the company’s aggregated site for news around the Web, highlights Google’s ambition to bring more content to its own site with the use of its “spidering” technology.
Huh. “Bring more content.” That’s an interesting way to put it. Indeed.
Indeed.com is a new service which scrapes jobs from scores of services and then wraps a familiar interface around the entire thing – a search interface. I like the ability to refine searches and the ability to search by region. It’s fun to play around with. Needs to deal with the duplication issue, as this search for “blogger” shows, but it supports RSS and you can sort by date. Neat. The site’s founders have a (nascent) blog as well.
Update: Paul Forster, one of the founders, tells me of the business model the site will pursue:
“We’ll start with a contextual advertising system similar to Google
Adwords or Overture. We won’t be accepting payments for improved job
positioning in our main search results and our paid ads will be clearly distinguished from our main search results”
And reader Otis G. tells me that Indeed is based on Lucene. Cool!
Gary/Search Engine Watch has posted a review of Ask’s new desktop search tool, and reading it reminded me of a conversation I had with Ask’s Jim Lanzone earlier today. Jim was a bit crabby – after all, Ask bought Tukaroo a long time ago and deserves credit for seeing the importance of the space way back then. However, as he pointed out, desktop search is simply one arrow that has to be in every serious search players’ quiver, so it’s not that big a deal that everyone hustled to get on board. He has a point. The torrent of news has all of us atwitter about desktop search, but in the end, it’s simply another necessary building block toward good search services.
And, by the way, I got pinged by the folks at Lycos, who want to remind us all that they were in this game really early….with a HotBot desktop search tool.
Yahoo’s been busy, today it also announced that it is overlaying traffic data on top of its Maps product. The picture shows get from my house to Yahoo, with traffic highlights.
This is a capability that John Hanke showed with Keyhole at Web 2.0, so I expect we’ll see something similar from Google shortly…Release in extended entry.
]]> Read More
So word is out on Yahoo’s video search, many have noted its similarity to previous incarnations from Yahoo acquisitions alltheweb and AltaVista. A post on Yahoo’s Search Blog clarifies that those sites now have Yahoo’s video search improvements rolled in, so the new product is in fact an improved version. The original post on the video beta release is here.
What I find interesting about this new product is the extensions Yahoo is proposing for RSS – “Media RSS.” With it, Yahoo is attempting to address a major problem with indexing video – that of metadata, or more directly, the lack thereof. From Jeremy’s post:
As Marc Canter has noticed, we could all benefit from a bit more metadata to go with this growing pool of media. Who published this video? What formats are available? How is it licensed?
From our point of view, it means we can build a much better video search. You might want to filter results based on some of that metadata (title, actor, file format, etc). But it also opens up so many more doors. For example, your news aggregator might use your preferences to figure out which videos to download: Windows Media or Quicktime? High bandwidth or low? Heck, we can see entirely new rich media aggregators and tools being built–something like the popular iPodder currently used for podcasting. And when they are, this metadata becomes all the more important.
To get this started, we’re suggesting an optional set of metadata extensions that we’ve been calling “Media RSS” (yes, we’re so creative with names). They’re aimed at publishers who’d like to provide a rich set of metadata about the media being published. Our video search system will also support these Media RSS extensions in addition to video enclosures (see the FAQ and the draft spec).
Yahoo is using its power as a major distribution player to feed what it hopes will be a major play in video distribution. It may not seem like a big deal now, but as the web increasingly becomes a native environment for video, it will may well prove to be one of the most forward looking things the company has done this year. And by the way, it’s always fun to see what the top search is for “dancing monkey.” Hey, that looks like Steve Ballmer….
I sent a query to Lee Giles, the guru at Penn State behind CiteSeer (with Steve Lawrence, who is now at Google) asking him which search-related papers are the most cited. I was struck by the near parity between Page and Brin’s original paper on Google and Jon Kleinberg’s paper on Hubs and Authorities. Giles did a bit of fiddling with Google Scholar and responded:
For web related work these are well cited in the Google Scholar using the query “web”:
PDF] The Semantic Web
T Berners-Lee, J Hendler, O Lassila – View as HTML – Cited by 1347
… May 17, 2001. The Semantic Web. A new form of Web content that is meaningful to
computers will unleash a revolution of new possibilities. … Web: A Research Agenda. …
Scientific American, 2001 – www-personal.si.umich.edu
[PDF] The anatomy of a large-scale hypertextual Web search engine
S Brin, L Page – View as HTML – Cited by 1087
Abstract In this paper, we present Google, a prototype of a large-scale search
engine which makes heavy use of the structure present in hypertext. Google …
Computer Networks and ISDN Systems, 1998 – kulturinformatik.uni-lueneburg.de – firstrate.co.nz – net.cs.pku.edu.cn – scalab.uc3m.es – all 69 versions
However, this one can’t be ignored:
[PDF] Authoritative sources in a hyperlinked environment
J Kleinberg… – Cited by 1059
Abstract. The network structure of a hyperlinked environment can be a rich
source of information about the content of the environment, provided we …
Journal of the ACM, 1999 – portal.acm.org – nan.dhs.org – cs.cmu.edu – mathe.tu-freiberg.de – all 73 versions
This book is the first to discuss the web in any detail:
[PS] Modern Information Retrieval
R Baeza-Yates, B Ribeiro-Neto, R Baeza-Yates – View as HTML – Cited by 1198
Page 1. Modern Information Retrieval. Ricardo Baeza-Yates. Berthier Ribeiro-Neto.
ACM Press New York. … 1.1.2 Information Retrieval at the Center of the Stage . . …
Addision Wesley, 1999 – dcc.ufmg.br – sunsite.dcc.uchile.cl – sims.berkeley.edu – portal.acm.org – all 7 versions »
All worthy reads!
Sigh. Again, I find myself in this odd space. I’m under embargo on this information (Yahoo briefed me and others), but a reader just sent me this link out of the blue (my readers are so damn dialed in, first Google Library, now this…). So you guys go look for yourselves, please comment here as to what you think, and I’ll write about this on Thursday, as I have holiday stuff to do tonight and can’t write it up now. Yahoo Video Search.
Reuters just came out with this:
ALEXANDRIA (Reuters) – A federal judge on Wednesday dismissed a key element of insurer GEICO’s trademark infringement case against online search engine Google Inc (GOOG.O: Quote, Profile, Research) .
U.S. District Judge Leonie Brinkema ruled that there was not enough evidence of trademark violation to bar Google from displaying rival insurers when computer users search the word “GEICO.”
From the AP:
Geico claimed that Google’s AdWords program, which displays the rival ads under a “Sponsored Links” heading next to a user’s search results, causes confusion for consumers and illegally exploits Geico’s investment of hundreds of millions of dollars in its brand.
“There is no evidence that that activity alone causes confusion, ” Brinkema said, in granting Google’s motion for summary judgment on that issue.
But Brinkema said the case would continue to move forward on one remaining issue, whether ads that pop up and actually use Geico in their text violate trademark law.
More as this develops…
PS – Watch GOOG. It was down before the news broke but is trending back up…