free html hit counter For What It's Worth | John Battelle's Search Blog

For What It's Worth

By - January 10, 2007

Larrysergeypaperpic

From time to time, I am jolted into re reading stuff that I practically memorized while writing the book. Here’s one passage, an appendix to “The Anatomy of a Large-Scale Hypertextual Web Search Engine” – the original paper by Larry and Sergey introducing Google – that felt like it was worth another look, in particular given the tempest over “tips” and ongoing pressures to monetize partnerships like AOL and YouTube, as well as the increasing creep in the number of ads we all see in Google results. I’ve bolded that which I find particularly worthy.

Appendix A: Advertising and Mixed Motives

Currently, the predominant business model for commercial search engines is advertising. The goals of the advertising business model do not always correspond to providing quality search to users. For example, in our prototype search engine one of the top results for cellular phone is “The Effect of Cellular Phone Use Upon Driver Attention”, a study which explains in great detail the distractions and risk associated with conversing on a cell phone while driving. This search result came up first because of its high importance as judged by the PageRank algorithm, an approximation of citation importance on the web [Page, 98]. It is clear that a search engine which was taking money for showing cellular phone ads would have difficulty justifying the page that our system returned to its paying advertisers. For this type of reason and historical experience with other media [Bagdikian 83], we expect that
advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.



Since it is very difficult even for experts to evaluate search engines, search engine bias is particularly insidious. A good example was OpenText, which was reported to be selling companies the right to be listed at the top of the search results for particular queries [Marchiori 97]. This type of bias is much more insidious than advertising, because it is not clear who “deserves” to be there, and who is willing to pay money to be listed. This business model resulted in an uproar, and OpenText has ceased to be a viable search engine. But
less blatant bias are likely to be tolerated by the market. For example, a search engine could add a small factor to search results from “friendly” companies, and subtract a factor from results from competitors. This type of bias is very difficult to detect but could still have a significant effect on the market. Furthermore, advertising income often provides an incentive to provide poor quality search results. For example, we noticed a major search engine would not return a large airline’s homepage when the airline’s name was given as a query. It so happened that the airline had placed an expensive ad, linked to the query that was its name. A better search engine would not have required this ad, and possibly resulted in the loss of the revenue from the airline to the search engine. In general, it could be argued from the consumer point of view that the better the search engine is, the fewer advertisements will be needed for the consumer to find what they want. This of course erodes the advertising supported business model of the existing search engines. However, there will always be money from advertisers who want a customer to switch products, or have something that is genuinely new. But we believe the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm.

Related Posts Plugin for WordPress, Blogger...

28 thoughts on “For What It's Worth

  1. JG says:

    Amen.

    I think I have said this before on this blog, but back around the turn of the decade, when Google started showing its first ads, I wrote them an email, offering to pay a monthly subscription fee if they would just keep their site ad free. I had agreed with them from the very beginning that search and advertising are mutually incompatible. Even if there is no direct ranking influence, the fact that real estate is being taken up on the SERPs page by ads means that you have no more real estate for integrating known search enhancement techniques into the system. Especially if you want to keep that famous “clean” and “uncluttered” interface. There will necessarily have to be a trade-off.

    I’ve lost my once starry-eyed fawning over the company. I still like Google, but the stars are gone. I don’t care that other companies are worse or have more dubious ads. “Mom, they’re doing it, too” is not a valid excuse. Google has to be better than that, on principle.

    I thought that the Google I fell in love with in 1998 was going to be like the magazine “Consumer Reports”, and not accept advertising at all. Avoid even the appearance of evil. Ah, my naivete.

  2. gio says:

    i actually printed it and read it sometime. its the idea that started it all.

  3. ZF says:

    This hand-wringing sounds plausible enough, but here’s the problem I have with it.

    It has been possible for a couple of years now for anyone to write and circulate a Greasemonkey script which pulls all of the Google advertising, tips and whatever out of their pages in your browser, presenting you with only the ‘pure’ search results. You could even choose from a pick list which elements you wanted to filter out. Google could do basically nothing to counter this.

    And yet there seems to be little demand for such a thing. If there were, even from just a portion of users, Google would take notice and behave differently, but there isn’t.

  4. ________________________________________________________

    It must be stated that Google has every right to sell advertising – and is doing society a service by doing so.

    Let Search Engines WeB explain why:

    Google insists on *Link Popularity and Trust Rank and Popularity of Links* as driving factors in their Algos…..

    Ergo, only established sites or networked sites have any prayer during the first stages of their existance.

    Not everyone can spend HOURS promoting a site, or Links Building or Link Baiting; not everyone can afford the creme de la creme of SEOs, nor can they afford a dedicated SEO.

    Google’s Advertisers are reviewed – so in essence, it is analagous to a temporary directory listing ( in a manner of speaking)

    This offers some hope for new Websites and small businesses to get their products and services in a competative mode along with the established sites.

    Also, the income generated for Google, aids them in not only offering free WEB 2.0 services, but in refining their technology to deliver more relevant ads.

    Imagine, NO ADWORDS at all!!! What hope would there be for new sites to get any attention while they are developing their Search Engine Optimization strategies??????

    Remember, a few years ago, it used to be easier for newbies to SEO their way into a decent ranking. The SERPs were much more varied.

    But as spammers and black-hat SEOs abounded, Google institute more defenses. Maybe they have gone to far.

    But look at the recent strategies by MSN – and to a lesser degree Yahoo, they are showing the same trend.

    ★ ★ ★ ★ ★ ★ ★ ★
    BTW:

    :LOL Was this sudden interest in the Anatomy of a Search Engines a result of the satirical comment on Cutts Blog?

  5. JG says:

    Search Engines Web: You completely missed the point of everything I just said.

    You write: “Not everyone can spend HOURS promoting a site, or Links Building or Link Baiting; not everyone can afford the creme de la creme of SEOs, nor can they afford a dedicated SEO.”

    Don’t you see that Google has created an artificial scarcity? By only providing a single way of trawling for relevant information (a single, unmalleable ranked list, with no relevance feedback, with no query modification tools), Google creates the need for SEO. Google creates the need for ad buying. There is only one top spot. There is only one first page with 10 results.

    Let’s imagine another world, for a minute. A world in which the ranked list was user malleable. Not by having the user enter a whole new query. No. Keep the same query. But let the user give a thumbs up/down to the top 3-4 items in the ranked list.

    By giving the thumbs up or down, the user is telling the Google engine to re-adjust the ranked list more toward one type of information, and away from the others. With just a couple of thumb clicks, the user totally reshuffles the existing results of his/her initial query.

    As a result, that non-SEOed relevant website that used to be ranked 83rd (where no user would ever see it) is now ranked 3rd.. because all the other less-relevant crap has been filtered out (using the searcher’s feedback). And it took no more user effort, no more clicks than it would have taken for that user to visit a few ads.

    Bingo: The user finds what they want/need. Bingo: The non-SEOed site gets the traffic it needs. Bingo: Not a single ad has had to be clicked, nor a single SEO has been paid. But the relevant users have nonetheless found their way to the relevant information, and non-SEOed websites now have a hope of being found, without having to pay the SEOs.

    Information wants to be free. Free to be found. Free to be remixed and free to be reranked by the user that is looking for it. It does not want to be SEOed.

    By NOT offering relevance feedback, by continuing to uphold the artificial scarecity of the unmalleable ranked list, Google is growing an artificial market filled with white hat SEOs, black hat SEOs, and desperate advertisers. It is doing its users a disservice. I would much rather it ignored its SEOs and instead focused on its users.

    Seven years ago I offered to pay Google to do that, to focus on creating better user interaction instead of accept advertising. It chose another route.

    Yes, Google can do whatever it wants. But understand that offering ads is not “doing society a service”, as you say. Because it is robbing users of a valuable service (easy, intuitive ranked list reshuffling), in order to do so.

  6. JG says:

    And ZF: You miss half of the point, too. You write: “It has been possible for a couple of years now for anyone to write and circulate a Greasemonkey script which pulls all of the Google advertising, tips and whatever out of their pages in your browser, presenting you with only the ‘pure’ search results. You could even choose from a pick list which elements you wanted to filter out. Google could do basically nothing to counter this.”

    My point isn’t just that the Google SERPs page is filled up with advertising. My point is that one comes at the expense of the other. Ads come at the expense of relevance feedback mechanisms, as I was talking about above.

    Sure, you can Greasemonkey all the ads away if you want; I know that. The point is that there are information search mechanisms that Google could be offering instead of the ads. Until it looses its focus on ads, however, it is never going to offer those mechanisms.

    So no manner how many ads you Greasemonkey away, you will never be able to give the thumbs up/down on any of the SERPs, and have Google automatically re-order the list for you, based on that dynamic feedback. And all because of Google’s focus on ads.

    To me, that is the antithesis of innovation.

  7. ZF says:

    JG, looking at the two halves of your argument separately then:

    (a) My point about filtering out Google’s ads is not that this would by itself change everything, it’s that despite how simple it would be few people seem to be very interested in doing it. This suggests that the ads are not (yet) detracting markedly from the overall user experience.

    (b) The other half of your argument is basically a suggestion that, without the ads, the SERPS from Google would be better due to the conflicts of interest which the ads bring. This is highly speculative. Firstly because it rests on a suggestion that the huge number of very smart people hired by Google to work on search aren’t doing their best. Well maybe they are and maybe they aren’t, cynicism alone doesn’t answer the question. Secondly how would Google have had the resources required to hire all those smart people without the money brought in by the ads? The question answers itself.

    The only way we will really find out if Google is doing a sub-optimal job on the SERPS due to the ads is if somebody new comes in with a better or equally good solution which turned out to be feasible at a lower cost level.

    It is possible that right now it just isn’t possible to do better than Google cheaply, but that as the industry matures and computing power gets less expensive an opening will occur. If so the people most likely to take advantage of that opportunity would be teams leaving Google.

  8. JG says:

    ZF: Fair enough.. I can concede part (a). Perhaps it really is true that ads are not detracting from the user experience. (How you actually know that a lot of people are not using Greasemonkey, I don’t know. But you’re most likely right. First, it involves using Firefox, which most people don’t do. Then it involves installing the tool, which is yet another effort-requiring step. It would be interesting, however, if Microsoft built into IE 7.1 its own “Greasemonkey”-like app. What if the first time it came across a Google ad (whether an adword or an adsense ad), Mr. Clippy popped up and said: “I see that Google is trying to fill your web browsing experience with advertising. Would you like to turn off Google ads?” I wonder what percentage of users would click “yes”. And I wonder how loudly Google would scream “foul”, even though MS wouldn’t automatically be blocking Google ads.. but instead giving users that choice, directly. What do you think, ZF? How many people would do that, if MS actively prompted them? Hmmm? :-)

    So yes, I concede (a). Ads might not take away or detract from the experience. But that brings us to part (b). You have two objections to my suggestion that ads are a countervailing force against “relevance feedback”, and vice versa.

    (i) Your first objection is that Google has obviously hired lots of smart people, and to think that they would be holding back their smarts is just silly. Well, yes, Google has hired lots of smart people. But there were even more smart people working on search for decades before Google even existed. And the near unanimous consensus, as far as I can tell, is that “relevance feedback” works. So I see three possible explanations for this:

    (A) Google’s people are not as smart as 30 years of scientific research, because they haven’t figured that out yet
    (B) Google’s people are smarter than 30 years of scientific research, and have figured out that, despite 30 years of research to the contrary, “relevance feedback” really does NOT work, or
    (C) Google’s people are equally smart as 30 years of scientific research, and know that relevance feedback works, BUT relevance feedback counteracts advertising, so to implement it would be killing the cash cow.

    Now, I think we agree that (A) is not true. Google has smart people. So that leaves (B) or (C). Let’s assume for a moment that it is not (C), therefore it is (B). So if it is (B), then I go back to my earlier suggestion that Google should publish its discoveries and give back to the research communities that spawned it. Knowing that relevance feedback does not work is obviously not a competitive advantage; Google does not lose money by sharing this “secret”. And if this is really true (that relevance feedback does not work) it would rock the foundational pillars of the information retrieval research community. It would be like suddenly announcing that evolution is wrong to a community of biologists. In short, it is a big deal. Google needs to stop sitting on this information and publish!

    However, I again tend to think that this is not the case, (B) is not the explanation. Google hasn’t suddenly discovered that relevance feedback is useless. This is not unfounded speculation. This is borne out of years of scientific experience, both my own and the larger community’s.

    So if, by the process of rational elimination, the explanation is not (A), and it is not (B), that leaves only (C): ads counteract relevance feedback. It may indeed just be speculation, but all other explanations have been eliminated. (What is that Sherlock Holmes quote? “When all else has been considered, whatever remains, no matter how unlikely it might be, must be the solution.”) And, most interestingly, it seems to be the opinion that Google’s own founders once held, in 1998, as witnessed by the excerpts from the paper that Battelle quoted above.

    Oh, and your second objection (ii) was “how would Google have had the resources required to hire all those smart people without the money brought in by the ads? The question answers itself.” Let me say again: they would have had the resources required, because I and hundreds of thousands of others would have been paying them a monthly subscription fee for the past 7 years.

  9. ZF says:

    JG:

    “When all else has been considered, whatever remains, no matter how unlikely it might be, must be the solution.” Actually if you go back and look at the early Google papers you will find another reason the company might think ‘relevance feedback’ is problematic. One of the long term advantages cited in favor of Pagerank was that it was inherently hard to game. Google’s founders recognized early that this would become a bigger issue the more traffic they had; given their current size it’s potentially more dangerous problem for them than for anyone else. The jury is still out on whether this is a fatal flaw for ‘relevance feedback’.

    As to the idea of you and “hundreds of thousands of others” paying Google “a monthly subscription fee”, I think I’m only slightly exaggerating when I say that you’re on your own there! A competitive threat to Google may emerge, but I don’t think it’s ever likely to be driven by subscription fees. I’m a huge fan for instance of Craigslist, and I love their business model, but what they do from a technical point of view, and in terms of the resources required, is trivial when compared to web-scale text search.

  10. JG says:

    ZF: You’ve kinda lost me when you say: “One of the long term advantages cited in favor of Pagerank was that it was inherently hard to game. Google’s founders recognized early that this would become a bigger issue the more traffic they had; given their current size it’s potentially more dangerous problem for them than for anyone else. The jury is still out on whether this is a fatal flaw for ‘relevance feedback’.”

    I don’t see the connection you are drawing between pagerank and relevance feedback. Are you saying that when I type the query “model trains”, that those pages with the highest links are going to be the most relevant, and there will therefore never be a need for relevance feedback?

    I have to disagree. To drive the point home, let’s assume that, somehow, there was absolutely zero gaming of the system, whatsoever. In that case, I still think it will be true that the highest pageranked pages are not necessarily going to be the ones I am looking for.

    What if I am really looking for a particular type of model train, but really don’t know the words to describe it? I see a couple of pages that contain this train, so I give them the thumbs up. I see other pages that contain models trains, but not ones I like, so I give them the thumbs down. Relevance feedback then automatically selects the best keywords, the ones that describe the pages I like that don’t describe the pages I don’t like. Without me having to figure it out myself. It then readjusts the rest of the ranked list, so that those pages similar to the ones I like pop up to the top, and the ones similar to what I don’t like push down toward the bottom.

    In this way, I can turn a pagerank-based SERP list into one that is custom-tailored, right this very minute, to what I am looking for right this very minute. A page that would have been ranked 83rd using pagerank is now ranked 3rd, given MY particular preferences (relevance feedback) for the “model trains” query.

    So really, this is all fairly orthogonal to pagerank, which is why I am confused by what you’ve said, above.

    Could you send links to the early Google papers, of which you speak, so that I can get a clearer understanding of what you mean?

    Oh, and no comment about the MS IE 7.1 “Would you like to turn off Google ads” feature? You don’t think a majority of people would click yes, when offered the choice?

  11. Brian Turner says:

    Come on, John – show some balls – don’t hide behind other people’s words. Use your own and make the point.

    Google were concerned about advertiser influences – so what do you make of it? Are there any specific concerns about particular influences?

  12. PorNoAdut says:

    I would never ever turn off Google ads, they were informative and useful all the time!

  13. gio says:

    well i still see spammy results with google serps though but its minimal compared to yahoo and msn.

  14. There are certanily naive remarks in the original Google paper, but the embedded advertising issue of the day was not concerned with clearly labelled sponsored results set into margin space. There were popular search engines that actually mixed the paid results in with the organic listings in such a way that users could not (easily) distinguish between the paid and unpaid listings.

    That problem really has gone away among the major search engines thanks in large part to Google, and all the finger pointing about their paid advertising is way off track.

    I don’t like the paid ads either, particularly as they are now the primary funding source for search engine spam, but Google’s results pages are still among the cleanest you’ll find.

  15. Rational Beaver says:

    JG-

    Your thumbs up/thumbs down idea is nice in theory, but impractical in the real world.

    Take your train example. You search for ‘model trains’ and Google brings back some results. Within those results you see a couple of pages that refer to the specific type of train you were looking for. What happens next? You click on one of those pages and are taken to the site. No one is going to stay around to groom Google’s search results. Especially since searches are so fleeting. I mean, you aren’t planning to run that search 5 times a day are you? You just needed that info right now so what good does it do you to reorder a page you’ll never see again? You search, you find, you leave. That’s the point.

  16. JG says:

    Rational Beaver: Well, first of all, the thumbs up/thumbs down idea is not my idea. It is an idea that has been around for decades, and tested for decades, in the information retrieval community.

    Second of all, “you search, you find, you leave” is not the point. Not always.

    You need to take a look at (Yahoo’s) Andrei Broder’s 2002 paper, “A Taxonomy of Web Search“. In it, he talks about 3 kinds of searches, and from an analysis of the query logs breaks down real world queries by these three types:

    Navigational – 20%
    Transactional – 30%
    Informational – 48%
    (Other – 2%)

    Now, let’s look at your objection: “Within those results you see a couple of pages that refer to the specific type of train you were looking for. What happens next? You click on one of those pages and are taken to the site. No one is going to stay around to groom Google’s search results. Especially since searches are so fleeting.

    Essentially, what you are talking about is a navigational or transactional type of query. The moment you find the home page you are looking for (navigational), you stop looking. The moment you find a place where you can buy the Elmo doll, you stop looking (transactional).

    In those cases, I agree with you, thumbs up/down is not needed.

    But you are dead wrong when you say that it is impractical in the real world, because the Broder’s analysis shows that almost half of all web searches are informational, NOT navigational. One out of every two persons searching on the web are not trying to navigate to a page, but to find out more information on a topic. And think about it: What is the best result, when you want to find true information on the web, and not just one person’s opinion? The best result is a set of pages, from multiple sources, rather than a single page. Broder even says this in his paper:

    It is interesting to note, that in almost 15% of all searches the desired target is a good collection of links on the subject, rather than a good document. (A good hub, rather than a good authority, in the language of Kleinberg [K98]).

    And the problem is, a single web page containing those links often does not already exist. Especially for the long tail of queries. Instead, classic information retrieval is all about creating that collection of links, on the fly. In other words, it is about filling the top 10 or 20 or 30 results all with relevant documents, rather than just the first or second link.

    And as I said earlier, it has been studied for 30+ years, and relevance feedback (thumbs up/down) has consistently proven to be the best way of filling the top of your ranked list with relevant documents, so that the user can examine all of them.

    You still don’t think that what I am talking about is practical, do you? Well, let me give you a real world example, from my own life. I was in the market this holiday season for a handheld, personal GPS navigation unit. And I wanted to read a large number of reviews on the unit, so that I could decide which of my three choices to buy.

    According to you, I should have just typed the name of my unit in, and then clicked on the first relevant link, and be done with it. Right? Well, that strategy would have gotten me to the GPS company’s home page. Fat load of help that would have been. Or maybe, if I’d gone down 4 more links, I could have found the CNET review. Great..now I have two opinions: one from the manufacturer and one from a faceless technology reviewer. Still hardly enough to help me make my decision.

    It would have been so much nicer to be able to do the thumbs thing, and get rid of all the pages that contained the product, but that didn’t contain reviews of the product. Note that this is very different from assuming that all those other pages are spam. They are not spam. But they are not the sort of page that I want, right now. I want to find pages that will help me make my decision about what to buy. Pages that are selling the unit, or pages that mention the unit, without rating or reviewing it in some manner, might all be high quality, high pagerank pages. But they are not what I am looking for, and get in my way. They are, in that moment, not relevant to my query.

    So since Google did not let me re-filter my results list, using relevance feedback, to better find the SET of links that I needed to make my decision, I had to manually federate my search. I opened up windows to Google, Yahoo, MS, Ask, Quintura, and Vivisimo and entered my query into all of them.

    I got a lot of the same pages back, from the different engines, but I also got different pages, too. Ultimately I was able to assembled a set of about 7-8 unique, good links that all together helped me make my decision. But no more than 2-3 of those links came from a single engine! In fact, the most helpful review, the one that solidified my decision, I found using Quintura. No other search engine had that review, at least not in the top 20. (Disclosure: I do NOT work for Quintura.)

    So if I had been using a single engine, and just reentering my same query each time I need to find more reviews, I would have been sore out of luck. No single engine, Google included, helped me in any way in assembling that list. I had to manually do it myself.

    It should not be like that. Google, by not offering a service to help me find the information I needed, and instead serving me with advertisements, basically wasted my time. Instead of being able to rely in a proven tool (relevance feedback) I had to manually go around to half a dozen other engines, in search of more information.

    THAT is the point of reordering the search results. It has nothing to do with planning to run the search 5 times a day. It has to do with the fact that what is relevant to ME, right now, is not going to be the exact ordering the search engine gives me. By giving feedback on what is relevant to me, the search engine can do a better job RIGHT NOW, for this one query, so that I can find all the information that I need.

    I have heard Marissa Mayer from Google say time and time again that in order for them to offer a service, at least 5% of their users need to find that service useful. Andrei Broder’s query log analyses show that 48% of all queries are informational (a case in which you might not want to rely on a single link). And 15% of queries are from users that explicitly say they are trying assemble a set of links, rather than a single link, as a result of their search.

    Either way you cut it (either 48% or 15% of all searches), this is well over Google’s 5% requirement. And yet Google still does not offer relevance feedback, after 8+ years.

    What I am asking for is NOT impractical. I am NOT the only one who needs it. (I feel like I am the only one that recognizes that I need it, but I certainly am not the only one who does need it.) And relevance feedback, which is not my idea but an idea that has been studied for 30 years, is a proven way to deliver this search functionality.

    And yet Google continues wasting its time developing calendars, rather than helping searchers better meet their informational needs using proven tools? Are there really >5% of all Google users using calendars?

    Damn, I should not be so passionate about this crap.

  17. nmw says:

    “In general, it could be argued from the consumer point of view that the better the search engine is, the fewer advertisements will be needed for the consumer to find what they want.”

    *Anything* can be argued.

    See also: http://www.google.com/sponsoredlinks?q=software

    :) nmw

  18. nmw says:

    … and for more *reliable* results you’d probably use something like http://software.net/ and/or http://download.com/ (etc.)

    ;D nmw

  19. gio says:

    I dont think “sponsored links” has nothing to do with being a better search engine.

  20. ” And yet Google still does not offer relevance feedback, after 8+ years. “

    You apparently haven’t looked at the bottom of the results pages lately, or perhaps you should explain what you mean by “relevance feedback”.

  21. Sohbet says:

    i love you battellemedia
    Sohbet
    Chat

  22. JG says:

    Michael, before I write another 5000 words on this topic, please go look up “relevance feedback” yourself. The literature is extensive.

    And I took your suggestion and did a Google search. I found two interesting things at the bottom of the results page. The first was a little blue box in the center with the words:

    “Get organized for the new year with Google Desktop”

    The phrase “Google Desktop” is hyperlinked. So, is this another Google tip? (This time without the tip icon?) Or is it a paid advertisement, because it is in a blue box? I don’t know. I can’t tell. But it certainly is further blurring of the lines. Wonder what Blake Ross would have to say about that?

    Oh, but you probably didn’t mean that, did you? You probably mean the link where it says “search within the results”, right? In other words, where you can type in another query.

    I’m sorry, but that is not “relevance feedback”. It is a nice feature, but it is not “relevance feedback”.

    If you really want me to explain it further, I will. But I feel like I’ve already done enough explaining, above. If you want to know more, go to Google Scholar and look up relevance feedback, yourself.

  23. JG says:

    Michael: Oh who am I kidding? You don’t care enough to go look it up. So I went and found you a good definition of relevance feedback. The first (but not only) one I found is from a 1998 paper by a fellow named Yong Rui. If you want earlier citations, you can go all the way back to Rocchio in 1971 (J. J. Rocchio Jr. Relevance feedback in information retrieval. In Gerard Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313–323. Prentice-Hall, Englewood Cliffs, NJ, USA, 1971.)

    Here Rui’s definition:

    Relevance feedback is a powerful technique used in traditional text-based Information Retrieval systems. It is the process of automatically adjusting an existing query using the information fed back by the user about the relevance of previously retrieved objects such that the adjusted query is a better approximation to the user’s information need. In the relevance feedback based approach the retrieval process is interactive between the computer and human. Under the assumption that high-level concepts can be captured by low-level features the relevance feedback technique tries to establish the link between high-level concepts and low-level features from the user’s feedback. Furthermore, the burden of specifying the weights is removed from the user. The user only needs to mark which [web pages] he or she thinks are relevant to the query. The weights embedded in the query object are dynamically updated to model the high level concepts and perception subjectivity.

    We could go into this in further detail if you’d like but the key difference between relevance feedback and the “search within these results” that Google is currently offering comes down to the automated nature of relevance feedback. With Google’s current “search inside these results”, the user has to manually come up with additional query terms that may (or may not) re-sort the list into a more relevant order. With relevance feedback, the user need only mark a few documents as relevant and/or non-relevant, and the system automatically and dynamically takes care of the query reweighting. That is the main difference.

    Google has said over and over that users are lazy, that user’s do not want to do a lot of work when searching. Relevance feedback is perfect for that! Rather than forcing the user to think really hard and come up with a new set of query terms, relevance feedback allows the user simply to say “I like this link” and “I don’t like that link”. That’s it.

    If there ever was a search tool that was custom built for the Google philosophy, relevance feedback is it. And yet they don’t offer it. I want them to offer it. I need them to offer it. I would use it all the time. But if they won’t offer it, I wish they would at least come out publicly and say why. Publish a paper showing that it doesn’t work on web-sized collections or something.

    The longer they go without offering it, and the more they focus on calendars, the more I look elsewhere for other engines to help me find information I need. So, Google, any chance of offering it?

    Oh who am I kidding.. ain’t noone reading this thread anymore.. :-)

  24. nmw says:

    “I dont think “sponsored links” has nothing to do with being a better search engine.”

    [gio, January 14, 2007 01:32 AM]

    gio, did you *mean* the double negative (or are you repeating the negative for emphasis)?

    at any rate, the sponsored results engine is faster, cleaner, and presumably the links there are relevant since people want to pony up the PPC rate to show up there (of course they could do so and try to get you to play the lottery, but that’s just how Google works; I guess that http://software.net/ and/or http://download.com/ would be less willing to dilute their results, since that would tend to drive away users…. then again I can’t figure out why people still use Google — does anyone else have an idea?)

    ;) nmw

  25. After meeting Matt Cutts in person for the first time at a Consumer Reports WebWatch 6/9/05 Conference on “How Failure to Disclose Ad Relationships Threatens to Burst the Search Bubble”, I blogged on “Advertising and Mixed Motives”. It pretty much makes the same points you do, with the addition of a suggestion at that conference that Matt liked, but was never implemented.
    http://www.brokerblogger.com/brokerblogger/2005/10/advertising_and.html

    The suggestion was to make “Sponsored Links” more bold, and to have it be a link to a full explanation of that term like Yahoo does:
    http://help.yahoo.com/help/us/ysearch/basics/basics-03.html

  26. Trogdor says:

    By JG: I thought that the Google I fell in love with in 1998 was going to be like the magazine “Consumer Reports”, and not accept advertising at all. Avoid even the appearance of evil.

    Truth is, even Consumer Reports sold out, long ago. Not to advertisers, but to the unions / federations related to the testing labs.

    one place to start …

  27. JG says:

    Did it really, Trogodor? Shoot.. I am more naive than I thought :-)

    But do you get the point, at least, that there is so much more Google could be doing around helping us find information, than giving a single, unmalleable ranked list, topped and sidled with advertisements? Relevance feedback is only the most obvious and glaring of possible improvements, simply because it has been studied for decades and known to work. And yet it is not offered. Its absense is conspicuous.

    How can one but conclude that the reason for its absense is the Google’s focus on advertising? Advertisements, while possibly relevant (though I can’t imagine who actually advertises their GPS review site when I’m searching for reviews.. seems mostly to be people just trying to sell GPS devices, sans real, objective reviews), take up page real estate that otherwise could have been dedicated to ways for improving the search experience.