New York

Running around the city, hard to update, if you have news for us to discuss, let us know in the comments……

Running around the city, hard to update, if you have news for us to discuss, let us know in the comments…

21 thoughts on “New York”

  1. Well, maybe the following item is news, maybe it isn’t. But it’s something that I find important and revealing, and I would like to open up a discussion about it, if I may.

    In the three years that I’ve been commenting on John’s blog one of my more common themes has been the myopic vision of search that Google espouses.

    There are two main kinds of search, or information retrieval. The first kind is “fact lookup” search, aka known-item search, aka navigational search, aka precision-oriented search.

    The second kind of search is “research” or knowledge synthesis search, aka informational search, aka recall-oriented search.

    In the first kind of search, a single fact or a single web page will satisfy your information need. In the second kind of search, multiple sources, often as many sources are possible, are necessary to satisfy your information need. A single answer, or single source alone actually cannot satisfy an information need. You need multiple sources, multiple views or aspects. It’s “find one” vs. “find all”.

    I have long been arguing that Google only does the former type of search, and is completely blind to the latter.

    Well, the “news” that I am talking about is Marissa Mayer’s official Google blog response to her recent gaff when she claimed search is 90% solved. What she really meant, apparently, is that the other 10% will take 90% of the effort, so she really meant 10%.

    As evidence of all the ways in which search is not solved, she wrote a lengthy paragraph revealing the multitude of searches that she recently has done, or has wanted to do. I need to quote this paragraph in full:

    Are “fab,” “goy” and “eely” words? (There was a Scrabble game going on.) What time does J.C. Penney open on Saturday? Which school has a team called the Banana Slugs? What is the team mascot for San Jose State? How much power does that hydroelectric dam generate? What do you call a group of turkeys? What time does Tropic Thunder show? What’s the name of that great Irish flute player, first name James? What’s the name of the largest city in Russia after Moscow and St. Petersburg? Which is older, a redwood or a cypress? What’s the oldest living thing and how old is it? Who sings “Queen of Hearts”? What kind of bird is that flying over there? Is the “LF” in San Francisco on Union Square or Union Street? What are the dance steps to the Charleston? What day of the week was The Lawrence Welk Show on? What are the lyrics to “In the Mood”? How does Coumadin differ from aspirin in its blood thinning effects? What was the story behind the naming of the number “googol”?

    I was astounded when I read those information needs. Each need, every single one, is a “fact lookup”, single-answer, known-item information need! Maybe, just maybe, the aspirin question might be more informational. But then later in the post she finds “the answer”. Where is the imagination, the innovation? For someone who has been working on search at Google for 9 years and 3 months, where is even a single information need that isn’t just fact lookup?

    So what am I talking about, with these “informational” searches? Let me give a few examples. Suppose you are writing a research paper on a new eye-tracking computer input device that you have developed. There is of course other work on eye tracking, and other work on non-eye tracking computer input devices. You need to cite relevant prior work. What are all the papers that you should cite? You don’t know what they are, but you know that they are relevant to your paper. And that there is more than one. Search for them. How well does Google support you, in that task?

    Or as I was saying a few days ago, I know that there were many causes, many historical and cultural factors leading up the Czechoslovak Velvet Revolution of 1989. I’d like to find writings of people’s personal accounts of these factors. Maybe there exists a story out there of some teenager attending a “Plastic People of the Universe” concert in 1982, experiencing an epiphany, and deciding to work for political change. Maybe there is a story out there of someone who sat with Vaclav Havel at a cafe late one night, and said something that sparked an idea that led to one of his plays. You really don’t know what is out there, so you can’t really search for “Vaclav Havel cafe play”, because that story might not exist. But another one might. And what you’re really interested in anyway is not that specific story, but all stories that are instances of the social and cultural milieu that led up to the Velvet Revolution. Search for them. How well does Google support you, in that task?

    You could even take some of Marissa’s “single, simple fact only” queries, and turn them into real, interesting information needs. Instead of just trying to find the school with the Banana Slug mascot, you may be interested in all schools with “non-traditional” mascots. Most schools have bears, lions, cougars, indians, eagles, falcons, horses, and so on as their mascots. But many schools also have leprechauns, frogs, artichokes, and okra. Suppose you want to find not only all those schools with non-traditional mascots, but you also want to discover what it is about the personality or character of these schools that steered them down non-traditional paths. That information is most likely not contained in a single document or web page somewhere. Search for that information. How well does Google support you, in that task?

    Marissa goes on to talk about the mode, media, personalization and language. But those are all still in support of the known item, fact finding, navigational needs. You want to find the time that Tropic Thunder starts, from your mobile and in French, rather than from your computer in English. A noble goal, I do not dismiss it. I only complain that whether it is French, English or Swahili, from your computer, car, mobile, or implanted brain chip, it’s still only a “known item” search.

    The reason I go on at length about this is that I keep wanting to engage with the other readers of this blog about this topic, exchange opinions and ideas. There seems to me to be two possibilities here. (1) Despite working for 9 years and 3 months on search, upper level Google management is simply not aware, and cannot seem to imagine, any type of information need beyond navigational, known item, “fact answer” needs. Therefore we never see anything offered to help those of us with those types of needs better find all the information that we need. Or, (2) Upper management is well aware of these additional information needs, but realizes that such needs are next to impossible to sell advertising against. And therefore only concentrates on building systems that push users into types of queries against which advertising works well. Look at all the queries that Marissa talks about above. Not all, but a good number of those queries lead to advertising. Interested in the Banana Slug mascot? Here, we’ll find an advertiser to sell you a UCSC sweater. What was the name of that Irish flautist? Here, we’ll sell you a CD. Find movie showtimes? Here, we’ll sell you a ticket. What are the dance steps to the Charleston? Here, sign up with Arthur Murray for dance lessons. Compare that to the information need in which you are trying to find personal essays on Czech and Slovak culture and politics in the 1970s and 1980s. How will you *ever* sell advertising against that? You can’t. So why bother building systems that get users to think more about those types of searches? You would only steer them away from issuing more “advertisable” queries.

    (Note that I’m not saying that Google tries to advertise against every single query. They obviously don’t, and they do, very conscientiously I believe, try to reduce advertising against queries for which it doesn’t make much sense. For example, Marissa’s query about the oldest living thing. Even though it is a “factoid” query, it is not really “advertisable”, and Google does very well in recognizing those sorts of situations. So not every fact-lookup, known item search is advertisable. But overall, “fact” queries are more advertisable than “informational” queries, so the general philosophy of only building a search system that supports more the former rather than the latter is a philosophy that it appears Google might have consciously adopted.)

    Anyone want to engage on this topic? Point out something that I missed? Call me on my b.s.?

  2. I can’t give a full response now, JG, but I recall thinking earlier (and again now) that what you refer to as an informational query might require something similar to “pattern recognition” (I just recently heard an interview with Ray Kurzweil speaking about such issues with NYTimes).

    One thing about pattern recognition is that is requires some standard, “correct” pattern (in order to know whather a system has performed successfully or not). So, for example: if a machine were trying to understand “spoken English” and it came up with “uhtimsly”, then we would say that it failed (but what if “uhtimsly” was actually spoken and was actually “correct” (being a “brand name”)?

    Likewise, how are we to know that if a machine guesses X was a “cause” that it would be right? or wrong? Is causality even acknowledged as a construct at all?

    Granted: This is just a knee-jerk kind of response, and I think to do your argument the fairness it deserves, I think alot more sensitivity to the critical issues you note are warranted.

    BTW: I especially like your depiction of how Google organic results are very “page” oriented (rather than synthesizing information gleaned from various sources. Intriguing concept!

    🙂 nmw

  3. In the second kind of search, multiple sources, often as many sources are possible, are necessary to satisfy your information need. A single answer, or single source alone actually cannot satisfy an information need.

  4. 1. What are all the papers that you should cite? (re: new eye-tracking computer input device)

    2. all stories that are instances of the social and cultural milieu that led up to the Velvet Revolution.

    3. Suppose you want to find not only all those schools with non-traditional mascots, but you also want to discover what it is about the personality or character of these schools that steered them down non-traditional paths.

    My answers:

    1. check all eye-tracking domain names (presently up to about 300, perhaps several thousands in the coming years); if there is an important article, it will appear on one (or more) of these domains (and indeed: Google’s chrome appears to be an attempt to get Google ads into the URL bar)

    2. what makes a story a story? (recall how we recently discussed this related to “emails” — form/genre is still one of the most important and underutilized aspects of information retrieval, but it is also extremely complex). But at any rate, if “Velvet Revolution” is a significant phenomenon, I expect the topic will have a domain name (and so the information related to that topic will be collected there)

    3. I’m sorry — this is just plain & simple going too far. Not everything that happens is documented, and far less can be explained in an objective manner. I think there was a philosopher named Bacon who was one of the first to write about “how to” document observations in a “scientific” manner — maybe try that?

    All in all, I feel you seem to be too focused on “full text” approaches. Not only does Google not do full text search, but it also weights towards link text (leading people to create links all over the place [e.g. wikis, blogs, etc.]) — slowly but surely they are beginning to realize how the “wisdom of the language” works ( http://gaggle.info/miscellaneous/articles/wisdom-of-the-language ). Even though they have declared that they wish to fight against organizing the world’s information with natural language (such as “credit cards” — apparently, the designers of Google wish brand names to show up for such a search, rather than information about credit cards [at least that’s what they said more than 2 years ago]), they will not be able to control / monopolize “natural language” the way they wish — if they keep trying this, then they will simply ultimately fail.

    Full-text search is actually an abysmally poor approach to information retrieval — it is brawn rather than brain. The field of information science, as we have noted before, has a rich history of much more effective approaches — and perhaps one of the most persistent of these are indexing & abstracting services which maintain an “authorized vocabulary” that is specific to the professional community with related expertise. So, to get back to your first example, if there is no such “eye-tracking” community, then perhaps there is a “usability” community, a “GUI” community or a “web design” community, perhaps a “interactive design” community or something like that(?) — and such communities will maintain their own jargon / authorized vocabulary for the purposes of “organizing information” relevant to the topic “eye-tracking” (and/or broader / narrower / related terms). Google continues to have difficulty with measuring relevance — and if the company wishes to restore credibility in its information retrieval product/service, it will ultimately have to recognize such communities of experts in order to avoid becoming some fruity “apples, oranges and bananas” joke.

  5. All in all, I feel you seem to be too focused on “full text” approaches. Not only does Google not do full text search, but it also weights towards link text (leading people to create links all over the place [e.g. wikis, blogs, etc.]) [snip] Full-text search is actually an abysmally poor approach to information retrieval — it is brawn rather than brain. The field of information science, as we have noted before, has a rich history of much more effective approaches — and perhaps one of the most persistent of these are indexing & abstracting services which maintain an “authorized vocabulary” that is specific to the professional community with related expertise.

    nmw, thank you for your reaction and insights. One reaction, myself: Whether or not I am focused on full text approaches vs. link-based vs. expert categories.. to me that is a separate issue. Initially I am not so interested in “how” you solve these things. I am more interested in “what” it is that you are trying to solve. What are the actual user information needs?

    And all I am trying to say that there is more than one type of user information need. There is more than just “single factoid”, navigational user needs. And yet Google really only does that “factoid” type of search. And I’m trying to figure out why. Is it because they just can’t imagine any other type of user information need, and so are not even aware that different types of systems need to be built? Is it because they don’t know how to do it (it is complex, as you say), and don’t want to call attention to their inability, after 10 years, to make any progress? Or is it because these other types of search don’t lend themselves well to advertising? Or is there yet another reason?

    I don’t have a lot of time now, myself, so I’ll respond to your other points in a bit.

  6. Yes, you’re right — I shouldn’t have assumed that (and I did say “seem”). However: I think that your method of disregarding navigation as a crucial (and often iterated) step in the heuristic information retrieval process is amiss. And Google doesn’t want to acknowledge this either (because then people might wake up and discover that whereas “weather” and “news” mean something to many people, “Google” means nothing to anyone [besides “to use Google”; remember: the number is spelled differently ;]).

    So once people realize how the “wisdom of the language” works ( http://gaggle.info/miscellaneous/articles/wisdom-of-the-language ), Google will simply be off in the left field of meaningless brand names.

    My hunch about factoids is: They follow the “wisdom of the crowds”. In other words: these results are quasi “spammed” into Google results by the masses who repeat the same stupid factoids over and over ad infinitum — and because there is such universal agreement on “who is buried in Grant’s tomb”, all of the links match up, so these results “bubble up” like 2+2=4.

    That’s why I think Google is a really cool website — if you want to reach people who don’t know such factoids (and are searching for them — and also the most popular music videos, the latest “hit” movies, free stuff, etc.). Google works for such “wisdom of the mobs” phenomena, but then again communities such as digg.com probably have a more up-to-date take on the pulse (since by the time Google can actually detect something trendy it’s probably already “over” ;).

  7. I think that your method of disregarding navigation as a crucial (and often iterated) step in the heuristic information retrieval process is amiss.

    I am not disregarding navigation; it is (I agree) often crucial. Just because I advocate additional methods does not mean I want to remove previous methods. A good information retrieval system should ideally have a number of different modes, a number of different arrows in the quiver, and should be able to select the right mode at the right time for the right user.

    What I am objecting to is the fact that this navigational approach is the *only* model for information seeking behaviour that Google implements.

    In fact, this “navigate and iterate” approach, as I am sure you are aware, has a formal name in the information science literature. It is the “berrypicking” model of information seeking, first put forth by Marcia Bates in 1989.

    So again, what I object to is not that Google *does* follow the berrypicking model. It is that Google *only* follows the berrypicking model, to complete and utter disregard of every other type or model of information seeking behaviour.

    To add injury to insult, almost every time I hear a top Googler talk about their system, they like to brag about how they don’t “force” the user into any types of behaviors, how they remain as neutral as possible. And I think to myself.. well.. yes.. that may be true.. if you are a berrypicker! If, however, you are a user that has a different method or style of information seeking, then Google utterly forces you into their berrypicking behavioural design.

  8. My hunch about factoids is: They follow the “wisdom of the crowds”. In other words: these results are quasi “spammed” into Google results by the masses who repeat the same stupid factoids over and over ad infinitum — and because there is such universal agreement on “who is buried in Grant’s tomb”, all of the links match up, so these results “bubble up” like 2+2=4.

    I agree with this hunch. So Google great for factoids; I think I already said as much.

    It’s that anything that isn’t a factoid completely breaks down. All those stories and personal essays, from the trenches, that led up to the Velvet Revolution, are not “factoid” in nature. So when you try to find them Google.. well.. pardon the pun.. bombs.

    And I’ll respond more to your other, more detailed points, in a bit. You raise some other good issues that still need addressing — sorry I don’t have more time right now.

  9. JG: do you have examples of other information seeking patterns than berrypicking? Are you talking about the amazon recommendation style browsing, or wikipedia article interlinking/categories, or perhaps is there a pointer to a meta discussion of these different search/browsing patterns?

  10. do you have examples of other information seeking patterns than berrypicking?

    Sorry, I don’t have a lot of time right now to pound out my usual 15 paragraphs.. maybe in a few days. 🙂 But in the meantime, take a look at Gary Marchionini’s 1997 book, “Information Seeking in Electronic Environments”. See here for a pre-print version with no figures. Gary covers a lot, though by far not all, of these issues. Chapters 1, 3 and 9 may be good to scan, with Chapter 3 probably the best. The first paragraph of Chapter 3, in fact, already goes well beyond any kind of information need that Marissa Mayer seems to be able to imagine.

    Nick Belkin also has some interesting thoughts around these issues, dating back almost 3 decades now…but just as relevant as ever. Do a (navigational, known item) search for his “Anomalous States of Knowledge as a Basis for Information Retrieval” paper, published in 1980, I think. It’s basically a model for “unknown item” search, rather than “known item” search.

    Lots more available. Unfortunately, how are you going to search to find them? Without me telling you, would you ever have known about a really good essay in a recent book, edited by Amanda Spink, on this topic? Now that you know that this paper exists, you may be able to find it — even though I haven’t told you the paper title or author.. only that it was in a “recent” book and who the editor was. But if I’d not said anything, how would you have otherwise known to look for it? Unknown item search.

    So how well does Google support you, in your new information need or task of discovering all the various possible models of information seeking that researchers have written about over the past few decades? Not very well, eh?

    Deliciously ironic, don’t you find? 😉

  11. Hope JG will be “right back” (and also hope John gets caught up on some of those promised posts he’s been meaning to write 😉

    Now, here’s a word from Dan Savage:

    ;D nmw

  12. nmw, as promised, here are my other reactions:

    1. check all eye-tracking domain names (presently up to about 300, perhaps several thousands in the coming years); if there is an important article, it will appear on one (or more) of these domains (and indeed: Google’s chrome appears to be an attempt to get Google ads into the URL bar)

    Ok, so there are 300 eye-tracking domains. But my example wasn’t just an eye tracker. It was an input device / input actuator. The device wasn’t just for passive monitoring. It was to allow the user to actively input information to the computer.. via the eyes.

    So I probably also need to cite not just eye tracking papers, but also cite HCI papers, input device papers, etc. So there are another 300-600 domains, right? And now that we’re up to almost 1000 domains, I am still going to need help searching through those 1000 domains. I still need a system that will help me “organize the world’s information” in a manner beyond just giving me the top 10 blue links. I need a way of quickly sorting through, summarizing, understanding, clustering, and searching, within those 1000 domains. So Google still is not helping me, very much at all, with that task.

    2. what makes a story a story? (recall how we recently discussed this related to “emails” — form/genre is still one of the most important and underutilized aspects of information retrieval, but it is also extremely complex). But at any rate, if “Velvet Revolution” is a significant phenomenon, I expect the topic will have a domain name (and so the information related to that topic will be collected there)

    Whatever makes a story a story, it is still incredibly difficult for me to find those stories. Yes, there may be some collected stories on a web page somewhere, all dedicated to the velvet revolution. But there might also be accounts in someone’s bio, on their personal web page.

    The velvet revolution was a significant phenomenon, but I’ve checked. There is no http://www.velvetrevolution.com.

    3. I’m sorry — this is just plain & simple going too far. Not everything that happens is documented, and far less can be explained in an objective manner. I think there was a philosopher named Bacon who was one of the first to write about “how to” document observations in a “scientific” manner — maybe try that?

    Yes, not everything that happens is documented. That shouldn’t stop a search engine from better helping me find everything that has been documented. More importantly.. how do I know when to stop searching? How do I know when I’ve found everything, when I’ve reached the limit of the set of documented information.

    The search engines aren’t going to solve this problem any time soon. But my point is still that it doesn’t mean they shouldn’t try. And it feels, to me, like they’re not every trying. Like they’re more interested in calendars and chat clients, than on making progress on these issues.

  13. 1. I need a way of quickly sorting through, summarizing, understanding, clustering, and searching, within those 1000 domains.

    Not really: Most of the “wisdom of the language” results are already focused, sorted, etc. When you search at homes.com you are not going to be searching through coffeemakers. Granted, if you search through 300 “homes” domains, you will be able to specify “4 bedrooms” and the limited scope of the search will probably result in highly relevant results — without searching through hundreds of millions irrelevant domains.

    2. Whatever makes a story a story,

    this is not a minor issue — you have to know what you’re looking for in order to find it

    3. Yes, not everything that happens is documented. That shouldn’t stop a search engine from better helping me find everything that has been documented. More importantly.. how do I know when to stop searching?

    This is known as “the economics of search” (a significant & central focus of the “economics of information”) — and Hal Varian has done alot of research in this area (some of which I studied about 20 years ago). Very basically: if it costs you more effort to find some information than that information is worth, economists consider that to be the point at which you would stop looking (like much of “pure” economics, this is very academic ;).

  14. 1. I need a way of quickly sorting through, summarizing, understanding, clustering, and searching, within those 1000 domains.
    ..
    Not really: Most of the “wisdom of the language” results are already focused, sorted, etc.

    With all respect, I think you’re still somewhat missing the point. I have got an information need which spans multiple topic areas. I know it spans multiple areas, because it is novel research. The work, the idea, didn’t exist until I created it. So nowhere is this stuff already curated. My information need spans multiple curations. That is very, very different from the “house” situation.

    2. Whatever makes a story a story,
    ..
    this is not a minor issue — you have to know what you’re looking for in order to find it

    Yes, I know what I am looking for. I am looking for stories about the cultural and social milieu that led up to the Velvet Revolution. I just don’t know what *words* will be found in those stories. But I know what the stories will be *about*. So I know exactly what I am looking for, i.e. I will perfectly recognize the relevance of the information when I see it. I just don’t know the keywords to use, to find it.

    This is the classic information retrieval problem. This is what Belkin is talking about, with his ASK model. And there have been many proposed solutions over the years. Some work better than others. But my point remains: Despite all the work in the area, what is Google doing to help solve this? With their 14,000 people working on search and advertising. How many of those people are actually trying to fulfill Google’s mission, by working on this, and similar, problems?

    3. Yes, not everything that happens is documented. That shouldn’t stop a search engine from better helping me find everything that has been documented. More importantly.. how do I know when to stop searching?
    ..
    This is known as “the economics of search” (a significant & central focus of the “economics of information”) — and Hal Varian has done alot of research in this area (some of which I studied about 20 years ago). Very basically: if it costs you more effort to find some information than that information is worth, economists consider that to be the point at which you would stop looking (like much of “pure” economics, this is very academic ;).

    Ok, maybe I’m not quite asking the question in the right way. Let me try again. If it costs you more effort to find some information than the information is worth, why isn’t Google doing more to *lower* the cost of finding that information? Why do they make it so darn difficult to sort through the 1.2 million results that come back as hits, as a result to one of my queries?

    Because there are two ways to change the slope of the cost/value function. One is to raise the worth of the information you are seeking. The other is to lower the cost of finding the information.

    Google has no control over the value of the information. That is what the user controls. Google controls the cost of finding the information. That’s its reason for existence. Organizing the world’s information. That’s its job. That’s its philosophy.

    So again, just because everything isn’t documented doesn’t mean Google shouldn’t help me better or more easily find the information that does exist. Google should be lowering the costs of finding the information, not stalling for 10 years.

  15. Yes, I think we are getting close to tying up the “loose ends” — I have a hunch that we basically agree, but that our expectations of what a “solution” might look like differ.

    I feel that it is illusory to think that a computer will be able to “solve” such complex issues — so complex that they can hardly be described with mere “words”. Algorithmic information retrieval is a text-based task, and will remain so for a long time to come, regardless of how many calculations per second are possible (such “insight” is not simply a matter of computational speed / capacity). Indeed: all of our “knowledge” (apart from such tautologies such as “who is buried in Grant’s tomb?” or “what is 2+2?”) is merely statistical guesswork (I think Einstein analogy of the “Elephant in the porcelain store” applies quite well to such limits of “knowledge”).

    What will improve the guesstimate is the involvement of experts. If I have a toothache, I would do better to ask 1, 2 or 3 dentists than 100 car mechanics or electricians. Likewise, if I car problems I would be ill-advised to ask a dentist simply because he/she has a university degree.

    However: Google applies the same algorithm to all webpages (and this is the caveman-with-a-club “one-size fits-all” approach).

    The “Wisdom of the Language” approach is not simply algorithmic: as the wisdom of the language approach develops and spreads across the Internet, each topic / domain will attract a community of experts related to that topic / domain, and then your questions will not be addressed to an mathematical formula, but rather to an information system which is maintained be a team of experts (and/or enthusiasts in that field).

    We do not need to program computers to understand “natural language” (which they will never do anyways — they will always be a generation or two behind the language that living humans are speaking today [I guess you can tell that I am not a “believer” in AI? ;]). I believe too much energy is wasted on dreamy visions when it would be far more effective to create solutions with “down to earth” reality — and that means human involvement.

    Perhaps the prototypical stepping stone for getting to web 4.6 is what I would refer to as the leading web 3.0 company: Digg.COM. Digg.COM was not the first to involve community involvement in ferreting out “relevant information” (and Digg.COM is also not relevant in every situation — IMHO it’s relevant for / focused on something like “tech news” [and I feel their aspiration to become more “one-size fits-all” is in fact misguided — so I feel they should not go that route, because that would only lead to “mixed up confusion”]). Strangely, “digg” seems to be quite the “well chosen” brand name, in that although it is not a word, it sure as hell sounds like a word that describes “what it’s about” — it could even be considered a “typo” of the “real” word (if there were such things as “real” and “fake” terms). Another good example is Twitter.COM — both of these companies stick very close to a meaning, and both are actually primarily focused on the community rather than any topic. And when these 2 aspects (community + topic) are wedded, then we will be getting pretty damn close to what I consider to be the wisdom of the language.

    Perhaps one of the best examples of this (that I have so far experienced) is Download.COM.

    But I am quite certain that many many more will follow — I think they will start with very general terms (such as “news”, “weather” — or, as in the case of your quest for information, something like “history” [for example, that history.info redirects to hopescience.com will simply not stand the test of time — market forces will move that site towards an equilibrium that has something to do with the idea “history” … and ultimately, it will be a history “search engine” — and perhaps it will include community involvement, allowing you to ask: “what does anyone know about the velvet revolution?” ;]).

  16. nmw: yes, I think we agree on many things, and I think we still disagree on some things. And also yes.. maybe it’s that we agree on the problem, but disagree on solutions. For example, you say:

    I feel that it is illusory to think that a computer will be able to “solve” such complex issues — so complex that they can hardly be described with mere “words”.

    I agree with this. But notice that I didn’t say that a *computer* had to solve these issues. I said that *Google* had to solve these issues. Or, rather, that Google had to lower the cost for the searcher to solve these issues, using Google as an “intelligent” assistant.

    A lot of folks have been kicking around the idea of “Search as a Dialogue” for a while. I am not saying that is the ultimate answer, but it is an example of something where the human, the *single* human (rather than the crowd) can get directly involved in instructing and reinstructing the machine about their information need, so as to be able to find some of this more difficult information. The machine does not have to solve everything. But the machine has to be able to involve more human intelligence, to the point where the combined human-machine system gets the task done.

    And early example of such an interactive solution is relevance feedback. It has been around for decades. And yet is conspicuously absent from Google. Presumably because most users don’t care about it, but so what? Some (probably more than 5%, even) do, and therefore there should be a search-box-command-line option to turn it on, when I want it. There isn’t.

    What I feel right now is that Google has no interest in making the task easier to solve, by allowing the user to provide any sort of guidance, direction or interaction in the search process. Except for maybe a little bit of spelling correction. To me, that is an appallingly severe underutilization of human expertise: The searcher’s own human expertise.

    The deal here is that the human might not always know the right questions to ask, but the human recognizes ’em, when they see ’em. Google doesn’t always know what answer to give, but can detect patterns in massive volumes of data, so as to surface information of which the human might not have been aware.. allowing human intelligence to then step in and ask the right question.. or give the right feedback. The computer is capable of detecting these patterns.

    So I think we both agree that the machine shouldn’t solve everything. But your solution is that the wisdom of crowds + wisdom of language is the answer, whereas my solution is that the wisdom of the individual + massive dataset pattern recognition + interactive feedback loops between the machine and the human is the answer.

    Either way, I agree with muhasebe, above: Google controls the cost of finding the information. That’s its reason for existence.

    I’ve long hypothesized that the reason why Google doesn’t provide more interaction and feedback possibilities is that, instead of wanting the users to interact directly with the results, they would rather users just give up looking, get frustrated, and turn to the advertisements (cha-ching) column for their answers.

    If you gave the users better ways of finding the information themselves, there would be fewer ad clicks. So by raising, or at least keeping relatively high, the cost of finding information (as muhaseba says), Google makes money.

    Oh, sure, they change their algorithms here and there. They make 450 little tweaks over the course of a year.. adding +0.02 to some mixture weight, decreasing the variance on some parameterized Gaussian.. adding a little bit more whitespace to the first result so that it pops out at you more. And a few SERPs get better here and there. Whatever. They lower the user cost by tiny, tiny little percentages, essentially leaving it unchanged. But they never fundamentally change the way the user is able to interact with the information, thereby lowering the cost by leaps.

  17. But notice that I didn’t say that a *computer* had to solve these issues. I said that *Google* had to solve these issues.

    I don’t feel that Google *has* to solve these issues — indeed: I feel the company is incapable of doing so, because I feel they lack the necessary expertise.

  18. I don’t feel that Google *has* to solve these issues — indeed: I feel the company is incapable of doing so, because I feel they lack the necessary expertise.

    But *how* could they possibly lack the expertise?! They have publicly stated that 70% of their effort goes toward search and advertising. Assuming half/half search/ad split, that still leaves 35% of the company effort toward search. And with 20,000 employees, that means that they have 7,000 people working on search. How could they possibly not have the expertise, in those 7,000 people? Are Google hiring policies really that poor?

    Seriously. What gives?

  19. Google returns domain names — and domain names are where the reputation and credibility are “fastened”.

    Take another listen to Mr. Battelle’s introductory remarks for the NYC CM summit conference — “it’s not one-size fits all, it’s one-size fits one campaign” (Google represents one-size fits-all, domains represent the topical expertise required to answer complex/detailed questions [i.e., “one campaign”]).

    Many people have not recognized this yet, but apparently Google itself has — which is why they’re trying to find a new product that will keep the company going once people realize that the ppc business is bogus.

    One thing they appear to be trying to do is to lock in people via the Chrome browser, in order to continue to be able to sell advertising before people are able to navigate to the websites where they can get meaningful information.

    Similarly, they are also buying locations that seem to be hot brands — but I think they are barking up the wrong tree there (I think information retrieval is always keyword focused — apparently the Google management still thinks that Google is a cool brand [or something like that], even if they don’t really do information retrieval [indeed: they are now much more of an ad agency than anything else]).

Leave a Reply

Your email address will not be published. Required fields are marked *