free html hit counter More on MSFT and Fast | John Battelle's Search Blog

More on MSFT and Fast

By - January 10, 2008

Fast-1

Gary has a timeline on FAST deals, showing the companies reach into enterprise search. Recall my earlier posts on how I am seeing this as a very interesting area powering new search UIs for consumers (against structured databases like the NYT articles, for example.)


Related Posts Plugin for WordPress, Blogger...

7 thoughts on “More on MSFT and Fast

  1. Jeff Oberlander says:

    Add Getty Images to that list. Getty launched its new site with FAST as the search engine in 07.

  2. JG says:

    You know what I find extremely interesting? It is the fact that if you are a Google AdWords user (you buy ads), or a Google Checkout user (you use Checkout to sell things), Google offers you a large, fancy, robust set of complicated interfaces to analyse, process, cross-correlate, plot, and otherwise tear apart all the information that is relevant to you within those domains.

    Google’s mission is to organize the world’s information. And when it comes to certain, select Google properties, such as AdWords and Checkout, it does a very nice job of letting you slice and dice and refactor and otherwise get a grip or understanding or organization of that data.

    But when it comes to Search itself: Nothing. When I do a search and get results 1 to 10 of about 1.4 million, there is nothing Google offers for me to slice and dice and refactor and otherwise get a grip or understanding or organization of those 1.4 million pages.

    Google doesn’t let me cluster those 1.4 million pages across semantic topics. Google doesn’t let me easily see all the different domains that are being retrieved by my query, perhaps grouped into clique or near-clique bundles. Google doesn’t let me see the most frequent keywords present within those 1.4 million hits, so that I can understand what types of pages I am retrieving. Google doesn’t do “named entity recognition”, and let me visualize persons, places, or things that are common across the 1.4 million pages that my query retrieved.

    I’ve always liked FAST because their technology does let me do many of these types of things. It makes search interesting. It makes search fun. It makes me feel like I actually have control over my search process, instead of feel at the mercy of whatever 1-3 SERPs happen to turn up above the fold.

    It will be interesting to see whether Microsoft can.. or even wants to.. bring this technology to the masses, to the web.

    I clicked that “earlier posts” link above, John, and was startled to see one of my earlier comments on that post. Almost two years ago to the day now. And I was complaining about the same thing that I am now: lack of interesting interfaces and information to aide and assist the search process. Goodness.. I am just a broken record, eh? :-)

  3. John Battelle says:

    I agree, JG, it’s a shame we can’t do more with our searchstream. I wish some engineer would figure it out in that 20% time!

  4. F.D. Athow says:

    JG : Try to look into what Google is planning ahead in Google Experimental search (google dot com slash experimental), I am sure you will be pleased with what they have in store.

    The sad thing though is that people have grown accustomed to Google’s way of doing thing and are reluctant to change. A bit like users having to click on Windows Start button in order to shut down their PC.

    Alternative websites like Alltheweb, Northernlight (see waybackmachine for more) or even Ask did present viable search methods but none managed to gather a critical mass.

  5. search.firm.in says:

    I also wonder if Google images and Google video could learn something from Fast (not just Google news).

    I have been very impressed when I have recently chatted with the designers and leaders of Fast — I find the technology to be superb, but what is more: The people at the helm appear to have a deep understanding of how information retrieval works.

    And BTW, yes, they also power alot of search — far more than I guess alot of people might realize.

    :) nmw

  6. JG says:

    “I agree, JG, it’s a shame we can’t do more with our searchstream. I wish some engineer would figure it out in that 20% time!”

    John: This needs to be more than 20% time. This needs to be full time position. Nay, it needs to be a full time team. Google has said that search is only 5% solved. If universal search gets us to 7%, we’re not going to get to 9%, much less to 15%, through some random engineer’s 20% time. It takes a full team of dedicated employees to make some of this stuff work. Not only the semantics of this stuff (the algorithmic intelligence behind doing good named entity recognition, for example), but how you scale it to the web.

    How many employees does Google have these days? 16,000 or something like that? And Google follows the 70/20/10 rule, with 70% of its effort on search and ads? So let’s say that half of that 70% is search and half of it is ads.. that means 35% on search. 35% of 16,000 is 5,600 people. You would think out of 5,600 people they could spare 10 or 20 or even 50 people to go full time on these issues.

    Suppose of those 5,600 search people, 500 are general purpose administrative, 500 are core (web) search algorithmics, 500 are (web) search spam fighers. And let’s estimate 250 people apiece on video, blog, image, book, scholar, desktop, enterprise, patent, and product search (for a total of 2,250 people). That’s a total of 3,750 people working on search, in some form or other. So what are the other 1,850 people doing all day?

    I’m sure there is an engineer inside the ‘plex right now, reading this and chuckling at my analysis, rolling her or his eyes at how far off my numbers are, at how skewed my understanding is. But that is my point. From the outside, I hear Google make claims about how many resources they dedicate to the search problem, about how they are not losing focus on that core. And I make reasonable inferences, based on claims like 70/20/10, about how much effort they are putting into the problem. And then I look at the actual products produced, per unit of effort, and it doesn’t quite match up.

    F.D. writes: “Try to look into what Google is planning ahead in Google Experimental search (google dot com slash experimental), I am sure you will be pleased with what they have in store.”

    Well, a few years ago some friends and I spoke with some fairly senior Google management about their search development process. And what we were told, in unequivocal terms, is that any new idea for improving search had a required 5 month lifecycle. You had to go from conception to introducing the idea, live on the main Google search page, in 5 months. I do think that the timeline search on the Google Experimental page is great; it’s a strong positive step in the right direction. But I think I first heard about that over half a year ago. When is it going to go live on the main Google page? When is it going to go from alpha to beta? When is Google going to start sacrificing advertising space on the main SERPs page, in order to show timeline-refactored search results?

    And after that, what else is in the experimental queue? Keyboard navigation of search results? Are you kidding me? Not that keyboard navigation is a bad thing. I am an avid Pine user for my email, and love keyboard navigation. But compared to all the interesting ways that Google Analytics lets you slice and dice and refactor and cross-correlate your advertising data, the next big “organize the world’s information” search improvement is…keyboard navigation? Come on!

    So again, it just makes me wonder what those other 1,850 full-time search-dedicated employees are doing, in their 80% time.

    I hope I do not come off as sounding too critical of Google in all this. I am more than willing to write my concerns off as a lack of understanding on my part, rather than any fault of Google’s. But after almost 10 years of Google search, and so very few ways of slicing your search results data (esp when compared to Google Analytics), I am left scratching my head, wanting someone to explain it to me.

    Oh, and nmw: You also make a good point about applying some of this FAST-style search to image and video search. I agree; that sort of thing would be extremely useful.

  7. semantic.info says:

    JG,

    I’m not even sure that “one-size-fits-all” search is the way to go. I mean: Big might be good for Wal-Mart or Aldi, but when we’re talking about information, then it is not clear that “one-size-fits-all” (aka “universal”) search is a useful approach.

    This is an issue that Google sticks it’s head in the sand over — they have not “figured out” a simple algorithm that applies across the board. For example: Stock-market prices are quite time sensitive, but 2+2=4 is not time sensitive. And when I want to know whether when a train/plane is leaving/arriving, google’s cache will not help all that much, either. Simply stated: It may very well be that Google’s corporate management sees no reason to create a search engine that produces anything more than “passable” results.

    To realize how little attention is paid to producing excellent results, consider this: Many of the links on the web can be fully discounted as no longer all that meaningful, corrupt or whatever — my gut feeling is that well over half of all links could easily be thrown out as completely meaningless. So a plain & simply link-based algorithm is probably more “wrong” than it is “right”.

    AFAIK, Google’s approach to weighting results is also quite strange — there is a rumor that domains which have been registered for a longer time are treated as more authoritative than more recently registered domains. Well, I also have a 19th century dictionary that clearly defines “spontaneous combustion” (LOL)….

    If they were a little more ingenious, they would have observed by now that economy.com may very well have more authoritative information about the economy than movies.com — and that movies.com might have more authoritative information about movies than economy.com (and perhaps people would do well note how whitehouse.com is also no longer as dull as it used to be ;). Seems to me that it would take no more than have a dozen mediocre engineers no more than a week to develop a piece of code that notes that the domain name is the primary indicator of authority (something like that might even save Google from demise ;). Otherwise, some hacker might do it in a weekend or two.

    I guess it’s really only a matter of time….

    :) nmw