free html hit counter From Static to Realtime Search | John Battelle's Search Blog

From Static to Realtime Search

By - December 04, 2008

What Are You Doing-1

My post on the subject, while arguably arguable (yes, I know, I know, but it’s better to just say it than let it stick in your craw) is up on the Looksmart Thought Leadership site (part of an FM program I am participating in). From it (this is just a portion):

I think Search is about to undergo an important evolution. It remains to be seen if this is punctuated equilibrium or a slow, constant process (it sort of feels like both), but the end result strikes me as extremely important: Very soon, we will be able to ask Search a very basic and extraordinarily important question that I can best summarize as this: What are people saying about (my query) right now?

When it first hit critical mass, it seemed Google answered this question. For the first time, you could ask a question in your native tongue, and get an answer. It felt immediate, but save for the speed with which the search results were rendered, it was not. Instead, it was archival – Google was the ultimate interface for stuff that had already been said – a while ago. When you queried Google, you got the popular wisdom – but only after it was uttered, edited into HTML format, published on the web, and then crawled and stored by Google’s technology. True, that has sped up – Google indexes a lot of sites more than once a day now – but as it nears the event horizon, this approach to search won’t scale.

In short, Google represents a remarkable achievement: the ability to query the static web. But it remains to be seen if it can shift into a new phase: querying the realtime web.

It’s inarguable that the web is shifting into a new time axis. Blogging was the first real indication of this, but blogging, while much faster than the traditional HTML-driven web, is, in the end, still the HTML-driven web. To its credit, Technorati saw blogging as the vanguard of a shift to real time, and tried to become the first search engine for “the live web”. It failed to gain critical mass, but I think the main reason was that the web was not yet “alive”.

That is changing, rapidly. Yes, I’m thinking about Twitter, of course, which is quickly gaining critical mass as a conversation hub answering the question “what are you doing?” But I’m also thinking about ambient data more broadly, in particular as described by John Markoff’s article (posted here). All of us are creating fountains of ambient data, from our phones, our web surfing, our offline purchasing, our interactions with tollbooths, you name it. Combine that ambient data (the imprint we leave on the digital world from our actions) with declarative data (what we proactively say we are doing right now) and you’ve got a major, delicious, wonderful, massive search problem, er, opportunity.

And with that search challenge comes an equally exciting monetization opportunity.

Related Posts Plugin for WordPress, Blogger...

21 thoughts on “From Static to Realtime Search

  1. Blog Expert says:

    I definitely think that real-time search would be awesome. It would be easy to use and definitely less annoying. Static search is really annoying as you always have to wait for the page to load. So I would really look forward to real-time search.

  2. Rupert Goodwins says:

    It’s interesting to think how journalism is evolving in terms of the real-time versus the static web (although I’m not sure exactly how to define either to exclude the other). How much journalism these days is spotting patterns form in the real-time web? How much is mining the static web? (There is another form of journalism, which involves spending time in the real world, but it may be falling out of fashion.)

    Journalism was the original search engine, albeit with a rather baroque query interface. It tends to adopt the most efficient use of people and technology to produce good data, being a notoriously Darwinist entity, and it’s quite good at adapting quickly – hasn’t taken long for blogs to make their mark. So I think it’s a good thing to track if you want to sniff out utility on the Web – after all, journalism is the first draft of history.

    I’m not sure that there’s a huge great wobbly lump of wondermoney sitting at the end of the real-time web search rainbow. And if there is, I wonder if it’s much bigger than the one sitting a day further down the line, where the massive outpouring of us auto-digitising hominids has been filtered by the mechanisms we have, more or less, in place now.

    Google’s big problem isn’t that it can’t be Google a day earlier, it’s that it can’t be cleverer about imparting meaning to what it filters. For now, and until AI gets a lot better, the new worth of the Web is how we humans organise, rank and connect it. The good stuff takes time and thought, and so far nobody’s built an XML-compliant thought accelerator.

  3. Tom Nocera says:

    Great post, John and an even greater post, Rupert. It is more daunting than usual to weigh in on the topic of what I would term “accelerated feedback”.

    When I read what John wrote:
    “Combine that ambient data (the imprint we leave on the digital world from our actions) with declarative data (what we proactively say we are doing right now) and you’ve got a major, delicious, wonderful, massive search problem, er, opportunity” my first thought was (or, perhaps in keeping with the subject, I should say “I’m thinking”)there will also be a goodly sum of what Rupert calls “wondermoney” racing at lightspeed toward the bank account of the company that will best provide the means to protect the privacy of hundreds of millions of people who have absolutely no need nor any desire to see the dots of their every action and comment connected and delivered to “the matrix”.

  4. This is definitely the next big thing in search. Your articulation of it is perfect. I say this, because I experienced this same thing over the last several weeks when I created a new twitter account for our new products and wanted to track what people are saying about the segment we are targeting. A quick twitter search was the answer and a few replies later I had some conversations going and new followers as well. The realtime web will far outweigh the benefits of the archived web, atleast for certain types of information. I call it the Future Search :-)

  5. nmw says:

    I responded @ the looksmart.com site, but my comment hasn’t appeared yet. :/ (I will wait a little longer before I decide whether I should post them elsewhere).

    I also want to mention a new tool I’ve discovered and have implemented at http://Conversative.NET — please come and join the very talkative discussion there!

    ;D nmw

  6. Bobinaj says:

    Journalism was the original search engine, albeit with a rather baroque query interface. It tends to adopt the most efficient use of people and technology to produce good data, being a notoriously Darwinist entity, and it’s quite good at adapting quickly – hasn’t taken long for blogs to make their mark. So I think it’s a good thing to track if you want to sniff out utility on the Web – after all, journalism is the first draft of history.

  7. JG says:

    hey, John and nmw: This does indeed sound exactly like an idea that has been around, I thought for 15 years, and nmw corrected me to 40+ years.

    Here is a description of the problem, from that link I sent earlier. Emphasis mine:

    http://trec.nist.gov/pubs/trec11/papers/OVER.FILTERING.pdf

    A text filtering system sifts through a stream of incoming information to find documents relevant to a set of user needs represented by profiles. Unlike the traditional search query, user profiles are persistent, and tend to reflect a long term information need. With user feedback, the system can learn a better profile, and improve its performance over time. The TREC filtering track tries to simulate on-line time-critical text
    filtering applications
    , where the value of a document decays rapidly with time. This means that potentially relevant documents must be presented immediately to the user. There is no time to accumulate and rank a set of documents. Evaluation is based only on the quality of the retrieved set. Filtering differs from search in that documents arrive sequentially over time.

    This overview paper was from 2002, but the TREC track itself goes back to the 90s. And as nmw point out with SDI, the idea goes back even further. In fact, now that I think of it, I remember talking with a friend at Radio Free Europe (anyone else remember that?) in Prague back in 1995, and he was describing a newswire system that they had, that did this online, real-time filtering.

    So maybe there is a shift from static to realtime search in the public, consumer web. But there have been systems (and research) around in other circles that have been doing this for a while.

  8. nmw says:

    JG, the idea goes back much further. If you look at my URL for this and/or my last post here, you may note that the link refers to a machine Vennevar Bush (one of the first visionaries of such “automated” information storage & retrieval schemes) wrote about decades before Luhn wrote about SDI.

    But you could go back a couple millenia, too — for example: the ancient Greeks argued whether words were real or ideal, representations or hoaxes for “actual observation” (and such disputation persisted throughout the Middle Ages [Occam's Razor] to this very day [one of the most renowned philosophers of the 20th Century -- Ludwig Wittgenstein -- probably immensely influenced the AI community without their even being "aware" of it ;])

    The issue that such “gizmos” such as SDI and/or AI in general cannot deal with is that the world keeps changing: change is the only constant.

    Everything is in flux — always!

    :) nmw

    ps: sure hope my post from yesterday over at looksmart.com will show up soon… ;)

  9. As it always has been, no? The ideas and technology for all search were around way before Alta Vista popularized them, and Google…

  10. JG says:

    Yes, John, true. I guess I just take issue with your terminology: “I think Search is about to undergo an important evolution.

    I think evolution is the wrong word. Perhaps the right word is “rediscovery”, or “mass public revelation” or “adoption” or something like that. But search is not evolving; what you are speaking of already exists. The future was here 15 to 50 years ago. It just wasn’t (to quote the popular phrase) evenly distributed.

    So maybe all you’re saying is that this particular aspect of search, i.e. routing and filtering, or SDI, or whatever you want to call it, is finally “growing” or “spreading”. But “growing” != “evolving”.

  11. nmw says:

    OK, John — I guess you can’t bring yourself to engage in conversational media? ;P

    Here’s a post with a copy of the reply to your article at looksmart.com that I made yesterday:

    http://gaggle.info/post/119/john-battelle-gets-it-sort-of

    :) nmw

  12. JG says:

    nmw: I agree with you that everything is in flux. Heraclitus and all.

    But while I do not share the technical utopian idealism of most of those in the Web 2.0 community, I also think that things are not so hopeless that our only choice is to throw up our arms and give up. “Gizmo”-wise, technologies such as online relevance feedback was created (and integrated into document filtering) to deal with both shifting user information need as well as shifts in language itself. And it really does work.

  13. nmw says:

    JG,

    I think we agree more than we disagree.

    My main point is that domain names are the only reliable metatags on the web.

    :) nmw

  14. Hey @nmw your comment is probably in a moderation queue, I don’t run that queue, the folks at the site do (I wrote that for Looksmart’s site). Nothing personal intended at all!

  15. nmw says:

    I see — I guess that might be frustrating situation to be in.

    Corporate blogs are still somewhat of an anomaly IMHO — I guess you are perhaps in a good position to judge how to integrate old and new media (seeing as you have — unlike myself — a lot of experience in “the other side of the fence”).

    Still, it’s a little surprising that the people at looksmart seem to have such a poor sense of online media.

    Glad to hear you explicitly distance yourself from the oversight.

    :) nmw

  16. People are twittering “what are you doing?” however they seem to be asking themselves that question.

    Yes, marketers would love that ambient data but that is a backwards approach to search. I don’t see the usefulness or appetite for people to query about what their friends are doing – especially when its already being delivered to them.

    Search solves a problem for people. Marketers just hop on board. Turning that train around is not a good idea.

  17. @nmw pls repost your comment – if you want, on the LSmart site. For some reason, it was not recieved, I am told. I think the system must have eaten it. No one on that site was ignoring it!

  18. I answered this on your other post as well about why Google should be worried about Twitter, but you really need to see what’s going on in FriendFeed more to grok the real time nature of the web. Twitter is only a small part of this.

    Look at my realtime feed here for just a small taste — that’s 4,800 hand-picked people being displayed in real time: http://friendfeed.com/scobleizer/friends/realtime

  19. nmw says:

    Hi John,

    I am very much on the fence regarding whether online interaction should be centrally located or distributed (the “power” of the Internet is its distributed nature, right? that’s one thing that GOOG fans don’t seem to understand: Google’s centralization is a big media behavior, going against what the Internet was conceived to be)

    At any rate, in this case I am satisfied having posted my reply on my own website, especially as it appears to be more reliable than looksmart’s.

    In the future, whenever I post on Looksmart’s website, I will try to remember to create a screenshot of whatever happens (in this case it said something like the post would be reviewed by the blog owner).

    Thanks for further investigating this.

    :) nmw

    ps/btw: I actually think that “moderation by blog owner” is a somewhat dated (or outdated) phenomenon — digg has taught us differently… (at least a little bit — in this vein, I am very excited about the prospects for the IntenseDebate cloud app that I have installed @ http://conversative.net/blog :)

  20. TimTipper says:

    YOU NAILED IT!!!!
    I was just looking at Twitscoop tonight and thinking, “what the hell do you do with this application that can be USEFUL?” and came up with the idea of a “realtime” search function. Now the next thing will be to make the “buzzing” the search engine of the future.

    Hey! If any programmer out there figures out how to do it, give me a penny per user, I’ll be happy with that.

  21. brett1211 says:

    great and well written post. I am particularly interested int the discussion you open at the end about our digital footprint. I am so excited about how the coming LBS revolution will increase our passive data collection abilities “in the field” so to speak. I’ve written about the opportunities

    http://bit.ly/hWi6A

    and challenges

    http://bit.ly/QzAMY

    thanks again,
    b