Caffeine: A Fundamental Rewrite of Google, A Shift to Real Time

Matt Cutts points to a video interview (embedded above) on Google's Caffeine infrastructure update. "It's a pretty fundamentally big change" Matt says. What I'd like to know is why and in response to what changes on the web. Of course, the major changes in how the web works are…

Matt Cutts points to a video interview (embedded above) on Google’s Caffeine infrastructure update.

“It’s a pretty fundamentally big change” Matt says. What I’d like to know is why and in response to what changes on the web. Of course, the major changes in how the web works are clear: Real Time Search.

In this post (and/or this one) I said:

In short, Google represents a remarkable achievement: the ability to query the static web. But it remains to be seen if it can shift into a new phase: querying the realtime web.

It’s inarguable that the web is shifting into a new time axis. Blogging was the first real indication of this, but blogging, while much faster than the traditional HTML-driven web, is, in the end, still the HTML-driven web.

Part and parcel to this shift is the web’s adoption of Flash/Silverlight/Ajax – a shift to assuming the web works in real time, like an application on your desktop. That makes it damn hard to index stuff, because pages are not static, they are created in real time in response to user demand. This is a new framework for how the web works, and if Google doesn’t respond to it, Google basically will become relegated to a card catalog archive of static HTML pages. No way will Google let that happen…

(By the way, one of the reasons I was impressed with Wowd was exactly because of its ability to, at scale, track a new signal in the web – the signal of what we are actually doing in real time…as opposed to the signal of the link…but more on that later.

Matt was asked if Caffeine was specifically about Real Time, and he was not totally specific about this but it’s pretty obvious it is all about this shift.

Oh, and Matt says it’s not because of Bing. In one way, I agree. But let’s be real. Microsoft and Yahoo did this deal because Yahoo alone could never sustain the infrastructure costs associated with indexing and processing the Real Time Web. So in truth, Google did this because it had to, just like Microsoft and Yahoo did what they did because they have to. If you want to play, you have to get the infrastructure right.

Here’s SEL’s take on it.

9 thoughts on “Caffeine: A Fundamental Rewrite of Google, A Shift to Real Time”

  1. Great insights. I couldn’t agree more that they are dead ducks if they don’t move towards real time.
    I still think there is a long term place for static content, obviously, but if you’re right, then perhaps we are finally getting to a place where clients have no option, but to invest in great content.
    The questiion still remains regarding small businesses who can’t necessarily afford to invest in great content.
    Perhaps the evolution of this medium is eventually going to make it possible for even the littlest guys to prosper regardless of their own content altogether.

  2. This confirms my suspicions that Google has been running this sort of real time “dynamic” results in their regular index. Perhaps caffiene will allow them to tweak the dials on many more factors for time relative SERPs. I have also been seeing a ton of crap/spam in the results as of late – caffiene will surely provide better filtering mechanisms.

    I also think they have been sitting on this for a while… waiting for the Bing shakeup. As soon as they saw Bing start to take marketshare they made the announcement.

  3. I still think there is a long term place for static content, obviously, but if you’re right, then perhaps we are finally getting to a place where clients have no option, but to invest in great content.

  4. Honestly, Matt, I don’t get it. At the risk of saying something incredibly unpopular, why do you care what most of us (”us” meaning the marketing/webmaster/geek community in general) think when it comes to updates such as this?

    1) We can’t accurately compare rankings as such, due to variance across datacenters. As a side note, this also helps to perpetuate the notion of rank checking as a viable concept and you may well be indirectly shooting yourself in the foot as a result. Do you guys really want to have the WebPosition Golds of the world taking this particular post out of context?

    2) We’ll generally only compare rankings for terms we have a vested interest in (i.e. our own keywords and phrases).

    3) Even though you said that the impact in terms of rankings should be minimal since this relates to indexing, people are going to see changes in ranking simply because of past precedent when it pertains to updates. While I’m sure some of these changes do exist, they would likely be minor at best and extreme aberrations would be caught by both us and regular users. In other words, we will see things not because they exist, but because we’re conditioned to think that we’re supposed to.

    4) Not that I’m second-guessing you on this, because I’m not sure anyone would have seen this coming, but people are already building “tools” and spamming your own blog in the perceived interest of “helping the community”. This means that every time you launch an update and tell us, more people are going to create more “tools” that will further send the SEO community down the garden path.

    5) This is going to sound somewhat contradictory until I fully explain it, but without seeing referral traffic, it’s almost impossible to form an opinion about an update one way or the other. I’m not just talking about “did my referral traffic go up for my chosen keywords X, Y, and Z”, but rather “what did I receive referral traffic for? Is it relevant? Did I target it? How strange are the phrases? What type of volume am I getting that way?” etc. In other words, I’d want to see the quantity and, far more importantly, the quality of referral traffic before I make any kind of judgement. I can’t do that with a sandbox.

    You guys areat the point where you have almost no other choice but to launch an update, shut up about it, and see what people notice based on unprompted feedback (e.g. quality control/spam reports). It’s noble that you want to keep us in the loop, don’t get me wrong…I’m just not sure that we’re generally smart and responsible enough to handle that information correctly. If you really want/need to, start building focus groups full of people such as say Lenssen who get it, who aren’t biased, and who actually can give quality feedback.

    I know that’s not really the type of answer you were looking for as such, but I don’t think there is a legitimate answer to this question in this case.

  5. Booyah Prefabrik and good onya! I could not agree more. As always, I appreciate Matt’s forthright dialogue with the SEO community and would really appreciate more information about what’s coming rather than what’s been. How about taking us deeper ons ome of those U.S. Patent filings with your name on them Matt? 😉

    Google Caffeine is interesting because Google has always rolled out updates and then dribbled information on them to those interested (Big Daddy, Florida). Potentially game changing updates (Orion anyone) remain in the abstracted ethernet of Mountain View.

    Indexing in real time is very helpful IF the content makes it into the index and IF the algorithms can assess and assign changes in PR fast enough and IF there is not some superior ranking calculation that we don’t know about going on in the back ground.

    A lot of “ifs” there and that is what makes search optimization on all levels so interesting.

  6. Since we’re being real, we have to discount the Microsoft-Yahoo! deal as being connected to the development of infrastructure. Microsoft simply found an elegant way to remove one of its rivals from the field.

    And then there were two.

    But otherwise real-time indexing is where things have been going for a long time.

    After all, we used to have that with a few engines back in the 1990s, when the Web was smaller and real-time was more about finding new pages on existing sites and less about capturing social media conversations.

    The crawling is important but I think we’ll see the greatest long-term effect from the improved architecture. I believe Google has developed a massive array of servers, based on what I’ve read about Caffeine so far.

    They have scaled up from processor arrays to server arrays. THAT impresses me.

  7. “But let’s be real. Microsoft and Yahoo did this deal because Yahoo alone could never sustain the infrastructure costs associated with indexing and processing the Real Time Web.”

    John, how do you know that? To be honest, I have been wracking my brains trying to figure out why they did this deal, and don’t really know. But neither do you. This is just speculation.

    Now I guess speculation can be interesting. But why the focus on infrastructure? If you are looking for good a reason, look at human resource costs. I think you and many others underestimate the engineering efforts required to maintain and expand a competitive “general-purpose” search engine, the many teams working on small subproblems that need to be solved to keep result quality competitive overall. That is the main barrier to entry into general-purpose (rather than twitter or blogs or video) search at the moment, I would say.

    Infrastructure cost, on the other end, mostly pays for itself, in that it scales with user population (if you get the monetization right). Yes, there is also some cost independent of user load that also depends on data size, but that cost is dominated by the people not the hardware. Having x engineers work on search while the other side has 3x is the real challenge, and at some point you have to decide whether to invest more in engineering or give up.

Leave a Reply

Your email address will not be published. Required fields are marked *