free html hit counter Future of Search Archives - Page 4 of 8 - John Battelle's Search Blog

Google Responds: No,That’s Not How Facebook Deal Went Down (Oh, And I Say: The Search Paradigm Is Broken)

By - January 13, 2012

(image) I’ve just been sent an official response from Google to the updated version of my story posted yesterday (Compete To Death, or Cooperate to Compete?). In that story, I reported about 2009 negotiations over incorporation of Facebook data into Google search. I quoted a source familiar with the negotiations on the Facebook side, who told me  “Senior executives at Google insisted that for technical reasons all information would need to be public and available to all,” and “The only reason Facebook has a Bing integration and not a Google integration is that Bing agreed to terms for protecting user privacy that Google would not.”

I’ve now had conversations with a source familiar with Google’s side of the story, and to say the company disagrees with how Facebook characterized the negotiations is to put it mildly. I’ve also spoken to my Facebook source, who has clarified some nuance as well. To get started, here’s the official, on the record statement, from Rachel Whetstone, SVP Global Communications and Public Affairs:

“We want to set the record straight. In 2009, we were negotiating with Facebook over access to its data, as has been reported.  To claim that the we couldn’t reach an agreement because Google wanted to make private data publicly available is simply untrue.”

My source familiar with Google’s side of the story goes further, and gave me more detail on why the deal went south, at least from Google’s point of view. According to this source, as part of the deal terms Facebook insisted that Google agree to not use publicly available Facebook information to build out a “social service.” The two sides had already agreed that Google would not use Facebook’s firehose (or private) data to build such a service, my source says.

So what does “publicly available” mean? Well, that’d be Facebook pages that any search engine can crawl – information on Facebook that people *want* search engines to know about. This is compared to the firehose data that was the core asset being discussed between the parties. This firehose data is what Google would need in order to surface personal Facebook pages relevant to you in the context of a search query. (So, for example, if you were my friend on Facebook, and you searched for “Battelle soccer” on Google, then with the proposed deal, you’d see pictures of my kids’ soccer games that I had posted to Facebook).

Apparently, Google believed that Facebook’s demand around public information could be interpreted  as applying to how Google’s own search service was delivered, not to mention how it (or other products) might evolve. Interpretation is always where the devil is in these deals. Who’s to say, after all, that Google’s “social search” is not a “social service”? And Google Pages, Maps, etc. – those are arguably social in nature, or will be in the future.

Google balked at this language, and the deal fell apart. My Google source also disputes the claim that Google balked at being able to technically separate public from private data. Conversely, my Facebook source counters that the real issue of public vs. private had to do with Google’s refusal to honor changes in privacy settings over time – for example, if I deleted those soccer pictures, they should also be deleted from Google’s index. There’s a point where this all devolves to she said/he said, because the deal never happened, and to be honest, there are larger points to make.

So let’s start with this: If Facebook indeed demanded that Google not use publicly available Facebook data, it’s certainly understandable why Google wouldn’t agree to the deal. It may not seem obvious, but there is an awful lot of publicly available Facebook pages and data out there. Starbucks, for example, is more than happy to let anyone see its Facebook page, no matter if you’re logged in or not. And then there’s all that Facebook open graph data out on the public web – tons of sites show Facebook status updates, like counts and so on in a public fashion. In short, asking Google to not leverage that data in anything that might constitute a “social service” is anathema to a company who claims its mission to crawl all publicly available information, organize it, and make it available.

It’s one thing to ask that Google not use Facebook’s own social graph and private data to build new social services – after all, the social graph is Facebook’s crown jewels. But it’s quite another thing to ask Google to ignore other public information completely.

From Google’s point of view, Facebook was crippling future products and services that Google might create, which was tantamount to an insurance policy of sorts that Google wouldn’t become a strong competitor, at least not one that  leverages public information from Facebook. Google balked. If Facebook’s demand could have been interpreted as also applying to Google’s search results, well, that’s a stone cold deal killer.

I certainly understand why Facebook might ask for what they did, it’s not crazy. Google might well have responded by narrowing the deal, saying “Fine, you don’t build a search engine, and we won’t build a social network. But we should have the right to create other kinds of social services.” As far as I know, Google didn’t chose to say that. (Microsoft apparently did). And I think I know why: The two companies realized they were dancing on the head of a pin. Search = social, social = search. They couldn’t figure out a way to tease the two apart. Microsoft has cast its lot with Facebook, Google, not so much.

When high stakes deals fall apart, both sides usually claim the other is at fault, and that certainly seems to be the case here. It’s also the case with the Twitter deal, which I’ve gotten a fair amount of new information about as well. I hope to dig into that in another post. For now, I want to pull back a second and comment on what I think is really going on here, at least from the perspective of a longer view.

Our Cherished Search Paradigm Is Broken (But We Will Fix It….Eventually)

I think what we have here is a clear indication that the search paradigm we’ve operated under for a decade or so is broken. That paradigm stems from Google’s original letter to shareholders in 2004. Remember this line?Our search results are the best we know how to produce. They are unbiased and objective, and we do not accept payment for them or for inclusion or more frequent updating.

In many cases, it’s simply naive to claim Google is unbiased or objective. Google often favors its own properties over others, as Danny points out in Real-Life Examples Of How Google’s “Search Plus” Pushes Google+ Over Relevancy and others have also detailed. But there is a reason: if you’re going to show results from all other possible contenders, replete with their associated UI and functional bells and whistles (as Google does with its own Maps, Pages, Plus etc.), well, it’s nearly impossible now to determine which service is the right answer to a particular person’s query. Not to mention, you need to put a deal in place to get all the functionality of the service. Instead, Google has opted, in many cases, to go with their own stuff.

This is not a new idea, by the way. Yahoo’s been doing it this way from the beginning. The contentious issue is that biasing some results toward Google’s own products runs counter to Google’s founding philosophy.

I have a theory as to why all this is happening, and I don’t entirely blame Google. Back when search wasn’t personalized, Google could defensibly say that one service was better than another because it got more traffic, was linked to more (better PageRank), and so on. Back when everyone got the same results and the web was one homogenous glob of HTML, well, you could claim “this is the best result for the general population.” But personalized search has broken that framework – I lamented this back in 2008 with this post: Search Was Our Social Glue. But That Is Dissolving (more here).

With the rise of Facebook and the app economy, the problem of search has become terribly complicated. If you want to have results from Facebook in your search, well, that search service has to do a deal with Facebook. But what if you want results from your running app (I have hundreds of rides and runs logged on AllSportGPS, for example)? Or Instagram? Or Path, for that matter? Do they all have to do deals with Google and Bing? There are so many unconnected pieces of the Internet now (millions of apps, most of our own Facebook experiences, etc. etc.) that what’s a good personal result for one person is not necessarily good for another. If Google is to stay true to its original mission, it needs a new framework and a massive number of new signals – new glue – to put the pieces back together.

There are several ways to resolve this, and in another post, I hope to explore them (one of them, of course, is simply that everyone should just go through Facebook. That’s the vision of Open Graph). But for now, I’m just going to say this: The issues raised by this kerfuffle are far larger than Google vs. Facebook, or Google vs. Twitter. We are in the midst of a major search paradigm shift, and there will be far more tears before it gets resolved. But resolve it must, and resolve it will.

  • Content Marquee

Whisperings of the Future Surround Us

By - November 17, 2011

Yesterday I met with Christopher Ahlberg, the PhD co-founder of Recorded Future, a company I noted in these pages back in mid-2010. Ahlberg is one of those rare birds you just know is making stuff that matters – a scientist, an entrepreneur, a tinkerer, and an enthusiast all wrapped into one.

He ran me through Recorded Future’s technology and business model, and I found it impressive. In fact, I’m hoping I can employ it somehow into my book research. And that conditional tense of “hoping” is the main problem I have with Ahlberg’s creation – it’s a rather complicated system to use. Then again, what of worth isn’t, I suppose?

Recorded Future is, at its core,  a semantic search engine that consumes tens of thousands of structured information feeds as its “crawl.” It then parses this corpus for several core assets: Entities, Events, and Time (or Dates). Recorded Future’s algorithms are particularly adept at identifying and isolating these items, then correlating them at scale. If that sounds simple, it ain’t.

The service then employs a relatively complicated query structure that allows you to project the road ahead for your question. For example, you might choose “Amazon” as your entity, and then set your timeframe for events involving Amazon over the past two months and into the next two months. Recorded Future will analyze its sources (SEC filings, blogs, news sites, etc) and create a timeline-like “map” of things that have happened and are predicted to happen with regard to Amazon over the next eight weeks. You can further refine a search by adding other entities or events (“earnings” or “CEO”, for example).

How does it work? Well, turns out the Internet is rife with whisperings of the future, you just need to learn how to listen. That’s Recordable Future’s specialty. As you might imagine, Wall Street quants and government spooks just love this stuff. I’d imagine journalists would as well, but most of us are too strapped to afford the company’s services. Embedded below is a new feature of the site, a weekly overview of a news-related entity.

Recorded Future’s engine is not limited to the sources it currently consumes. Not only is Ahlberg adding more every month, his customers can add their own corpuses. Imagine throwing Wikileaks into Recorded Future, for example.

Perhaps the coolest aspect of the service is a visualization of how entities relate to each other over time. Ahlberg showed me a search for mobile patents, then toggled into a relationship graph. Guess what entity broke into the center of the graph, connected to nearly everything else? Yup – Motorola.

Did I mention that Google is an investor in Recorded Future?

As I said, I hope to start using the service soon, and perhaps posting my findings here.

On Location, Brand, and Enterprise

By - September 11, 2011

HP IO.pngFrom time to time I have the honor of contributing to a content series underwritten by one of FM’s marketing partners. It’s been a while since I’ve done it, but I was pleased to be asked by HP to contribute to their Input Output site. I wrote on the impact of location – you know I’ve been on about this topic for nearly two years now. Here’s my piece. From it:

Given the public face of location services as seemingly lightweight consumer applications, it’s easy to dismiss their usefulness to business, in particular large enterprises. Don’t make that mistake. …

Location isn’t just about offering a deal when a customer is near a retail outlet. It’s about understanding the tapestry of data that customers create over time, as they move through space, ask questions of their environment, and engage in any number of ways with your stores, your channel, and your competitors. Thanks to those smartphones in their pockets, your customers are telling you what they want – explicitly and implicitly – and what they expect from you as a brand. Fail to listen (and respond) at your own peril.

More on the Input Output site.

More on Twitter's Great Opportunity/Problem

By - August 10, 2011

Itwitter-bird.pngn the comments on this previous post, I promised I’d respond with another post, as my commenting system is archaic (something I’m fixing soon). The comments were varied and interesting, and fell into a few buckets. I also have a few more of my own thoughts to toss out there, given what I’ve heard from you all, as well as some thinking I’ve done in the past day or so.

First, a few of my own thoughts. I wrote the post quickly, but have been thinking about the signal to noise problem, and how solving it addresses Twitter’s advertising scale issues, for a long, long time. More than a year, in fact. I’m not sure why I finally got around to writing that piece on Friday, but I’m glad I did.

What I didn’t get into is some details about how massive the solving of this problem really is. Twitter is more than the sum of its 200 million tweets, it’s also a massive consumer of the web itself. Many of those tweets have within them URLs pointing to the “rest of the web” (an old figure put the percent at 25, I’d wager it’s higher now). Even if it were just 25%, that’s 50 million URLs a day to process, and growing. It’s a very important signal, but it means that Twitter is, in essence, also a web search engine, a directory, and a massive discovery engine. It’s not trivial to unpack, dedupe, analyze, contextualize, crawl, and digest 50 million URLs a day. But if Twitter is going to really exploit its potential, that’s exactly what it has to do.

The same is true of Twitter’s semantic challenge/opportunity. As I said in my last post, tweets express meaning. It’s not enough to “crawl” tweets for keywords and associate them with other related tweets. The point is to associate them based on meaning, intent, semantics, and – this is important – narrative continuity over time. No one that I know of does this at scale, yet. Twitter can and should.

Which gets me to all of your comments. I heard both in the written comments, on Twitter, and in extensive emails offline, from developers who are working on parts of the problems/opportunities I outlined in my initial post. And it’s true, there’s really quite a robust ecosystem out there. Trendspottr, OneRiot, Roundtable, Percolate, Evri, InfiniGraph, The Shared Web, Seesmic, Scoopit, Kosmix, Summify, and many others were mentioned to me. I am sure there are many more. But while I am certain Twitter not only benefits from its ecosystem of developers, it actually *needs* them, I am not so sure any of them can or should solve this core issue for the company.

Several commentators noted, as did Suamil, “Twitter’s firehose is licensed out to at least publicly disclosed 10 companies (my former employer Kosmix being one of them and Google/Bing being the others) and presumably now more people have their hands on it. Of course, those cos don’t see user passwords but have access to just about every other piece of data and can build, from a systems standpoint, just about everything Twitter can/could. No?”

Well, in fact, I don’t know about that. For one, I’m pretty sure Twitter isn’t going to export the growing database around how its advertising system interacts with the rest of Twitter, right? On “everything else,” I’d like to know for certain, but it strikes me that there’s got to be more data that Twitter holds back from the firehose. Data about the data, for example. I’m not sure, and I’d love a clear answer. Anyone have one? I suppose at this point I could ask the company….I’ll let you know if I find out anything. Let me know the same. And thanks for reading.

Book Review: In The Plex

By - April 20, 2011

Last night I had the pleasure of interviewing Steven Levy, and old colleague from Wired, on the subject of his new book: In The Plex: How Google Thinks, Works, and Shapes Our Lives. The venue was the Commonwealth Club in San Francisco, and I think they’ll have the audio link up soon.

Steven’s interview was a lot like his book – full of previously untold anecdotes and stories that rounded out pieces of Google’s history that many of us only dreamt of knowing about. When I was reporting my book,The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture, I had limited access to folks at Google, and *really* limited access to Larry Page and Sergey Brin. Levy had the opposite, spending more than two years inside the company and seeing any number of things that journalists would have killed to see in years past.

The result is a lively and very detailed piece of reporting about the inner workings of Google. But I was a bit disappointed with the book in that Steven didn’t take all that new knowledge and pull back to give us his own analysis of what it all meant. I asked him about this, and he said he made the conscious decision to not editorialize, but rather lay it all out there and let the reader draw his or her own conclusions. I respect that, but I also know Steven has really informed opinions, and I wish he’d give them to us.

What I took away from In the Plex was a renewed respect for the awesome size and scope of Google’s infrastructure, as well as its ambition. Sometimes we forget that Google is more likely than not the largest manufacturer of computers in the world, and runs the largest single instance of computing power in the world. It’s also one of the largest collectors and analyzers of data in the world. All of this has drawn serious scrutiny, but I don’t think even the regulators really grok how significant Google’s assets are. They should all read Steven’s book.

Levy only grazes the surface of Google’s social blindness, unfortunately, and due to timing could only mention Page’s ascendancy to CEO in his epilogue. But his reporting on how the China issue played out is captivating, as are the many details he fills out in Google’s early history. If you’re fascinated by Google, you’ve got to add this one to your library.

Google "Head End" Search Results: Ads as Content, Or…Just Ads?

By - March 30, 2011

GoogHeadEndSearchAdEditRatioBattelleMedia.png

Today I spoke at Sony HQ in front of some Pretty Important Folks, so I wanted to be smart about Sony’s offerings lest anything obviously uninformed slip out of my mouth. To prepare I did a bunch of Google searches around Sony and its various products.

Many of these searches are what I call “head end” searches – a lot of folks are searching for the terms I put in, and they are doubly important to Google (and its advertising partners) because they are also very commercial in nature (not in my case, but in general.) Usually folks searching for “Sony Tablets” have some intent to purchase tablets in the near future, or at the very least are somewhere in what’s called the “purchase funnel.”

I was struck with the results, so much so I took a screen shot of one representative set of results. In traditional print, we used to watch a metric called “Ad Edit Ratio” very closely (as did the government, for reasons of calculating postal rates). Editors at publications lobbied for low ad edit ratios (so they’d get more space to put their content, naturally). Advertising executives lobbied for higher Ad Edit ratios (so they could sell more ads, of course). We usually settled somewhere around 50-50 – half ads, half editorial.

Google is way lower than that, on any given search. But not for head end searches. In fact, as a percentage of actual “editorial” (organic search results) versus “paid”, it’s pushing towards 35/65 or more, at least when you measure the space “above the fold” on a typical screen.

Then again, in the case of AdWords, one could argue the ads are contextually relevant and useful.

Just felt worth pointing out, if for no other reason as to add a page to the historical record of how the service is evolving. Once “media” adwords start taking over, this picture may well change again, and it might not be a change that folks like much.

Google, Social, and Facebook: One Ring Does Not Rule Them All

By - February 17, 2011

Screen shot 2011-02-17 at 11.01.57 AM.png

When I read Google announcements like this one, An update to Google Social Search, I find myself wondering why Google doesn’t just come out and say something like this: “We know social search is important, and we’re working on it. However, we don’t think the solution lies in working only with Facebook, because, to be honest, we think social media is bigger than one company, one platform, or one “social graph.” We’ve got a bigger vision for what social means in the world, and here it is.”

Wouldn’t that be great?

Because honestly, I think the company does have a bigger vision, and I think it’s rooted in the idea of instrumentation and multiples of signals (as in, scores if not thousands of signals understood to be social in nature). In other words, there is not one “ring to rule them all” – there is no one monoculture of what “social” means. For now, it appears that way. Just like it appears that there’s one tablet OS. But the world won’t shake out that way – we’re far too complicated as humans to relegate our identity to a single platform. It will be distributed, nuanced, federated. And it should be instrumented and controlled by the individual. At least, I sure hope it will be.  

Google might as well declare this up front and call it a strategy. In that context, it might even make sense to do further Facebook integration in the near term, as one of many signals, of course. Google already uses some limited Facebook data (scroll down), but clearly has decided to not lean in here (or can’t come to terms with Facebook around usage policies). Clearly the two companies are wary of working together. But it’s my hope that over time, whether or not they do should be a moot issue.

Why? Because I view my Facebook data as, well, mine. Now, that may not really be the case, but if it’s mine, I should be able to tell Google to use it in search, or not. That’s an instrumentation signal I should be able to choose. Just like I can chose to use my Facebook identity to log into this blog, or any number of other sites and services. It should be my choice, not Facebook’s, and not Google’s either.

Switch the control point to the customer, and this issue sort of goes away. I have a longer post in me about “social clothing” – came up on a phone call with Doc Searls yesterday – and hopefully when I get to that, this might make a bit more sense….

Does Google Favor Its Own Services?

By - January 19, 2011

Seems so. I’ve written about this a lot, so much that I won’t bother to link to all the stuff I’ve posted. It was the basis of a chapter in the book, where I pointed out that (at the time) Google claimed algorithmic innocence, and Yahoo, on the other hand, was cheerful in its presumption that Yahoo services were the best answer to certain high value searches (like “mail”).

Now comes this study, from Harvard professors no less, which pretty much states the obvious. Check this graph:

search favorites 1.png

It’s clear that in some cases, one might argue that Google services should win (maps, for example). But for “chat”? Or for “mail”? A stretch.

Here’s the paper’s authors general conclusion: “Google typically claims that its results are “algorithmically-generated”, “objective”, and “never manipulated.” Google asks the public to believe that algorithms rule, and that no bias results from its partnerships, growth aspirations, or related services. We are skeptical.”

So am I.

Update: Danny has, as always, a more nuanced point of view. Thanks, my always smarter commentators.

blekko Explains Itself: Exclusive Video (Update: Exclusive Invite)

By - August 31, 2010

blekko: how to slash the web from blekko on Vimeo.

Blekko is a new search engine that fundamentally changes a few key assumptions about how search works. It’s not for lazywebbers – you have to pretty much be a motivated search geek to really leverage blekko’s power. But then again, there are literally hundreds of thousands of such folks – the entire SEO/SEM industry, for example. I’ve been watching blekko, and the team behind it, since before launch. They are search veterans, not to be trifled with, and they are exposing data that Google would never dream of doing (yes, they do pretty much a full crawl of the web that matters). In a way, blekko has opened up the kimono of search data, and I expect the service, once it leaves private beta, will become a favorite of power searchers (and developers) everywhere.

The cool thing is, using blekko’s data and (I hope) robust APIs, one can imagine all sorts of new services popping up. I for one wish blekko well. It’s about time.

And in case you are wondering what the big deal is, besides all the data you can mine, to my mind, it’s the ability to cull the web – to “slash” the stuff you don’t care about out of your search results. Now, not many of us actually will do that. But will services take that and run? I certainly hope so.

For a quick overview of blekko’s core feature – “slashtags” – check out the new video, above. And to bone up on the various merits of the service, here are a few key links:

Blekko: A Search Engine Which Is Also A Killer SEO Tool (SEL)

TechCrunch Review: The Blekko Search Engine Prepares To Launch (TC)

A new search engine Blekko search: first impressions (Economist)

Blekko’s Tools Give Search Marketers Google Alternative (MediaPost)


Update: First 500 readers get a beta invite! Email battelle@blekko.com to get in on it!