GoogleBase: Structured/Vertical/Domain Search Ain’t The FreeWeb

(More thinking out loud….) I recently spoke to a reputable person who is a senior exec in a major structured database company ( I promised not to name this person or the company/industry.) I can't go into specifics, but the company owns a very large repository of valuable industry…

(More thinking out loud….)

I recently spoke to a reputable person who is a senior exec in a major structured database company ( I promised not to name this person or the company/industry.) I can’t go into specifics, but the company owns a very large repository of valuable industry listings, and has an web-based interface that allows folks to search those listings. Think Autobytel, or even Craigslist, but this is a vertical player in a very information intensive business. The data this company owns is locked behind a registration wall, search engines can not get to it through normal crawling techniques. It’s part of the paid or “dark” web: not freely available. The company has spent a lot of time and money creating specialized search interfaces that are useful to its target market.

Anyway, a few months ago or so Google approached this person and asked if the company would want to work with Google. In essence, Google asked the company to upload its entire database into Google, which would then be mashed up, perhaps, with Google Maps and Local, and given a structured search front end and a structured database backend. When GoogleBase story broke, the person put two and two together. This is what Google wanted to do – upload the company’s data into GoogleBase, this person told me. But at the time of their discussion, Google made no mention of that product.

When Google comes calling asking for your entire database, one might reasonably wonder what the company which owns that database might get in return. In this case, and in other cases I’ve heard about, the answer was “give us your data and you’ll get lots of traffic in return.” No discussion of syndication models, or shared revenues.

How does this differ from Oodle? Well, Oodle either asks for permission (as in the case of Monster, etc.) or crawls the publicly available web (as in the case of Craigslist, until they asked Oodle to stop. As to why, Jim Buckmaster, CEO of Craigslist, comments here, and Craig Donato, CEO of Oodle, posts his thoughts here.) But Google came into this company with no business model to compel the target company to share its database other than “we’ll send you traffic,” and precious few details on how Google was going to use the data should the company make it available.

The company representative decided against working with Google given such terms (or lack thereof), and I have to say I certainly understand why. This is not a freeweb company that plays in the tit-for-tat world of web search. This is a domain specific company that has built a sophisticated vertical search engine which is difficult to replicate. Would the company be willing to work with Google if Google offered a syndication deal, or a split in revenues, I asked? “Yes, we’d at least have kept talking,” the executive said. “But they wanted full control of how they might use our data, without even telling us what the model was,” or even what the product might look like. The Google line, the exec said, was pretty much “We’re Google, we know what’s best for your data, give it to us, stand back, and watch as we make your stuff work better than you can” (my paraphase here).

This approach to business development does not feel very compelling – it’s a “free web” approach to a “paid web” model. And that mismatch creates a tone that, off the record, is similar to the one publishers described to me with regard to Google Print/Library. I think the main issue here is lack of details and transparency – Google wants your data, but doesn’t want limitations on what it might do with that data in the future. I think this stance, more than any other, is what might stymie the progress of a service like GoogleBase, at least in terms of cracking the major vertical industries which might otherwise make the project extremely valuable. As I’ve said before, I think Google could build a killer meta engine over many vertical search engines (including books) if only it was willing to cut revenue sharing deals. Why, I wonder, is the company allergic to this partnering model?

20 thoughts on “GoogleBase: Structured/Vertical/Domain Search Ain’t The FreeWeb”

  1. >Why, I wonder, is the company allergic to this >partnering model?

    Every company’s thinking has limitations. Look at Microsoft. Even after 15 years since the Internet started becomeing a phenomenon, the company fails to realize that there is computing beyond the desktop and that it needs to shift its focus from operating systems and on to other things. The company hasn’t come up with one product with the “awe” factor.

    Same thing with Google. It’s a terrific company, but it has its own limitations. And because it’s so successful, it fails to realize this. The good thing or the problem (depends on whose side you are) is that as long as there’s no one else who is as good as they are, or has the money, they should be fine.

  2. I still think this is much more about being a web app platform than wanting everyone’s data. If they’re a good platform the data comes naturally. If they’re just asking for data, then it won’t work:

    Think of Google Base as the core database component for building webapps, and then allowing developers to use Google’s search as well. Combine the database layer with search code with your own data, and you’re pretty close to having a good part of what’s necessary to build a web app. If they had a nice front-end building mechanism too then Googl emay have just built the webOS.

  3. Well.. Google Base (amusingly enough, anything with “base” reminds me of “all your base are…”), is that it is most likely to follow the steps of Google BlogSearch. Being an open medium, the first who will realize it’s value will be people who’s main interest is to drive the surfer to their site, regardless on the relevancy. What we saw with splogs (spam blogs) we will likely see with spbases (spam databases), indexed by google and attract “innocent surfers”. This is an obstacle google will somehow have to overcome. And judging by what is going on with BlogSearch, they are not on the right track so far.

  4. John,

    If I’m not wrong, the company you’re talking about is CoStar and the person you probably spoke with was Andrew Florance.

    Google’s approach doesn’t seem unfair to me. I think the obvious writing on the wall that CoStar missed is that Google is going to be competiton very soon. Now with Google base on the horizon it all makes sense. Google will allow people to add real estate listings on Google Base, mash it up with Google Maps and make it easily accessible through Google Search.

    Thus threatening newspaper classifieds, Craigslist and companies like CoStar that have so far brought the searcher and the advertiser together – a business that Google excels at. With the massive efficiency in search that Google possess, companies like CoStar obviously can’t compete with Google for long.

    It might seem awfully arrogant of them but when Google approached CoStar it essentially gave them a chance to be a part of the initiave that would otherwise anyway outcompete them in the long run.

    I think CoStar missed an opportunity there because as they say: if you can’t compete them, join them.

  5. I tend to agree with Mike & Techdirt on this one… this is GOOG showing their hand at building *the platform*. I’m also inclined to believe that their alliance with SUNW will play a major role here.

  6. Is this google’s approach to creating an application platform? A partner can create a specific app by uploading his ‘labeled’ data into the google semi-structured database. Google might soon come out with an ETL-like interface to upload terabytes of data into their distributed file system.

    The uploader (app partner) then continues the work of labelling and collection of data. The better the labelling of data, the more likely that the data comes up in google searches. This translates to more traffic to the uploader’s site. Traffic will somehow translate to revenue for the uploader?

  7. I think at least part of the reason that can’t tell potential data source partners what their plans are for the data is that THEY DON’T KNOW. It’s hard to tell what the implications of a particular data set are until it’s being correlated with their other datasets. The cultural disconnect that John mentions is very real, google has so far, not been asked to bear the costs of collection and collation for the datasets it uses.(GoogleBot being the exception)
    Google has run through the cheap and easily available data sets that could be relevant to providing useful information to broad classes of people; and it will need to pay specialist data providers for useful and relevant datasets. So, as Google begins climbing that value chain, they will face sharply higher costs. I don’t know how that plays against Manu’s point, I think it really depends on whether google can amass an equally compelling offering (lower quality, greater spread most probably) for a generalist audience.

  8. I worked on a large scale semantic web project which crawled the web, and made a database (actually a knowledge base, rdf, etc) out of what it found. I interviewed for a position at Google in 2002, and asked one of their very senior engineers this question. I said, “I have a lot of trouble getting more than really shallow, low quality data from companies for my semantic search application. What will compel them to allow Google to crawl their data?”

    The response I got was exactly what you describe. They said, “we’re google, of course everyone will upload their data directly to us!” When I pointed out that many companies stake their livelihood on the gathering of this data, using specialized knowledge and experience, they were dismissive and were unwilling to even consider that they may have to enter into business relationships with those companies. I found it very shocking.

  9. Google appears to be learning from its mistakes so I think it will recover from this one.

    Recently it changed suspending campaigns that were not getting enough click-through. Before the change they felt that campaigns without click-through were not of ‘high-enough quality'(my interpretation) to be part of their sponsored results.

    I heard that now they just charge more for campaigns with low click-through like all good Capitalist outfits. I might even try AdWords again.


  10. Why does the data hand-off have to be all or nothing? Why can’t the company provide enough low-value data to make the locked content “findable” to anyone searching on Google? To turn that into revenue, all they would have to do is make sure each record contained a link back to the high-value data that required a sign-up to view. Plus, I have to imagine that ther paying customers would love to see references to their for-pay database alongside other Google search results. One stop shopping, so to speak. I see Gbase as a great to drive traffic to sites, not deprive sites of traffic. It just requires a link field.

  11. John –

    this was really the question i was trying to ask Sergey at Web 2.0 — namely, it would be helpful to Google and the rest of the world for them to more clearly define the platform they are building, and where they want others to build on top of it.

    if they continue to be so secretive, many people will fear partnering with them / working with them in the same ways they used to fear working with Microsoft… and no one wants to get partnered to death.

    on the other hand, if Google decided they wanted to build the world’s largest open access hosted database infrastructure, with the promise that everyone (both inside & outside the company) would have equal access to write applications on that platform, i think they’d have solid story to tell the world, and to all the entrepreneurs and developers out there… not to mention a story that would likely make Microsoft very, very nervous.

    while there may be a few walled gardens that remain walled, in the future those folks will become more & more isolated (unless they have some special access to data, and in general that’s not the case). AOL is moving away from that model, and the growth of the structured web will continue to pressure even players like Google to emulate more transparency.

    – dave mcclure

  12. An interesting (and glossed-over?) question, from my vantage point, is where the increasingly rich Google (via Base) goes in terms of user experience. While “entry search” is a user experience that approaches universality, what a person does with the information (the next page applications) gets more valuable with precision and situational context.

    IMHO, Google is well poised to continue to control entry search, but consumers do not intuit Google to be their application/destination. Consumers go to Google to be shown where to do, and the space is wide open for the growth of the next layer of web and mobile applications which incorporate deep data, contextual relevance and tailored user experiences.

    “One stop shopping” rarely defines the way markets evolve.

  13. Well, given the Google Payments “rumors” one would expect Google to embrace paid content and even facilitate the charging for the content in exchange for a revenue share. Has the octopus gotten so big that the strategic cross-project implications are falling between the cracks?

  14. Well, given the Google Payments “rumors” one would expect Google to embrace paid content and even facilitate the charging for the content in exchange for a revenue share. Has the octopus gotten so big that the strategic cross-project implications are falling between the cracks?

  15. GregcarterFord, the leading Ford car dealership in Sebastopol, CA. Find New Ford Cars, Certified pre owned cars, trucks, Order arts for your vehicle online. Contact Greg Carter Ford for your vehicle servicing in & around Sebastopol.

Leave a Reply

Your email address will not be published. Required fields are marked *