8 thoughts on “Open and Shut”

  1. I am a bit ignorant here. But could somebody more knowledgeable enlighten me if Youtube freely allows other crawlers to index its videos besides Google’s crawlers? How’s about keeping a copy in search cache?

  2. There’s probably no deeper example of Google being closed than when it comes to book search…To be fair, Microsoft has also added similar restrictions. But if Google’s on an “open” kick, why not join the Open Content Alliance?

    I agree, Danny. I made this same point five days ago 🙂


    And, by the way, I think it is completely fair that you are trying to be completely fair, by pointing out similar Microsoft policies. But Google has a mission statement and a code of conduct in which they aim to be “a different kind of company”, one that does not operate in the usual manner.

    IMHO, a “usual” company will point to their competitor and say “well, they are doing it, so why can’t we?”, whereas a “different kind of company” says to itself, “we’re going to do things this way, just because it is the correct, open, and transparent thing to do. No matter what anyone else is doing”.

    So I agree; if the Google push is toward openness in social networks and in mobile phones, the consistent thing to do would be to join the Open Content Alliance.

  3. Sorry for the burst of comments recently, but I was reminded of one more item relevant to this topic.

    In summary, Google is open about mobile platforms and social networks, but closed about book scanning (even when open options exist) and closed about their index.

    I just wanted to point out one more critical area in which they remain closed: Evaluation. I am talking about creating an open alliance, say between Google, Yahoo, MS and Ask, and pooling together thousands of anonymous user queries, pooling the top 10 results returned by each of these search engines, and then getting an independent set of third party evaluators to assess whether each link is or is not relevant to the user’s query. The evaluators would not know which search engine each result came from; in fact two search engines could easily return the same result.

    But by pooling the queries and the relevance assessments, you could then do an objective (or at least open, transparent, and standarized) evaluation on the relative quality of each engine by looking at standard metrics such as precision at 1 returned link, precision at 5 returned links, precision at 10, recall, and so on.

    It would also enable users to get a better sense of the types of queries that each search engine performed relatively better on. Does Yahoo work better on geographical queries? Does Google work better on scientific queries? I don’t know. But having an Open Evaluation Alliance would give user a better sense of this.

    I think I’ve said this on this blog before, but I have been at conferences where folks from other search engines have asked Google to open up Evaluation in this manner. And the requests have been refused.

    So even if Google never opens up its index, it should at least be willing to open up its results pages. Not to let people scrape the results or try to reverse engineer the algorithms, but to let a standardized body evaluate the results, and give some side-by-side performance comparisions between the various search engines.

  4. Is there a difference between platforms and products? Google has been pretty strong on open platforms: they’ve been widening the number of browsers their web apps work on, they’ve been releasing more client stuff for Macs and Linux boxes as well as Windows, sponsoring open source projects with the Summer of Code, pushing for network neutrality, etc. These are all platform things, not product things.

    I’m not sure the analogy to their indexes quite works, though. Companies of all sorts support open standards in their industries without giving away the products that conform to those standards. Auto companies may agree on fuel types and tire sizes without giving away cars…

  5. Wait, wait.. let me understand you correctly here, Hiroko. It sounds like you are saying that Google’s index is “copyrighted” intellectual property that they wish to retain control over. They do not want to allow just anyone to be able to come in and make a full copy of their index, right?

    But what if my argument is “Ok, I’m going to make a full copy of Google’s index. And I am going to build my own search engine on top of it. But I am only going to serve up a few links.. let’s say ten at a time.. out of the eight billion in the index, to any user that comes along and types a query.

    Basically, I am only going to show snippets of the index.

    So that would be totally fair use, right?! So Google should have no problem at all with anyone copying their index, because as long and they don’t distribute the entire index any further, and only show snippets of that index, everything is good, right?

    It sounds like this is the direction your argument is heading.

  6. The index itself isn’t published, so copyright doesn’t apply. Now, if they published the entire index, yes indeed that would be fair use–just like Wikipedia showing up in search results.

    As an analogy, consider Westlaw: all of the court decisions they index are public records–but the index itself (which they compiled at their own expense) has its own value as a compilation. I remember the web before Google–and I’m very comfortable saying that similarly, Google’s index and ranking provide additional value on top of the sites being indexed.

  7. I should be more explicit: I think that since the index is not published, it falls under trade secret law, not copyright. But I am not an intellectual property lawyer.

  8. Hiroko, yes yes, you are correct. It’s not published, so what I was saying above isn’t completely accurate. I left out a logical step in my argument; let me fill it in, now.

    To review, seven days ago I proposed that Google join the Open Content Alliance, because not to do so would be inconsistent behavior with their other recent open proposals.

    Your response was that platforms and products were different. That OpenSocial is a platform, and the BookScan index is a product, and that if they opened up the index, it would be like giving away property, the same way car manufacturers would be giving away cars.

    So my response was to the hypothetical, “what if” they did indeed open up their index. Ignoring for a moment the fact that Google would now also gain access to the indices of all the other people doing book scanning (which would have reciprocal value to Google), I was trying to imply that an open index would NOT actually be like a car manufacturer giving away cars.

    Because when a book publisher publishes a book, and then lets the library retain an open copy of that book, they are NOT giving away that book. Even though anyone is free to come read it, borrow it, use it, etc. the book, or the intellectual property contained within the book, still belong to the author/publisher.

    Google seems to agree with that. When Google goes in to the library, to now scan that book that they have not even purchased, they argue that they are NOT stealing anybody’s intellectual property. Even though they keep a full copy of the book in their system, they only serve snippets of the book to outside users.

    So yes, Google has not actually joined the Open Content Alliance, has not actually “published” their book scan index, and so of course no one can copy it. But, by analogy, IF Google joined the Open Content Alliance, it would NOT be giving away their product, like car manufacturers giving away cars. Google would still retain intellectual ownership of their index. And people who copied that index would, as per Google’s arguments, only be making a fair use copy, since no one would be republishing that index in full, only snippets (10 links out of 8 billion at a time.. very small amounts!)

    Therefore, Google should have no qualms with joining the Open Content Alliance. That is all that I am saying.

Leave a Reply

Your email address will not be published. Required fields are marked *