free html hit counter Google Library: Talk About a Long Tail... - John Battelle's Search Blog

Google Library: Talk About a Long Tail…

By - December 13, 2004

old book 6.gif
The NYT now reports on Google’s program to digitize some of the world’s most important libraries, and it is truly an amazing project. Google was founded at Stanford in partial association with that university’s digital library effort, so this must be a pretty proud day for Stanford, which is a participant, as well as the original Googlers. John Markoff spoke to Larry Page:

Mr. Page said yesterday that the project traced to the roots of Google, which he and Mr. Brin founded in 1998 after taking a leave from a graduate computer science program at Stanford where they worked on a “digital libraries” project. “What we first discussed at Stanford is now becoming practical,” Mr. Page said.

The details: Google is working with Stanford, the University of Michigan, Harvard, Oxford, and the New York Public Library to make millions of books available in its index. For now the project is in pilot phase, but there are hopes and expectations this will go big in the next few years. A source told me the project was originally named Google Library, but for now it will exist under the Google Print moniker. An example of Google Print is here. The screenshot at left is what I was provided by Google for today’s launch.

The implications here are significant. First, the idea that the world’s knowledge, as held through books and libraries, is opening up to all via a web browser cannot be understated. It’s one thing to have the an original copy of The Origin of Species on the shelves, where students and interested parties have to travel to find it. It’s another to have it available to everyone via a search index and your web browser. Second, this move clearly puts Google in the category of innovator when it comes to adding information to their index. But it also raises significant business model questions, one that are both exciting and unanswered. I brought them up in an earlier post:

A very interesting case will be Google Print. As that program expands, and it’s rumored that it will, dramatically, a number of questions arise. How will Google monetize out-of-copyright books? If it indeed does bring tens of thousands of out-of-print books onto the web and into its index, will it allow others to access and index that new treasure trove, or will it act more like a traditional media company, which would “own” that resource for itself? How will it choose what it brings into the index – those that might sell? Those that somehow are the most “in demand” by some measurable standard? With regard to books that are in print, will it limit itself to being soley an organizational tool supported by AdWords, or will it start to take a vig for books that are sold via the Google Print service (in fact, maybe it does already and I’m simply unaware of it – any publishers out there, let me know!)? And will the print model scale to television and movies or music?

Google Print already monetizes a selection of in-copyright books via advertising, and shares some of those revenues with the publishers. But it’s a very short distance between that and, say, an affiliate link to Amazon or any other booksellers for a cut of an in copyright sale. It’s also a very short route to the on demand publishing of an out of print and out of copyright book with a company that is set up to do such a deal, and I am aware of at least one that is about to launch that will provide just such a service. Of course, if you want an ebook, that can be arranged as well. For out of copyright books, the tail is extraordinarily long, and quite possibly very very profitable. In other words, this could well be a step toward diversifying Google’s revenue streams away from advertising and into direct sales and/or subscriptions – ie, the content business. As one source who is familiar with the industry tells me, Google is not doing this only out of the kindness of its heart – there is a lot of money to be made in selling books, in particular books with no copyright.

I did ask Adam Smith, a manager of the Print program at Google, how Google will decide which books get scanned first. He said quite forthrightly that he did not have a good answer for me on that yet. I’ve heard from others that for now it’s pretty random, but the question is important. As to whether Google will allow anyone else to index the books they scan, I am pretty sure the answer is no. After all, Amazon is also scanning books, and I am sure they aren’t letting others in on their hard work. I’ll repost if that turns out to be inaccurate. And of course there are other efforts, including Project Gutenberg and the Internet Archive. But now, we have a commercial giant who has both a mission-based (organize the world’s information and make it accessible) as well as a commercially viable reason to bring this information to the world. As David Hayes, a copyright lawyer at Fenwick who worked on this deal and who I’ve known from my own work with his firm put it: “This will create a revolutionary new information location tool that should be a benefit to the whole world.” I for one applaud the effort – it’s an example of enlightened capitalism, and I hope it thrives.

More here and here.

Update: I originally posted the wrong image, new image to come.

Related Posts Plugin for WordPress, Blogger...

12 thoughts on “Google Library: Talk About a Long Tail…

  1. Stephen says:

    [snip]It’s also a very short route to the on demand publishing of an out of print and out of copyright book with a company that is set up to do such a deal, and I am aware of at least one that is about to launch that will provide just such a service.[/snip]

    You mean, a company besides I’m curious! Who? Who?

  2. While the short-term focus on what’s happening with this Google announcement may be on the “legacy” books sitting in the stacks and basements in the libraries, it’s important to consider what the world may look like, going forward. Virtually every book now being written is being written digitally; what will “publishing” look like five years from now? Will one deliver a complete manuscript to the publisher, which will then “MIRV” it into submissions to the printer, to Amazon for “inside the book” search, and to Google? What *more* could be sent to either of the latter, given that they live with fewer format constraints than print?

    There was a fascinating development some ten+ years ago now, in the Intelligence Community, when NSA collection analysts started putting their names and phone numbers on intelligence cables, and we analyst/readers got a “direct line” back to the source… classification “sources and methods” constraints might have meant that we couldn’t be told any more than was contained in the cable, but we could provide direct feedback and guidance to collectors. I could see either Google or Amazon becoming services to enhance writer/reader communication, on beyond the obvious utility they provide to Search.

  3. How long will the book survive? Doesn’t searching within the book for precise passages tear apart the structure of the book? Good readers paintstakingly pour over the whole text for that gem. Now they only need to write a few good searches. The book is demystified – wripped apart. Reading will be increasingly selected by the readers’ previously conceived questions.

    The same forces against the structure of the book (as a whole) oppose the print industry. I don’t need a printer to send me the 20 page passage. I can handle that in 30 seconds. Readers will demand this incremental deliver. Publishers will deliver. Writers will groan.

    This isn’t a religious comment. It’s just an observation.

  4. John Beach says:

    I think what Google are undertaking in this program is truly fantastic! To have all that information available through Google will really bring information that would be otherwise unobtainable to the masses.

    I have always been frustrated that information held by institutions and certain libraries was only ever available to us by “invitation” or as in some cases where only one copy exists, by traveling halfway around the globe.

    I for one hope this sort of venture catches on!

  5. David Brake says:

    I have read disquieting rumours that the search results when you get them will be text as image and not as plain text – that’s the way it is presented in demos apparently. If so it would be terrible. I want to be able to copy and paste the text or download it into my Palm once I find it. It is public domain after all… Like John B I was a little worried Google would ensure you could only access the text via Google but it appears that (in the case of the U of Mich) they are going to make the results of their digitisation available directly to the instutition as well means it should be available via several interfaces and several search engines. Which makes me wonder why they would spend $$$ doing this? It’s an awfully expensive bit of good PR… I suppose it may still be *easier* to get at it via Google than an alternative search engine because they integrate things better and include more metadata…

  6. I was thinking on this whole Google/library thing during a long trip back to Michigan, and it occurred to me that having search entities like Google or its competitors arrange for book searching probably stunts the development of open standards and architectures for “blinded” searching (e.g., allowing one to search against a corpus derived from copyrighted works, and receiving pointers/clips sufficient to lead you on to purchase, or otherwise seek information from them). Each of the major search powers will likely create its own proprietary universe of searchability, where what might be better (e.g., more open to allow for other tools, competition, etc.) would be standards for any publisher to build toward.

  7. Greg says:

    While it may not be legal for people to scan and make available as text, books that are written more recently, it certainly is easy for them to do so. If it becomes routine for people to search books older than a certain date they will probably begin to wonder why they can’t search through books written slightly more recently which are not really recent or contemporary at all. Then we will see large numbers of people posting and downloading books with p2p networks as we are seeing now with music and movies.

    The 70+ year copyright laws are just going to have to go. They are dinosaurs from the period of the tyranny of geography, the scarcity of shelf space and the creation of the special interests themselves.

    I do not argue that there should be no copyrights, only that their duration should be short, as they were originally.

  8. Jim says:

    I think this is a great idea. As I understand it, on copyrighted materials Google will only retain exerpts. It seems to me that, beyond this, some kind of “pay per view” system could be worked out.
    I appreciate the issues of ownership and compensation, and these need to be honored.
    For myself, time always being at an essence, in Bloomington IN paying some outrageous parking fee to march 6 blocks in the hail and snow and spend 3 hours to find out that a book I MAY want is at some other library is a “No Way”. That stuff is for 100 years ago. Getting a fair chance at seeing that it may be what I want, and paying some nominal fee to see more, would be well worth the price. Would it be worth a couple of bucks? You Betcha!
    I think there’s an answer in here, somewhere, and I think everyone would benefit. This is a chance for libraries, which are such an incredibly valuable resource, to make the step into the next era.

  9. I’ve just been hanging out not getting anything done. What can I say? I’ve basically been doing nothing worth mentioning, but pfft. Not that it matters. Pretty much nothing exciting happening to speak of. I haven’t been up to much these days.

  10. I just don’t have anything to say. Not that it matters. Eh. I’ve just been staying at home doing nothing, but I don’t care. That’s how it is.

  11. I haven’t gotten anything done today. I feel like a fog, but what can I say? I’ve just been letting everything wash over me lately, not that it matters. Shrug.

  12. Polin Armsley says:

    begin to wonder why they can’t search through books written slightly more recently which are not really recent or contemporary at all. Then we will see large numbers of people posting and downloading books with p2p networks as we are seeing now with music and movies.