Early in my ponderings around Google Book Search and the library program, I wondered:
First, who is making the money? Second, who owns the rights to leverage this new innovation – the public, the publisher, or … Google? Will Google make the books it scans available for all comers to crawl and index? Certainly the answer seems to be no. Google is doing this so as to make its own index superior, and to gain competitive advantage over others.
Well, the early results are in, and as Tim O’Rielly (a major publisher and a partner of mine) puts it, “Book Search Should Work Like Web Search.” But it doesn’t.
…maybe eventually, Google, and Microsoft, and Amazon, and the Open Content Alliance (OCA), and everyone else scanning books will come to parity, with all books included in all search engines, just as all web search engines with independent spiders converge on a roughly complete search index for the web. But scanning books is slower and more costly than spidering web pages, and in the meantime (and likely for a long time to come), the situation outlined above is likely to prevail.
In other words, book search is broken. The other piece to consider has to do with how book content is ranked (or not). From an old Sblog post:
But all this new Print material, well, it’s never been on the web before. It’s Google who is actively bringing it to us. How, therefore, does Google rank it, make it visible, surface it, and..importantly…monetize it? If a philanthropist were to drop the entire contents of the Library of Congress onto the web, Google would ultimately index it, and as folks linked to the content, that content would rise and fall as a natural extension of everything else on the web. But in this case, Google itself is adding content to the web, and is itself surfacing the content based on keywords we enter. This is a new role – one of active creator, rather than passive indexer.