Some folks have been calling me and together we’ve been pondering the implications of the Google Print announcement. And one drop dead obvious thing dawned on me during the conversations.
This is so obvious as to be almost embarrassing to restate, but this program marks a major departure in Google’s overall approach to search. After all, what has been the presumptive model till now? If it’s on the web and publicly available, it’s in the index. That’s why we called it web search, after all. But Gary Price and Chris Sherman, among many others, have reminded us how vast and darkly lit the invisible web is – all that information trapped in the amber of password-protected databases, or crumbling film libraries, or ….books.
Now other companies have taken significant steps toward illuminating these dark corners of the world’s knowledge web – Yahoo with its CAP program, Amazon with A9 and Search Inside the Book. And Google has long claimed that it’s mission was to go beyond the web and crawl the world’s information, wherever it lay.
But Google was, until now, the world’s purest web search engine. What, I wonder, are the implications of tens of millions of book pages entering this once pure space? (Google has announced that the results will be included in the index, not separated out in a vertical book search engine.)
Why am I on about this? Well, it comes down to the essence of what – so far – has made Google Google: the ranking paradigm. Here’s a sketch from the book I am working on:
In essence, academic publishing is a flawed but useful system of peer review incorporating ranking, citation, and annotation as core concepts. Fair enough. So what?
Well, in short, it was Tim Berners Lee’s attempt to address the drawbacks of this system (through network technology and hypertext) that led to his creation of the World Wide Web (4), and it was Larry Page and Sergey Brin’s attempts to make Berners Lee’s World Wide Web better that led to Google.
Which brings us back to Page, and his original research work focusing on backlinks. He reasoned that the entire web was loosely based on the premise of citation and annotation – after all, what was a link but a citation, and what was the text describing that link but annotation?
The point I’m making is this: Google was born of, by, and in the web, as an extremely clever algorithm which noticed the relationships between links, and exploited those relationships to create a ranking system which brought order and relevance to the web. Google’s job was not to build the web, its job was to organize it and make it accessible to us.
But all this new Print material, well, it’s never been on the web before. It’s Google who is actively bringing it to us. How, therefore, does Google rank it, make it visible, surface it, and..importantly…monetize it? If a philanthropist were to drop the entire contents of the Library of Congress onto the web, Google would ultimately index it, and as folks linked to the content, that content would rise and fall as a natural extension of everything else on the web. But in this case, Google itself is adding content to the web, and is itself surfacing the content based on keywords we enter. This is a new role – one of active creator, rather than passive indexer.
This means, in short, that Google is making editorial decisions about how to surface this new content, decisions it can’t claim are based on the founding principle of its mission – PageRank. Sure, there are straightforward keyword matching techniques, and over time the web will deep link those book pages – each page in Print has a unique URL. But really, the magic of what made Google Google – the existing link structure of the web – is entirely non-existent with these newly surfaced print pages. By extension, the same will be true for any new media brought into the index – be it movies, music, radio, television, photos, you name it. That’s why I’m so interested in what role Google will play in monetizing this content (see here and here) and why I am so fascinated with this media v. technology angle.
I guess the net net of all this is that this move by Google, which I think is monumental, marks a shift in who the company is in the world. It’s no longer simply an indexer of the world’s knowledge web. Google Print is a clear declaration that it’s a builder of it as well.