Site icon John Battelle's Search Blog

How Search Drove Generative AI – A Passage from “The Search”

It’s been fun to go back to Berkeley, where I first taught Journalism more than 20 years ago. I’m leading a seminar on how technology impacts journalism, with a particular focus on AI. The class asks students to read a bit of history – it’s hard to understand where we are if we don’t know how we got here. Search is a big part of that history, so I included a chapter of my first book – The Search – as a reading assignment.

As I prepared for class last week, I dug through my archives and unearthed The Search’s original manuscript. In the first chapter, “The Database of Intentions,” I opine on how search might lead to the development of AI that passes the Turing Test. Written 22 years ago, the passage anticipates the rise of generative AI. I start by drawing a distinction between data that is on our personal machines and data held in the cloud by large technology companies like Google. Then I think out loud a bit about where that all data might take us. Even though the writing is two decades old, it prompts some interesting questions about the moment in which we find ourselves.

When our data is on our desktop, we assume that it is ours. It’s my address book that lives in Entourage, my email attachments, and my hard drive inside my Powerbook. When I am looking for a file or a particular email message on my local files (when I am searching my local disk), I presume that my mouse-and-click actions – that of searching, finding, manipulating data – are not being watched, recorded, and analyzed by a third party for any reason, be it benign or malicious. (In certain workplaces, this is certainly no longer the case, but we’ll set that aside for now.)

But when the locus of computing moves to the web, as it clearly is for second generation applications like social networking, search, e-commerce, and the like, the law is far fuzzier. What of the data that is stored and created through interactions with those applications? Who owns that data? What rights to it do we have? The truth is, at this point, we just don’t know.

As we move our data to the servers at amazon.com, hotmail.com, yahoo.com, and gmail.com, we are making an implicit bargain, one that the public at large is either entirely content with, or, more likely, one that most have not taken much to heart.

That bargain is this: We trust you to not do evil things with our information. We trust you will keep it secure, free from unlawful government or private search and seizure, and under my control at all times. We understand you might use my data in aggregate to provide us better and more useful services, but we trust that you will not identify me personally through my data, nor use my personal data in a manner that would violate my own sense of privacy and freedom.

That’s a pretty large helping of trust we’re asking companies to ladle onto their corporate plate. And I’m not sure either we – or they – are entirely sure what to do with the implicit and explicit implications of such a transfer. Just thinking about these implications makes a reasonable person’s head hurt.

But imagine the disorientation you might feel if search becomes self aware – capable of watching you as you interact with it?

Search As Artificial Intelligence?

“I would like to see the search engines become like the computers in Star Trek,” Google employee number one, Craig Silverstein, often quips. “You talk to them and they understand what you’re asking.”

Silverstein, a soft-spoken paragon of Google’s geek culture, is hardly kidding. The idea that search will one day morph into a human like form pervades nearly all discussion of the application’s future. Asked at a conference how he’d best describe his search service, Ask Jeeves executive Paul Gardi replied: “(The android character) Data from Star Trek. We know everything you might need.”

But how might we get there? For search to cross into intelligence, it must understand a request – the way you, as a reader, understand this sentence (one hopes). “My problem is not finding something,” said Danny Hillis, a MacArthur-certified genius and computer scientist who now runs a consulting business. “My problem is understanding something.” That, he continued, can only happen if search engines understand what a person is really looking for, and then guide him or her toward understanding that thing, much as experts do when mentoring a student. Search, he continued, “is an obvious place for intelligence to happen, and it is starting to happen.”

So Hillis argues that the future of search will be more about understanding, rather than simply finding. But can a machine ever understand what you are looking for? Answering that question raises what is perhaps computing’s holiest of grails: passing the Turing test.

The Turing test, developed by British mathematician Alan Turing in a seminal 1950 article, lays out a model to prove whether or not a machine can be considered intelligent. While the test and its prescripts are subject to intense academic debate, the general idea is this: an interrogator is blindly connected to two entities, one a machine, the other a person. The questioner has no idea which is which. His task is to determine, through questioning both, which is man, and which is machine. If a machine manages to “fool” the questioner into believing it is human, it has passed the Turing test and can be considered intelligent.

Turing predicted that by the year 2000 computers would be smart enough to have a serious go at passing the Turing test. He was right about the “serious go” part, but so far, the prize has eluded the best and brightest in the field. In 1990 a wealthy oddball, Hugh Loebner, offered $100,000 to the first computer to pass the test. Every year, AI companies line up to win the honor. Every year, the money remains uncollected.

That may well be because, as with so many things, people are framing the problem in the wrong way. So far, contestants have focused on building singular “robots” which have millions of potential answer sequences coded in, so that for any particular question a plausible answer might be given. Perhaps the most famous of these efforts is CYC (pronounced “psych”), the life’s work of AI pioneer Doug Lenant. CYC attempts to conquer AI’s “brittleness problem” by coding in hundreds of thousands of “common sense” rules – mountains go up, then down, valleys are between hills or mountains, etc. – and then build a robust model based on those simple rules. Not surprisingly, a CYC alumnus, Srinija Srinivasan, was one of Yahoo’s first employees, and has run Yahoo’s directory- based search product from nearly day one.

But brute force by one organization has failed so far, and most likely will fail in the future. No, search will more likely become intelligent via the clever application of algorithms which harness and leverage the intelligence already extant on the web – the millions and millions of daily transactions, utterances, behaviors, and links that form the web’s foundation – the Database of Intentions. After all, that’s how Google got its start, and if any company can claim to have created an “intelligent” search engine, it’s Google.

“The goal of Google and other search companies is to provide people with information and make it useful to them,” Silverstein told me. “The open question is whether human-level understanding is necessary to fulfill that goal. I would argue that it is.”

What does the world want? Build a company that answers that question in all its shades of meaning, and you’ve unlocked the most intractable riddle of marketing, business, and arguably of human culture itself. And for the past few years, Google seems to have built just that company.

You can follow whatever I’m doing next by signing up for my site newsletter here. Thanks for reading.

Exit mobile version