John Battelle's Search Blog Thoughts on the intersection of tech, business, and society.

On AI: What Should We Regulate?

I’ve been following the story of generative AI a bit too obsessively over the past nine months, and while the story’s cooled a bit, I don’t think it any less important. If you’re like me, you’ll want to check out MIT Tech Review’s interview with Mustafa Suleyman, founder and CEO of Inflection AI (makers of the Pi chatbot). (Suleyman previously co-founded DeepMind, which Google purchased for life-changing money back in 2014.)

Inflection is among a platoon of companies chasing the consumer AI pot of gold known as conversational agents – services like ChatGPT, Google’s Bard, Microsoft’s BingChat, Anthropic’s Claude, and so on. Tens of billions have been poured into these upstarts in the past 18 months, and while it’s been less than a year into since ChatGPT launched, the mania over genAI’s potential impact has yet to abate. The conversation seems to have moved from “this is going to change everything” to “how should we regulate it” in record time, but what I’ve found frustrating is how little attention has been paid to the fundamental, if perhaps a bit less exciting, question of what form these generative AI agents might take in our lives. Who will they work for, their corporate owners, or …us? Who controls the data they interact with – the consumer, or, as has been the case over the past 20 years – the corporate entity?

Leave a comment

The Sites That Never Get Built: Why Today’s Internet Discourages Experimentation

The Dude knows the pitfalls of scattering a loved ones’ ashes…

Every so often I get an idea for a new website or service. I imagine you do as well. Thinking about new ideas is exciting – all that promise and potential. Some of my favorite conversations open with “Wouldn’t it be cool if….”

Most of my ideas start as digital services that take advantage of the internet’s ubiquity. It’s rare I imagine something bounded in real space – a new restaurant or a retail store. I’m an internet guy, and even after decades of enshittification, I still think the internet is less than one percent developed. But a recent thought experiment made me question that assumption. As I worked through a recent “wouldn’t it be cool” moment, I realized just how moribund the internet ecosystem has become, and how deadening it is toward spontaneous experimentation.

3 Comments

Google Will Become the World’s Largest Subscription Service. Discuss.

Those of you who’ve been reading for a while may have noticed a break in my regular posts – it’s August, and that means vacation. I’ll be back at it after Labor Day, but an interesting story from The Information today is worth a brief note.

Titled How Google is Planning to Beat OpenAI, the piece details the progress of Google’s Gemini project, formed four months ago when the company merged its UK-based DeepMind unit with its Google Brain research group. Both groups were working on sophisticated AI projects, including LLMs, but with unique cultures, leadership, and code bases, they had little else in common. Alphabet CEO Sundar Pichai combined their efforts in an effort to speed his company’s time to market in the face of stiff competition from OpenAI and Microsoft.

3 Comments

Digital Is Killing Serendipity

The buildings are the same, but the information landscape has changed, dramatically.

Today I’m going to write about the college course booklet, an artifact of another time. I hope along the way we might learn something about digital technology, information design, and why we keep getting in our own way when it comes to applying the lessons of the past to the possibilities of the future. But to do that, we have to start with a story.

Forty years ago this summer I was a rising Freshman at UC Berkeley. Like most 17- or 18- year olds in the pre-digital era, I wasn’t particularly focused on my academic career, and I wasn’t much of a planner either. As befit the era, my parents, while Berkeley alums, were not the type to hover – it wasn’t their job to ensure I read through the registration materials the university had sent in the mail – that was my job. Those materials included a several-hundred-page university catalog laying out majors, required courses, and descriptions of nearly every class offered by each of the departments. But that was all background – what really mattered, I learned from word of mouth, was the course schedule, which was published as a roughly 100-page booklet a few weeks before classes started.

6 Comments

At Threads, No News Is Good News, For Now. But That’s About To Change.

Threads is a week old today, and in those short seven days, the service has lapped generative AI as the favorite tech story of the mainstream press. And why not? Threads has managed to scale past 100 million users in just five days — far faster than ChatGPT, which broke TikTok’s record just a few months ago. That’s certainly news — and news is what drives the press, after all.

Threads has re-established Meta as a hero in tech’s endless narrative of good and evil — I can’t count the number of posts I’ve seen from influential public figures joking that, thanks to Threads, they actually like Mark Zuckerberg again. And Meta can certainly relish this win — the company has been the scapegoat for the entire tech industry for the better part of a decade.

But were I an executive at Meta responsible for Threads, I’d not be sleeping that well right about now. As they well know, the relationship between the tech industry and the press can shift in an instant. Glowing stories about breaking app download records can just as quickly become hit pieces about how Meta has leveraged its monopoly position in social media to vanquish yet another market, killing free speech and “real news” along the way. So far that story has been confined to the fringes of Elon’s bitter troll army over on whatever remains of Twitter these days, but should Threads lap Twitter as the largest app focused on creating a “public square” — whatever that means — the worm will quickly turn.

Meta has a tiger by the tail here, and so far, they’ve been working hard to tamp down expectations. Both Zuckerberg and Instagram CEO Adam Mosseri have been active on Threads, posting daily with both practiced humility (“gosh this thing is succeeding well beyond our expectations,” “we’re just at the starting line,” “we know we’re over our skis”) and reminders about how Threads isn’t like Twitter. Mosseri, for example, has downplayed the role of news — Twitter’s main differentiation and its endlessly maddening Achilles hell; Zuckerberg’s first Thread defined his new service as “an open and friendly public space” — prompting Musk to fire back that he’d rather be “attacked by strangers on Twitter” than live in “hide the pain” world of Instagram.

But The News — with all of its complications — is coming for Threads. I left Twitter more than six months ago, and while I sometimes missed feeling connected to the real time neural net the app had become for me, I almost instantly felt better about both myself and the world. Living on Twitter means navigating an unceasing firehose of toxicity, and Musk’s interventions only worsened the poisonous atmosphere of the place. I joined Threads a half hour after it launched, and indeed, it was a giddy place, its initial users basking in the app’s surprising lack of toxicity.

Other journalists have noticed the same thing. For now, the narrative around Threads centers on its extraordinary growth, but a close second is how “nice” the place feels compared to Twitter. Meta executives would like to keep it that way — combining “what Instagram does best” with “a friendly place for public conversation,” as Zuck put it in his first post.

To that fantasy, I say good luck to you, Mr. Zuckerberg. Keeping Threads “nice” means controlling the conversation in ways that are sure to antagonize just about everyone. No company — not Facebook, not Instagram, not Reddit, and certainly not Twitter, has figured out content moderation at scale. If, as Zuckerberg claimed, the goal with Threads is to create a “town square with more than 1 billion people,” the center of that square will have to contain news. And news, I can tell you from very personal experience, is the front door to a household full of humans screaming at each other.

“Politics and hard news are inevitably going to show up on Threads,” Mosseri told the Hard Fork podcast last week, “But we’re not going to do anything to encourage those verticals.”

I’ll have more to say about that sentiment in another post, but for now, I’ll leave it at this: When Threads hits 300 million active users — roughly the size of Twitter — the love affair between the press and Threads will more than likely come to an end.

—

I’ll be talking to Meta’s head of advertising Nicola Mendelsohn at P&G Signal tomorrow. You can register here for free.

You can follow whatever I’m doing next by signing up for my site newsletter here. Thanks for reading.

Leave a comment

Threads: We Don’t Want to “Hang Out With Everybody.” Sometimes, We Want To Leave.

Apparently the open web has finally died. This the very same week Meta launches Threads, which, if its first day is any indication, seems to be thriving (10 million sign ups in its first few hours, likely 50 million by the time this publishes…).

But before Threads’ apparent success, most writers covering tech had decided that the era of free, open-to-the-public, at scale services like Twitter, Reddit, and even Facebook/Insta is over. I’ll pick on this recent one from The Verge: So where are we all supposed to go now?

The piece argues that the decline of Twitter (Elon’s killing it), Reddit (it’s killing itself), and Instagram (it’s just entertainment now!) has left “an everybody-sized hole in the internet. For all these years, we all hung out together on the internet. And now that’s just gone.”

Umm…no. And not because of Threads (I’ll get to that in a minute). We never did “hang out together on the internet.” Anyone who knows Twitter knows it’s always been a cliquey echo chamber run by public narcissists. Reddit’s always been where a relatively small group of highly disaffected kids make fun of…everyone. And Instagram? Last I checked, it was still growing – even before Threads. Besides, no one ever “hung out” on Insta, I mean, it started as a photo service, remember? Complaining that it’s become an entertainment service is equivalent to moaning that TikTok is unusable because you’re getting old. Oh wait, Verge’s cousin Vox has already done that too.

Sure, you can “hang out” on some random subreddit, or get into endless flame wars with 12 other idiots on Twitter, or join an Instagram Live with a few hundred other voyeurs, but…that’s certainly not “everyone hanging out together on the Internet.” The very idea is ridiculous. We’re not built to “hang out with everyone,” and we never will be. Many of us, me included, are built to hang out with about six people at a time. And they change depending on context.

Trend pieces noting that the web has changed aren’t annoying because they’re wrong (of course the web is changing), they’re annoying because they miss the core problem: Centralization. We’ve been living in a centralized web world for more than a decade now, one where all the data, graphs (social, commercial, etc), and value are concentrated and managed by large corporations hell bent on protecting their most precious resource – your attention. To make sure you keep paying attention, corporations have made it very, very difficult to do the one thing all of us want to do from time to time: We want to leave.

The problem with the past ten or so years of Internet history is that we couldn’t leave when we wanted to – at least not without severe penalty. When I left Twitter last November, for example, I instantly lost a social graph I had built over 15 years, tens of thousands of my posts, an audience of nearly 300,000, not to mention my primary real-time news and information source. I couldn’t take any of that with me as I decamped to Twitter imitators like BlueSky or Mastodon. Neither of them had the rich networks of people that Twitter once had, and they were much the poorer for it.

But what they did have was compelling: A decentralized model that promised that, if I wanted to leave again, I could bring the value I helped create anywhere I wanted to. Both BlueSky and Mastodon are built on published protocols – essentially technology specs that other developers and entrepreneurs can leverage to build competing (or complementary) services. One of the most popular of these protocols is called ActivityPub – that’s what powers Mastodon. And in one of the smartest moves I’ve seen out of Meta in ages*, Instagram’s Threads will support ActivityPub.

Threads is built on top of Instagram’s social graph, which means if you’ve created value on that network, you’ll instantly have value on Threads. I have several thousand followers on Insta, an artifact of my early use of the place (I stopped posting regularly years ago). But when I joined Threads last night, I already had thousands of latent connections from Insta, and that network resurfaced almost immediately. People with super active Insta handles saw this effect in a much stronger way – in essence, Meta has created another way to create engagement across its network, so bully for them.

But if Meta keeps its promise to incorporate ActivityPub, that engagement and the social graphs driving it can be exported to any other service that supports the ActivityPub protocol. This means that if Threads turns into a Twitter-like hellscape in coming years, we can all take our attention, and our data, to a competing service like Mastodon. That kind of competitive threat undermines the web’s current business model of centralized, locked-in attention farming. You know, the very model upon which Facebook built an empire. Before yesterday, you couldn’t take your Instagram social graph and its related data to anywhere else on the web. But with Threads, you can. That’s progress.

For more than a decade I’ve been railing about how we’ll never get a truly open, highly innovative Internet until it becomes possible to build services that share data through standardized, easy to use protocols. I called these services “meta services” – services that thrive above the control of any one platform. In one stroke, Meta has capitalized that phrase (in every meaning of the term) and staked out the high ground – declaring itself willing to compete not on its ability to lock your data into a silo, but to provide you a superior service that keeps you engaged regardless of your ability to leave. This will prove extremely valuable for public dialog – a use case that has suffered massively thanks to the terrible incentives created by the attention economy. And for that, I tip my cap to Meta. Never thought that day would come, but here it is.

*Two other smart moves from Meta recently: Open sourcing its LLM, and naming Threads based on Twitter terminology.

—

You can follow whatever I’m doing next by signing up for my site newsletter here. Thanks for reading.

Leave a comment

Come With Me on a Spin Through the Hellscape of AI-Generated News Sites

Welcome to the hellscape of “Made for Advertising” sites

This past Monday NewsGuard, a journalism rating platform that also analyzes and identifies AI-driven misinformation, announced it had identified hundreds of junk news sites powered by generative AI. The focus of NewsGuard’s release was how major brands were funding these spam sites through the indifference of programmatic advertising, but what I found interesting was how low that number was – 250 or so sites. I’d have guessed they’d find tens of thousands of these bottom feeders – but maybe I’m just too cynical about the state of news on the open web. I have a hunch my cynicism will be rewarded in due time, once the costs of AI decline and the inevitable economic incentives that have always driven hucksters kick in.

Given 250 is a manageable number for a mere mortal, I decided to ask the good folks at NewsGuard, where I’m an advisor, for a copy of their listings. Nothing like a tour through the post-apocalyptic hellscape of our AI future, right?

What I found was…disappointing. Most of the sites were beyond shoddy – barely literate, obviously automated, full of errors and content warnings, and utterly devoid of any sense of organizational structure. The most common message, upon clicking on a story link, was some variation of an OpenAI violation:

Not exactly a compelling headline. The next most common experience was this:

This of course is evidence that the scammers are rotating URLs to avoid blacklisting, unburdened of any concern about building audience loyalty. Beyond OpenAPI warnings and 404s, there’s the browser warnings that the site you’re about to visit is, well, seedy:

When you do get an actual news experiences, it becomes clear that these publishers have little interest in passing as “real news sites,” IE publications a sane person might intend to visit. They are instead built as SEO chum in the hopes that Google’s indexes might favor them with some low quality traffic, or worse, as destinations for bot traffic destined for arbitrage inside the darker regions of the programmatic ad universe. The editorial decisions on the various home pages I visited were, well, hilariously inchoate:

Perhaps that’s what we should expect with the first phase of this particular genre, but I found their general awfulness depressing: Most reporters will look at these sites and dismiss them. But they shouldn’t.

Traditional “made for advertising” sites already control 21 percent of all programmatic advertising revenues, and these sites tend to dominate Google search results, enshittifying the open web with low-calorie crap that, one would hope, actually good AI might help us avoid. But the relatively low volume of AI sites indicates, at least anecdotally, that so far the economics of replacing human-built content with AI-driven drivel have yet to kick it. Put simply, it’s still too expensive to replace sites like Geeky Post or Explore Reference with AI. For now.

But when costs come down, I expect made for advertising sites will pivot to AI almost overnight. And I wonder if that’s a bad thing. Once the web’s worst sites all shift to AI-driven output, perhaps they’ll find themselves in a positive spiral of competition for actual human attention. If these sites start to create reasonably high quality content, and search and social start to reward them with real traffic that converts to revenue, perhaps we can simply automate away the shitshow that the open web has become.

One can dream.

—

You can follow whatever I’m doing next by signing up for my site newsletter here. Thanks for reading.

Leave a comment

Asking The Stupid Questions of GenAI

I recently caught up with a pal who happens to be working at the center of the AI storm. This person is one of the very few folks in this industry whose point of view I explicitly trust: They’ve been working in the space for decades, and possess both a seasoned eye for product as well as the extraordinary gift of interpretation.

This gave me a chance to ask one of my biggest “stupid questions” about how we all might use chatbots. When I first grokked LLM-driven tools like ChatGPT, it struck me that one of its most valuable uses would be to focus its abilities on a bounded data set. For example, I’d love to ask a chatbot like Google Bard to ingest the entire corpus of Searchblog posts, then answer questions I might have about, say, the topics I’ve written about the most. (I’ve been writing here for 20 years, and I’ve forgotten more of it than I care to admit). This of course only scratches the surface of what I’d want from a tool like Bard when combined with a data set like the Searchblog archives, but it’s a start.

My friend explained that my wish is not possible now, despite what Bard confidently told me when I asked it directly:

Well, no. Bard hallucinated all manner of bullshit in its answer. Yes, I write about technology, but not the Internet of things. I guess I write about society, but mainly in the context of policy and consumer data, not “education, healthcare, and the environment.” Culture? When’s the last time you’ve seen me write about movies?! And if I ever start writing about “personal development,” please put one between my eyes.

Bard’s list of supposed articles was even funnier – it reads like an eighth-grade book report culled from poorly constructed LinkedIn clickbait. Bard is a confident simpleton, despite its claim to be able query specific domains (in this case, battellemedia.com). I responded to Bard with this new prompt: “This is not right. That site does not cover music, movies. Nor does it do motivation, well being, productivity. Why did you answer that way?” Bard’s answer was … pretty much the same, though it did clumsily incorporate my corrections in its response:

Gah. My next prompt was an attempt to clarify where Bard was getting its answers, since it was clearly not using the battellemedia.com domain. “Are you actually referring to content on the site to do these answers?”

Bard’s answer:

Ok, then, at least we’re getting some honesty. I decided to try one last time:

Now this was quite the freshly whipped bullshit: Actual percentages of how the content on my site breaks down! Unbeknownst to me, more than one in ten of my posts are about cybersecurity – a topic I’ve rarely if ever written about here.

Ok, enough beating up on poor Bard. My well-placed friend explained that while it’s currently out of scope for a standard chatbot like Bard or ChatGPT to do what I’m asking of it, “domain specific” queries was a hot area of development for all LLMs. So when will it happen? My friend didn’t commit to an answer on that, but I did get the sense it’s coming soon. The ability to apply LLM-level intelligence to large data sets is just too big an opportunity – in both B2C as well as B2B/enterprise markets.

A big reason this is taking more time than I’d like is cost. Noted AI investor Andreessen Horowitz recently posted a long explainer on the state of LLM models, but it all comes down to this money quote: “Today, even linear scaling (the best theoretical outcome) would be cost-prohibitive for many applications. A single GPT-4 query over 10,000 pages would cost hundreds of dollars at current API rates.” By my estimates, this cost would need to come down at least four orders of magnitude – from hundreds of dollars per query to pennies – to unlock the kind of magic that I’ve been dreaming about over the past few months. Not to mention all the technological machinations related to prompt handling, vector database management, orchestration frameworks, and other stuff that makes my brain hurt. But the good news, despite my rather pessimistic post from earlier this week, is that the good shit’s coming – we just need to be a bit more patient.

—

You can follow whatever I’m doing next by signing up for my site newsletter here. Thanks for reading.

Leave a comment

It Takes Time, And It Ain’t For Sure.

Not since the iPhone, in the mid aughts. No, not since the rise of the browser and the original web, in the early nineties. No, not since the introduction of the PC, in the 1980s. Ah hell, honestly, not since the Gutenberg printing press in the 15th century – or, fuck it, let’s just go there: Not since the invention of language, which as far as we know marked the moment when homo sapiens first branched from its primate cousins.

That’s how big a deal AI is, according to academics, politicians, and a rapt technology and capital ecosystem starved for The Next Big Thing.

I tend to agree. First we created language, then we created its digital doppelganger with computer code, and with generative AI, we’re melding the two into a shimmering and molten fun house mirror, one that forces us to question our very consciousness. What the hell does it mean to be human when we’ve created machines that seem to transcend humanity?

“…the Digital Revolution is whipping through our lives like a Bengali typhoon…[bringing] social changes so profound their only parallel is probably the discovery of fire.”

Ah, fire. I forgot about fire, which likely preceded language by a good 50,000 years. Those lines introduced the very first issue of Wired magazine 30 years ago. As founders we were convinced every aspect of society would be reshaped – our culture, our economy, our social lives, our faiths, our sense of self. In those early days we were essentially a cult, a non-denominational sect stoned on a buoyant certainty that we were right – that technology offered all of us an offramp from the tired shit-show of the industrial revolution. Of course the Internet was going to rewire everything – it was obvious. If you didn’t see that coming, you just weren’t paying attention. Our job was to slap you into seeing what was right in front of our eyes: The future, coming fast, screaming into our face with possibility and promise.

And now, here we are. The starting gun has been fired once again- this time the release of ChatGPT. After a decade of trillion-dollar platform consolidation based on surveillance capitalism and trickle-down innovation, tech once again brims with optimism, with that original possibility and promise.

If, that is, we don’t fuck it up by forcing our new tools into the structures of the past.

Yesterday Fred posted about voice input over on AVC, and it reminded me how long it takes for consumers to adopt truly new behaviors, regardless of how enthusiastic we might get about a particular technology’s potential. As Fred points out, voice input has been around for a decade or so, and yet just a fraction of us use it for much more than responding to texts or emails on our phones.

While tens of millions of us have begun to use generative AI in various ways, its “paradigm shifting” impacts are likely years away. That’s because while consumers would love to have AI genies flitting around negotiating complex tasks on our behalf, first an ecosystem of developers and entrepreneurs will have to do the painstaking work of clearing the considerable brush which clogs our current technology landscape – and it’s not even certain they’ll be able to.

Some historical context is worth considering. When the World Wide Web hit in 1993, I was convinced this new platform would change everything about, well, everything. Culture, business, government – all would be revolutionized. 1993 was the year Wired first published, and we took to the technology with abandon. We launched Hotwired, one of the first commercial websites, in 1994- but quickly realized the limitations of the early Web. There was no way to collect payment, serve advertising, or even identify who was visiting the site. All of those things and more had to be invented from scratch, and it took several years before the entrepreneurial ecosystem ramped up to the challenge. Then, of course, the hype overwhelmed the technology’s ability to deliver, and it all came crashing down in 2001.

Fast forward to the launch of the iPhone in 2007, and once again, everyone was convinced the world was going to change dramatically. But Airbnb launched in late 2008, Uber in 2009, and both didn’t gain widespread traction until 2011 or 2012. It took another seven to nine years for these two stalwarts of the mobile revolution go public. Along the way tens of thousands of smaller companies were building apps, exploring new opportunities, and generally laying the groundwork for the world as we know it today. But to win, they learned that they had to play by the increasingly rigid policies of the dominant platforms: Apple, Google, Amazon, and Facebook. The dream of “Web 2” – where the Internet would be an open platform allowing innovation to flourish – never truly materialized. The platforms became some of the largest corporations ever to roam the earth, and quite predictably, enshittification followed.

So while many of us are currently enraptured with the rise of generative AI, it’s worth remembering that despite the technology’s huge potential, this will all take time. And unlike 1993, when the Internet was literally a blue ocean opportunity, or 2007, when smart phones were as well, this time everyone’s in on the joke. Yes, billions upon billions of venture capital is now being deployed against what feel like unlimited opportunities in the space, but these new startups will have to battle deeply entrenched incumbents with almost no interest in seeing their moats breached.

Thirty years after the first issue of Wired, it’s still making for one hell of a story.

—

You can follow whatever I’m doing next by signing up for my site newsletter here. Thanks for reading.

6 Comments

Can Gather Change the Course of Internet History?

The Gather founding team from top left: Zan Doan, CTO, Sudhir Kandula, COO, Mengmeng Chen, Cofounder & CPO, Sumit Agarwal, Cofounder & CEO

A few weeks ago I was genuinely thunderstruck. My co-editor at P&G Signal (thanks Stan!) introduced me to a new company – one that promised to give consumers control over their personal data in new and innovative ways. At first I was skeptical – I’d seen quite a few “personal data lockers” come and go over the past decade or so. I even invested in one way back in 2012. Alas, that didn’t work out.

For as long as I can remember, I’ve been writing – over and over and over – about how the Internet’s central problem is the lack of leverage that consumers have over the data they co-create with the hundreds of apps, sites, and platforms they use. But data lockers never got any traction – most were confusing to install and run, and they all suffered from a lack of tangible consumer benefits. Sure, having a copy of all my personal data sounds great, but in the end, what can it do for me? Up till now, the answer was not much.

It was with all those caveats – and honestly pretty low expectations – that I took a meeting with Sumit Agarwal and his team at Palo Alto, CA-based Gather, an early stage startup still in its first year of operation. Fifteen minutes later I was hooked – here was a company that was addressing the “what can my data do for me” problem by building out a generative AI agent that just might spark the kind of personal data revolution I’ve been writing about for more than a decade. And this was no fly-by-night startup – the company’s founders, team, and investors are all deeply experienced in AI, Internet security, scaled engineering, product design, marketing, and much more.

Before diving in, a caveat: Gather is still at a very early stage, as is the overheated AI ecosystem in which Gather’s products will eventually live. Agarwal told me he’s not even sure if his company will be called Gather by the time its first product becomes available later this year. In addition, the company faces fearsome obstacles to success – including entrenched platform players like Google, Amazon, and Apple, whose business interests do not align with the concept of a newly empowered consumer base. While I usually like to write about companies and products that readers can use immediately, I’m breaking that rule for Gather. No matter the business you’re in, it will pay to understand the shift in consumer behavior that tools like Gather could unlock. And as I said before, this company is the first I’ve seen that has assembled the team, vision, and execution chops to pull it off.

Timing Is Everything

With startups, timing is everything. There’s only so much you can control – what you make, how you spend your investors’ money, the people you hire. But nearly everything else is driven by externalities you must navigate. Is the technology ecosystem capable of supporting your vision, or is your product ahead of its time? Netflix, for instance, had to wait until broadband was pervasive enough to launch its streaming service. Are consumers ready for your idea, or is it out of sync with their expectations? Uber and Airbnb faced this challenge in their early years. Will huge competitors copy your idea, or change their policies and make it impossible for you to thrive? Ask Yelp how it feels about Google’s review summaries, or ask Epic Games about Apple’s 30 percent tax in the app store.

Gather faces all these timing challenges and more, but the company does have one huge tailwind: AI is hot, and investors can’t get enough of it. This past month alone, VCs poured more than $11 billion into AI startups, up 86 percent from a year ago. But while AI funding tipped into a frenzy with OpenAI’s launch of ChatGPT last November, Gather managed to raise an impressive seed round five months earlier, in June of 2022. Agarwal and several of his co-founders were already seasoned operators with a billion-dollar exit in the Internet security sector (Shape Security, sold to F5 three years ago). Gather’s $9 million round was led by general partners at respected firms Bain, Floodgate, and Wing Ventures, with participation from experienced Valley angels like Gokul Rajaram – an investor and director at The Trade Desk, Pinterest, and Coinbase, among many others – and Vivek Sharma, the co-founder and CEO of Movable Ink.

“I’ve known Sumit for about 15 years,” said Gaurav Garg, founder of Wing Venture Capital and early investor and board member at Gather. “He has a deep background in consumer and enterprise products used by hundreds of millions of people, as well as exposure to government policy, across technology areas including security, identity, privacy, and e-commerce.” Garg also noted that Sumit has the attributes of a great founder – drive, persuasiveness, perspective, and a learning mindset – crucial when you’re looking to reimagine something as big as how consumers will interact with AI and the Internet.

“The team are all exceptional founders,” added Bain Capital Venture’s Ajay Agarwal, who’s known and worked with several of the Gather team over the past 25 years. He added a key observation: The tech ecosystem is at an inflection point – mainstream devices like phones and computers can now power distributed platforms like Gather, large language models have evolved to conversational levels, and there’s even just the right amount of government regulation to create conditions for a sea change in how consumers control their data. We’ll get into all of that, but first, the product.

First, Gather The Data

My Amazon data in Gather: Who knew I bought that back in 2006?!

Gather acts as your trusted and secure agent, logging into various data-hoarding services like Google (think Maps, Search, Mail, Android Play, YouTube), Amazon, Uber, Strava, and many more. At your direction, Gather then downloads copies of your data from each service to your local device – a right codified into law by the 2018 European General Data Protection Regulation (GDPR) and adopted, in broad strokes, by several states in the US – California chief among them. Till Gather came along, no one had built a service that automates what is otherwise a tedious and frankly pretty pointless process – almost no one actually downloads copies of their data from online services, because, as we’ve already established, there was simply no use case for doing so.

But once you have a critical mass of that data, and it’s organized in a way where questions can be asked of it, a whole new world opens up. Gather added me to a pre-release version of its platform, and it was magical to watch the service engage with Amazon, Uber, Google, Twitter, Venmo, Strava, and many others. Within minutes, copies of my data were presented and organized in my Gather app. And that’s when things start to get interesting. As Gather marketing lead Niki Aggarwal pointed out, now that I had the data in my control, I could start to ask it questions. What kind of bias, if any, might be evident in the stories I was reading from Google News? How much did I spend on Amazon each month, and would I have saved money if I had bought those same items at Walmart? Was there an optimal time of day to get a personal best when riding on Strava?

Of course, this is only the tip of the proverbial iceberg when it comes to what’s possible with data sets like these. Sure, it’s cool to have a copy on my own device, in my control. But the real fun will start when you add a personalized AI agent capable of instantaneously answering those initial questions, as well as conjuring up ones you’ve not thought to ask. Even more exciting, imagine that same agent as a trusted confidant, leveraging your data as it interacts with the rest of the online world. Now that’s a consumer benefit – a personal agent that knows my preferences and can, say, plan a complicated business trip, or negotiate for the best price for an item I want to buy online. The time savings alone make the idea compelling. I’ve come to call such agents “genies” – because they work only for you, and they can produce all kinds of magic (and as a bonus, they aren’t limited to three wishes!).

Where Genies Play

Gather hasn’t released its “genie” yet, but it’s working on it. Codenamed “Sidekick,” the product will consist of several elements. First is your personal datastore, which lives on your own device and remains under your control at all times. Second is the Sidekick agent, which Agarwal describes as “an AI product that proactively and intuitively helps you.” He continues: “We are trying to get away from ‘blinking cursor in an empty text box’ and get to ‘intelligent character that thinks about your needs, intuits your desires, [and] acts on your behalf.'” Gather’s third element is a platform that manages how outside organizations interact and create value with your data.

Agarwal offers an example of how Sidekick might work: “You visit Amazon Music, and there’s an offer for six months of free service if you upload your Spotify play history. Your Gather Sidekick allows you to upload with a single click. Your data moves to an external service (Amazon Music) and you get value. You can think of the same example in many contexts – food/dieting, fitness, other entertainment apps, medical apps, etc. The concept is simply that a specific – and sometimes complex – “slice” of your data needs to move somewhere in order for you to receive some value (economic or otherwise).” (This example calls to mind my 2018 piece, in which I imagined how Walmart might compete with Amazon online by leveraging a consumer’s Amazon purchase history.)

Agarwal’s second example envisions an instance where you don’t necessarily trust the service that wants to leverage your data. Imagine, for example, that a third-party developer has created “the ultimate music recommendation app.” It sounds appealing, but you’re wary of uploading your Spotify or Apple Music data to a service that has yet to prove it’s trustworthy. In this case, Gather becomes a secure platform that runs on your device. “With Gather,” Agarwal explains, “that recommendation app can run locally in your environment. This is a win for you because you have no data sharing concerns, so you can comfortably let the app engage with your data to get the very best recommendations. The Gather platform keeps your data local but publishes the schema so developers know how to interact with the platform – without seeing your data.”

I think of Gather’s platform as like Apple’s app store or Google Play, but with one critical difference: The power to decide who gets access to the platform resides not with a massive corporation, but with you, the consumer. This seemingly small distinction is in fact a massive shift in power, agency, and value from the centralized model of Web 2 toward a decentralized vision more aligned with the original architecture of the Internet – pushing intelligence and control to the edge of the network.

The Iceberg Metaphor

Over the course of many emails, calls and Zoom meetings with Agarwal and his co-founders Sudhir Kandula and Mengmeng Chen, both colleagues with Agarwal at Shape Security, we touched on topics as varied as security and privacy, Internet history, and information theory. Gather emerged from its founders’ dissatisfaction with what author and Internet OG Cory Doctorow calls the Internet’s “ensh*ttification.” As large companies have consolidated control of our online lives, our experiences have begun to degrade. This is why so many technology observers are excited by Microsoft’s integration of ChatGPT into its Bing search service. Google search is so clogged up with low-quality results and ads, Microsoft’s “conversational search” promises a better, clutter-free user experience. But Agarwal sees many more use cases beyond search – and to understand how it might work, it’s worth a dive into what he calls “the iceberg metaphor,” a visualization of how humans might best communicate with infinitely capable AIs.

As we know, 90 percent of an iceberg is underwater. At its tip – the 10 percent – is human interaction with AI – the prompts we type, or soon, the words we utter. That interaction is limited by our ability to speak or type – which compared to machines, is very low bandwidth, about 50 words per minute. But human speech is richly nuanced, and informed by executive function – this is where decision making occurs.

It’s in the 90 percent underwater where AI can excel. Machines can speak to other machines at mind bending speeds – one AI genie speaking to countless others, negotiating information demands, price comparisons, complex, multi-step transactions like scheduling a meeting or building a travel itinerary. “Underneath the water the GPT is listening, watching, reading, and comprehending on your behalf,” Agarwal says. “It’s unconstrained by our puny 50 word-per-minute input.”

The key to the iceberg model is that ten percent – no substantive decisions are taken, no meaningful action, until the human in charge says so. Good genies will surface questions and clarifications at the speed of human language, then dive back below the surface to negotiate next steps in the underwater world of machine-to-machine communication.

Will Tech Giants Let It Happen? The Plaid Example.

For Gather to scale, it needs hundreds of thousands, if not millions, of people to engage with its platform. Agarwal is reserved when pressed on the use cases that might drive that engagement, but it’s not hard to imagine any number of “killer apps” that could get the startup to its first million users. But as challenging as scaling an initial user base may be, Gather faces an even larger threat: The data use policies of big platforms like Amazon and Google. Regulations like GDPR guarantee a user’s right to access and download their own data, but most big tech platforms have “terms of service” policies that prohibit automated retrieval of user data. These policies are ostensibly in place to counter malicious actors who are spoofing real user’s accounts, but they could also be employed to stymie Gather’s work on behalf of its user base.

This is where the Gather team’s experience at Shape Security comes into play. Shape’s core business was to help large financial institutions fight automated attacks against the banking industry’s consumer portals. Agarwal and his colleagues spent years understanding and perfecting defenses against sophisticated “attack vectors” in a sector where the stakes are high – people do not like to lose their money. For much of their time at Shape, one of their most vexing opponents was a company called Plaid – then a startup, but now a $15 billion industry leader that offers consumers a platform to retrieve, manage, and gain value from their personal financial data across a majority of banking institutions online. If you’ve ever used RobinHood, or moved money from your bank to Venmo, you’ve used Plaid. Like Gather, Plaid works as an agent on behalf of its individual customers. For years big banks fought against the idea of their own consumers taking control of their data, and Shape Security was one of their most potent weapons. While Shape was able to win at the tactical level – stopping Plaid from accessing data on behalf of clients like Capital One – Plaid managed to win the overall war, because its ardent users pressured their financial institutions (and regulators) to allow them access to their own data.

The Plaid example can’t but be front and center in the Gather team’s minds as they embark on the next phase of their journey. Agarwal, a former Network Warfare Officer with the United States Air Force, often uses military terminology when describing the state of consumer data rights. “In the military there’s a term called preparing the battlefield – months and months of preparatory work before you commit,” Agarwal says. “As soon as we finished up at our acquirer, we started brainstorming about how to protect more users in more profound ways.” The idea for Gather, he said, hit him “like a lightning bolt” and work on Gather began almost immediately afterward.

What’s In It For Brands?

Agarwal is committed to making Gather free for its users, a tactic that will certainly help the company garner its initial user base. Once a critical mass of consumers are on the platform, he envisions charging enterprises API fees when they reach out to consumers and request consumer data, with the consumer’s permission, of course. As Agarwal imagines it, brands might want to offer promotions much like the example he mentioned above – where Amazon Music offers six months free in exchange for a user’s Spotify data. Walmart might do the same in a bid to lure away Amazon customers. But the examples can get even more granular – McDonald’s might offer otherwise hard to reach consumers in a certain zip code free delivery via DoorDash, or a company like P&G might pilot a Pampers subscription service based on a user’s past purchase data. The possibilities are infinite – if Gather gets to scale.

Should he succeed, Agarwal and team are hoping to jumpstart an entirely new value equation for consumer-driven data, one that just might force all businesses to abandon today’s dominant model of hoarding data and steering consumers into a limited set of choices. “Today, our data is so siloed, but it’s so valuable,” Agarwal told me. “We can change the power balance from the platform back to the user.”

Long time readers know how I feel about Agarwal’s sentiment – I believe unleashing the consumer data economy could drive a huge increase in economic innovation and flourishing. I suspect this won’t be the last time I write about Gather, and certainly not the last time I write about the sea change it hopes to spark. In the meantime, Agarwal will be presenting his vision for Gather at P&G Signal this coming July 12th. You can find a registration link for the event here.

—

You can follow whatever I’m doing next by signing up for my site newsletter here. Thanks for reading.

5 Comments

Share this:

SIGN UP FOR THE NEWSLETTER

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: