John Battelle's Search Blog Our Data Governance Is Broken. Let’s Reinvent It.

This is an edited version of a series of talks I first gave in New York over the past week, outlining my work at Columbia. Many thanks to Reinvent, Pete Leyden, Cap Gemini, Columbia University, Cossette/Vision7, and the New York Times for hosting and helping me.

Prelude.

I have spent 30-plus years in the tech and media industries, mainly as a journalist, observer, and founder of companies that either make or support journalism and storytelling. When it comes to many of the things I am going to talk about here, I am not an expert. If I am expert at anything at all, it’s asking questions of technology, and of the media and marketing platforms created by technology. In that spirit I offer the questions I am currently pursuing, in the hope of sparking a dialog with this esteemed audience to further better answers.

Some context: Since 1986, I’ve spent my life chasing one story: The impact of technology on society. For whatever reason, I did this by founding or co-founding companies. Wired was kind of a first album, as it were, and it focused on the story broadly told. The Industry Standard focused on the business of the Internet, as did my conference Web 2. Federated Media was a tech and advertising platform for high quality “conversational” publishers, built with the idea that our social discourse was undergoing a fundamental shift, and that publishers and their audiences needed to be empowered to have a new kind of conversation. Sovrn, a company I still chair, has a similar mission, but with a serious data and tech focus. NewCo, my last company (well, I’ve got another one in the works, perhaps we can talk about that during Q&A) seeks to illuminate the impact of companies on society.

It’s Broke. Let’s Fix It.

And it is that impact that has led me to the work I am doing now, here in New York. I moved here just last Fall, seeking a change in the conversation. To be honest, the Valley was starting to feel a bit…cloistered.

A huge story – the very same story, just expanded – is once again rising. Only it’s just … more urgent. 25 years after the launch of Wired, the wildest dreams of its pages have come true. Back in 1992 we asked ourselves: What would happen to the world when technology becomes the most fundamental driver of our society? Today, we are living in the answer. Turns out, we don’t always like the result.

Most of my career has been spent evangelizing the power of technology to positively transform business, education, and politics. But five or so years ago, that job started to get harder. The externalities of technology’s grip on society were showing through the shiny optimism of the Wired era. Two years ago, in the aftermath of an election that I believe will prove to be the political equivalent of the Black Sox scandal, the world began to wake up to the same thing.

So it’s time to ask ourselves a simple question: What can we do to fix this?

Let’s start with some context. My current work is split between two projects: One has to do with data governance, the other political media. How might they be connected? I hope by the end of this talk, it’ll make sense.

So let’s go. In my work at Columbia, I’m currently obsessed with two things. First,

Data.

How much have you thought about that word in the past two years?

Given how much it’s been in the news lately, likely quite a lot. Big data, data breaches, data mining, data science…Today, we’re all about the data.

And second….

Governance.

When was the last time you thought about that word?

Government – well for sure, I’d wager that’s increased given who’s been running the country these past two years. But Governance? Maybe not as much.

But how often have you put the two words together?

Data Governance.

Likely not quite as much.

It’s time to fix that.

Why?

Because we have slouched our way into an architecture of data governance that is broken, that severely retards economic and cultural innovation, and that harms society as a whole.

Let’s unpack that and define our terms. We’ll start with Governance.

What is governance? It’s an …

Architecture of control

A regulatory framework that manages how a system works. The word is most often used in relation to political governance – which we care about a lot for the purposes of this talk – but the word applies to all systems, and in particular to corporations, which is also a key point in the research we’re doing.

Governance in corporate context is “the system of rules, practices and processes by which a firm is directed and controlled.”

But in my work, when I refer to governance, I am referring to the “the system of rules, practices and processes by which a firm controls its relationship to its community.” Who’s that community? You, me, developers and partners in the ecosystem, for the most part. More on that soon.

Now, what is data? I like to think of it as…

Unrefined Information.

I’m not in love with this phrase, but again, this is a first draft of what I hope will grow to more refined (ha) work. Data is the core commodity from which information is created, or processed. Data has many attributes, not all of which are agreed upon. But I think it’s inarguable that the difference between data and information is …

Human meaning.

That’s Socrates, who thought about this shit, a lot. Information is data that means something to us (and possibly the entire universe, as it relates to the second law of thermodynamics. But physics is not the focus of this talk, nor is a possible fourth law of thermodynamics….).

As we’ve learned – the hard way – over the past decade, there are a few very large companies which have purview over a massive catalog of meaningful data, meaningful not only to us, but to society at large. And it’s this societal aspect that, until recently, we’ve actively overlooked. We’re in the midst of a grand data renaissance, which if history remotely echoes, I fervently hope will give rise to …

A (Data) Enlightenment

That’s John Locke, an Enlightenment philosopher. Allow me to pull back for second and attempt to lay some context for the work I hope to advance in the next few years. It starts with the Enlightenment, a great leap forward in human history (and the subject of a robust defense by Steven Pinker last year).

Arguably the crowning document of the Enlightenment is…

The United States Constitution

This declaration of the rights of humankind (well mankind for the first couple of centuries) itself took more than three centuries to emerge (and cribbed generously from the French and English, channeling Locke and Hume). Our current political and economic culture is, of course, a direct descendant of this living document. American democracy was founded upon Enlightenment principles. And the cornerstone of Enlightenment ideas is …

The Scientific Method

That’s Aristotle, often credited with originating the scientific method, which is based on considered thesis formation, rigorous observation, comprehensive data collection, healthy skepticism, and sharing/transparency. The scientific method is our best tool, so far, for advancing human progress and problem solving.

And the scientific method – the pursuit of truth and progress – all that turns on the data. Prompting the question….

Who Has the Most (and Best) Data?

This is the question we are finally asking ourselves, the answer to which is sounding alarms. As we all know, we are in a renaissance, a deluge, an orgy of data creation. We have invented sophisticated new data sensing organs – digital technologies – that have delivered us superhuman powers for the discovery, classification, and sense-making of data.

Not surprisingly, it is technology companies, driven as they are by the raw economics of profit-seeking capital and armed with these self-fulfilling tools of digital exploration and capture – that have initially taken ownership of this emerging resource. And that is a problem, one we’ve only begun to understand and respond to as a society. Which leads to an important question:

Who Is Governing Data?

In the US, anyway, the truth is, we don’t have a clear answer to this question. Our light touch regulatory framework created a tech-driven frenzy of company building, but it failed to anticipate massive externalities, now that these companies have come to dominate our capital markets. Clearly, the Tech Platform Companies have the most valuable data – at least if the capital markets are to be believed. Companies like Google. Facebook. Amazon. Apple.

All of these companies have very strong governance structures in place for the data they control. These structures are set internally, and are not subject to much (if any) government regulation. And by extension, nearly all companies that manage data, no matter their size, have similar governance models because they are all drafting off those companies’ work (and success). This has created a phenomenon in our society, one I’ve recently come to call …

The Default Internet Constitution

Without really thinking critically about it, the technology and finance industries have delivered us a new Constitution, a fundamental governance document controlling how information flows through the Internet. It was never ratified by anyone, never debated publicly, never published with a flourish of the pen, and it’s damn hard to read. But, it is based on a discoverable corpus. That corpus, at its core, is based on …

Terms of Service and EULAs

Like it or not, there is a governance model for the US Internet and the data which flows across it: Terms of Service and End User Licensing Agreements. Of course, we actively ignore them – who on earth would ever read them? One researcher did the math, and figured it’d take 76 work days for the average American to read all of the policies she clicks past (and that was six years ago!).

Of course, ignoring begets ignorance, and we’ve ignored Terms of Service at our peril. No one understands them, but we certainly should – because if we’re going to make change, we’ll want to change these Terms of Service, dramatically. They create the architecture that determines how data, and therefore societal innovation and value, flow around the Internet.

And let’s be clear, these terms of service have hemmed data into silos. They’re built by lawyers, based on the desires of engineers who are – for the most part – far more interested in the product they are creating than any externalities those products might create.

And what are the lawyers concerned with? Well, they have one True North: Protect the core business model of their companies.

And what is that business model? Engagement. Attention. And for most, data-driven personalized advertising. (Don’t get me started about Apple being different. The company is utterly dependent on those apps animating that otherwise black slate of glass they call an iPhone).

So what insures engagement and attention? Information refined from data.

So let’s take a look at a rough map of what this Terms of Service-driven architecture looks like:

The Mainframe Architecture

Does this look familiar? If you’re a student of technology industry history, it should, because this is how mainframes worked in the early days of computing. Data compute, data storage, and data transport is handled by the big processor in the sky. The “dumb terminal” lives at the edge of the system, a ‘thin client’ for data input and application output. Intelligence, control, and value exchange lives in the center. The center determines all that occurs at the edge.

Remind you of any apps you’ve used lately?

But it wasn’t always this way. The Internet used to look like this:

The Internet 1.0 Architecture

I’m one of the early true believers in the open Internet. Do you remember that world? It’s mostly gone now, but there was a time, from about 1994 to 2012, when the Internet ran on a different architecture, one based on the idea that the intelligence should reside in the nodes – the site – not at the center. Data was shared laterally between sites. Of course, back then the tech was not that great, and there was a lot of work to be done. But we all knew we’d get there….

…Till the platforms got there first. And they got there very, very well – their stuff was both elegant and addictive.

But could we learn from Internet 1.0, and imagine a scenario inspired by its core lessons? Technologically, the answer is “of course.” This is why so many folks are excited by blockchain, after all (well that, and ICO ponzi schemes…).

But it might be too late, because we’ve already ceded massive value to a broken model. The top five technology firms dominate our capital markets. We’re seriously (over)invested in the current architecture of data control. Changing it would be a massive disruption. But what if we can imagine how such change might occur?

This is the question of my work.

So…what is my work?

A New Architecture

If we’re stuck in an architecture that limits the potential of data in our society, we must envision a world under a different kind of architecture, one that pushes control, agency, and value exchange back out to the node.

Those of us old enough to remember the heady days of Web 1.0 foolishly assumed such a world would emerge unimpeded. But as Tim Wu has pointed out, media and technology run in cycles, ultimately consolidating into a handful of companies with their hands on the Master Switch – we live in a system that rewards the Curse of Bigness. If we are going to change that system, we have to think hard about what we want in its place.

I’ve given this some thought, and I know what I want.

Let The Data Flow

Imagine a scenario where you can securely share your Amazon purchase data with Walmart, and receive significant economic value for doing so (I’ve written this idea up at length here). Of course, this idea is entirely impossible today. This represents a major economic innovation blocked.

Or imagine a free marketplace for data that allows a would-be restaurant owner to model her customer base’s preferences and unique taste? (I’ve written this idea up at length here). Of course, this is also impossible today, representing a major cultural and small business innovation is impeded.

Neither of these kinds of ideas are even remotely possible – nor are the products of thousands of similar questions entrepreneurs might ask of the data rotting in plain sight across our poorly architected data economy.

We all lose when the data can’t flow. We lose collectively, and we lose individually.

But imagine if it was possible?!

How might such scenarios become reality?

We’re at a key inflection point in answering that question.

2019 is the year of data regulation. I don’t believe any meaningful regulation will pass here in the US, but it’ll be the year everyone talks about it. It started with the CA/Facebook hearings, and now every self-respecting committee chair wants a tech CEO in their hot seat. Congress and the American people have woken up to the problem, and any number of regulatory fixes are being debated. Beyond the privacy shitstorm and its associated regulatory response, which I’d love to toss around during Q&A, the most discussed regulatory relief is anti-trust – the curse of bigness is best fixed by breaking up the big guys. I understand the goal, and might even support it, but I don’t think we need to even do that. Instead, I submit for your consideration one improbable, crazy, and possibly elegant solution.

The Token Act

I’m calling it the Token Act.

It requires one thing: Every data processing service at a certain scale must deliver back to its customers any co-created data in machine readable format, easily portable to any other data processing service.

Imagine the economic value unlocked, the exponential impact on innovation such a simple rule would have. Of course we must acknowledge the negative short term impact such a policy would have on the big guys. But it also creates an unparalleled opportunity for them – the token of course can include a vig – a percentage of all future revenue associated with that data, for the value the platform helped to create. This model could drive a far bigger business in the long run, and a far healthier one for all parties concerned.

I can’t prove it yet, but I sense this approach could 10 to 100X our economy. We’ve got some work to do on proving that, but I think we can.

Imagine what would occur if the data was allowed to flow freely. Imagine the upleveling of how firms would have to compete. They’d have to move beyond mere data hoarding, beyond the tending of miniature walled gardens (most app makers) and massive walled agribusinesses (in the case of the platforms – and ADM and Monsanto, but that’s another chapter in the book, one of many).

Instead, firms would have to compete on creating more valuable tokens – more valuable units of human meaning. And they’d encourage sharing those tokens widely – with the fundamental check of user agency and control governing the entire system.

The bit has flipped, and the intelligence would once again be driven to the nodes.

To us!

But the Token Act is just an exercise in envisioning a society governed by a different kind of data architecture. There are certainly better or more refined ideas.

And to get to them, we really need to understand how we’re governed today. And now that I’ve gotten nearly to the end of my prepared remarks, I’ll tell you what I’m working on at Columbia with several super smart grad students:

Mapping Data Flows

If we are going to understand how to change our broken architecture of data flows, we need to deeply understand where we are today. And that means visualizing a complex mess. I’m working with a small team of researchers at Columbia, and together we are turning the Terms of Service at Amazon, Apple, Facebook and Google into a database that will drive an interactive visualization – a blueprint of sorts for how data is governed across the US internet. We’re focusing on the advertising market, for obvious reasons, but it’s my hope we might create a model that can be applied to nearly any information rich market. It’s early stages, but our goal is to have something published by the end of May.

Finally, Advertising

I’ve not spoken much about advertising during this talk, and that was purposeful. I’ve written at length about how we came to the place we now inhabit, and the role of programmatic advertising in getting us there.

Truth is, I don’t see advertising as the cause of this problem, but rather an outgrowth of it. If you offer any company a deal that puts new customers on a platter, as Google did with AdWords, or Facebook has with NewsFeed, well, there’s no way those companies will refuse. Every major advertiser has embraced search and social, as have millions of smaller ones.

Our problem is simply this: The people who run technology platforms don’t actually understand the power and limitations of their systems, and let’s be honest, nor do we. Renee Di Resta has pointed this out in recent work around Russian interference in our national dialog and elections: Any system that allows for automated processing of messages is subject to directed, sophisticated abuse. The place for regulation is not in advertising (even though that’s where it’s begun with the Honest Ads Act), it’s in how the system works architecturally.

But advertisers must be highly aware of this transitional phase in the architecture of a system that has been a major source of revenue and business results. We must imagine what comes next, we must prepare for it, and perhaps, just perhaps, we should invent it, or at the very least play a far more active role than we’re playing currently.

I believe that if together – industry, government, media and consumers collectively – if we unite to address the core architectural issues inherent to how we manage data, in the process giving consumers economic, creative, and personal agency over the data they co create with platforms, the question of toxic advertising will disappear faster than it arose.

But I’ve talked (or written) long enough. Thank you so much for coming (for reading), and for being part of this conversation. Now, let’s start it.

10 thoughts on “Our Data Governance Is Broken. Let’s Reinvent It.”

Anne McCrossan says:

January 25, 2019 at 10:26 am

Great post and the idea of a Token Act is intriguing, John and I am a big supporter of generating mutual benefit from data flows.

All we need now is for data monopolists to recognise a) that they are and b) it’s not a long-term sustainable business model…but I suspect we may be a while waiting.

Pingback: Newco Shift | The Internet Must Change. To Get There, Start With the Data.
Pingback: John Battelle's Search Blog Lead, Business Affairs
Pingback: John Battelle's Search Blog We Dream of Genies – But Who Will They Work For?
Pingback: John Battelle's Search Blog Can Gather Change the Course of Internet History?
Pingback: John Battelle's Search Blog On AI: What Should We Regulate?
Pingback: John Battelle's Search Blog Mine, Mine, All Mine
Pingback: John Battelle's Search Blog We Dream of Genies, But Will Big Tech Let Us Use Them?
Pingback: John Battelle's Search Blog The Web We Want Vs. The Web We Have
Bianca Cavalcanti says:

June 11, 2026 at 4:40 pm

Building on the data governance point, one overlooked aspect is how much unnecessary context we feed AI agents. That is wasted tokens and money. I see teams ignore this as a governance issue. We built something similar to help enforce prompt hygiene.

Share this:

10 thoughts on “Our Data Governance Is Broken. Let’s Reinvent It.”

Leave a Reply Cancel reply