John Battelle's Search Blog The Tragedy of the Data Commons

Before, and after?

A theme of my writing over the past ten or so years has been the role of data in society. I tend to frame that role anthropologically: How have we adapted to this new element in our society? What tools and social structures have we created in response to its emergence as a currency in our world? How have power structures shifted as a result?

Increasingly, I’ve been worrying a hypothesis: Like a city built over generations without central planning or consideration for much more than fundamental capitalistic values, we’ve architected an ecosystem around data that is not only dysfunctional, it’s possibly antithetical to the core values of democratic society. Houston, it seems, we really do have a problem.

I know, it’s been a while since I’ve written here, and most of my recent stuff has focused on Facebook. I’ve been on the road the entire summer, and preparing to move from the Bay area to NYC ( that’s another post). But before you roll your eyes in anticipation of yet another Facebook rant, no, this post is not about Facebook, despite that company’s continued inability to govern itself.

No, this post is about the business of health insurance.

Last week ProPublica published a story titled Health Insurers Are Vacuuming Up Details About You — And It Could Raise Your Rates. It’s the second in an ongoing series the investigative unit is doing on the role of data in healthcare. I’ve been watching this story develop for years, and ProPublica’s piece does a nice job of framing the issue. It envisions “a future in which everything you do — the things you buy, the food you eat, the time you spend watching TV — may help determine how much you pay for health insurance.” Unsurprisingly, the health industry has developed an insatiable appetite for personal data about the individuals it covers. Over the past decade or so, all of our quotidian activities (and far more) have been turned into data, and that data can and is being sold to the insurance industry:

“The companies are tracking your race, education level, TV habits, marital status, net worth. They’re collecting what you post on social media, whether you’re behind on your bills, what you order online. Then they feed this information into complicated computer algorithms that spit out predictions about how much your health care could cost them.”

HIPPA, the regulatory framework governing health information in the United States, only covers and protects medical data – not search histories, streaming usage, or grocery loyalty data. But if you think your search, video, and food choices aren’t related to health, well, let’s just say your insurance company begs to differ.

Lest we dive into a rabbit hole about the corrosive combination of healthcare profit margins with personal data (ProPublica’s story does a fine job of that anyway), I want to pull back and think about what’s really going on here.

The Tragedy of the Commons

One of the most fundamental tensions in an open society is the potential misuse of resources held “in common” – resources to which all individuals have access. Garrett Hardin’s 1968 essay on the subject, “The Tragedy of the Commons,” explores this tension, concluding that the problem of human overpopulation has no technical solution. (A technical solution is one that does not require a shift in human values or morality (IE, a political solution), but rather can be fixed by application of science and/or engineering.) Hardin’s essay has become one of the most cited works in social science – the tragedy of the commons is a facile concept that applies to countless problems across society.

In the essay, Hardin employs a simple example of a common grazing pasture, open to all who own livestock. The pasture, of course, can only support a finite number of cattle. But as Hardin argues, cattle owners are financially motivated to graze as many cattle as they possibly can, driving the number of grass munchers beyond the land’s capacity, ultimately destroying the commons. “Freedom in a commons brings ruin to all,” he concludes, delivering an intellectual middle finger to Smith’s “invisible hand” in the process.

So what does this have to do with healthcare, data, and the insurance industry? Well, consider how the insurance industry prices its policies. Insurance has always been a data-driven business – it’s driven by actuarial risk assessment, a statistical method that predicts the probability of a certain event happening. Creating and refining these risk assessments lies at the heart of the insurance industry, and until recently, the amount of data informing actuarial models has been staggeringly slight. Age, location, and tobacco use are pretty much how policies are priced under Obamacare, for example. Given this paucity, one might argue that it’s utterly a *good* thing that the insurance industry is beefing up its databases. Right?

Perhaps not. When a population is aggregated on high-level data points like age and location, we’re essentially being judged on a simple shared commons – all 18 year olds who live in Los Angeles are being treated essentially the same, regardless if one person has a lurking gene for cancer and another will live without health complications for decades. In essence, we’re sharing the load of public health in common – evening out the societal costs in the process.

But once the system can discriminate on a multitude of data points, the commons collapses, devolving into a system rewarding whoever has the most profitable profile. That 18-year old with flawless genes, the right zip code, an enviable inheritance, and all the right social media habits will pay next to nothing for health insurance. But the 18 year old with a mutated BRCA1 gene, a poor zip code, and a proclivity to sit around eating Pringles while playing Fortnite? That teenager is not going to be able to afford health insurance.

Put another way, adding personalized data to the insurance commons destroys the fabric of that commons. Healthcare has been resistant to this force until recently, but we’re already seeing the same forces at work in other aspects of our previously shared public goods.

A public good, to review, is defined as “a commodity or service that is provided without profit to all members of a society, either by the government or a private individual or organization.” A good example is public transportation. The rise of data-driven services like Uber and Lyft have been a boon for anyone who can afford these services, but the unforeseen externalities are disastrous for the public good. Ridership, and therefore revenue, falls for public transportation systems, which fall into a spiral of neglect and decay. Our public streets become clogged with circling rideshare drivers, roadway maintenance costs skyrocket, and – perhaps most perniciously – we become a society of individuals who forget how to interact with each other in public spaces like buses, subways, and trolley cars.

Once you start to think about public goods in this way, you start to see the data-driven erosion of the public good everywhere. Our public square, where we debate political and social issues, has become 2.2 billion data-driven Truman Shows, to paraphrase social media critic Roger McNamee. Retail outlets, where we once interacted with our fellow citizens, are now inhabited by armies of Taskrabbits and Instacarters. Public education is hollowed out by data-driven personalized learning startups like Alt School, Khan Academy, or, let’s face it, YouTube how to videos.

We’re facing a crisis of the commons – of the public spaces we once held as fundamental to the functioning of our democratic society. And we have data-driven capitalism to blame for it.

Now, before you conclude that Battelle has become a neo-luddite, know that I remain a massive fan of data-driven business. However, if we fail to re-architect the core framework of how data flows through society – if we continue to favor the rights of corporations to determine how value flows to individuals absent the balancing weight of the public commons – we’re heading down a path of social ruin. ProPublica’s warning on health insurance is proof that the problem is not limited to Facebook alone. It is a problem across our entire society. It’s time we woke up to it.

So what do we do about it? That’ll be the focus of a lot of my writing going forward. As Hardin writes presciently in his original article, “It is when the hidden decisions are made explicit that the arguments begin. The problem for the years ahead is to work out an acceptable theory of weighting.” In the case of data-driven decisioning, we can no longer outsource that work to private corporations with lofty sounding mission statements, whether they be in healthcare, insurance, social media, ride sharing, or e-commerce.

6 thoughts on “The Tragedy of the Data Commons”

Pingback: How Your Personal Health Data Will Be Used Against You - Evonomics
andrew j campbell says:

July 29, 2018 at 5:19 pm

https://roarmag.org/essays/max-haiven-crises-of-imagination/

“In Crises of Imagination, Crises of Power (Zed Books, forthcoming), I argue that this sense of futility is the residual effect of the way capitalism ‘encloses’ not only our time, our communities and our environment; but also our imaginations. Enclosure here is a metaphor borrowed from the process by which medieval peasants were dispossessed of their common lands and forced to rely on wage labor for survival. Throughout the book, I argue in various ways that this enclosure of the imagination is something that occurs not simply at the level of the individual mind, but at the level of social and material relationships.”

Zachary Orlov says:

July 29, 2018 at 5:51 pm

Genetic Information Nondiscrimination Act, or GINA. Not good enough? Perhaps all our personal data points; purchases, posts, fitbits sleep diary, etc.. will develop a data phenotype that is far more predictive for the majority measurable of costs to our health care system. The issue then becomes that we need a new health care finance system.

kirkfish says:

August 7, 2018 at 12:38 pm

Although your details are off (guaranteed issue & community rating remain intact, yet are under heavy attack – https://goo.gl/njmaLz)..this is certainly an interesting POV given your seat on the Acxiom board. Would have liked to see you develop the point that health insurers fostering a tragedy of the commons by employing consumer data to manipulate self-selection segmentation at scale in order favorably balance out their IFP risk pools.

Pingback: Whose Data, Which Commons, What Tragedy? – Newco Shift
Pingback: 21 Great SEO Blogs for Business Owners | LoTops

Share this:

6 thoughts on “The Tragedy of the Data Commons”

Leave a Reply Cancel reply