John Battelle's Search Blog Web 2 Map: The Data Layer – Visualizing the Big Players in the Internet Economy

As I wrote last month, I’m working with a team of folks to redesign the Web 2 Points of Control map along the lines of this year’s theme: “The Data Frame.” In the past few weeks I’ve been talking to scores of interesting people, including CEOs of data-driven start ups (TrialPay and Corda, for example), academics in the public dataspace, policy folks, and VCs. Along the way I’ve solidified my thinking about how best to visualize the “data layer” we’ll be adding to the map, and I wanted to bounce it off all of you. So here, in my best narrative voice, is what I’m thinking.

First, of course, some data.

On the left hand side are eight major players in the Internet Economy, along with two categories of players who are critical, but who I’ve lumped together – payment players such as Visa, Amex, and Mastercard, and carriers or ISP players such as Comcast, AT&T, and Verizon.

I’ve given each company my own “finger in the air” score for seven major data categories, which are shown across the top (I don’t claim these are correct, rather, clay on the wheel for an ongoing dialog). The first six scores are in essence percentages, answering the question “What percentage of this company’s holdings are in this type of data.” The seventh, which I’ve called Wildcard data, is a 1-10 ranking of the potency of that company’s “wildcard” data that it’s not currently leveraging, but might in the future. I’ll get to more detail on each data category later.

Toward the far right, I’ve noted each company’s overall global uniques (from Doubleclick, for now, save the carriers and payment guys – I’ve proxied their size with the reach of Google). There is also an “engagement” score (again, more on that soon). The final score is a very rough tabulation computing engagement over uniques against the sum of the data scores. There are pivots to be built from this data around each of the scores for various types of data, but I’ll leave that for later. This is meant to be a relatively simple introduction to my rough thinking about the data layer. Hopefully, it’ll spark some input from you.

Now, before you rip it apart, which I fully invite (especially those of you who are data quants, because I am clearly not, and I am likely mixing some apples and watermelons here), allow me to continue to narrate what I’m trying to visualize here.

As you know, the map is a metaphor, showing key territories as “points of control.” The companies I’ve highlighted in the chart all have “home territories” where they dominate a sector – Google in search, Facebook in social, Amazon and eBay in commerce, etc. What I plan to do is create a layer based on the data in the chart that, when activated, shows those companies’ relative size and strength.

But how?

v.earlyviz.png

Well, the best idea we’ve come up with so far is to show each as a small city of sorts, where the relative height of the buildings is determined by a corresponding data point. So Twitter, for example, will have a tall building in the middle of its city, representing “Interest data.” Google’s tallest building will be search. Facebook’s, social, and so on. And of course the cities can’t be all on the same scale, hence our use of total global uniques, and total engagement. Yahoo may be nearly as big as Facebook, but it doesn’t have nearly the engagement per user. So its city will be smaller, relatively, than Facebook’s.

What is interesting about this approach is that each company’s “cityscape” emerges as distinct. Microsoft’s is wide but not tall – they have a lot of data in a number of areas. It will probably end up looking like a suburban office park – funnily enough, that’s what Microsoft really looks like, for the most part. Amazon and eBay will have high towers of payment data, with a smattering of shorter buildings. And so on. I don’t have a good visualization of this yet, but the designers at Blend, who I’m working with, have sketched out a very rough early version just so you can get the idea. The structures will be more whimsical, and of course be keyed with color. But I think you get the idea.

I’m even thinking of adding other features, like “openness” – ie can you access, gain copies of, share, and mash up the data controlled by each company? If so, the city won’t be walled. Apple, on the other hand, may well end up a walled city, with a moat, on top of a hill.

Now, a bit more detail on the data categories. You all gave me a lot of really good input on my earlier post, where I posited these original categories. But I’ve kept them the same, save the addition of the wildcard data. Why? Because I think each can be interpreted as larger buckets containing a lot of other data. I’ll go through each briefly in turn:

Purchase Data: This is information about who buys what, in essence. But it’s also who *almost* buys what (abandoned carts), *when* they buy, in what context, and so on.

Search Data: The original database of intentions – query data, path from query data, “intent” data, and tons more search signals.

Social Data: Social graph, but also identity data. Not to mention how people interact inside their graphs, etc.

Interest Data: This is data that describes what is generally called “the interest graph” – declarations of what people are interested in. It’s related to content, but it’s not just content consumption. It includes active production of interest datapoints – like tweets, status updates, checkins, etc.

Location Data: This is data about where people are, to be sure, but also data about how often we are there, and other correlated data – ie what apps we use in location context, who else is there and when, etc.

Content Data: Content is still a king in our world, and knowing patterns of content consumption is a powerful signal. This is data about who reads/watches/consumes what, when, and in what patterns.

Wildcard Data: This is data that is uncategorized, but could have huge implications. For example, Microsoft knows how people interact with their applications and OS. Microsoft and Google have a ton of language data (phonemes, etc.). Carriers see just about everything that passes across their servers, though their ability to use it might be regulated. Google, Yahoo and Microsoft have tons of email interaction data. And so on….

Now, of course all these data categories get more powerful as they are leveraged one against the other, and of course, I’ve left tons of really big data players off the map entirely (Tons of small startups like Tynt, Quora, or Sharethis have massive amounts of data, as do very large companies like Nielsen, Quantcast, etc.). But you have to make choices to make something like this work.

So, that’s where we are with the Web 2 Summit map data layer. Naturally, once the data layer is live, it will be driven by a database, so we can tweak the size and scope of the cities and buildings based on the collective intelligence of the map users’ feedback. What do you think? What’s your input? We’ll be building this over the next two months, and I’d love your feedback before we get too far down the line. Thanks!

13 thoughts on “Web 2 Map: The Data Layer – Visualizing the Big Players in the Internet Economy”

What’s value of data if not queried against existing industries where billions are spent: namely TV advertising and TV media rights.

Data owners you mention hardly touch sports and entertainment, which control 70%+ of all dollars spent in the media rights and TV advertising industries. Which are of course the holy grail of all these companies.

So they’re all going after the same customer when the major elephant is one that needs to be brought down.

Why does it still exist? Because only social networks that have been mapped are school and work networks. Once sports and entertainment networks (ecosystems) are mapped–so the relationship a “user” has to various things in those graphs–those elephants can be brought down.

Love your stuff otherwise. But all data is not equal if it’s going after the same customer, the same cluster of networks.

World needs more networks to be mapped. Networks are the highways that drive lower customer acquisition, which eventually reinvents industries.

The signal in sports can’t shift from TV/print/radio till the sports graph has been mapped.

As you have often pointed out “intent” data is key so I’m not surprised to find it twice among your seven categories; once implied (“Purchase”) and the second time explicit (“Search”).

To reach its full potential “intent” data needs to expand from the present one-dimensional approach that equates search keywords as ‘what’ buyers are interested in purchasing. Missing are two added components beyond simply ‘what’: (1) which brands are in play (Consideration Set) and (2) when, the time to purchase. (In more technical terms: intent is not a one-component scaler but a three-component vector.)

Considering Purchasing Intent as a triplet (what, which, when) also applies to your ‘Purchasing’ category, where you list candidates for inclusion in structuring “Purchase” data. In fact, the Purchasing Intent triplet (what, which, where) may be a viable instance of one such structure. After all, those three elements form the heart of every sales transaction, whether off or on line.

Nathan Schor nathans@netmeals.net

Very interesting analisys. I look forward to the complete work.

In this context, it seems worthwhile to have a scale of how exposable the organization’s data is, or alternatively how latent or dark their data is (dark as in dark fiber).

Facebook would be seem to have the most explicit, exposable data, while carriers and payment processors have huge, significant barriers to mining the data in a strategic way.

I’d like a column for “Platform” an indicator about “People are building stuff here.” It would be quite high for amazon (on their cloud), google (duh), facebook (applications, integration), Apple (iOS) but low for almost everyone else.

I like the small cities approach for measuring things. For using so many different factors and variables I think you are off to a good start.

Jeffrey’s column of Platform might be a good one. The sites that are constantly developing new software, apps, or features should get a bit of a bump in score.

How does Apple score higher than Google in terms of location?

– Google localizes search results for some generic search queries

– 20% of searches are local in nature (and that % will go up over time as more searches are from mobile devices)

– Google is the default search provider on iPhone & Android

Is there a Google Spreadsheet or similar that enables one access the data in the table you’ve embedded in this post?

John – Any reason why the content providers to professional markets(finance, legal etc.) are not included in this study? Or perhaps I missed them in the infographic?

May be these professional content providers might not fit within the purview of web 2.0 points of control , but I feel these players are ripe for disruption and some of them are observing how to gain benefits of the open data model without disrupting their business models. StockTwits is one example of financial info provider using web 2.0 style info sharing, which is quite contrary to the norm in that industry.

Thoughts?

Where are Skype and other realtime social media in this? Is it a casual ommission or did you have some thoughts on why it wasn’t interesting in this context?

Let me point you to my response to last year’s map on Skype Journal: http://www.slideshare.net/evanwolf/skypelandia-the-lost-continent-of-realtime-communications and http://skypejournal.com/blog/2010/11/15/dear-john-and-tim-i-found-skypelandia/. I’ll be updating it this weekend.

The map is epic. Great work!

The map is not epic – it suffers from the same problems that 2.5D bar charts do – you cannot accurately compare values. Clustering them without a good baseline compounds the difficulties that users will have. Yeah, it looks cool, but does a poor job of conveying the message. Just use a bar chart. thanks, JP

Share this:

13 thoughts on “Web 2 Map: The Data Layer – Visualizing the Big Players in the Internet Economy”

Leave a Reply to Tim F Cancel reply