Building A New Map And I Need Your Help: What Are The Key Categories of Data In Today’s Network Economy?

Many of you probably remember the "Points of Control" Web 2 Summit Map from last year, it was very well received. Hundreds of thousands of folks came to check it out, and the average engagement time was north of six minutes per visitor. It was a really fun way…

Map 2010.png

Many of you probably remember the “Points of Control” Web 2 Summit Map from last year, it was very well received. Hundreds of thousands of folks came to check it out, and the average engagement time was north of six minutes per visitor. It was a really fun way to make the conference theme come to life, and given the work that went into its creation, we thought it’d be a shame to retire it simply because Web 2 has moved on to a new theme.

As I posted last week, this year’s theme is “The Data Frame.” From my updated verbiage describing the theme:

For 2011, our theme is “The Data Frame” – focusing on the impact of data in today’s networked economy. We live in a world clothed in data, and as we interact with it, we create more – data is not only the web’s core resource, it is at once both renewable and boundless.


Consumers now create and consume extraordinary amounts of data. Hundreds of millions of mobile phones weave infinite tapestries of data, in real time. Each purchase, search, status update, and check-in layers our world with more of it. How our industries respond to this opportunity will define not only success and failure in the networked economy, but also the future texture of our culture. And as we’re already seeing, these interactions raise complicated questions of consumer privacy, corporate trust, and our governments’ approach to balancing the two.

How, I wondered, might we update the Points of Control map such that it can express this theme? Well, first of all, it’s clear the game is still afoot between the major players. Some boundaries may have moved, and progress has been made (Bing has gained search share, Facebook and Google have moved into social commerce, etc.), but the map in essence is intact as a thought piece.

Then it struck me – each of the major players, and most of the upstarts, have as a core asset in their arsenals *data*, often many types of it. In addition, most of them covet data that they’ve either not got access to, or are in the process of building out (think Google in social, for example, or in deals, which to my mind is a major play for local as well as purchase data.) Why not apply the “Data Frame” to the map itself, a lens of sorts that when overlaid upon the topography, shows the data assets and aspirations of each player?

So here’s where you come in. If we’re going to add a layer of data to each player on the map, the question becomes – what *kind* of data? And how should we visualize it? My initial thoughts on types of data hew somewhat to my post on the Database of Intentions, so that would include:

– Purchase Data (including credit card info)

– Search Data (query, path taken, history)

– Social Graph Data (identity, friend data)

– Interest Data (Likes, tweets, recommendations, links)

– Location Data (ambient as well as declared/checked in)

– Content Data (Journey through content, likes, engagement, “behavioral”)

Those are some of the big buckets. Clearly, we can debate if, for example, identity should be its own category, separate from social, etc, and that’s exactly the kind of argument I hope to spark. I’m sure I’ve missed huge swaths of landscape, but I’m writing this in a rush (have a meeting in five minutes!) and wanted to get the engine started, so to speak.

I’m gathering a small group of industry folks at my home in the next week to further this debate, but I most certainly want to invite my closest collaborators – readers here at Searchblog, to help us out as we build the next version of the map. Which, by the way, will be open sourced and ready for hacking….

So please dive into comments and tell me, what are the key categories of data that companies are looking to control?

27 thoughts on “Building A New Map And I Need Your Help: What Are The Key Categories of Data In Today’s Network Economy?”

  1. John the only deficiency I see in your categories is your parenthetical gloss on Content: (Journey through content, likes, engagement, “behavioral”) — this is all amplification stuff and so it’s rather redundant of your “Interest Data” category.

    To me, the Content Data category has to be “Aboutness”: getting at how we can carve content into pieces and see its various aspects and dimensions, and know how (and why and when) those pieces are relevant to other pieces of content (or aspects or dimensions thereof).

    People who own this, own a lot.

  2. I would offer the following ideas.

    Some data sets that come to mind that are building value as more apps and users interact with them. Some may fit into your categories above.

    International Development data
    Financial Transaction data
    Game Layer data

    Also, I would want to consider the data streams of the “network of things”. For example: home appliances and electric vehicles talking to utilities and negotiating electrical rates, home-based medical testing devices, implanted medical devices, apparel with bio-sensors talking with insurance companies, doctors and expert systems, and vehicle to vehicle to transportation network communications.

    Best,

    Paul Cline

  3. Hi John,
    I definitely agree with the suggestion above regarding the internet of things. The way we’re now using sensors within cities is quite fascinating. Streetline is setting up mesh networks within cities and capturing parking/traffic data using sensors…but this is only the beginning 😉 Here’s an interview I did with Mashable a few weeks ago: http://mashable.com/2011/04/13/smart-parking-tech/

    Definitely something I’d love to see addressed on the map and at Web 2.

    Best,
    Kelly Schwager, Streetline

  4. John,

    In addition to consumption, transaction and engagment, what about contribution? You touch on it with interest (likes, recommendations) and location (checked in) but do you need to include some measure of the ways in which users contribute to dialog by generating content such as reviews, uploading video, etc.?

    This type of data can contribute to finding out who the influencers are, wwhich i think can be pretty important in the context you are describing. I hope I am not misinterpreting your intent or splitting too fine a hair here though.

    -Dan Lubart

  5. To extend Kelly’s thoughts, I would add sites that actually collect sensor data of all kinds; data logger sites like:
    * http://Pachube.com/
    * http://http://open.sen.se/
    * http://thingspeak.com/

    Great story about crowd sourcing radiation levels in Japan through DIY Geiger counters and data collected on Pachube.

    See also:
    Data Source Handbook – A Guide to Public Data
    http://oreilly.com/catalog/0636920018254

    Weatherbug is doing some crowd sourcing for what they’re calling hyper-local weather.

  6. Kelly, and Chris, I agree, but I’m trying to find high-level classes of data, and I’m not sure if “sensor” data is not TOO high level, but then “radiation levels” (for example) is far too granular….hmmm.

  7. John,

    The data frame is a great focus for the conference. I offer three data type comments/additions:

    Personally Identifiable Data – expanding definition is key in the global privacy debates and rash of new law suits

    Sensor Data – exploding with internet of things

    Meta Data – the data about each data type (including permissions, source,

  8. John, I’m glad to see this increasing attention to the import of data and metadata. Your emerging categories are sound. I suggest also including representations of how “open” and “structured” are these data sets. Open Linked Data http://linkeddata.org/ coupled with harmonized protocols for accessing and rendering them offers monumentally scalable opportunities for innovation and understanding.

    The “Points of Control” corollary map for the Linked Data frame has been the Linking Open Data cloud diagram http://richard.cyganiak.de/2007/10/lod/It presents datasets that have been published in Linked Data format, by contributors to the Linking Open Data community project and other individuals and organizations. It is based on metadata collected and curated by contributors to the CKAN directory.

    With Facebook experimenting with partial implementations of RDFa in their Open Graph protocol and with Twitter still considering something similar with Twitter Annotations, large-sale social graph data clusters are moving towards the threshold of useful Linked Data exploitations.

    I can imagine map variations of Points of Control representations that position companies and their data sets by degrees of openness (to all or dynamic partners) and how closely they hew to structured data standards.

  9. I think “Interest Data” is the richest from your list. A better label for it might be “intention”, or keyword #VRM (the idea, even if people have no idea what it means yet, it’s an important flipside to marketing).

    I think “interest” is pre-search and pre-purchase, whereas “intention” could be derived (as a facet) from search, purchase, social, location.

    Anyhow, got me thinking for sure. I am esp. interested on how SMBs understand/act on the map, that’s what we are focused on at Needium.

  10. As an extension of Purchasing Data I would suggest Purchasing Intent As an extension of Purchasing Data I would suggest Purchasing Intent information (which obviously intersects nicely with your original Database of Intentions), but maintained in a structured format. For example, at minimal every buyer knows at least three data elements when they are in acquisition mode:

    (1)What item they intend to purchase
    (2) Which brands are candidates for acquiring (‘consideration set’)
    (3) When they plan to make the purchase.

    Assuming there existed an mechanism (with appropriate privacy controls) where buyers volunteered this triplet in exchange for compelling rewards, any permutation of the above three items is clearly valuable to sellers.
    Starting with the first the data items value accumulates sequentially: The first two items are more valuable than the first item alone and, of course, all three items are most valuable.

    In fact, the above scenario could readily constitute a viable form of Advertising-by-Invitation. Once sellers have access to such buyer provided information then they could deliver relevant marketing messages (versus intrusive and interruptive advertisement) resulting in a markedly more efficient sales transaction.

    Note that the above scenario is a step beyond Google’s Database of Intentions since buyer volunteered purchasing data is inherently more accurate (and correspondingly more valuable) than Google algorithmic approach.

    Finally, considering that buyer’s Purchasing Intent is organic, constantly re-generated, and thus effectively limitless, it would feed an immense data reservoir, thereby deserving a significant chunk of mass on your Data Map. In fact, buyer’s Purchasing Intent may be the most valuable piece of untapped wealth in the history of commerce.
    Nathan Schor nathans@netmeals.net

  11. I don’t think it is enough to just capture information as FB does about “Like” a track or an artist. You need more dimensions/vectors on that data.

    Some dimensions you need:
    – Frequency
    How many times have I listened to this artist/album/track? How many times in the last week? in the last month? in the last year? all time?

    – Passion (strength of sentiment)
    How much do I love/hate this track (Last.fm loves/bans, Pandora’s up/down thumbs)

    – location/time/method of access
    Did I play this at home at 9pm on my Sonos box? On vacation using my iphone? At work at 10am on my laptop via Spotify?

    Another important question is the way it was collected. Was it automatically collected (like scrobbles on Last.fm or plays on iTunes or was collected when a user explicitly “published” to a social network (tweeted it or published a link to it on Tumblr). Both have different values and meanings.

    Finally, tying music back to live vs recorded experiences have value. So tying live event check-ins for an artist or band you love adds even more value to your data around that artist.

  12. Whoops. I left the first sentence off that post. I meant to contextualize the vectors or dimensions of data that you need to capture on the Interest Graph to make it useful. And since I know the music space best, I used music interest data as the example.

  13. Hi,

    Since I have nothing more to share as they had been shared by others at the TOP, I think I’ll just watch out for its availability. It’s such a nice idea to collect categories from suggestions of friends here.

  14. John,

    Great post.. You’re right, adding a data layer to the “Points of Control” map seems like a natural next step.

    We’ve been looking at data for quite some time here at Infochimps, and the groups you’ve laid out are almost identical to the way we see the world.

    Identity & Social Graph, however, doesn’t seem like its own continent. Rather, it seems more appropriate to consider it a subterranean level of the world underneath all of the data continents.

    For example, location data is a continent populated by players like Foursquare, Yelp, Facebook, Twitter, and etc. So, at the surface, aggregated data can provide excellent insights without compromising privacy. But, if you dig deeper — mine it, if you will — then it’s possible to get the social data. But, of course, digging deeper is messy and carries its own set of dangers.

    I feel like all of your data continents that you’ve proposed are great, but they all seemed to be tinged with the specter of individual data. Let’s not forget that the “Database of Intentions” still works quite well even if we don’t dig down into personally identifiable keys.

    In any case, for your map metaphor here, I suppose we at Infochimps are building bridges between continents and also running a commercial shipping fleet that transports data all around the world. There’s a lot of data to move around, and we’re having a great time building the infrastructure to support that need.

    Cheers,
    dennis.

  15. For what it’s worth, here are the data types that we have the most interest in (speaking as someone that sells data to 100s of customers / 1000s of users):

    1. Location Data
    2. Social / Identity Data
    3. Product Data
    4. Review Data
    5. Real Estate Data
    6. News / Blog Data
    7. Travel Data

    The main issue I have with the approach being taken here is that it seems to come from the perspective of someone producing data, rather than those consuming data.

    As far as the question of interest graph, etc.. so far we’ve seen the above 7 types as the main base and most everything else is derived.

  16. I think your main categories cover all the data layers but I would reorganize them in different buckets.

    To start, I see 5 layers that define the “behavioral / segment” of a user
    – Transactional
    – Search
    – Social network
    – Engagement (reviews, check-in)
    – Demographic

    Then you have on their own
    – Location
    – Products / Services
    – Knowledge
    – Events

    Of course, the way we perceive data layers is strongly influenced by the nature of our respective industry.

  17. Energy data! There are heaps of interesting things happening these days around the smart grid and also building benchmarking, monitoring of distributed generation projects, demand response etc. This will be one of the most important data buckets for the health and well being of our planet in the near-term future.

  18. or what it’s worth, here are the data types that we have the most interest in (speaking as someone that sells data to 100s of customers / 1000s of users):
    1. Location Data
    2. Social / Identity Data
    3. Product Data
    4. Review Data
    5. Real Estate Data
    6. News / Blog Data
    7. Travel Data
    The main issue I have with the approach being taken here is that it seems to come from the perspective of someone producing data, rather than those consuming data.
    As far as the question of interest graph, etc.. so far we’ve seen the above 7 types as the main base and most everything else is derived.
    ————
    “Take your stinking paws off me, you damned dirty ape!”

    My data is MY DATA.

  19. From my POV, measuring “time” data is essential. The new delivery tools smart-phones/tablets/twitter have streamlined our lives. We now spend less time performing cumbersome activities (navigating), we’re now subscribed to platforms; we’ve shifted more time where? I think the battleground is for people’s time, everything else converts from there.

    Just a thought…

  20. John, thanks for the opportunity to contribute.

    I see significant value associated with Conversation Data in both the consumer and enterprise space.

    For me, elements of conversation data include things like:
    – conversing parties
    – converse time / date / duration
    – conversation mode – voice, video, text
    – conversation threading
    – conversation followers
    – conversation lifecycle
    – conversation context
    – conversation graph

  21. Web 2.0 was all about making the web a bidirectional engagement medium, and all this engagement resulted in an incredible accumulation of data from users and about users, and as John points out, we now need to classify and organize all this data to make use of it more easily.

    In addition to all this consumer-centric data, let’s not forget that there are lots of additional classes of data that have moved onto the web – for example, government data, public records, prices – data that was previously locked in databases and file systems behind firewalls. I expect that being able to access and normalize these additional classes of data will be a key ingredient for additional insights by correlating it with all this not-previously-available consumer-centric data that John is discussing in this blog post.

    Unexpected correlations between disparate data sets lead to unexpected insights!

    Timo Kissel
    CTO
    Fetch Technologies

  22. John, Not sure how this fits but something of interest for the map readers may be the “flow or currents” by which data can be influenced. (after all the visual nature of the ocean lends itself to the analogy), specifically as it relates to use cases or research showing the Push-Pull breakdown of Search to Content to Interest to to transaction data, and how those layers or currents support the work of those involved with the Loyalty Loop work see http://socialcommercetoday.com/speed-summary-hbr-on-social-media-new-rules-of-branding/
    Thanks for the efforts and I look forward to seeing the outcome…

Leave a Reply

Your email address will not be published. Required fields are marked *