Twitter and the Ultimate Algorithm: Signal Over Noise (With Major Business Model Implications)

Note: I wrote this post without contacting anyone at Twitter. I do know a lot of folks there, and as regular readers know, have a lot of respect for them and the company. But I wanted to write this as a "Thinking Out Loud" post, rather than a reported article….

Note: I wrote this post without contacting anyone at Twitter. I do know a lot of folks there, and as regular readers know, have a lot of respect for them and the company. But I wanted to write this as a “Thinking Out Loud” post, rather than a reported article. There’s a big difference – in this piece, I am positing an idea. It’s entirely possible my lack of reporting will make me look like an uninformed boob. In the reported piece I’d posit the idea privately, get a response, and then report what I was told. Given I’m supposedly on a break this week, and I’ve wanted to get this idea out there for some time, I figured I’d just do so. I honestly have no idea if Twitter is actually working on the ideas I posit below. If you have more knowledge than me, please post in the comments, or ping me privately. Thanks! twitter issue.png

—-

I find Twitter to be one of the most interesting companies in our industry, and not simply because of its meteoric growth, celebrity usage, founder drama, or mind-blowing financings. To me what makes Twitter fascinating is the data the company sits atop, and the dramatic tension of whether the company can figure out how to leverage that data in a way that will insure it a place in the pantheon of long-term winners – companies like Microsoft, Google, and Facebook. I don’t have enough knowledge to make that call, but I can say this: Twitter certainly has a good shot at it.

My goal in this post is to outline what I see as the biggest challenge/opportunity in the company’s path. And to my mind, it comes down to this: Can Twitter solve its signal to noise problem?

Many observers have commented on how noisy Twitter is: That once you follow more than about fifty or so folks, your feed becomes unmanageable. If you follow hundreds, like I do, it’s simply impossible to extract value from your stream in any structured or consistent fashion (see image from my stream at left). Twitter’s answers to this issue has been anemic. One product manager even insisted that your Twitter feed should be viewed as a stream you dip into from time to time, using it as a thirsty person might use a nearby water source. I disagree entirely. I have chosen nearly 1,000 folks who I feel are interesting enough to follow. On average, my feed gets a few hundred new tweets every ten minutes. No way can I make sense of that unassisted. But I know there’s great stuff in there, if only the service could surface it in a way that made sense to me.

You know – in a way that feels magic, the way Google was the first time I used it.

I want Twitter to figure out how to present that stream in a way that adds value to my life. It’s about the visual display of information, sure, but it’s more than that. It requires some Really F*ing Hard Math, crossed with some Really Really Hard Semantic Search, mixed with more Super Ridiculous Difficult Math. Because we’re talking about some super big numbers here: 200 million tweets a day across hundreds of millions of accounts. And that’s growing bigger by the hour.

A mini industry has evolved to address this issue – I use News.me, Paper.li, TweetDeck (recently purchased by Twitter), Percolate and others, but the truth is, they are not fully integrated, systemic solutions to the problem. Only Twitter has access to all of Twitter. Only Twitter can see the patterns of usage and interest and turn meaningful insights and connections into algorithms which feed the entire service. In short, it’s Twitter that has to address this problem. Because, of course, this is not just Twitter’s great problem, it is also Twitter’s great opportunity.

Why? Because if Twitter can provide me a tool that makes my feed really valuable, imagine what it can do for advertisers. As with every major player that has scaled to the land of long-term platform winners (as I said, Google, Microsoft, Facebook), product comes first, and business model follows naturally (with Microsoft, the model was software sales of its OS and apps, not advertising).

If Twitter can assign a rank, a bit of context, a “place in the world” for every Tweet as it relates to every other Tweet and to every account on Twitter, well, it can do the same job for every possible advertiser on the planet, as they relate to those Tweets, those accounts, and whatever messaging the advertiser might have to offer. In short, if Twitter can solve its signal to noise problem, it will also solve its revenue scale problem. It will have built the foundation for a real time “TweetWords” – an auction driven marketplace where advertisers can bid across those hundreds of millions of tweets for the the right to position relevant messaging in real time. If this sounds familiar, it should – this is essentially what Google did when it first cracked truly relevant search, and then tied it to AdWords.

Now, I do know that Twitter sees this issue as core to its future, and that it’s madly working on solving it. What I don’t know is how the company is attacking the problem, whether it has the right people to succeed, and, honestly, whether the problem is even soluble regardless of all those variables. After all, Google solved the problem, in part, by using the web’s database of words as commodity fodder, and its graph of links as a guide to value. Tweets are more than words, they comprise sentiments, semantics, and they have a far shorter shelf life (and far less structure) than an HTML document.

In short, it’s a really, really, really hard problem. But it’s a terribly exciting one. If Twitter is going to succeed at scale, it has to totally reinvent search, in real time, with algorithms that understand (or at least replicate patterns of) human meaning. It then has to take that work and productize it in real time to its hundreds of millions of users (because while the core problem/opportunity behind Twitter is search, the product is not a search product per se. It’s a media product.)

To my mind, that’s just a very cool problem on which to work. But I sense that Twitter has the solution to the problem within its grasp. One way to help solve it is to throw open the doors to its data, and let the developer community help (a recent move seems to point in that direction). That might prove too dangerous (it’s not like Google is letting anyone know how it ranks pages). But it could help in certain ways.

Earlier in the week I was on the phone with someone who works very closely in this field (search, large scale ad monetization, media), and he said this of Twitter: “There’s definitely a $100 billion company in there.”

The question is, can it be built?

What do you think? Am I off the reservation here? And who do you know who’s working on this?

42 thoughts on “Twitter and the Ultimate Algorithm: Signal Over Noise (With Major Business Model Implications)”

  1. John,

    There are so many opportunities to improve the signal-to-noise ratio from large, unstructured data streams; e.g., better filters, personal curation, algorithmic relevance, etc.

    At TrendSpottr (http://trendspottr.com) we focus on two aspects of improving signal-to-noise ratio: timeliness and relevance of data. Our algorithms are tuned to identify trends and trending information from Twitter, Facebook and other real-time data streams at their point of acceleration; ie., before they have become “popular”. As a result, we provide early and predictive insights about the information that is trending now on the social Web and is expected to continue to accelerate and gain general awareness.

    The ability to turn a chronological firehose of data into a relevant stream of insightful & predictive information provides advertisers, brands, journalists, daytraders and others the tools to proactively prepare for and predict market and social behaviors, consumer preferences, impending crises and major trending events.

    We’ll be announcing some integrations with large media and social analytics companies soon who will be integrating our API to offer real-time trending products and services to their customers.

    At the end of the day, their are various opportunities and solutions to improve signal-to-noise on the social Web. In fact, one person’s signal may be another’s noise. We offer one approach, there are many others like DataSift, Siftee, Summify, KnowAboutIt and others.

  2. John,

    Check out the demo I sent you the other day. We try to extract a lot of that value with Roundtable. Our prototype is going to be ready on Monday and we’re meeting with DFJ Gotham, AOL Ventures, and Lerer next week.

    Would love for you to be one of the few thought leaders chosen as one of our beta adopters!

  3. John,

    Check out the demo I sent you the other day (Roundtable). Our product extracts Twitter’s best “signals”.

    We would love for you to be one of the few Beta testers when our prototype is completed next week.

  4. John,

    By the way, here’s an example of how we can improve signal-to-noise using your “Digital Marketers” curated Twitter list. As you know, Twitter lists are already an attempt to improve signal-to-noise ratio. However, a large list is still very noisy and is essentially a chronological stream without any relevance.

    Compare your Twitter list (http://twitter.com/#!/johnbattelle/digital-marketers) on Twitter with the same list on TrendSpottr (http://t.co/LtWpf1O).

    Would love your feedback.

    – Mark

  5. Eh, I dunno. I hate when an algorithm hides what it’s doing. Bad mental model. Facebook tries to surface most relevant stuff. I hate their stream.

    In fact, it was much better when they let you control it somewhat. “More stories like this”. “Less from this user”. etc..

    Twitter’s success stems from its simplicity and you want them to get super complicated with crazy math.

    Why not start with something more simple? The bread and butter of Twitter are the links. Some links trend. Twitter knows who posted a link first. How many of your followers posted a link? Retweeted it?

    Have a tab for links posted by folks you follow. Ordered descending by when the link was first posted.

    The look should be similar to RTs, with a number and the names of a few of your friends in order of importance and a “more” button.

  6. John, You are right in that we need to separate the wheat from the chaff, but the solution is not via search. One man’s wheat is another man’s chaff, and we need to cater to different interests-groups. The solution here is indeed via information filtering and categorization – exactly what we are working on at Evri. It’s not just about who you know (or equivalently, the social media or the news sources you follow) spouting info on a variety of topics — rather, it’s about intelligently classifying items into topic buckets of the right granularity, and either pushing them to people or letting them self-select the topics they want to follow. So you follow topics, and not only people/sources; you discover new and related topics as you browse stories in these topic areas, so your interest graph evolves with time and your changing interests. Evri’s VP of Product, Adrian Klein’s post from yesterday The Interest Graph: Smart Filters for Social Media discusses exactly these ideas. Fortunately, Twitter is cultivating a platform that lets partners innovate in this space as well. Just like Summize tackled the problem of search while Twitter focused on core stability and infrastructure issues, the opportunity is there today for other companies, like ours, to tackle the noise/filter problem.

  7. Take faith from their acquisition of BackType. Those guys have been thinking about real-time and conversational search and working on data-at-scale for years. (The press seemed to forget pretty quickly that these guys built and were open-sourcing Storm before Twitter bought them, too.)

  8. “Only Twitter has access to all of Twitter” –> well that’s not accurate, is it?

    Twitter’s firehose is licensed out to at least publicly disclosed 10 companies (my former employer Kosmix being one of them and Google/Bing being the others) and presumably now more people have their hands on it. Of course, those cos don’t see user passwords but have access to just about every other piece of data and can build, from a systems standpoint, just about everything Twitter can/could.

    No?

  9. I think there might be a big conceptual error in what you are considering here. Twitter is a thing that’s not like the others. You are wanting it to take on the characteristics of the others, but why? Why don’t you just use the others?

    To demonstrate this, take all our current communication options and throw them all on the table, sort them by their advantages and disadvantages, and see what you’ve got. There are advantages to each, and changing Twitter would likely turn it into something that already exists.

    Twitter is constrained by extremely short length communication that breeds noise and only has enough room to REFER to outside things that are meaningful, and of a longer, more thoughtful length. Even the iPhone gives you unlimited length text messages.

    Using algorithms to compensate for a fundamental characteristic of Twitter is probably treating the symptoms and not the cause.

    Consider what algorithms, and the gaming of them has done to Google search, which is bordering on useless now. How often do you search for things and the second entry is something from 2002? You need to watch http://www.thefilterbubble.com/ted-talk to see what algorithms will do to Twitter.

    Finally, why don’t you ask this question on Twitter?

    Exactly.

    Mark Hernandez
    The Information Workshop

  10. Hey John,

    Checkout our startup The Shared Web (we’re a TechStars funded company, with a great group of investors). We’re still early in our product iteration, and we’re only using Twitter as a way to ‘kickstart’ content and community on our site.

    We do a pretty decent job of adding some value on top of twitter (with aggregation of tweets from people you follow, and ranking based on your interests, along with human powered categorization of the items that come in).

    Our vision is for this to become a sort of Reddit/HackerNews with your real identities, and with content in any topic you’re interested in! Read your blog regularly, and would love to chat if you want to know more:

    founders @ thesharedweb.com

  11. Hey John,

    Checkout our startup The Shared Web (we’re a TechStars funded company, with a great group of investors). We’re still early in our product iteration, and we’re only using Twitter as a way to ‘kickstart’ content and community on our site.

    We do a pretty decent job of adding some value on top of twitter (with aggregation of tweets from people you follow, and ranking based on your interests, along with human powered categorization of the items that come in).

    Our vision is for this to become a sort of Reddit/HackerNews with your real identities, and with content in any topic you’re interested in! Read your blog regularly, and would love to chat if you want to know more:

    founders @ thesharedweb.com

  12. Agreed with you there is an opportunity. Not just for current users as your blog points to, but for getting new users. The issue with Twitter is that “tuning in” is insanely hard for the average user.

    Finding “channels” …

    for shared interests (i.e. sustainability) domains (say, business) or categories to watch (i.e. technology) or personal passions (i.e. guitar players) …

    all require way too much energy for the average newbie. I posit that they leave or they use less than potential, which leaves twitter vulnerable.

    If they can use the data to create better “channels”, then their usage will go up which will in turn add more value, which will perpetuate it’s dominance, and data, and so on.

    It’s a gap — more like a gaping hole — that unless fixed leaves an opportunity for other social networks and Twitter of course.

  13. Hi John,

    I was glued to my iphone reading this article. I think what you say is 100% on the mark – Twitter has a big opportunity here and it is a really really hard problem to solve. I think one thing Twitter has is an 3rd party army at their disposal. It could be a blessing or a curse, but it’s the route they have taken so I hope they will embrace this community as their way forward. It hasn’t always been an easy road as a 3rd party company that uses the Twitter API, but I think @rsarver is making a real effort to keep developers excited about moving Twitter forward.

    -Tammy, CEO @MarketMeSuite

  14. Firstly, I despair that Twitter can do anything better than me, in terms of what I want. Twitter reminds me of MS Windows, desperately seeking self-justification through adding value, though actually making things infinitely more complex by trying to simply them for users.

    What we need are APIs, and of all the services mentioned here, it seems only DataSift is promoting their API. Further, I’ve lately been looking for a cloud-hosted machine learning classification API in order to cluster tweets by similarity, and therefore filter them on similarity. On @Quora: Are there any cloud-based APIs for clustering Tweets? Answer: http://qr.ae/7nuLP

    – Marcus Endicott http://meta-guide.com

  15. “whether the company can figure out how to leverage that data in a way that will insure it a place in the pantheon of long-term winners – companies like Microsoft, Google, and Facebook.”

    I don’t think you should include Facebook in the ‘pantheon’ just yet. Clearly their best days seem to lie ahead but they are in the same position as Twitter. Facebook needs to solve for FaceWords just as much at Twitter needs TweetWords. As of now, Facebook is a roughly $3b revenue company with 700m uniques, which makes it more like Yahoo! ($4b revenue company with 650m uniques) than Google.

    I am interested to see how both companies leverage their data. More interesting is who has more monetizable data? Facebook’s social graph or Twitter’s intent graph?

  16. Even a simple filter can make a big difference to the usability of Twitter. I built a prototype Bayesian filter for Seesmic and it worked quite well for improving the signal to noise ratio: http://bit.ly/bfX12z

  17. Well said, and totally agreed. I’ve commented on Fred Wilson’s blog last week that Twitter has to move beyond “the tweet” as the main unit of value. They are sitting on a ton of knowledge that’s hard to process already.

    I think it’s wrong for Twitter to outsource the tweet analysis to the Gnip, DataSift et al. That’s their core.

    Twitter has shown a total lack of imagination in innovation. They have not been able to manage growth AND innovation at the same time. Here was my reaction to the Adam Lashinsky interview with Dick Costello, as it dove tails well with your analysis.

    1) Twitter has to grow beyond “the tweet”. I’m worried that they may be too focused on Ad sales as the key revenue element. Costelo said “the Ad platform is organic to the Twitter platform”, as they are focused on selling Ads to the world’s top brands where they charge by “engagement” (which is innovative). But these Promoted tweets and Promoted trends seem like old school of ‘interruption marketing’. There must be monetizable value beyond “the tweet”.

    2) The eco-system positioning is still murky. Dick was hard pressed to find more than 2 examples of eco-system added-value: CRM & Analytics, and he repeated these examples. Twitter needs to describe a more rich and inspiring eco system. They possess the most intelligence on what users are asking and they know where the holes are from the inside. Why don’t they lay out a great vision for it, instead of leaving us to guess what the next move might be?

    3) Great motto: “The World in your pocket”, but I find it hard to get it.
    Costelo said they are true to their original mission of “The World in your pocket”. That’s a great vision for the end-user, but it doesn’t seem to jive with their focus on ad sales & gamed tweets. Where is that easy-to-use client App that gives me that?

    4) They need marketing beyond the brand marketing. Costelo said they’re focused on hiring a senior marketing person who will focus on managing/curating the Twitter brand. OK, but I’m hoping for more than that. They definitely need some marketing muscle to explain who/what they are.

    It’s not easy to express the value of a multi-purpose platform as complex and as diversified as Twitter who is simultaneously a communications medium, a social/online network, a development platform, a broadcast channel, a listening post, a knowledge repository…and an Ad network.

    We all want Twitter to step-up.

  18. I am honored by the thoughtfulness of all the comments here, and will respond with another post, as I don’t have a decent commenting system in place that allows all of you to see I am writing back to you. I’m putting in Disqus soon, but I’ll respond in a post, so stay tuned. Thanks all of you so far, and all to come, should you feel inspired to post more.

  19. I am definitely looking forward to this next phase of the Twitter platform, but I would say that Twitter probably needs more data to figure out user intent.

    Consider the more casual Twitter user who doesn’t post or RT as much as sift for signal? How would Twitter provide a better experience for those users?

  20. John, The brands are struggling with the same issue and finding relevance in the cacophony of NOISE is the money maker. I’m working on a similar post now in the same theme. Your panel

    The signal to noise ratio is a problem and few can scale to do it well for brands. Taking trending information from Twitter, Facebook and other real-time data streams is not new at all. What do you do with it once you find it is key. There and MANY sites doing the simple curation filter interfaces see “Winning Curation Strategies and Brands Becoming Publishers” Scoop.it http://bit.ly/jxHQmg is doing a good job on this top level filtering

    Our spot on SocialTimes http://bit.ly/mgjrv6 “5 Ways Curation and Content Automation Increases Engagement.” Really talks to what you have here. I love Jason’s statement “More interesting is who has more monetizable data? Facebook’s social graph or Twitter’s intent graph?” For the brands we work with they are stuck in the middle and don’t give a damn really just want to increase engagement with their consumers to drive sales at the end of the day.

    This post “Behavioral Analytics using Facebook” http://bit.ly/qut5Hh by Marshall Sponder author of “Social Media Analytics” really nailed a key element in the value of behavior. Will follow up.

    @chasemcmichael

  21. Nick Bilton talks about this in his new book “I live in the future & here’s how it works”. But he appears to have the signal to noise ratio working well for him. I’m glad to read your blog, because I have the same problem you do. I’ve “solved” it by reducing the number of people I follow, but that is it’s own problem. I don’t find TweetDeck helpful – how does it help you?

  22. Great post John! It would be wonderful if Twitter could solve this problem. In the mean time, I manage it personally by assigning most of the people I follow to a list. And I even list some people that I don’t follow. I then pull those lists into my Hootsuite account and 9 times out of 10, if I see your tweet, it’s because I was reading the list feed through my Hootsuite App. This is how I keep things straight. I do this with hashtags I regularly follow as well. It’s the only way I can break through the noise and it works pretty well for me.

  23. Great stuff here…I’ve been thinking about and working on this problem for a fairly long time myself so I can attest to the challenge that lies within 😉

    One of the problems I see is not just within the data though…there is a strong interface (see my friendstat.us project as one example of playing around with that problem) and intent (see my knowabout.it project as an example here) problem around all of this that also needs addressed.

    Different people think of Twitter (and social data in general) in different ways…and use each system in different and personal ways. Many just use Twitter as a conversation tool, many just as a promotional tool (mini-blog), and many others use it to discover new people, ideas, and view points.

    And still many more use it for other things (that I’m sure I’m not even aware of).

    So the challenge Twitter itself faces first, is one of true definition…is it a protocol that lies behind social communication or is it a product itself…and if it’s a product, what actual product is it? (search, conversation, marketing, etc.)

    Without clear answers on these things from Twitter itself, the rest of us are forced to come up with our own personal answers and use cases…and try to figure out our own ways to deal with the ‘noise’.

    To be completely honest with you, Twitter’s noise at the moment is one of the very things that makes it so interesting and powerful (to me)…it’s the wild west of social data right now…and that’s a good thing!

  24. What Twitter is good at is providing seridipity in my information strean because of its human element. Algorithm-driven recommenders and rankers kill this as they cannot provide it (yet) . To me that answer is to reduce the number of people in your Twitter stream, and going for quality, rather than quantity there and use RSS to go for quantity.

  25. John –

    I think one way to break down the problem is to start with what your objective is in terms of the information you’re consuming and what “signal” you’re trying to extract from the noise.

    Conceptually and as it’s often been said, there are two high level types of media consumption: grazing and hunting/ gathering. When grazing, most people probably don’t necessarily want signal. Rather, they want serendipity or diversity. Put simply, they want entertainment. They’re killing time and what might interest one person from the next might be very different. Heck, what might interest the same person from one moment to the next might be totally different. In this case, the Twitter Product Manager’s metaphor of dipping your toe into a stream is probably pretty apropos. On the other hand, when you’re hunting, you’re on a mission. In this case, you do want to extract as much signal from the noise. There are certainly lots of variables – semantic, contextual and other — which could help present more meaning from the cacophony which is the Twitter stream, even when only following a few people who are active tweeters. No doubt, semantic analysis and other algorithms can help thread and make the information more relevant and meaningful. But as you suggest, that’s a pretty daunting challenge. And, it’s especially hard to do it in a vacuum and without knowing what you’re hunting for. It’s sort of like trying to answer every question before it’s been asked and then predict what question you’re interested in asking and present you with the answer. I’d posit that’s an unattainable objective.

    In order for Twitter to “figure out how to present that stream in a way that adds value to [your] life” as you say, you need to be part of the hunt. And, you probably need to be an active part of it, not just passive. I’d propose two simple heuristics which might actually help: first, help people pick targets and provide input. Most everyone does this all the time on more traditional media outlets. We set up categories, topic alerts, keywords, favorites, etc. In a word, we search. Second, make it easier to filter the people you’re following. As they say, if you’re hunting for bison, go to the plains; if you’re looking to gather berries, look in the brush.

    We’re trying to take a simple step forward on this last point with Poptuit (if you’re interested, you can check out some coverage here http://t.co/BS34wJ9). Of those 1,000 people you’re following, do you really care about what all of them are saying … all of the time … and on every subject? I’d guess probably not. When a specific topic comes to mind, the first filter most people apply is who — Who can/will help me? Who’s an expert? Who do I trust? After the who, we usually then go to the what/when/why questions. When I wanted to find a good bistro to take a client to on my recent trip to NY, I knew exactly which friends to ask — a few live in NYC and one is an incredible foodie that always knows good places for all occasions anywhere and everywhere. But, when I got stuck helping my son fix the clutch on his dirt bike this weekend, I asked someone entirely different. Not rocket science and maybe it wasn’t perfect, but both queries did the trick. Making it easier, faster and more intuitive to filter our ever-growing social networks with our personal networks could be a first, simple step towards getting more signal from the noise in Twitter.

  26. I absolutely love reading this article, the manner of writing is outstanding.This post as usual was instructive, I have had to bookmark your website and subscribe to this feed in googlereader. this site looks impressive.
    jakarta hotel | rental forklift

  27. Thanks for taking this opportunity to discuss this, I feel strongly about it and I take pleasure in learning about this topic. If possible, as you gain information, please add to this blog with new information. I have found it extremely useful.
    It frequently is amazing to me how site owners such as yourself can find some time along with the dedication to carry on composing outstanding discussions. Your website isgreat and one of my personal need to read websites. I just needed to say thanks.

  28. Great post John! It would be wonderful if Twitter could solve this problem. In the mean time, I manage it personally by assigning most of the people I follow to a list. And I even list some people that I don’t follow. I then pull those lists into my Hootsuite account and 9 times out of 10, if I see your tweet, it’s because I was reading the list feed through my Hootsuite App. This is how I keep things straight. I do this with hashtags I regularly follow as well. It’s the only way I can break through the noise and it works pretty well for me. gambar gambar gokil

Leave a Reply to Ian Cancel reply

Your email address will not be published. Required fields are marked *