In the comments on this previous post, I promised I’d respond with another post, as my commenting system is archaic (something I’m fixing soon). The comments were varied and interesting, and fell into a few buckets. I also have a few more of my own thoughts to toss out there, given what I’ve heard from you all, as well as some thinking I’ve done in the past day or so.
First, a few of my own thoughts. I wrote the post quickly, but have been thinking about the signal to noise problem, and how solving it addresses Twitter’s advertising scale issues, for a long, long time. More than a year, in fact. I’m not sure why I finally got around to writing that piece on Friday, but I’m glad I did.
What I didn’t get into is some details about how massive the solving of this problem really is. Twitter is more than the sum of its 200 million tweets, it’s also a massive consumer of the web itself. Many of those tweets have within them URLs pointing to the “rest of the web” (an old figure put the percent at 25, I’d wager it’s higher now). Even if it were just 25%, that’s 50 million URLs a day to process, and growing. It’s a very important signal, but it means that Twitter is, in essence, also a web search engine, a directory, and a massive discovery engine. It’s not trivial to unpack, dedupe, analyze, contextualize, crawl, and digest 50 million URLs a day. But if Twitter is going to really exploit its potential, that’s exactly what it has to do.
The same is true of Twitter’s semantic challenge/opportunity. As I said in my last post, tweets express meaning. It’s not enough to “crawl” tweets for keywords and associate them with other related tweets. The point is to associate them based on meaning, intent, semantics, and – this is important – narrative continuity over time. No one that I know of does this at scale, yet. Twitter can and should.
Which gets me to all of your comments. I heard both in the written comments, on Twitter, and in extensive emails offline, from developers who are working on parts of the problems/opportunities I outlined in my initial post. And it’s true, there’s really quite a robust ecosystem out there. Trendspottr, OneRiot, Roundtable, Percolate, Evri, InfiniGraph, The Shared Web, Seesmic, Scoopit, Kosmix, Summify, and many others were mentioned to me. I am sure there are many more. But while I am certain Twitter not only benefits from its ecosystem of developers, it actually *needs* them, I am not so sure any of them can or should solve this core issue for the company.
Several commentators noted, as did Suamil, “Twitter’s firehose is licensed out to at least publicly disclosed 10 companies (my former employer Kosmix being one of them and Google/Bing being the others) and presumably now more people have their hands on it. Of course, those cos don’t see user passwords but have access to just about every other piece of data and can build, from a systems standpoint, just about everything Twitter can/could. No?”
Well, in fact, I don’t know about that. For one, I’m pretty sure Twitter isn’t going to export the growing database around how its advertising system interacts with the rest of Twitter, right? On “everything else,” I’d like to know for certain, but it strikes me that there’s got to be more data that Twitter holds back from the firehose. Data about the data, for example. I’m not sure, and I’d love a clear answer. Anyone have one? I suppose at this point I could ask the company….I’ll let you know if I find out anything. Let me know the same. And thanks for reading.
3 thoughts on “More on Twitter’s Great Opportunity/Problem”
John, This is a great opportunity to bring all these companies together. I sent you a link regarding InfiniGraph that provides more of a brand angle to all this. Tell me what you think
> I’d like to know for certain, but it strikes me that there’s got to be more data that Twitter holds back from the firehose. Data about the data, for example. I’m not sure, and I’d love a clear answer.
Well here’s what a mapped out tweet looks like http://www.readwriteweb.com/archives/this_is_what_a_tweet_looks_like.php and as we know developers have been doing their own additions with annotations to add any additional metadata to a Twitter post.
That’s right, any data.
Hope that helps.
I don’t think that Twitter’s biggest problem is anything technical or has anything to do with their API. Sure, this kind of semantic data analysis is an opportunity and I think it is really useful, but I think Twitter’s greatest problem seems to be finding a way to allow businesses to advertise and for Twitter to make some real money that is inline with their ridiculous traffic levels. There are all kinds of companies that are helping businesses get Twitter followers such as http://twitter.popularfans.com and Twitter needs to find a way to take advantage of their traffic to actually try and make some money. Making money will ensure that they can tackle all of these other interesting problems. I have heard that Twitter’s investors are getting antsy about this more than anything. I know that you have to build up the product, but focusing on technical issues when you still don’t have a solid business model doesn’t make as much sense to me.