Yesterday I met with Christopher Ahlberg, the PhD co-founder of Recorded Future, a company I noted in these pages back in mid-2010. Ahlberg is one of those rare birds you just know is making stuff that matters – a scientist, an entrepreneur, a tinkerer, and an enthusiast all wrapped into one.
He ran me through Recorded Future’s technology and business model, and I found it impressive. In fact, I’m hoping I can employ it somehow into my book research. And that conditional tense of “hoping” is the main problem I have with Ahlberg’s creation – it’s a rather complicated system to use. Then again, what of worth isn’t, I suppose?
Recorded Future is, at its core, a semantic search engine that consumes tens of thousands of structured information feeds as its “crawl.” It then parses this corpus for several core assets: Entities, Events, and Time (or Dates). Recorded Future’s algorithms are particularly adept at identifying and isolating these items, then correlating them at scale. If that sounds simple, it ain’t.
The service then employs a relatively complicated query structure that allows you to project the road ahead for your question. For example, you might choose “Amazon” as your entity, and then set your timeframe for events involving Amazon over the past two months and into the next two months. Recorded Future will analyze its sources (SEC filings, blogs, news sites, etc) and create a timeline-like “map” of things that have happened and are predicted to happen with regard to Amazon over the next eight weeks. You can further refine a search by adding other entities or events (“earnings” or “CEO”, for example).
How does it work? Well, turns out the Internet is rife with whisperings of the future, you just need to learn how to listen. That’s Recordable Future’s specialty. As you might imagine, Wall Street quants and government spooks just love this stuff. I’d imagine journalists would as well, but most of us are too strapped to afford the company’s services. Embedded below is a new feature of the site, a weekly overview of a news-related entity.
Recorded Future’s engine is not limited to the sources it currently consumes. Not only is Ahlberg adding more every month, his customers can add their own corpuses. Imagine throwing Wikileaks into Recorded Future, for example.
Perhaps the coolest aspect of the service is a visualization of how entities relate to each other over time. Ahlberg showed me a search for mobile patents, then toggled into a relationship graph. Guess what entity broke into the center of the graph, connected to nearly everything else? Yup – Motorola.
Did I mention that Google is an investor in Recorded Future?
As I said, I hope to start using the service soon, and perhaps posting my findings here.
From time to time I have the honor of contributing to a content series underwritten by one of FM’s marketing partners. It’s been a while since I’ve done it, but I was pleased to be asked by HP to contribute to their Input Output site. I wrote on the impact of location – you know I’ve been on about this topic for nearly two years now. Here’s my piece. From it:
Given the public face of location services as seemingly lightweight consumer applications, it’s easy to dismiss their usefulness to business, in particular large enterprises. Don’t make that mistake. …
Location isn’t just about offering a deal when a customer is near a retail outlet. It’s about understanding the tapestry of data that customers create over time, as they move through space, ask questions of their environment, and engage in any number of ways with your stores, your channel, and your competitors. Thanks to those smartphones in their pockets, your customers are telling you what they want – explicitly and implicitly – and what they expect from you as a brand. Fail to listen (and respond) at your own peril.
More on the Input Output site.
In the comments on this previous post, I promised I’d respond with another post, as my commenting system is archaic (something I’m fixing soon). The comments were varied and interesting, and fell into a few buckets. I also have a few more of my own thoughts to toss out there, given what I’ve heard from you all, as well as some thinking I’ve done in the past day or so.
First, a few of my own thoughts. I wrote the post quickly, but have been thinking about the signal to noise problem, and how solving it addresses Twitter’s advertising scale issues, for a long, long time. More than a year, in fact. I’m not sure why I finally got around to writing that piece on Friday, but I’m glad I did.
What I didn’t get into is some details about how massive the solving of this problem really is. Twitter is more than the sum of its 200 million tweets, it’s also a massive consumer of the web itself. Many of those tweets have within them URLs pointing to the “rest of the web” (an old figure put the percent at 25, I’d wager it’s higher now). Even if it were just 25%, that’s 50 million URLs a day to process, and growing. It’s a very important signal, but it means that Twitter is, in essence, also a web search engine, a directory, and a massive discovery engine. It’s not trivial to unpack, dedupe, analyze, contextualize, crawl, and digest 50 million URLs a day. But if Twitter is going to really exploit its potential, that’s exactly what it has to do.
The same is true of Twitter’s semantic challenge/opportunity. As I said in my last post, tweets express meaning. It’s not enough to “crawl” tweets for keywords and associate them with other related tweets. The point is to associate them based on meaning, intent, semantics, and – this is important – narrative continuity over time. No one that I know of does this at scale, yet. Twitter can and should.
Which gets me to all of your comments. I heard both in the written comments, on Twitter, and in extensive emails offline, from developers who are working on parts of the problems/opportunities I outlined in my initial post. And it’s true, there’s really quite a robust ecosystem out there. Trendspottr, OneRiot, Roundtable, Percolate, Evri, InfiniGraph, The Shared Web, Seesmic, Scoopit, Kosmix, Summify, and many others were mentioned to me. I am sure there are many more. But while I am certain Twitter not only benefits from its ecosystem of developers, it actually *needs* them, I am not so sure any of them can or should solve this core issue for the company.
Several commentators noted, as did Suamil, “Twitter’s firehose is licensed out to at least publicly disclosed 10 companies (my former employer Kosmix being one of them and Google/Bing being the others) and presumably now more people have their hands on it. Of course, those cos don’t see user passwords but have access to just about every other piece of data and can build, from a systems standpoint, just about everything Twitter can/could. No?”
Well, in fact, I don’t know about that. For one, I’m pretty sure Twitter isn’t going to export the growing database around how its advertising system interacts with the rest of Twitter, right? On “everything else,” I’d like to know for certain, but it strikes me that there’s got to be more data that Twitter holds back from the firehose. Data about the data, for example. I’m not sure, and I’d love a clear answer. Anyone have one? I suppose at this point I could ask the company….I’ll let you know if I find out anything. Let me know the same. And thanks for reading.
Last night I had the pleasure of interviewing Steven Levy, and old colleague from Wired, on the subject of his new book: In The Plex: How Google Thinks, Works, and Shapes Our Lives. The venue was the Commonwealth Club in San Francisco, and I think they’ll have the audio link up soon.
Steven’s interview was a lot like his book – full of previously untold anecdotes and stories that rounded out pieces of Google’s history that many of us only dreamt of knowing about. When I was reporting my book,The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture, I had limited access to folks at Google, and *really* limited access to Larry Page and Sergey Brin. Levy had the opposite, spending more than two years inside the company and seeing any number of things that journalists would have killed to see in years past.
The result is a lively and very detailed piece of reporting about the inner workings of Google. But I was a bit disappointed with the book in that Steven didn’t take all that new knowledge and pull back to give us his own analysis of what it all meant. I asked him about this, and he said he made the conscious decision to not editorialize, but rather lay it all out there and let the reader draw his or her own conclusions. I respect that, but I also know Steven has really informed opinions, and I wish he’d give them to us.
What I took away from In the Plex was a renewed respect for the awesome size and scope of Google’s infrastructure, as well as its ambition. Sometimes we forget that Google is more likely than not the largest manufacturer of computers in the world, and runs the largest single instance of computing power in the world. It’s also one of the largest collectors and analyzers of data in the world. All of this has drawn serious scrutiny, but I don’t think even the regulators really grok how significant Google’s assets are. They should all read Steven’s book.
Levy only grazes the surface of Google’s social blindness, unfortunately, and due to timing could only mention Page’s ascendancy to CEO in his epilogue. But his reporting on how the China issue played out is captivating, as are the many details he fills out in Google’s early history. If you’re fascinated by Google, you’ve got to add this one to your library.
Today I spoke at Sony HQ in front of some Pretty Important Folks, so I wanted to be smart about Sony’s offerings lest anything obviously uninformed slip out of my mouth. To prepare I did a bunch of Google searches around Sony and its various products.
Many of these searches are what I call “head end” searches – a lot of folks are searching for the terms I put in, and they are doubly important to Google (and its advertising partners) because they are also very commercial in nature (not in my case, but in general.) Usually folks searching for “Sony Tablets” have some intent to purchase tablets in the near future, or at the very least are somewhere in what’s called the “purchase funnel.”
I was struck with the results, so much so I took a screen shot of one representative set of results. In traditional print, we used to watch a metric called “Ad Edit Ratio” very closely (as did the government, for reasons of calculating postal rates). Editors at publications lobbied for low ad edit ratios (so they’d get more space to put their content, naturally). Advertising executives lobbied for higher Ad Edit ratios (so they could sell more ads, of course). We usually settled somewhere around 50-50 – half ads, half editorial.
Google is way lower than that, on any given search. But not for head end searches. In fact, as a percentage of actual “editorial” (organic search results) versus “paid”, it’s pushing towards 35/65 or more, at least when you measure the space “above the fold” on a typical screen.
Then again, in the case of AdWords, one could argue the ads are contextually relevant and useful.
Just felt worth pointing out, if for no other reason as to add a page to the historical record of how the service is evolving. Once “media” adwords start taking over, this picture may well change again, and it might not be a change that folks like much.
When I read Google announcements like this one, An update to Google Social Search, I find myself wondering why Google doesn’t just come out and say something like this: “We know social search is important, and we’re working on it. However, we don’t think the solution lies in working only with Facebook, because, to be honest, we think social media is bigger than one company, one platform, or one “social graph.” We’ve got a bigger vision for what social means in the world, and here it is.”
Wouldn’t that be great?
Because honestly, I think the company does have a bigger vision, and I think it’s rooted in the idea of instrumentation and multiples of signals (as in, scores if not thousands of signals understood to be social in nature). In other words, there is not one “ring to rule them all” – there is no one monoculture of what “social” means. For now, it appears that way. Just like it appears that there’s one tablet OS. But the world won’t shake out that way – we’re far too complicated as humans to relegate our identity to a single platform. It will be distributed, nuanced, federated. And it should be instrumented and controlled by the individual. At least, I sure hope it will be.
Google might as well declare this up front and call it a strategy. In that context, it might even make sense to do further Facebook integration in the near term, as one of many signals, of course. Google already uses some limited Facebook data (scroll down), but clearly has decided to not lean in here (or can’t come to terms with Facebook around usage policies). Clearly the two companies are wary of working together. But it’s my hope that over time, whether or not they do should be a moot issue.
Why? Because I view my Facebook data as, well, mine. Now, that may not really be the case, but if it’s mine, I should be able to tell Google to use it in search, or not. That’s an instrumentation signal I should be able to choose. Just like I can chose to use my Facebook identity to log into this blog, or any number of other sites and services. It should be my choice, not Facebook’s, and not Google’s either.
Switch the control point to the customer, and this issue sort of goes away. I have a longer post in me about “social clothing” – came up on a phone call with Doc Searls yesterday – and hopefully when I get to that, this might make a bit more sense….
Seems so. I’ve written about this a lot, so much that I won’t bother to link to all the stuff I’ve posted. It was the basis of a chapter in the book, where I pointed out that (at the time) Google claimed algorithmic innocence, and Yahoo, on the other hand, was cheerful in its presumption that Yahoo services were the best answer to certain high value searches (like “mail”).
Now comes this study, from Harvard professors no less, which pretty much states the obvious. Check this graph:
It’s clear that in some cases, one might argue that Google services should win (maps, for example). But for “chat”? Or for “mail”? A stretch.
Here’s the paper’s authors general conclusion: “Google typically claims that its results are “algorithmically-generated”, “objective”, and “never manipulated.” Google asks the public to believe that algorithms rule, and that no bias results from its partnerships, growth aspirations, or related services. We are skeptical.”
So am I.
Update: Danny has, as always, a more nuanced point of view. Thanks, my always smarter commentators.
Blekko is a new search engine that fundamentally changes a few key assumptions about how search works. It’s not for lazywebbers – you have to pretty much be a motivated search geek to really leverage blekko’s power. But then again, there are literally hundreds of thousands of such folks – the entire SEO/SEM industry, for example. I’ve been watching blekko, and the team behind it, since before launch. They are search veterans, not to be trifled with, and they are exposing data that Google would never dream of doing (yes, they do pretty much a full crawl of the web that matters). In a way, blekko has opened up the kimono of search data, and I expect the service, once it leaves private beta, will become a favorite of power searchers (and developers) everywhere.
The cool thing is, using blekko’s data and (I hope) robust APIs, one can imagine all sorts of new services popping up. I for one wish blekko well. It’s about time.
And in case you are wondering what the big deal is, besides all the data you can mine, to my mind, it’s the ability to cull the web – to “slash” the stuff you don’t care about out of your search results. Now, not many of us actually will do that. But will services take that and run? I certainly hope so.
For a quick overview of blekko’s core feature – “slashtags” – check out the new video, above. And to bone up on the various merits of the service, here are a few key links:
Update: First 500 readers get a beta invite! Email email@example.com to get in on it!
A while back I wrote a post titled “The Gap Scenario.” In it I outlined one (of many) scenarios that I imagined would become pretty commonplace as location based services, search, and social merged into a retail setting.
Today’s news (Business Insider) that publisher Daily Candy has created an Android app that sends users articles when they are near “current local happenings” such as designer sales, spas, and concerts got me thinking about this scenario once again.
The app monitors where you might be in the background, then matches content, and one must assume, eventually, offers. It works only in New York for now, but more cities are expected.
As I laid out in my original post (and my 2005 book), location aware services are not yet a cultural habit, in particular ambient ones. But it won’t be long before we assume that our public presence is, in effect, a search, one for which we will expect a response from any number of potential respondents.
What I find interesting is that the first innovators in this space are publishers, for the most part, rather than marketers. I’m not certain that this will stand. As many of you know from reading my thoughts here, I’m convinced that all marketers are now publishers, and the best ones will figure out how to add value in the context of ambient location aware scenarios. Platforms (like Google, Twitter, Facebook, Yelp) will be key mediators, but I’m not sure what we understand to be traditional publishers (like Daily Candy) can hold this ground. We’ll see….