Grokking PubSub and Data Lock In

John Battelle

19 years ago

Earlier this week I spent some time on the phone with Bob Wyman, CTO and founder of PubSub. Over the past year Bob has been heckling me for focusing on “retrospective search – Google and Yahoo, et al, and not paying attention to his offering of” prospective search,” or searching what he calls the “GrayWeb” – that part of the web which is available and open, but is rarely seen because our view of the web is so dependent on traditional approaches to search. Wyman focuses on that portion of the GrayWeb that changes rapidly – the “ChangingWeb” where the future hits the present, where the unique element of the dataset is the fact of its newness. That window – when the information is knowable, but before it becomes forever eternalized in The Index – is where PubSub lives.

In short, PubSub crawls (mostly) blog feeds and offers a service that allows you to stay abreast of topics you choose as new information breaks. (PubSub just announced a political cut of this kind of data, for example). To me, PubSub felt a lot like Google or Yahoo news alerts on steroids, a Feedster clone. But after talking to Bob, I came away convinced that there’s more to PubSub than meets the eye.

PubSub is named for “publish/subscribe” – a well traveled piece of IT theory that has, at its core, the assumption of structured data. Back in the earlier days of the computer biz, Apple, DEC, and others realized the need for users to be alerted with things change – in a database publishing model, for example, a new rev of a document would create an alert. These companies invented publish-subscribe models that, for the most part, really never took off. Why? I think the code was overspecified, and the user interface cumbersome. Wyman worked on pubsub apps at DEC – in fact, he built the pubsub piece of AllInOne, a Notes-like application that had a brief moment in the sun in the late 80s, if memory serves.

A few years ago Wyman found himself wondering if it were possible to apply the publish and subscribe model to the entire world wide web. That’s a pretty audacious idea, but focusing on blogs was a good way to start , because blogs have a wealth of feed-based structured data around each post (timestamp, author, title, often a category). Wyman claims to have figured out algorithms which allow PubSub to process the ChangingWeb rapidly and “at internet scale.”

I’m not in position to judge those claims, but I like the theory behind Bob’s intentions. He plans to create tools that allows bloggers to easily tag their posts with category like information – “this is a book review” or “this is an event announcement.” He’s already built plug ins for Word Press and is looking to continue his work with other platforms like MT, which have similar widgets that so far are not aligned around a particular standard.

In theory anyway, Bob is onto something here. It’s yet another attempt to build the semantic web from the bottom up, and it suffers from all the foibles of such an effort, but the intent is good – let the individual publishers build data structures which, in aggregate, create a fuzzy kind of value that developers can tap into. Were enough of these kind of structured and tagged data sets to become available (“This is a job posting,” “this is something for sale,”) we might well see services evolve which are built on the premise of freely available data – in other words, a new kind of publishing model, one where value comes from what you do with the data, as opposed to who owns access to the data. That may not seem like a big change, but in fact it would be – eBay, Monster, Yahoo, et al are all based on the idea of owning the environment in which structured data lives. More on this shortly, but for now, check out PubSub and let me know what you think.

Share this: