The Web Time Axis

One of my largest gripes about the web is that it has no memory. But I think this will soon change – at some point in the not too distant future we'll have live and continuous historical copies of the web that will be searchable – creating, if you will,…

One of my largest gripes about the web is that it has no memory. But I think this will soon change – at some point in the not too distant future we’ll have live and continuous historical copies of the web that will be searchable – creating, if you will, a time axis for the web, a real-time Wayback Machine (only there’ll be no broken links). In other words, in our lifetimes we’ll see our cultural digital memory – as we understand it through the web and engines like Google – become contiguous, available, always there. And barring a revival of the Luddites or total nuclear war, this chain will most likely be unbroken, forever, into the future. Historians looking back to this era will mark it as a watershed. At some definable point in the early 21st century, the web will gain a memory of itself, one that will never be lost again. Most likely, this will start as a feature of a massively scaled company like Yahoo or Google, much like Gmail or search itself is now. But it’s coming, and the implications are rather expansive.

If the web had a time axis, you could search constrained by webdate. You could ask questions like “show me all results for my query from this time period…” or “Tell me what was the most popular results for XYZ during the 3rd of May in 20XX.” How about “show me every reference to my great grandfather, born in 2050,” asked by a great grandson in 2150? Impossible? Yeah, seems that way, but…so did a free gig of mail and the concept of the entire Internet in RAM. Thanks to the dramatic decrease in the cost of storage, 64-bit computing, abundant memory (jesus, there’s an entendre), and the scalable business model of paid search, I think this day is not far off. The web is just ten years old, for the most part, but think what it might be like when it’s 100 years old. That’s a lot of data to search.

I was reminded of this idea (I had written it down a while back while musing for the book) when Gary sent word that Daypop is archiving its Top 40 back to 2002. It’s fascinating to see what was the buzz, say, two years ago today.

5 thoughts on “The Web Time Axis”

Ezra Ball says:

April 14, 2004 at 2:19 pm

I think your basic premise, that advances in storage, memory, 64-bit processing, etc. will make it possible to create a drastically more useful Wayback Machine is right.

But “show me all results for my query from this time period…” and “Tell me what was the most popular results for XYZ during the 3rd of May in 20XX.” are two drastically different things.

The former only requires an archive of all the content that was publicly availalable on the web, which is something that storage, processing, memory can help you with. Moore’s law is on the side of this becoming possible.

The latter is a completely different story. For you to be able to do that, you’d have to have an archive of every version of a search engine’s *software* up and running. Even if all the software that’s currently locked behind all the servers were publicly available, creating a viable system to make sense out of how to run it all is an enormous software engineering challenge.

There’s no Moore’s law for software.

Reply
Avi Rappoport says:

August 20, 2004 at 2:01 pm

There are legitimate reasons to want to remove stuff from the record though. From the simple: a friend of mine had to get stuff out of google/yahoo/askjeeves because he’s writing on very personal topics under a different name at the same time he’s looking for a job, to the life-threatening: people who are being stalked may need to remove things which could disclose their identity. These are exceptional cases but not that unusual. The archive really should have an exceptions policy.

Reply
~ SearcH EngineS WeB ~ says:

November 13, 2006 at 9:43 pm

CORRECTION: Daypop TRIED to Archive its Top 40 back to 2002

i>Daypop down until further notice…

Sorry for the inconvenience.

After adding a bunch of submitted sites, Daypop no longer has enough memory to calculate the Top 40 and other Top pages.

If there’s no simple fix, Daypop won’t be back up until a new search/analysis engine is in place.

A new engine will take at least a month to get online.

Reply
JG says:

October 2, 2008 at 8:01 am

How about “show me every reference to my great grandfather, born in 2050,” asked by a great grandson in 2150? Impossible?

Naw, that particular need does not sound all that impossible at all…but it depends on the following:

Will the manner in which we will get results from that search will still be 3 ads at the top, followed by 4 blue links above the fold, 6 blue links below the fold, and 8 advertisements to the right.

If that is going to be our only possible way to interact with the results of the search, then yes, I do believe the task will remain impossible.

But if we’re willing to move beyond it, by doing something other than tweaking whitespace, then lots more becomes possible.

Reply
Book says:

October 2, 2008 at 12:43 pm

There are legitimate reasons to want to remove stuff from the record though. From the simple: a friend of mine had to get stuff out of google/yahoo/askjeeves because he’s writing on very personal topics under a different name at the same time he’s looking for a job, to the life-threatening: people who are being stalked may need to remove things which could disclose their identity. These are exceptional cases but not that unusual. The archive really should have an exceptions policy.

Reply

Share this:

5 thoughts on “The Web Time Axis”

Leave a Reply Cancel reply