free html hit counter AOL: Dooooooh! | John Battelle's Search Blog

AOL: Dooooooh!

By - August 07, 2006

Aol ResearchAOL has officially responded to the recent ruckus over data released by folks in its research group. The summary: Man, did we screw up.

I emailed my contacts there and got an early draft of the release:

“This was a screw up, and we’re angry and upset about it. It was an innocent enough attempt to reach out to the academic community with new research tools, but it was obviously not appropriately vetted, and if it had been, it would have been stopped in an instant.

Although there was no personally-identifiable data linked to these accounts, we’re absolutely not defending this. It was a mistake, and we apologize. We’ve launched an internal investigation into what happened, and we are taking steps to ensure that this type of thing never happens again.

Here was what was mistakenly released:

* Search data for roughly 658,000 anonymized users over a three month period from March to May.

* There was no personally identifiable data provided by AOL with those records, but search queries themselves can sometimes include such information.

* According to comScore Media Metrix, the AOL search network had 42.7 million unique visitors in May, so the total data set covered roughly 1.4% of May search users.

* Roughly 20 million search records over that period, so the data included roughly 1/3 of one percent of the total searches conducted through the AOL network over that period.

* The searches included as part of this data only included U.S. searches conducted within the AOL client software.”


Related Posts Plugin for WordPress, Blogger...

8 thoughts on “AOL: Dooooooh!

  1. King Troll says:

    Battelle check out this guys searches. Fucking nuts bro. Do you think they are going to release my- How to Kill Battelle and, battelle address searches!???. Im just kidding bro. Is yor book on CD? It takes me 1 hour to read 20-30 pages, so thats about 10 hours per book and it only takes about 2 hours on CD. I only listen to audibooks now.

    Weirdo on Aol’s searches:
    17556639 how to kill your wife
    17556639 how to kill your wife
    17556639 wife killer
    17556639 how to kill a wife
    17556639 poop
    17556639 dead people
    17556639 pictures of dead people
    17556639 killed people
    17556639 dead pictures
    17556639 dead pictures
    17556639 dead pictures
    17556639 murder photo
    17556639 steak and cheese
    17556639 photo of death
    17556639 photo of death
    17556639 death
    17556639 dead people photos
    17556639 photo of dead people
    17556639 http://www.murderdpeople.com
    17556639 decapatated photos
    17556639 decapatated photos
    17556639 car crashes3
    17556639 car crashes3
    17556639 car crash photo

  2. Here’s what people from the sample set search for to land at battellemedia.com (I would’ve removed personally identifiable information, though there wasn’t any):

    2006 predictions [2006-03-03 21:46:11]
    class action lawsuit mircrosoft [2006-03-13 15:17:02]
    tatas [2006-03-27 22:36:03]
    google real estate [2006-04-25 11:45:12]
    goog prospectus [2006-04-24 02:31:51]
    earth google [2006-05-01 01:59:59]
    first time models [2006-03-01 02:50:29]
    predictions for 2006 [2006-03-01 18:11:42]
    give me a site that will let me read books onlin for free [2006-05-25 15:39:41]
    predictions 2006 [2006-03-13 14:23:42]
    predictions 2006 [2006-03-13 14:23:42]

  3. Teddie says:

    Can anyone point me to where I can still download the dataset or email it to me as it’s now officially offline.

  4. Search Engines W says:

    http://www.aolsearchdatabase.com/

    if you do not wish to download the data – someone has created an online Search Service

  5. mc says:

    There is personally identifying information there, as all data has a time stamp and link to what site they clicked through to. So for those searches above to here, John could look at the logs, get an IP address and see possibly 1000s of searches the person at that IP has done. And if they happened to leave a comment etc., as Phillip has pointed out, you would have there name.

    I’m sure it would be possible to conclusivly identify about 5-10% of those AOL users if big websites did some searches on their logs for AOL referal headers. None of the major media seems to realise this is the Personally Identifying Information that AOL denies it has given out. Im sure the NYT could link thousands of AOL users who have registered with them to searches they have done – how is that not personally identifying? This is not a theoretical problem.

  6. Teddie says:

    Search Engines Web that website isn’t very useful.

    Managed to get the DB from:
    http://www.gregsadetsky.com/aol-data/

  7. psych787 says:

    This makes me sick… I personally wouldn’t want my entire google search history in the public domain – anonymous or not.

    Not that I’d be affected by something like this; I use Tor so my queries don’t all come from the same IP, and thus no one could “profile” my interests, but I’m wondering how all those unsuspecting people out there who were using AOL search feel about this? OMFG is the least of it, especially for the unfortunates whose queries are now a standing joke…

  8. Jeremy Dunck says:

    psych: wrong. AOL almost certainly correlates this data based on cookies, which Tor won’t help you with.

    If you’re browsing and accepting cookies, you’re not anonymous.

    Security is hard. Tor is not a silver bullet.