From his post:
I’m disappointed in the government for wanting to use the online behavior of millions of people in an attempt to justify a law that many of those million are likely against. I’m disappointed in them for making people even more fearful of “being tracked online” and the Bush Administration’s attempts to keep an eye on the public.
I’m disappointed in those companies that appeared not to put up a fight, notify their users, or explain what happened in a timely fashion. I’m disappointed in them for not providing an opt-out mechanism. I guess that’s everyone but Google so far.
Is it too much to ask, I keep asking, to ask our online services to provide us:
– Access to a record of all the information they keep on us and how they use it
– The ability to challenge that data’s accuracy, and edit it for accuracy
– The ability to opt out (with a clear understanding of the resulting loss of services and opportunities that might result)
– The ability to set permissions as to who else might see the data
– The right to maintain a user copy of that data for archival purposes
– The right to share in the value of that data on negotiated terms
Is that so freaking hard to do? I sense that, increasingly, there is a market opportunity in doing this. I bet 95% of the public will never edit, or even view the data more than once. But the sense that the control panel is there, just in case, will be invaluable to establishing trust.
21 thoughts on “Yahoo’s Jeremy Is Disappointed, I’m Bewildered”
It’s happening. See root.net.
Could it be that Google is actually hiding sources of revenue? How much of their advertising revenue is derived from potentially embarrassing sources? How much of it is on the fringe of being illegal?
“Is that so freaking hard to do?”
Are you kidding? I doubt that it’d be *possible*, but even if it was, I don’t think that any advertising-based business could make money providing that level of personal service to individual users (e.g., the cost of dealing with someone’s challenge would exceed the lifetime value of that customer by more than 100x or much, much more).
John, it is awfully freaking hard to do. Consider the case at hand: search queries. Imagine gigs upon gigs of web server logs that contain personal information in the form of search queries and click tracking. As it is, it’s freaking hard to process volumes of server log data offline in a non-realtime fashion. An individual’s personal information is spread out amonst all those files; adding the ability for individual users to query and update such information would be a task of unrivaled enormous scale.
Doh! Meant to add: call me back when FM does this 😉
What do you have on me? I’ve answered some of the surveys. I’ve got your cookies (through DoubleClick?? — and what did you tell DoubleClick about me?) Can I just call your cell to figure out what needs to be corrected?
uh…Google Search History
OK OK guys.
>>I don’t think that any advertising-based business could make money providing that level of personal service to individual users (e.g., the cost of dealing with someone’s challenge would exceed the lifetime value of that customer by more than 100x or much, much more)
But wait, can’t this be automated?
>>call me back when FM does this 😉
I wish. I’d LOVE to work with a vendor that can do this for us.
I’ve been talking with alot of web-savy folks, mostly webmasters, and I’m pretty apalled at the typical roll-over response I’m hearing on this issue. These are actual quotes:
• “If you don’t have anything to hide, what’s the big deal?”
• “Google should hand it all over just like the others have.”
• “We have to stop the kiddy-porn perverts and this is the only way.”
• “Give it to ’em. Print out the entire server log history and let them [.gov] try to find any useful information.”
I had an ironic thought … suppose Google caved and did the deed; suppose the data was submitted in the form of CDs; suppose the NSA geek assigned the task searched the data with Google Desktop?
There seems to be a campaign started by freedom lovers and it goes like this: search Google for “If you are reading this, you are infringing on my 4th amendment rights.”
Get Free at TechnoHippie.Com
Savy Users always have the option of using Proxies;
– there are both Online Web Services, Encypted and Software Based options!!
Also, one could customized their “Internet Options” NOT to allow certain any cookies.
Unfortunately – Search Engines would likely have to reesort to a Subscription model for advanced services and what are now Free based tools and Features – if not for the Advertisers and the Targeted Ads.
And of course – even with subcription based services – one would STILL be required to Log In.
So, the ONLY way to get information in true Anonymity – would be to just buy Newspapers and destroy them once finished – and to never borrow from a Library – and to pay Cash for any Book Purchases – and never use Cable Services or Satellite subscription Radios…
The nature of this Evolving society is that Everyone will leave a trace in return for using the Ubiquitous, Free, Real-time, Dynamically Updated, Global Information Resources that are being made available.
As someone said above, Google Search History seems to be working towards this exact thing.
I think it would be helpful if some reputable organization, such as the EFF, provided a list of what sort of data users should be allowed to manage and who should be allowed to manage it. I personally would love to enable my users to manage their data, and I know how to acheive it technically, but I’m confused as to which data they should have access to (I have a lot, in lots of different forms) and who should have access to it (should it be done by IP, or user ID, etc. And should your spouse be able to see it, how about your parents, etc..).
“can’t this be automated?”
I’m not sure, to be honest. The idea of someone reading your mail when you leave yourself logged on at a public terminal is bad enough – imagine if they could see your search history, the times your most likely to be online, links clicked on, etc … and then make changes by filling out a form.
Anyway, very, VERY, complex. Desirable, but almost impossibly hard to do right.
John I think that you are basically asking for companies to really follow the Attention Trust model and give users control over their data.
It’s an issue that Paul Martino and I at Aggregate Knowledge have been following closely as we have been building out our attention/behavior based recommendation engine. We must give users access to that data and allow them to really have control over it. We aren’t there yet but we are actively working with Seth Goldstein at Attention Trust to figure out the best way for us to do it. I envision a future where you’ll be able to use an Attention Trust interface to get at data that you have created on our servers or anyone’s.
One place where I think that you and others aren’t going far enough is in only asking our online service providers to give us this control. Why doesn’t Visa give me this control? How about my bank? I’m just as concerned if not more so of THAT information getting into the wrong hands than I am about my search behavior.
I’d be the first one to buy – however, I believe there is no market for this. Just like you said, “95% of the public will never edit”. A service like this would be nice (very nice), but who would pay for this, east of the Bay Area hills?
John, I’m floored to see so many question your approach which offers at the least an excellent starting point for the debate that should be raging over *commercial* use of OUR DATA that is PROVIDED BY US, and is ABOUT US!
The same people who seem to worry that GW will toss them into Guantanamo after viewing their search history have few concerns about getting quietly and secretly assaulted by an SE marketing department gone wild.
Processing search data via a control panel for those that request this should NOT be a huge challenge (this is what cheap scalable parallel processing was made for, OR just use a local program on the users computer!).
Google Analytics (formerly Urchin) does a far more robust analysis of websites at a far greater level of detail than would be needed to simply review search queries.
Privacy is ignored when people enjoy free online services. This may turn out a big issue for some people in the future. See more at
Well I would probably agree with the lot that say it is near impossible… but i’d also say, that is probably the agenda of these companies to invade privacy, in order to reduce us to the consumers we have become! If we regulate what the search engines can use on us then the whole pile of knowledge or minable information becomes redundant because it is inaccurate and incomplete.
I’m a little late to the party on this one (yesterday was… uhm, busy) but other folks seemed to have covered this pretty well.
Policy issues aside, it’s a technically hard problem due to the sheer scale of the infrastructure involved in collecting, organizing, crunching, and using that data. There are home-grown tools, third party commerical tools we don’t have souruce code to, and many groups involved.
Had the system been designed with that goal in mind from day #1 (who’d have thought to do that years ago?), it’d be one thing. But this is a very large retrofit project. And we’re still strugling to find enough great engineers to build things we’ve had on the drawing boards for quite some time now.
Google history is an aggregation of personal search data that is almost certainly stored in addition to rather than in place of regular search logs. As such, Google history only solves the “see my data” problem. (By saying it “solves” this problem, we assume that history can be enabled by default for all Google search users, registered with Google or not, and still meet scaling demands. It also assumes that Google history would grow to encompass all forms of personal information that is logged, such as click paths and page views.)
The “update my data” problem is significantly harder, for the reason I stated before: the entirety of your personal information is spread out amongst vast numbers of gigantic log files. These files are stored on GFS, which is not designed for efficient random write access. The scale problems are tremendous.
This is not to say it can’t be done, only that it’s hard to do. It would be great if one of the major search providers implemented John’s suggestions. My comments are in response to the John’s tone of voice, through which he suggests that this problem is easy to solve. I don’t believe that is the case.
“It can’t be done” ?? – In some countries, it’s the law
Most European countries have data-protection laws that grant individuals the exact rights which John was asking for, not only of government agencies, but also any private firm. Example: Switzerland (this is only the part of the law concerning transfer of data abroad. The law also contains the exact rights as stipulated by John to any person in Switzerland against the collectors of data).
I find it a very interesting piece of cultural difference that US citizens don’t care about their data being traded and (mis-) used by virtually everyone who can afford to pay for it. Why in the world should I trust a commercial company, whos first goal is to make money, more than the government which should be in my service (not that I trust them, either)?
http://www.streamiming.com/superman-man-of-steel-2013-streaming/The “update my data” problem is significantly harder, for the reason I
stated before: the entirety of your personal information is spread out
amongst vast numbers of gigantic log files. These files are stored on
GFS, which is not designed for efficient random write access. The scale
problems are tremendous.