As I muddle my way through yet another iteration of my outline, and think about the issues raised in my recent ephemeral/eternal post, it seems apparent to me that as a culture we are nowhere near consensus on what rights, if any, a person has with regard to the data we create and/or provide to third party applications like A9, Gmail, Plaxo, and the like. Clearly we are touchy about all of this, as the reaction to Gmail proves. In the process of my research, I started reading the terms of service and privacy policies for various services, and found them inconsistent, often vague, and in general difficult to understand.
Now, I know there is a vocal contingent of folks who believe that we should simply assume we have no privacy online, and assume the quid pro quo for any service that we use is loss of control over the metadata/personal information we create along the way. I certainly understand this line of thinking, but…it strikes me as a cop out. In the end, I’d warrant that business models are going to evolve to the point where services will spring up that offers consumers access to their own clickstreams in new and powerful ways, and I’m going to predict that we will want that access as a right. I’d prefer we not have early lockdown on this issue, if we can at all avoid it.
The nice thing about doing a book is that people help you. I have had and continue to have help from a lot of smart folks, and one of them is Abigail Phillips, a lawyer who has worked with the CDT and the Berkman Center. Abigail is helping me pull together a little research project that will compare the policies of several well known platform players as they relate to what I’m calling “clickstream/stored information” – the data exhaust we all create when we interact with web-based services.
Now, I imagine this kind of work is ongoing at lots of places, and hopefully this lazyweb request will point me toward that work, if indeed it exists, as well as pertinent case law from the real world. In any case, we’ve tried to outline what the major issues are in the form of what we hope are clarifying questions. Below, I submit them to this readership for feedback and input. Once we get a good sample set – and we’re trying to keep it simple, and avoid overly focused, complicated, specific or situational questions – we intend to review the Terms Of Service and Privacy Policies of four major services (we plan to start with an email provider, a major ecommerce player, a search site, and a social networking/contact site), and see what we learn.
If nothing else, we hope that we can report out a clearer sense of how each site “scores” on issues of consumer data protection and usage. That said, here are the questions, laid out in three rough categories of Ownership, Privacy/Usage, and Account Modification/Deletion. If you’re into this kind of thing, please give them a read and post your responses. If not, stay tuned, and we’ll report what we find out.
Thanks in advance!
Who owns the information-trail (clickstream) and/or stored personal information or profiles (stored information) created while using the service?
If the service owns it, does the user have any rights to view and/or edit that clickstream/stored information? Does the user have any rights to republish, aggregate, or profit from that information in other venues apart from the service where it was created?
Can the user transfer his or her clickstream/stored information to another web-based service? If so, can it be done easily, or is it a difficult and time-consuming task?
Does the service make it easy or difficult to access, edit, and/or retrieve copies of the user’s clickstream/stored information?
Who has access to the clickstream/stored information that a user posts or creates on the site?
Is there a place where the service outlines and regularly updates exactly how it uses this information? Is there a reasonable mechanism for the user to request and receive information on such use?
What is the strategic role of such information in the ongoing business/service, both specifically to the service and more generally to the larger business?
Does the site transfer to third parties personal data that the user submits to or creates on the site? If so, is it connected to specific user profiles, or is it delivered in aggregate form?
Under what circumstances (request, subpoena, etc) will the clickstream/stored information be released to law enforcement or government entities?
Does the service have the right to delete an account and all related information without notice to the affected user?
When a user deletes information from an account, is it deleted from the service’s servers and any backups the service may have? If not, does the user have recourse to insure permanent deletion?
If the user closes an account, does the service delete all copies of the information that is stored in the account? Do all third parties that have received user information through the service delete that information?
What happens to user information in the event the user dies while the account is still active? If the user owns that information, or has rights to that information, can those rights be transferred those rights to others, such as an estate or family?
What guarantees do users have that their information will be protected if the service is sold to another company?
What is the service’s policy as it relates to altering its terms of service/privacy policies? Will a user be notified prior to such changes, and will the users have a period of time to react prior to those changes taking effect?
6 thoughts on “Terms of Service and the Clickstream: A Survey”
I wonder whether people are really worried about their online privacy or maybe that they are just not aware what is happening. I guess it is the latter. No real abuses have been reported. The noises coming after the Gmail announcement are really only coming from the privacy advocates, who have a much better idea what is happening and what is possible.
A few years back I have been working on a set of services, which intended to help users protect their personal information. However in the end we decided that there was no demand yet and stopped our work. I do not get the feeling that anything changed in these few years. People do not seem to care. Alas.
Your “Ownership” questions presume that sites store clickstream data like they store registration data, in user-profile records. It rarely happens that way. For a variety of technical cost/benefit reasons, you will find few large-scale sites that can easily answer a question like, “Show me all John Battelle’s clicks from his last 5 visits.”
Usually, clickstream data lives in its own repository, which is optimized for aggregated reporting–questions like, “How many clicks did page X get last week?” For some sites, that’s the end of the story; they can’t associate clicks with registered users because the data lives in different worlds and there’s no bridge. Or, they don’t have registration and thus don’t have personally identifying records to associate clicks with in the first place.
However, some sites associate a user identifier with each click event. In theory, this can be used to join to a registration record that has the same identifier, which would make the query above possible. In practice, there’s so much data in the clickstream repository that such a query would be, in the words of one of your questions, a “difficult and time-consuming task”–and that’s just for the site to do it internally; if the site had to make such a function available to end users, it would be even harder.
That said, there is a middle ground. Some sites have *selective* clickstream data attached to user profiles. So, for example, if a site says you recently browsed product X, that’s a piece of information the site has saved in a slot dedicated to “most recently browsed product.” For technical expediency, many of the sites that do this take a shortcut and store it in your cookie rather than keeping it as part of a server-side profile record. Either way, the data is not in the “this will go down on your permanent record” category.
Also in the middle ground, some sites abstract the clickstream data’s details into higher-level business concepts (“High-Value Visitor”), which are then attached to profiles. This is the closest thing to fitting with the premise of your “Ownership” questions. However, the metadata that maps clickstream detail to business concepts tends to be proprietary, especially when the business concepts involve behavioral segmentation.
An analogy here is the difference between the demographics you provided at registration versus your behavioral-segment membership as calculated by the site’s data-mining algorithm (which took as input the demographics you provided). The site will let you view/edit the demographic facts you provided but not the derived fact (that you are a “High-Value Visitor”), which is part of a proprietary user-classification system. The same thing potentially applies to meta-clickstream data, where your behavior has been abstracted to business concepts.
Very helpful and clear points, Steve, thanks. But I wonder if, in time, the computer science of mining so much metadata will get solved? There certainly is significant potential upside, both in new types of services and profitable business models. And to your “High Value Customer” example, I wonder, does a user have the right to know how he or she is classified by the service? To know what “profile” he or she fits into?
Yes, over time, it will get cheaper and easier to enable the scenario you are presuming. I was highlighting the state of things now because your survey appears to be about today’s practices.
That said, in a world where the technical problems of storing/mining the data are solved, the power position in the clickstream game is the ISP: Each site only sees a little bit of you, whereas the ISP sees it all.
From this position, ISPs can play either side (or potentially both sides) of the fence: They can provide privacy services, like disposable identities, for consumers; or they can ally with search engines, portals, or other wide-scope players whose power as infomediaries is proportional to the richness of their profiling.
probably you’re aware of http://www.openprivacy.org? as i read the outline, i thought to myself: “we should just come up with some xml definition of user profile data, as well as some protocol whereby the data can be held by some trusted escrow service, but still indexed by websites.” naturally, i used some search engine (can’t remember which — but who cares, they’re just a commodity item:) and open privacy came up first. while their work may not have an explicit answer to your questions, i expect such would be implicit in their software and protocols.
Also (in response to Steve’s points), some sites are starting to preserve clickstream data in user profiles to enhance their services — for example, A9, which stores personal search histories that users can access at any time. I can imagine that these kinds of add-ons will become more common over time.
It’s an interesting question whether users have a right to know how they’ve been classified by the service. I doubt it, unless they could come up with some legal rationale for finding out, such as a claim that they’re being discriminated against by the service or something. This isn’t information that the user provides or creates, so I’m not sure on what basis a user could assert the right to see it.