Google Personalized Search: Who Owns the Profiles?

Continuing on my theme of worrying about the Database of Intentions and its use as a potential privacy trap, Greg Linden reports on insights gleaned from reading a Google paper on Bigtable, a distributed storage system. One tidbit I found curious in the Google Bigtable paper was this hint…

Continuing on my theme of worrying about the Database of Intentions and its use as a potential privacy trap, Greg Linden reports on insights gleaned from reading a Google paper on Bigtable, a distributed storage system.



One tidbit I found curious in the Google Bigtable paper was this hint about the internals of Google Personalized Search:

Personalized Search generates user profiles using a MapReduce over Bigtable. These user profiles are used to personalize live search results.



This appears to confirm that Google Personalized Search works by building high-level profiles of user interests from their past behavior.

Greg goes on to say that he worries this approach will not work so well for the task at hand, and I agree with him, but that’s not my topic for this post. What I want to point out is simply this: what rights do you, I, or anyone else have to edit, delete, or own these profiles?

Anyone from Google care to answer that one?

12 thoughts on “Google Personalized Search: Who Owns the Profiles?”

  1. John:

    This question of editability and even portability assumes something that is likely untrue about the profiles. It assumes they are human readable, separable into components (the sites I visited or queries), and compatible with other systems.

    It is likely that these profiles are mathematical signatures — vectors — and not the original data. As such, they are reductions of the orignal data much as a mean of a series is a reduction of the individual numbers. There is no way to “edit” a mean value. Nor is there a way to reverse out the numbers from the mean.

    Also, the specific signature probably is only usable with the targeting algorithms that compute the dor products between vectors. Think of this as distance or similarity. The signature can’t be used by itelf, only in relation to all other signatures.

    The question of ownership is entirely different. Does Google own the summary statistics of my behavior on google or do I. Who owns my google cookie? Who owns my emails on Gmail? What rights to I grant Google (implicitly) by using their service? That IS a meaningful and interesting question and at the core of their business model.

    I happen to know something about this stuff because we did something quite similar at Infoseek ten years ago, using neural networks. We called it Ultramatch. It was part of a partnership with HNC software at the time.

  2. Peter: What you say about the editability and portability of the profiles is absolutely true. By the time you get to the actual personal profile, all you have is a mathematical signature of some sort, whether that be a vector of weights on a neural network, sufficient statistics such as mean and stddev, histograms over a term vocabulary, etc.

    But go back and read the part that Battelle highlights in bold: Personalized Search generates user profiles using a MapReduce over Bigtable. These user profiles are used to personalize live search results.

    To me, this says that while the profile itself is a mathematical signature, the data that was used to create this signature is stored in Bigtable.. in an unreduced, unsummarized, fully human readable form. And it is most likely not deleted, even after the summary statistics have been calculated!

    So the question is: who owns my entries in Big[brother]table? My entries, my personal creations, the ones that I as a Lebenskuenstler, creatively expressed.

    As I argued earlier, I feel that just because Google has the recording devices that allow them to record my expressions does not give them ownership. Otherwise, my VCR would give me ownership of any Simpsons episode. The cable company, acting with the permission of FOX, expressed those signals to me. And I have the capability to record them. Therefore, I now own them, right? No, of course not.

    Similarly, Google should have no right to ownership over my DMCA-protected query “broadcasts”?

  3. I bet our present day answer resides in the T&C’s of the GYM: we don’t yet own squat of our collected behavioral data. But…

    Let’s pretend for a bit ONE of the GYM decides to make our profile editable (my money’s on G). What kind of a competitive advantage does this REALLY create for them? LOTS of goodwill at the influencer level to be sure. For instance, Mr. Battelle, I know you’d be cheering on the next version of editable profiles and chiding the laggards. But, what does it mean for the longer tail of users (grandma and the non-geek next door at the office)? It means squat! These are likely the same people who don’t use advanced search features you note in your book. They probably also won’t care until something foul happens in conjunction with such a profile (an icon falls for illegal behavior as recorded in their searches and click throughs). As soon as it affects the popular culture… THEN it will matter. For all of about two days. Then it’s back to the throw away silliness of the world.

    It’s just one more feature, like pivot tables in Excel, most people will never use but taken in aggregate add up to be the unique, secret sauce of a REAL offering.

  4. Yea, this is a really interesting topic. I just noticed that google asked me today if I wanted them to customize my search. What really is interesting to me is what they are going to do with that data.

    Example: I have been really getting into seo for a while now. I am a complete novice that is currently sandboxed due to a really stupid rookie mistake. Well one of the things I have been doing is working to optimize our small web store. By reading different things on SEO I have been looking into keyword stuffing and wondering where the line is between optimization and stuffing. So, wanting to get out and stay out of the sandbox I decided to try to do some research. I decided to go to google and type in : ”keyword stuffing in title seo” The site that popped up was seoblackhat.com/category/keyword-stuffing/
    Not reading the url I just opened the link and started reading. A somewhat funny feeling hit me that big brother was watching. My thought was “Google knows I just visited this site, and they know who I am because I log into my g-mail, sitemaps, and adwords account.”

    I don’t think google is looking for those patterns yet, but I could see them in the future trying to track patterns of site owners and apply that to their algorithm. Especialy if a site owner is hanging out on black-hat sites all day.

  5. 1. If you go into a store (say Walmart, or BestBuy), and use your credit card or a customer loyalty token like a keychain fob, customer card etc. Can you legally (_LEGALLY_) prevent them from remembering things about you? Absolutely not! __WHY__ because if you were concerned about your privacy (whatever that means in such cases) then you would be vigilant or (stupid) enough to not use your credit card or a customer token (like a loyalty card, or keyring fob etc.)

    2. If you go into your local store, the very useful ones just around the corner of your street. Can you prevent (or legally force) the store owner from recognizing and greeting you in person. Can you prevent them from suggesting things that were not available to you before but are in stock now. If you were paranoid about such things, you will simply go to a superstore like Walmart or a Tesco, and not be bothered by the convenience that such local small-stores can offer.

    Increasingly I see people, like John Battelle and his kind, treat Google as if it were a tax paid, subsidized, government sponsored, free-lunch, public service, which in this year, decade or century it is absolutely not. If such people were bothered about the privacy of interactions with Google, then it would simply be best to go to a random Internet cafe and use any Internet search services engine there. Or lobby their Government to sponsor a public service utlity for Internet searches (shudder … (think of NSA) … shudder again …)

    Google (or Yahoo, MSN whatever …) have every right to use their data, which includes user interactions, profiles, history, preferences, for anything or even sell it (currently in the form of targeted Ads to Sponsors). Their behaviour will change the user perspective’s towards them. For instance if they sell (or forward) my e-mail address to Sponsors (or even stores, like Paypal or Amazon currently does), then I will stop using such services they offer (or setup a fake email address, where such spam can be directed to)

    If you are not happy about this reality then don’t use these sites, or as a saying goes STFU …

  6. Sorry, Alok, it isn’t the same. The retail data business has gone through a generation of learning, self-policing and govt policy development which has led to a level of protective measures that are carefully watched and maintained.

    What you buy in retail is also less personal that indicators of “what you are thinking” the way search strings are. It’s intensely more personal and capable of violating more real privacy.

    Additionally, the participants in this market are acting aloof and removed, which is what is causing the quiet groundswell. They don’t want to crack the door open and give consumers control over recordings of their personal lives because advertisers will view it as invaluable.

    I think you have a point (albeit delivered without much class) regarding consumer behavior being able to trump this. However, the volume of the issue has nowhere near reached a point of incline where it will count. Yet.

    It just ain’t that simple, and STFU represents the kind of infitile behavior that does little to contribute to meaningful conversation. QTFD.

  7. Alok writes: Google (or Yahoo, MSN whatever …) have every right to use their data, which includes user interactions, profiles, history, preferences, for anything or even sell it (currently in the form of targeted Ads to Sponsors).

    Not if it is against the DMCA they don’t have that right!

    But look, let’s turn it around: Suppose I were to do the same thing to Google as they are doing to me: Take all the results pages that they returned to me, and profile them, record them, etc. Let’s go one step further: Implement a firefox plugin that lets -everyone- in the world profile and record the Google results. Then, we all just aggregate the results (anonymously!), and create our own, community-based, peer2peer search engine, serving Google results.. but without the ads! It would be just like filesharing.. except instead of sharing files we’d be sharing Google rankings.

    Don’t you see the beautiful symmetry in this? Google mines the data from millions of users. So why can’t millions of users mine data from Google?

    And it wouldn’t even be against Google’s TOS, as Google says no automated queries are allowed. None of these queries would be automated. They would all be real queries. We would just have millions of users aggregating them…which would have the same effect of automation, without actually being so.

    But I can still see Google saying: “Wait, hold on! Those ranked results are still OUR intellectual property. They don’t belong to you, just because you issued the query that created the ranked list.”

    Well, what’s the difference? Why does Google get to claim property on its actions, and I can’t on mine?

  8. The right that one has begins with NOT taking advantage of the (log-in) FREE Services offered by Google (GYM)….

    It would be very difficult for them to convince Advertisers to maintain their relationships, if there is not high quality relevance and correlation with prospective buyers.
    (Gone are the days of only the one-size-fits-all animated Gifs banners, and pop-ups)

    Many of these services are being paid for indirectly by the dollars generated from Advertisers. If might defeat the purpose to NOT ultilize and analyze tenaciously.

    By being PROACTIVE users can delete Google cookies or use proxies or dynamic IPs, and not use any of the log-in-required free services offered by Google (GYM).

    But other perspectives are:

    – wouldn’t it also be to your advantage to get the most RELEVANT ads pushed to you as opposed to generic ads?

    – if your were an Advertiser, wouldn’t YOU want the most likely prospects for your hard earned dollar$?

  9. Search Engines WEB wrote
    “- wouldn’t it also be to your advantage to get the most RELEVANT ads pushed to you as opposed to generic ads?”

    Sure it is. But this way since I am making the advertisement on the internet more valuable than I do want the benefit of my profile in terms of lower advertisement rate for advertisers and therefore lower prices for me. Instead of a web company making increased profits by being able to raise advertisement rates and not letting me have the fruits of my own profile.

  10. Dear John,

    Google answered part of your question about personalized search in some correspondence with the Mercury News that we posted as part of a recent project on Web privacy. The short answer is yes, people can edit and delete their own view of their own searches, however, Google will continue to retain a copy in its logs.
    People who are interested in this topic are welcome to review our correspondence with Google and the other the top three Internet companies (search engine/portals). We sent long lists of questions about data collection and retention to Google, Microsoft and Yahoo and conducted a combination of phone interview and written follow-up with AOL. The best way to find our project is by typing my name, the names of the companies and “privacy” into one of the major search engines. The companies didn’t answer as many questions as I would have liked. However, I think there is a possibility they will be more forthcoming once they realize how important this issue is to the hundreds of millions of people who use their services every day.

    Best regards,

    Elise Ackerman
    Reporter
    San Jose Mercury News

Leave a Reply

Your email address will not be published. Required fields are marked *