Privacy, Gmail, and Unintended Consequences

The email below comes to me via Dave Farber's IP list. I quote it in full with permission of the author, I think the story he tells is quite interesting as it relates to our communications and intentions moving from the ephemeral to the eternal (the title of a chapter…

The email below comes to me via Dave Farber’s IP list. I quote it in full with permission of the author, I think the story he tells is quite interesting as it relates to our communications and intentions moving from the ephemeral to the eternal (the title of a chapter in my book). This email was written by JA Terranson, who is on Dave’s IP list, in response to this article by Declan McCullagh on issues of privacy and GMail.

Subject: Opposing view of Gmail issues (Cypherpunk tie in)

Good Afternoon Declan,

As with much of the online community, I have been discussing this
topic since it was announced by Google, and until recently, I was also of
the opinion that this was a simple contractual choice between the user of
Gmail and Google.

My opinion was altered by a gentleman in England, who used the
following story to illustrate his point:

When Google released their toolbar, he, like most of us, installed
it. What was different was that he installed it with all of the advanced
features (including the tracking options, which Google goes out of their
way to make crystal clear *is* tracking software). He reasoning was
similar to the thoughts you expressed below: he had nothing to hide, he
believed Google really was stripping identity data from their observations
of his browsing habits, and he did not mind having them “watch”.

One day he had a firewall issue when trying to retrieve a file,
and the person who was hosting it offered to put it on a “private” (i.e.,
unlinked) page for him to grab over HTTP. He accepted, downloaded the
document, and promptly forgot about it – until this document, which had
extremely personal information on it (personal to the person *hosting* it,
not the person retrieving it) showed up on Google a short time later. You
see, the toolbar had seen him go to a web page that Google did not have,
and so they indexed it right away.

Without meaning to, the user of the toolbar had helped Google to
violate the privacy of the person who went out of his way to keep this
document private. This person knew nothing of the toolbar, and had no
agreement with Google, yet he became the unwilling participant in Google’s
web cache.

The senders of email to users of Gmail are in the very same
position as our friend above: they know nothing of the agreement, they are
not participants in the Gmail program – they have never agreed to allow a
third party to access *their* private thoughts and utterances, yet they
too are caught in the middle.

As much as it goes against my gut reaction, I must admit that
Gmail has some very serious privacy implications, some of which almost
definitely fall under EU privacy laws.

The ultimate solution to the problem is close to what was
suggested in the essay below: encryption. But not by Google. Encryption
by the senders. The Cypherpunk cries of “Encryption Everywhere” lands
smack dab in the middle of the plate here – email stays private,
regardless of Google indexing, government snooping, or end user
negligence. Pity that people will spend thousands of hours, and millions
of dollars arguing over the best way to protect us from ourselves, but
that we won’t spend five minutes learning to use a simple encryption
system that could completely erase these very issues.

Yours,

Alif Terranson

19 thoughts on “Privacy, Gmail, and Unintended Consequences”

  1. Did he password protect the file? Did he exclude the directory in robots.txt? The person did not go “out of his way to keep this
    document private”. The toolbar example is BS.

    As for the the privacy laws re the email, I can’t see the problem. By this logic, every corporate spam filter is a violation of EU privacy laws. Don’t want google to see your email? Don’t send it to gmail.com.

  2. Robert,

    Have to disagree with you. I’ve put a file up on a server for just one person to download. Even though I removed it a few hours later that doesn’t mean it wouldn’t have been indexed and in Google’s cache for the entire world to see.

    Sure we know about robots.txt and password protecting files, but your average web user doesn’t. Why should the onus be on them?

    You had better be prepared to share any links or file attachments you send to a GMail address with the entire world.

    I thought that the whole fuss about GMail was foolish. But Mr. Terranson’s example has caused me to totally reevaluate my position.

  3. Robert, just because you don’t send it to a gmail.com address doesn’t mean it won’t arrive in a gmail.com inbox. Lots of folks will forward all their mail to gmail to make it searchable.

  4. What I find most interesting about the Toolbar story is that it’s unquestioningly believed even though it’s being told fifth-hand (from the Englishman’s friend to the Englishman to Alif to John to everyone else), and there exist several other ways that the item could have been indexed, not the least of which is if the file was placed in a directory which was linked to, but which had no index file. A bot (any bot, Google or otherwise) following the link to that directory would show a listing of all the files in it, which would then be spidered. Sadly, without being able to talk to the friend of the Englishman in question and find out the technical details, it’s impossible to say.

    The underlying point of the story, whether it’s true or not, is still made though. I personally believe that what it comes down to is whether an ISP, any ISP, Google or otherwise, holds to their agreements. Google’s policy says what they will and won’t do with a user’s data, but most of the concerns in this thread arise over what they *can* do. If the metric for privacy concern is what a company or person /can/ do, then yes, end to end encryption and self-destruction is the only way to make sure your data is only seen by the intended person, because in a lawless world where privacy policies and terms of use aren’t seen as binding documents, brute force protection would probably be necessary. However, call me naive, but I don’t believe that’s the world we live in yet.

  5. With all due respect, I have yet to see an actual example where someone can prove that the toolbar caused Google to crawl a url. There’s
    – referer logs as you surf from “secret” pages to outside pages
    – some browsers are broken and pass referers even if the user didn’t click on a link
    – various proxies can expose visited pages as clickable urls
    – various web servers serve up stats pages on most-visited pages as clickable urls
    – people can submit a url directly to search engines
    – etc. etc.

    There are so many ways that a “secret” url can leak to the outside world that Google even has FAQ entry:
    http://www.google.com/webmasters/faq.html#secretserver

    I agree that people should encrypt things if they want them to be secret. But I’ve heard this toolbar claim reported over and over, always as a friend of a friend who is really convinced, but never with concrete evidence. At this point, I think snopes should add an entry about this. 🙂 John, I think it does you a disservice to quote this letter without questioning the toolbar example.

  6. I don’t think Google crawls pages that have no
    incoming links. Even explicitly submitted pages
    might not get crawled if it’s not linked anywhere.
    Or else, imagine how much spam could be generated
    by someone automating IE to repeatedly surf a huge
    linkfarm that’s disconnnected from the net. Right
    now these spammers have to invest some effort
    in ensureing some incoming links from somewhere
    on the net.

    Furthermore, is this something specific to Google?

    Yahoo has a toolbar, webmail and a search engine.

    MSN has a toolbar, webmail and a search engine.

    Etc.

    Is there something in Yahoo/MSN’s ToS and policies
    preventing them from doing this? You think Yahoo
    wouldn’t want to jumpstart its young search engine
    with a bunch of fresh links harvested from mail
    and toolbar logs?

    TZ

  7. As for “Don’t want google to see your email? Don’t send it to gmail.com”, that’s not a great argument. After all if you send to my email as above (blog@outer-court.com) it will land in my Gmail-inbox, and you don’t have any way of knowing that. Even if you did, you did not agree to my Gmail ToS, but that’s another issue.

    The crucial point to me is that email is not private, never was, not at any webmailer, and if you want to have your complete theoretical privacy (not just the practical, pragmatic privacy which means your emails won’t land on some public server to be googled by everyone), you better encrypt (or stop sending email in the first place).

    And as for the Google Toolbar, yes it most certainly has its ways of indexing files which are otherwise not found. This, or Google is trying out random URLs on servers. I saw it on mine before — something completely unlinked goes into Google — and others in newsgroups reported it too. And we are toolbar users. I still do not have any proof I would bet my life on, as is the case with most things we argue about Google. Google’s the black box we reverse-engineer, after all. Reverse-engineering’s soft science.

    Now on to the friend whose file was found — sorry, pal, you didn’t secure the file, you don’t have *any* reason to assume it’s not public. If you want to secure your files… well, then secure your files (there are easy ways of password-protecting them). There really is not much else to say about this.

  8. If Google discovers urls on servers with a robots.txt file on them the urls still appear as uncrawled on serps.

    In other words Google shows the url but not the content.

    I believe other engines do not show these types of urls.

  9. I think the problem does not lie with google, but rather with the person hosting the file. Web traffic is PUBLIC traffic. The whole point of a web server is to SHARE information. If you don’t want to share it, don’t put it on a web server. Just because you didn’t know google would index it doesn’t mean it isn’t your fault for sharing it. The same thing goes for email. As soon as you send a message, you lose all rights to that message. Think of it this way, don’t tell a untrustworthy person something you don’t want them to share. With email, everyone is untrustworthy.

  10. This seems to be a textbook example of failed security through obscurity. As nosebreaker.com (the last poster) said a webserver is made to share information with the public. Putting a file in robots.txt is also a bad idea in my opinion, it’s just a different way of security through obscurity. A hacker can look at robots.txt just as easily as Google can and see the url for the file with “extremely personal information”. They can then download and use the informaition for whatever they want. johnny.ihackstuff.com/index.php a list of tons of searches to find stuff that has been spider by Google and is made public because the user trusted security through obscurity. A simple way to have solved this problem would have been to passoword protect the file or possibly the whole folder (which is very easy to do in both Apache and IIS). Security through obscurity does not work! Also, a not on Gmail, nobody should trust email to be secure or reliable. Email is easy to spoof, easy to do man in the middle attacks, and the integrity is not guarenteed. Gmail didn’t create this problem, I’m glad it’s making news though and getting this issue out so people will realize it.

  11. This could actually explain why the Googlebots tried to index pages hosted on my personal computer (address never revealed to anyone), only minutes after I activated a “dynamic dns” service, (which made the pages accessible to the outside world – only protected by a firewall), and I tested the pages using the “external” URL.

    I never understood how they (The Bot-monsters) could guess I had stuff there. I was wondering about the dyn-dns service… But Googlebar, that actually makes sense…

  12. In the free version of Opera, if you allow AdWords advertising instead of generic banners, you get a similar effect. I am very cautious of who I let know “secret” URLs for that reason.

  13. [quote=Rick Mason] You had better be prepared to share any links or file attachments you send to a GMail address with the entire world.[/quote]
    Err.. that’s not exactly true. Because links are spiderable, there is a chance they end up in google’s cache, but that’s what links do: they point people to /public/ resources, on /public/ webservers.
    The attachements are not spiderable, and in no way public.

  14. this is the internet, the world-wide web, everything on it is in the public domain, unless it is secured, which in this story was not the case. i personally would never upload personal information to a location online unless i didn’t mind the consequences. ‘oops, i enabled something that specifically stated it was tracking software and clicked on a [quote]private[/quote] link’. must be google’s fault of course. slightly sarcastic, i know, but the onus is not on google any more than it is one either end-user.

    as for gmail, it’s not in the public domain. your email’s end up on google’s secure gmail servers, which i’m sure is a legal requirement which is strictly in force. whether you are aware or not that people browse with link indexing software enabled (google or not), you should not be sending private links across the internet without any type of security in place. also, when it comes to the ads, google don’t hire employees to sit and read through your email and pick out keywords. it’s done by machine, the same ones that send your emails from one server to the next, scanning them for keywords which trigger of spam detectors, as your ‘private’ text file traverses the planet.

    c’mon folks.

    oh, and all email to my domain is also redirected to gmail. you’ve got to love it.

  15. “Pity that people will spend thousands of hours, and millions of dollars arguing over the best way to protect us from ourselves, but that we won’t spend five minutes learning to use a simple encryption
    system that could completely erase these very issues.”

    It’s there already, and almost every modern e-mail client supports it: it’s called S/MIME.

Leave a Reply to Robert Sayre Cancel reply

Your email address will not be published. Required fields are marked *