Site icon John Battelle's Search Blog

Google And Pre-Fetching

Google has found itself in the midst of another tempest, whether this particular one is in a teapot or not depends on your point of view. The issue has to do with “pre fetching” – a practice for which Google got some heat back when it introduced its web accelerator.



I first saw note of this on Dave Farber’s IP list. From the original post, by privacy advocate Lauren Weinstein:

…about a month ago, Google started triggering “prefetch”

page data for the top listings in search results. This behavior is

reportedly currently limited to users on Mozilla-based browsers

(Mozilla, Netscape, Firefox…)

The goal of this procedure is to allow users of those browsers to see

the top link results faster, since they’d already be cached locally.

But there are big downsides to this process.

One obvious problem is that it can distort Web server statistics, by

creating “hits” from users who never actually chose to visit the

sites in question, but were prefetched when their search listed those

sites at the top of results. For some sites, this may be a mere

annoyance, for others it could be a significant problem that could

affect their revenue patterns. This also has the side-effect of

creating a sudden artificial boost in Mozilla-based browser usage

statistics.

A much more serious issue is that the prefetching causes users to

actually access sites without ever having touched the associated

links — and this includes the receiving of cookies. …..

….This means that your IP address and other typical connection data

have *already* been dropped into that site’s logs, even though you

never chose to access that site, and you may now already be holding

cookies from them as well.

… imagine if an innocent search returned results where the

top-listed site contained information you’d never want to be

associated with nor access in any way (child porn, browser exploit

sucker-bait sites, illicit files — you name it). Keep in mind that

such sites will often use various techniques specifically to boost

their rankings in search results….

…Bottom line: Creating a situation where users are “automatically”

accessing search-result sites without their having taken explicit

actions to do so is very bad policy. This problem is not the fault

of Google alone — the prefetching mechanism has been present in

Mozilla-based browsers for quite some time.

However, when the planet’s major search engine begins to routinely

use this technique in the manner that Google has done, it at the

very least suggests that they did not fully think through the

potentially serious anti-privacy ramifications of their actions, when

applied on the vast scale of their user base.

Tim posted on this issue as well, offering a clarification that Google only prefetches the first result.

I called Google and spoke to some folks there, they acknowledge that Google does the pre-fetching for Mozilla clients. But they argued that Google is doing it in a fashion that is compliant with web standards, and for a good reason: to speed up the web. Sophisticated webmasters can easily filter out pre-fetches from other kinds of requests, so logs won’t be inflated, and users can turn fetching off it they want. For more on this issue, Google pointed me to Mozilla’s link prefetching FAQ.

Exit mobile version