(Non-Ficticious) News: Google Publishes Paper On Click Fraud

Here's the situation: on the one hand you have your customers, insisting that there is a problem and that you do something about it. On the other hand, you have your engineers, insisting there is not a problem. Further complicating the issue is that your customers, unsatisfied with your…

Fictionclick

Here’s the situation: on the one hand you have your customers, insisting that there is a problem and that you do something about it. On the other hand, you have your engineers, insisting there is not a problem. Further complicating the issue is that your customers, unsatisfied with your insistence that their concerns are, in fact, not a concern, have gone and hired third party firms who then validate their concerns (and turn click fraud detection into yet another industry – see the ads on here). Then, of course, the press whips those concerns into a major frenzy, threatening your $100+billion market cap.

And all of this is due to one thing: you aren’t willing to show your cards as to why you believe your customers concerns are invalid in the first place. Doing so would dull the edge of competitive differentiation that made your product what it is in the first place.

This is the situation in which Google finds itself right now with its AdSense advertisers. It’s not a pretty place to be. So to dampen the criticism, Google has responded with a 17 page white paper attacking the methodology third party click fraud reporting firms use. They’ll have to walk a fine line here.

Titled “How Fictitious Clicks Occur in Third-Party Click Fraud Audit Reports,” the paper sets out to set the record straight.

“We have seen numerous reports of click fraud estimates which we believe significantly overestimate the impact on advertisers,” the report states in its background section. “The most fundamental flaw that we have seen in these reports is the existence of fictitious clicks: events which are reported as fraudulent but do not appear within Google’s logs as AdWords clicks. This report identifies the root causes behind these fictitious clicks and illustrates the extent to which this flaw impacts click fraud estimates from these firms.”

I am still reading this report, and was given a 9 am publishing embargo, so I’m going to go ahead and upload the document here, and let you all read it with me. I’ll be back with more thoughts as they occur to me, or please, add yours to this thread.

Update: Wow, the document reads far more combative than I thought it would. It’s more of an indictment of the nascent click fraud detection industry, and three firms in particular are called out. To wit:

We have been aware of the presence of fictitious clicks in third-party reports for some time. We have given feedback to advertisers (and indirectly, to some of these third-party auditing firms) and pointed out the various flaws weíve observed in their reports, but have met with little in the way of a positive response or interest in correcting their methodologies. They maintained that their click fraud detection methodologies differed from ours, and that fact alone accounted for any differences. ….

We discovered some basic engineering and accounting issues across the industry – problems which were in fact completely separate from the issue of accurate click fraud detection – which have in each case led to dramatic overestimation of click fraud rates by these firms. As an example, a single AdWords click may appear as five events in some reports, leading to (a) the identification of these events as “click fraud”, and (b) the reporting of five fraudulent clicks. …

Appendix B presents detailed case studies for three firms:

ClickFacts, Click Forensics, and AdWatcher. Click fraud estimates from ClickFacts and Click Forensics have

received widespread media coverage. And among third-party auditing reports submitted to us by advertisers, reports from AdWatcher are the most common.

All three cases studies exhibit the problem of severe click inflation in their reports primarily due to the presence of fictitious clicks, which generally render their published estimates on click fraud invalid.

I am still grokking this, I’m not a fraud detection expert. Watch SEW for more, I’d wager. Also, here is Google’s post on the report.

And lastly, coverage from SEW from Publishing 2.0. Sounds like there were fireworks on stage, so sorry to be missing the conference…

9 thoughts on “(Non-Ficticious) News: Google Publishes Paper On Click Fraud”

  1. If these click fraud detectors are selling snake oil, then Google is right to call them out. But just because these guys can’t isolate and identify it, it doesn’t prove or disprove the existance or extent of click fraud. It just proves that click fraud is very difficult to detect, which we knew.

    Now how about putting some tools in adsense to show click spikes, document the time interval between clicks from the same origin (and exceptions from these averages), and better yet lists of the URLs where clicks originated?

  2. That was a good paper put out by Google – and I read though it and posted about it today. I’m just wondering what Google would consider, or be happy with, in a 3rd party solution?

  3. Having seen more than my share of reporting discrepancies in click, page, etc. counting, I still maintain that the best solution is not to engage in more of the same, but to move to business models that are less subject to fraud, and carry less of the overhead needed to detect it. We can quarrel endlessly about how a particular counting methodology is or is not correct, or take steps to minimize everyone’s burden.

    As a thought exercise, the next time someone gives you a coin or bill in exchange for something, ask them to prove to you that it is not counterfeit. Then consider why we have systems in place that allow currency to be exchanged, despite the existence of counterfeiters.

  4. Clearly, Google and click fraud detection companies both have a conflict of interest when it comes to reporting fraud levels.

    That makes it difficult to take their data seriously, whether it comes from auditing reports by click fraud companies, or Google’s own ‘estimating invalid clicks’ reporting tool.

    The Alexander Tuzhilin report was a start, and maybe this IAB Click Measurement Guideline initiative will move things in the right direction.

  5. Esoos, what this report highlighted mostly was they weren’t actually cross referencing their reports with Googles properly to check whether or not the supposed clickfraud had ever actually been billed for. This is one of the checks Google suggests in that report everyone should do and is a standard part of our own internal process, and a quick and dirty way to compare engines:

    http://www.search-engine-war.co.uk/2006/08/google_clickfra.html

    IAB Click Measurement Guidelines – I am pretty sanguine on this as the IAB has hardly got its head around search in the past. The engines would be better chatting to SEMPO and SMA where the search advertisers hang out to find out what we really want. My opinion is that an industry standard version of the Google autofill feature which contained some data about the referring click (timestamp, keyphrase etc) that used shared key encryption, so could be decoded by only that account holder or the engine is the way to go. That way you could at least precisely validate what clicks came from the engine (no third party tampering), and when submitting clickfraud reports the engine would know exactly which clicks you were on about, for instance where they had a dodgy affiliate.

    It seems pretty obvious really.

  6. This is always fascinating for me as I have been involved in running a finance portal for the last 7 years and “incorrect click counting” has been something that we have had concerns about since we launched back in the dark ages.

    Firstly, the google paper is technical in approach and deals with the problems that 3rd party click firms have assessing the validity of google clicks. It does not consider “real but malicious clicks” – 1 company’s employees clicking on their competitors adwords. In percentage terms this is not a problem for popular products like car insurance or loans, but the CPCs for these terms are high and therefore the extra cost to providers is bound to be high.

    However, the real problem with this behavior is it is all but untraceable. In niche markets where there are only a small number clicks each day extra will a big difference in percentage terms.

    We do not believe that our site gets targeted by individuals trying to increase their commissions from us. However, we are constantly battling to esure that the clicks we are counting are real. There is a lot of artificial traffic generated on the internet and sussing out what is real and what is not can be very difficult. This means that for our little business, we have a series of rules to help remove spiders.

    We use a combination of IP, cookie, regularity of clicks, User Agent & Browser information to help ensure that we remove all that we should. However, as yet we have not managed to create rules that accurately remove all the artificial traffic. We then also look for any IPs, or cookie’d users that have produced more than 20 clicks in a month. We then look at what has been clicked on to see if we can spot any patterns that make us suspicious, if we do we remove the traffic.

    3rd party firms will not have the luxury of being able to check whether a particular “user” has clicked on a whole bunch of links, they will only be able to see 1 click on their own websites.

    I would like to see 1 of the accountancy / consulting firms given real access to all the data, to do their own independent evaluation of how Google is removing artificial traffic by looking at swathes of data across companies, not just looking at individual companies.

    I better get on, but just thought the above might be of some interest.

    Thanks,

    Richard

  7. Overall assessment: Definitely combative in tone, which is surprising given that Google has a real issue with click fraud on their hands. Counting methodologies are an issue when comparing any two tracking systems, but at least 3rd party tools are giving us some data to look at to identify where there might be an issue. This is a far cry more than Google is giving its advertisers. Does this data need to be verified with billed clicks? Of course. What the paper fails to mention is that Google refunds people that ask, and doesn’t necessarily refund people that don’t ask to the same degree. I would like to see a statistical analysis of the refund rate for clients using a 3rd party tool, versus those that are not. Now that would be interesting.

    I am not vouching for the firms in the report, but am fairly certain any one of them can produce a client that has received 1000’s of dollars in refunds. Google is giving this money back out of the goodness of their hearts? Please.

    What the report also fails to mention is how Google is dealing with sophisticated click fraud, like that generated using clickbot.a, an automated program running on bot nets and clicking on AdSense sites. We all know the failings of 3rd party tracking. What we don’t know is how Google is filtering clicks of this type (answer: their not). Other reports have indicated that Google’s click fraud detection is basic in its approach, so it is unlikely they are filtering anything other than the most obvious forms.

    The real issue is that the AdSense model is prone to click fraud due to the motivations of the thousands of small publishers that make up the network. Remove the financial motivaton for fraud, and you solve click fraud for the most part. Can’t be done? Transparency might help, if we really want to see the market effect Eric Schmidt has talked about, then introducing better information into the decision-making process can only help. Sure, I can look at ROI and if it is not working, not play. But what if fraud is causing me to throw the baby out with the bath water?

    The best approach would be for Google to release their record of the click time, click IP, click UA, a anonymized click number, keyword, and referring URL. This is already available in most 3rd party tracking systems (good ones), and getting the official record from Google, much like your phone bill, would put an end to the disagreement. The claim that this is competitive intelligence is bunk. Until they do this, the issue will remain mired in disagreement and suspicion.

  8. Stuart: “the AdSense model is prone to click fraud due to the motivations of the thousands of small publishers that make up the network”

    First of all, none of this is new. Click programs have been around since 1994 at least, when a guy in my dorm was earning $1/click. Google did not invent the click program. Transparency is not the solution. No, it’s not bunk. The more info Google gives out, the *more* fraud you will have. There are plenty of hungry hackers out there living in poverty studying Google’s counting habits. Google won’t make it easier for hackers by giving away their algorithm secrets with detailed reports.

    Second, I’m tired of publishers getting the blame for this. The Internet is full of non-human traffic. Get over it. Stuart, are you on another Internet we don’t know about, maybe Web 3.0 Strict where everyone scans fingerprints to access your website? Get real. Anyway, you never know when a robot will link to your website, add you to a search engine or report your info back to real humans.

    If you don’t like our traffic, don’t buy it. It’s a good deal as it is. Otherwise, you wouldn’t buy it. The more you push Google to block traffic, the more you push everyone to better paying programs like TextLinkAds.com. Google is not your problem. The Internet is your problem.

Leave a Reply to pj Cancel reply

Your email address will not be published. Required fields are marked *