Google's Fraud Squad Battles Phantom Clicks
An anonymous reader writes "It's an open secret that low cost workers in India, China and other countries are hired to boost traffic for online ads by clicking on text links, banners etc. Internet marketers facing high advertising fees on search networks like Google are becoming increasingly concerned about this form of online fraud. This problem has reached a critical stage and even Google recognizes that it has been the target of individuals and entities "using some of the most advanced spam techniques for years". A Google spokesperson said the company has "applied what we have learned with search to the click fraud problem and employed a dedicated team and proprietary technology to analyse clicks.""
For 2004 about 96% of Google's revenue has come from ads.
Here is some more detailed info.
Because of thier desire for the IPO alot of financial info is now available.
the position for your adwords ads is based on CTR (clickthroughrate (%), as in clicks divided by impressions * 100) and the price you're willing to pay
If Google just let this happen, they would be saying to advertisers "you're getting screwed, but we're profiting, so we're happy." This might tarnish Google's saintly image and make people not want to pay them money.
You might as well say that cellphone companies shouldn't stop phone cloning, because if someone steals my identity and starts making calls to Nigeria, the phone company can bill me big time! But if they didn't do their best to stop the fraud, they would soon lose my custom.
With Google it is not as easy as some other companies out there.
Google's code is placed on the site as a javascript include that then gets rendered to the screen at runtime when a browser executes it.
That means if you have a script hit the page and get the source for it, all you get is the javascript include.
If you write a page that onClick let's you view the content of the Google IFrame (the Javascript include dumps out an Iframe that then fills with a page off of Google), you will then see more of the code.
They have several layers of javascript and none of the pages render out links directly, so it is hard to scrape them with a bot, since a bot only sees the source.
You could load up the pages individually (outside of the iframe) and take a look at them, but it doesn't always work and also when you load that page, it sends back a reference to Google of what the site/location/name of the page you are loading looks like.
So if you have a site ballsweat.com that has Google Ads on it so that you can look to see what the ads look like, as you start messing around with it to get a better idea, they will see that it is no longer showing up on the site and instead showing up on your hard drive (or if you like you can put it on your server and then they can read your code that you are using).
That alone will tip them that you are looking into it - but then you could claim that it was someone else and not you (assuming it was on a drive), but then that could also mean that you just use someone else's site to test.
So anyway, back to getting the data, you would have to load up the source, and then either parse the javascript and execute it to build it the same way a browser does (hopefully there are objects in Windows that let you simulate this and then dump the post rendered contents into a variable which you can scan - don't know about that),.
OCR is out of the question since that is not going to get you the proper link (the links are listed, but the payment only goes out if you click on the link which first routes it through a Google site so it can register the click and track the stats and then redirects you to the site). When you mouseover it shows the regular site link, but that is done via javascript.
Then you run the issue that Google would have to be retarded to just let a single IP crunch through a ton of ads everyday.
So then you have to worry about spoofing - in this case it could arguably be blind spoofing - but the problem there isn't that you want to load web pages - that would actually work with blind spoofing (say I am computer A, and I want to tell server B that computer C is connecting to it, and that it should send the page data there), but the problem is again that it is only going to send raw HTML/javascript source down that connection and it is them going to drop off of that machine.
So the site (Google in this case since you loaded a page and then "clicked" a link) registers the hit, but the page never gets rendered, so the Google page is never displayed and the redirect never happens - one could assume that Google is aware of this and wouldn't count that as a hit since the other page never gets loaded.
So even if you could past all of that (heh, feels like shades of Oceans 11), then there is the issue that Google (technically it isn't Google, but a series of companies that they farm out the AdWords content - learned that from an investment bank friend that sat in on the IPO workings - yay) monitors this shit and looks for anomalies.
So while you were getting 200 hits a 2 clicks every day for a month, if you all of the sudden are getting 2000 hits and day and 200 clicks, they are going to investigate your site.
If nothing has changed to show that there should be new interest in your site (new ad placement, new content, etc) and they can do searches and see that there aren't any new sites pointing to you - then all signs point to you cheating.
And then on top of all of that, we can show that a Gaussian distribution
There are some odd things afoot now, in the Villa Straylight.