Spam Sites Infesting Google Search Results
The Google Watchdog blog is reporting that "Spam and virus sites infesting the Google SERPs in several categories" and speculates, ...Google's own index has been hacked. The circumvention of a guideline normally picked up by the Googlebot quickly is worrisome. The fact that none of the sites have real content and don't appear to even be hosted anywhere is even more scary. How did millions of sites get indexed if they don't exist?
For years Yahoo was infested with spammers on their front page, but the fact is -- Google is susceptible to an erosion of moral tenacity, just like any other corporation. Someone from within has given the keys to someone who has paid a lot of money to get them. This isn't a hack job... it's an inside job.
The dangers of knowledge trigger emotional distress in human beings.
Which raises the question: Why not have GoogleBot do a check also as a normal user-agent (IE/Firefox/etc.) and see if the page is significantly different than when it identifies itself? At the very least GoogleBot could check if there are common blacklist words ("viagra" et al) on the website when identifying itself as IE or Firefox.
I imagine that spammers could band together or simply get botnets 'clicking' as independent IP addresses links that boost their page rank. That's how it worked with Bush, they simply linked his homepage as "miserable failure" and suddenly he was the number one result from that query in Google.
I find this more likely an explanation than someone changing the data or values in the database. There's going to be plenty of evidence left in the logs & it's not like nobody's going to notice. This is Google's bread & butter, no amount of money in the world could entice a worker to mess with it. They would have to be exceptionally stupid as the lawsuits that follow would be in the billions.
My work here is dung.
I was pretty sure that Google already did some kind of checking for this sort of dodge. It could be that the sites in question have found some way to dodge the dodge -- maybe they figured out when a google revisit (with a different user agent) would occur, or maybe they recognize google IP addresses and always give the scammed page regardless of user agent, or some other similar trick.
That's what makes this scary -- as I said, I thought google was already on the lookout for such scams, and if they're being beat on such a large scale it might mean a major shift in google's strategy is in order...
Google does this already, perhaps not with spiders, or in the way you described. But they do seek out and destroy sites that are caught faking keyword densities and other SEO tactics on crawl pages vs human pages.
www.jmagar.com
-
The story would be more interesting if it included an example hijacked search phrase.
I'd like to check it out myself.
Yeah, I think "not hosted anywhere" is somewhat of a simplification for "actually hosted somewhere but never show any content to a normal user because they redirect you to another domain instead". While it might fly for a complete non-techy, I wouldn't have thought /. would have too many people believing in responses from machines that don't exist.
I think he needs to run AdAware. Seriously.. I've entered a bunch of the usual suspects into google trying to find these hordes of .cn sites that pop up. No joy yet.. Anyone else found one?
The old believe everything, the middle-aged suspect everything, the young know everything. - Oscar Wilde
Did you even read the link I posted? Nothing gets hidden from Google, it just tells google to ignore links in the comments rel="nofollow"> (and thus removes any point in link spamming)
and commented on by Dvorak. (God, did I just say that he confirmed anything!?!)
http://www.pcmag.com/article2/0,1895,2188281,00.asp
Also, the Reg noticed - after my Slashdot posting, for once - so they are chasing this tail!
http://www.theregister.co.uk/2007/10/01/google_spam_infiltration/
Wheee!
"Flyin' in just a sweet place,
Never been known to fail..."
They've had people working on their algorithms for quite some time now. I doubt it's in the state where it's something you can just give away all at once... or precisely target, for that matter. It's probably hundreds of thousands of lines of code by now, if not more. They should have systems in place to notify them when that much data is copied at once.
:)
Still waiting for them to allow weighting of search terms, though