Spam Sites Infesting Google Search Results

← Back to Stories (view on slashdot.org)

Spam Sites Infesting Google Search Results

Posted by CmdrTaco on Monday October 1, 2007 @01:16AM from the hate-when-that-happens dept.

The Google Watchdog blog is reporting that "Spam and virus sites infesting the Google SERPs in several categories" and speculates, ...Google's own index has been hacked. The circumvention of a guideline normally picked up by the Googlebot quickly is worrisome. The fact that none of the sites have real content and don't appear to even be hosted anywhere is even more scary. How did millions of sites get indexed if they don't exist?

13 of 207 comments (clear)

Min score:

Reason:

Sort:

Nothing New by mfh · 2007-10-01 01:21 · Score: 1, Interesting

For years Yahoo was infested with spammers on their front page, but the fact is -- Google is susceptible to an erosion of moral tenacity, just like any other corporation. Someone from within has given the keys to someone who has paid a lot of money to get them. This isn't a hack job... it's an inside job.

--
The dangers of knowledge trigger emotional distress in human beings.
Re:SEOs by glindsey · 2007-10-01 01:36 · Score: 4, Interesting

Which raises the question: Why not have GoogleBot do a check also as a normal user-agent (IE/Firefox/etc.) and see if the page is significantly different than when it identifies itself? At the very least GoogleBot could check if there are common blacklist words ("viagra" et al) on the website when identifying itself as IE or Firefox.
I Bet It's a Simpler Explanation by eldavojohn · 2007-10-01 01:37 · Score: 5, Interesting

Google is susceptible to an erosion of moral tenacity, just like any other corporation. This would be far more interesting but the sad fact is that it's probably the simplest explanation: spammers are merely more sophisticated. I mean, a while ago a few people teamed up to Google bomb Bush as a "miserable failure" and it worked. They exploited Google's page ranking system. It's pretty easy to exploit because they patented it so you merely need to read the patent. From there you get an idea of how to exploit it.

I imagine that spammers could band together or simply get botnets 'clicking' as independent IP addresses links that boost their page rank. That's how it worked with Bush, they simply linked his homepage as "miserable failure" and suddenly he was the number one result from that query in Google.

I find this more likely an explanation than someone changing the data or values in the database. There's going to be plenty of evidence left in the logs & it's not like nobody's going to notice. This is Google's bread & butter, no amount of money in the world could entice a worker to mess with it. They would have to be exceptionally stupid as the lawsuits that follow would be in the billions.

--
My work here is dung.
1. Re:I Bet It's a Simpler Explanation by nahdude812 · 2007-10-01 04:07 · Score: 3, Interesting
  
  Or Google Analytics.
  
  --
  Slay a dragon... over lunch!
Re:SEOs by dschuetz · 2007-10-01 01:41 · Score: 3, Interesting

I was pretty sure that Google already did some kind of checking for this sort of dodge. It could be that the sites in question have found some way to dodge the dodge -- maybe they figured out when a google revisit (with a different user agent) would occur, or maybe they recognize google IP addresses and always give the scammed page regardless of user agent, or some other similar trick.

That's what makes this scary -- as I said, I thought google was already on the lookout for such scams, and if they're being beat on such a large scale it might mean a major shift in google's strategy is in order...
Re:SEOs by jmagar.com · 2007-10-01 01:42 · Score: 4, Interesting

Google does this already, perhaps not with spiders, or in the way you described. But they do seek out and destroy sites that are caught faking keyword densities and other SEO tactics on crawl pages vs human pages.

--
www.jmagar.com
-
specific phrases? by rubberglove · 2007-10-01 01:43 · Score: 5, Interesting

The story would be more interesting if it included an example hijacked search phrase.
I'd like to check it out myself.
Re:Not hosted anywhere? by IBBoard · 2007-10-01 01:43 · Score: 4, Interesting

Yeah, I think "not hosted anywhere" is somewhat of a simplification for "actually hosted somewhere but never show any content to a normal user because they redirect you to another domain instead". While it might fly for a complete non-techy, I wouldn't have thought /. would have too many people believing in responses from machines that don't exist.
Sure it's not his browser that's porked? by AskChopper · 2007-10-01 02:04 · Score: 2, Interesting

I think he needs to run AdAware. Seriously.. I've entered a bunch of the usual suspects into google trying to find these hordes of .cn sites that pop up. No joy yet.. Anyone else found one?

--
The old believe everything, the middle-aged suspect everything, the young know everything. - Oscar Wilde
1. Re:Sure it's not his browser that's porked? by Anonymous Coward · 2007-10-01 02:19 · Score: 1, Interesting
  
  I just randomly found some.
  
  Search for "vnc pips e61" without the quotes and check page 7. There are some in other pages, but that one has the most.
Re:Search Engine Pessimisation by Anonymous Coward · 2007-10-01 08:25 · Score: 1, Interesting

Did you even read the link I posted? Nothing gets hidden from Google, it just tells google to ignore links in the comments rel="nofollow"> (and thus removes any point in link spamming)
This Finding was Validated by Jeremiah+Cornelius · 2007-10-01 13:24 · Score: 2, Interesting

and commented on by Dvorak. (God, did I just say that he confirmed anything!?!)
http://www.pcmag.com/article2/0,1895,2188281,00.asp

Also, the Reg noticed - after my Slashdot posting, for once - so they are chasing this tail!
http://www.theregister.co.uk/2007/10/01/google_spam_infiltration/

Wheee!

--
"Flyin' in just a sweet place,
Never been known to fail..."
Re:I call Bullshit!!! by Metasquares · 2007-10-01 16:32 · Score: 2, Interesting

They've had people working on their algorithms for quite some time now. I doubt it's in the state where it's something you can just give away all at once... or precisely target, for that matter. It's probably hundreds of thousands of lines of code by now, if not more. They should have systems in place to notify them when that much data is copied at once.

Still waiting for them to allow weighting of search terms, though :)