Slashdot Mirror


Spam Sites Infesting Google Search Results

The Google Watchdog blog is reporting that "Spam and virus sites infesting the Google SERPs in several categories" and speculates, ...Google's own index has been hacked. The circumvention of a guideline normally picked up by the Googlebot quickly is worrisome. The fact that none of the sites have real content and don't appear to even be hosted anywhere is even more scary. How did millions of sites get indexed if they don't exist?

42 of 207 comments (clear)

  1. It's the Rand Corporation by OptimusPaul · · Score: 3, Funny

    in conjunction with the saucer people under the supervision of the reverse vampires are forcing our parents to go to bed early in a fiendish plot to eliminate the meal of dinner. We're through the looking glass, here, people...

  2. Google index hacked? by InvisblePinkUnicorn · · Score: 5, Funny

    Hacking of Google databases might explain why Google Translator used to translate the Russian name for "Ivan the Terrible" as "Abraham Lincoln".

  3. SEOs by Chilled_Fuser · · Score: 5, Informative


      Using one page of information for Google's spider and then using a redirect for a non-spider user. It's an SEO tactic.

    1. Re:SEOs by glindsey · · Score: 4, Interesting

      Which raises the question: Why not have GoogleBot do a check also as a normal user-agent (IE/Firefox/etc.) and see if the page is significantly different than when it identifies itself? At the very least GoogleBot could check if there are common blacklist words ("viagra" et al) on the website when identifying itself as IE or Firefox.

    2. Re:SEOs by dschuetz · · Score: 3, Interesting

      I was pretty sure that Google already did some kind of checking for this sort of dodge. It could be that the sites in question have found some way to dodge the dodge -- maybe they figured out when a google revisit (with a different user agent) would occur, or maybe they recognize google IP addresses and always give the scammed page regardless of user agent, or some other similar trick.

      That's what makes this scary -- as I said, I thought google was already on the lookout for such scams, and if they're being beat on such a large scale it might mean a major shift in google's strategy is in order...

    3. Re:SEOs by jmagar.com · · Score: 4, Interesting

      Google does this already, perhaps not with spiders, or in the way you described. But they do seek out and destroy sites that are caught faking keyword densities and other SEO tactics on crawl pages vs human pages.

    4. Re:SEOs by Tim+C · · Score: 5, Insightful

      At the very least GoogleBot could check if there are common blacklist words ("viagra" et al) on the website when identifying itself as IE or Firefox.

      So medical supply or information websites shouldn't be indexed by Google?

      I know what you're trying to do, but no word is 100% inappropriate. What if someone is actually looking for information on Viagra, or replica Swiss watches, or cheap stocks? What if someone is looking for information on spam?

      Check for significant differences in content with different user-agents yes, but banned words? That really doesn't seem like a good idea to me.

    5. Re:SEOs by suv4x4 · · Score: 4, Insightful

      Which raises the question: Why not have GoogleBot do a check also as a normal user-agent (IE/Firefox/etc.) and see if the page is significantly different than when it identifies itself? At the very least GoogleBot could check if there are common blacklist words ("viagra" et al) on the website when identifying itself as IE or Firefox.

      It does. It also detects landing pages mentioned above. Apparently it's something more subtle than what one could think of in few mins on Slashdot, and we'll learn soon enough.

    6. Re:SEOs by Billosaur · · Score: 4, Informative

      It's more than likely related to IP address than user agent. I used to work in web site metrics, and the number of fouled up user agents and spoofs was always staggering, but IP was a pretty good indicator of who was doing something. No doubt the bad guys have tracked the Google bot's IP over a long period of time and perhaps made some correlations to give them a pretty good idea if the site is being revisited by Google under an assumed user agent. I'm not sure, but it would seem to me that Google would have thought of spoofing it's IPs long ago, to avoid people being able to track them, though I can't say how you'd go about that.

      --
      GetOuttaMySpace - The Anti-Social Network
    7. Re:SEOs by colourmyeyes · · Score: 5, Funny

      Apparently it's something more subtle than what one could think of in few mins on Slashdot
      Blasphemy! In my relatively short time lurking on Slashdot, I've seen nearly all the world's problems, including hideously complicated questions of physics, SOLVED in posts no more than a few paragraphs long.

      It's amazing, really.
      --
      My grandmother used anecdotal evidence all the time, and she lived to be 120 years old.
    8. Re:SEOs by glindsey · · Score: 3, Insightful

      What if someone is actually looking for information on Viagra, or replica Swiss watches, or cheap stocks? What if someone is looking for information on spam? That's a good point. But perhaps combinations of keywords would work -- it's pretty unlikely that you'd see "viagra" and "mortgage" on the same site, for example. If you partner this with checking for significant user-agent differences it could become a pretty good tool, I think.
  4. Google hacked, sites don't exist, um ... by icepick72 · · Score: 3, Insightful
    Submitter says Google's index has been hacked which could imply the severe case: direct security threat and entry to it, or more likely: managing to get it to index something Google would not want it to index.

    Submitter asks: How did millions of sites get indexed if they don't exist?

    Okay, I call this an idiot story. Millions of sites come into being and go out of being all the time. What does this statement have to do with anything? It seems like submitter has a lack of understanding how basic Google and the web work, but the story has made it to Slashdot. I think the Slashdot IQ level is dropping because this is a Digg story.

    1. Re:Google hacked, sites don't exist, um ... by Clandestine_Blaze · · Score: 3, Informative

      Millions of sites come into being and go out of being all the time. What does this statement have to do with anything? It seems like submitter has a lack of understanding how basic Google and the web work, but the story has made it to Slashdot. If you had bothered reading the article, you would have seen:

      • The .cn sites don't appear to be hosted ANYWHERE. They are simply redirected domain names. How they got ranked in Google in such a short period of time for fairly competitive keywords is a mystery. Google's index even shows legitimate content for the .cn sites.
      • It appears that the faked sites are redirecting the Googlebot to a location where content can be indexed, while at the same time recognizing normal users and redirecting them to a site that includes the malware mentioned earlier. This is an obvious violation of Google's guidelines, but the spammers have found ways to circumvent the rule and hide it from the Googlebot.
      Yes, millions of sites do come into being all the time. Had Google indexed a site, and had said-site disappeared before the index was updated, you would simply either hit a landing page (if that domain was purchased but not set-up) or you would get an error message

      The submitter was referring to instances when a fake redirector is being set-up and tricking the googlebot by sending it to websites with content and keywords while sending normal users to malware-infested sites. This is a completely different situation than "Millions of sites come into being and go out of being all the time." In this case, those sites are still there and are appearing pretty high up in the index, while redirecting unsuspecting users to other websites. They exist in the physical sense, but that's about it.

      I think the Slashdot IQ level is dropping because this is a Digg story. Or because the readers simply don't bother to read the articles they comment on any more.
  5. Not hosted anywhere? by Vicegrip · · Score: 2, Informative

    The article makes the claim that the "hijacked keywords" are going to redirection websites that do not "appear to be hosted anywhere".

    That seems a little incredible to me. :)

    Invisible, IPless, Chinese web-servers are taking over Google! Personally, I'll just let Google worry about trying to protect its search engines. :)

    --
    Do not spread "09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0" over the internet, thank you.
    1. Re:Not hosted anywhere? by IBBoard · · Score: 4, Interesting

      Yeah, I think "not hosted anywhere" is somewhat of a simplification for "actually hosted somewhere but never show any content to a normal user because they redirect you to another domain instead". While it might fly for a complete non-techy, I wouldn't have thought /. would have too many people believing in responses from machines that don't exist.

    2. Re:Not hosted anywhere? by TheRaven64 · · Score: 4, Funny

      Those of us on Internet 3.0, Quantum Edition, have this problem all the time. Quoogle indexes sites without collapsing their wave functions. When you click on a link, the waveform collapses and the server may or may not exist. Web spiders are therefore being replaced by cats.

      --
      I am TheRaven on Soylent News
  6. I call Bullshit!!! by Jennifer+York · · Score: 4, Insightful
    Any evidence to back that up? I seriously doubt that a single individual has the ability to make a change on production boxes without a committee of senior managers approving the change.

    Google will adjust, find the method of manipulating the page ranks, and close the hole.

    1. Re:I call Bullshit!!! by Billosaur · · Score: 5, Insightful

      It may not be a question of a single developer making changes, as much as a single developer (or group of them -- safety in numbers) divulging to certain third parties how the algorithms work in the page ranking system. It's very rare any company gives anyone production access to make changes, but then again I've seen that happen too, where something breaks, they give a developer access to patch it in a hurry before the hew and outcry set in, then forget to revoke his/her access. Of course Google is global, so any change would have to propagate through the system vis source control, so tracking it wouldn't be that hard. I doubt any developer, no matter how nefarious, would take the risk.

      --
      GetOuttaMySpace - The Anti-Social Network
    2. Re:I call Bullshit!!! by zymano · · Score: 2, Insightful

      No it's not. Whenever you ask just a computer program to weed out spam , it will always be outwitted by average human intelligence.

      There are websites strictly devoted to google ranking.

      Let me add this about Google. The google corporation really isn't 100% innovative. Their search uses common links to rank. This has led to evolution of the spammers. They load their pages with links to spam. So my point to slashdot is......

      If google is so damn loaded with money and that their search tech uses common user links, why not pay people/moderators for 'quality' links to information?

    3. Re:I call Bullshit!!! by Metasquares · · Score: 2, Interesting

      They've had people working on their algorithms for quite some time now. I doubt it's in the state where it's something you can just give away all at once... or precisely target, for that matter. It's probably hundreds of thousands of lines of code by now, if not more. They should have systems in place to notify them when that much data is copied at once.

      Still waiting for them to allow weighting of search terms, though :)

  7. I Bet It's a Simpler Explanation by eldavojohn · · Score: 5, Interesting

    Google is susceptible to an erosion of moral tenacity, just like any other corporation. This would be far more interesting but the sad fact is that it's probably the simplest explanation: spammers are merely more sophisticated. I mean, a while ago a few people teamed up to Google bomb Bush as a "miserable failure" and it worked. They exploited Google's page ranking system. It's pretty easy to exploit because they patented it so you merely need to read the patent. From there you get an idea of how to exploit it.

    I imagine that spammers could band together or simply get botnets 'clicking' as independent IP addresses links that boost their page rank. That's how it worked with Bush, they simply linked his homepage as "miserable failure" and suddenly he was the number one result from that query in Google.

    I find this more likely an explanation than someone changing the data or values in the database. There's going to be plenty of evidence left in the logs & it's not like nobody's going to notice. This is Google's bread & butter, no amount of money in the world could entice a worker to mess with it. They would have to be exceptionally stupid as the lawsuits that follow would be in the billions.
    --
    My work here is dung.
    1. Re:I Bet It's a Simpler Explanation by suv4x4 · · Score: 2, Informative


      I imagine that spammers could band together or simply get botnets 'clicking' as independent IP addresses links that boost their page rank. That's how it worked with Bush, they simply linked his homepage as "miserable failure" and suddenly he was the number one result from that query in Google.


      I like your post, but Google can't detect if you "click" a link. It doesn't need botnets to click links from different IP addresses.

      It just needs the mere *presence* of those links, with the same text, to the same page. Also the hosting servers of those sites should have different IP-s.

      The miserable failure bomb was simply a bunch of bloggers posting a link on their blogs. When GoogleBot came around and found the links, the attack was accomplished.

    2. Re:I Bet It's a Simpler Explanation by suv4x4 · · Score: 2, Informative

      We're not talking about the results page, but just links. In sites separate from Google.

    3. Re:I Bet It's a Simpler Explanation by Arthur+B. · · Score: 2, Insightful

      Unless the sites happen to have google ads...

      --
      \u262D = \u5350
    4. Re:I Bet It's a Simpler Explanation by nahdude812 · · Score: 3, Interesting

      Or Google Analytics.

  8. specific phrases? by rubberglove · · Score: 5, Interesting

    The story would be more interesting if it included an example hijacked search phrase.
    I'd like to check it out myself.

    1. Re:specific phrases? by wbean · · Score: 2, Informative

      There's a sample search phrase posted in the comments to the original blog entry. It produced a lot of funny .cn results for me. Here it is:

      Bayesian networks and decision graphs Finn rapidshare

  9. Wait and see. by eniac42 · · Score: 5, Insightful

    People, its just a blog. If someone has really hacked Google, we will hear soon enough. Otherwise scamming and spoofing the ratings with rubbish sites is a sport thats been going on a long, long time..

    --
    "A nation that forgets its past is doomed to repeat it." - Churchill
    1. Re:Wait and see. by tbannist · · Score: 4, Insightful

      Actually, it's worse than that. It's a blog that can't provide any actual evidence that anything they claim is true. As far as we know, the entire story is bogus because the blogger has provided nothing to prove that any of his claims are true.

      --
      Fanatically anti-fanatical
  10. Nutcase conspiracy theory adopters web2.0 version by georgeb · · Score: 2, Insightful

    Quotes:

    "Some searches (very specific phrases, and I won't list any of them right now - Google knows which they are) return results with a large number of .cn (Chinese) sites."

    "The .cn sites don't appear to be hosted ANYWHERE." (wow!)

    "[...] the Word-Confirm on all of their sites, including the one I will have to use to post this, generate a large number of rogue responses, and the HELPDESK facilities with thousands of consoles and employees each all over the planet watch the responses and other traffic characteristics [...]"

    How the HECK did _this_ get on /.? It's a new low, I swear.

  11. Sure it's not his browser that's porked? by AskChopper · · Score: 2, Interesting

    I think he needs to run AdAware. Seriously.. I've entered a bunch of the usual suspects into google trying to find these hordes of .cn sites that pop up. No joy yet.. Anyone else found one?

    --
    The old believe everything, the middle-aged suspect everything, the young know everything. - Oscar Wilde
  12. Google is working on this ... by miller60 · · Score: 3, Informative

    Back in May Google launched on online security blog as part of a broader effort to detect malware sites, presumably to exclude them from the SERP results. They're clearly behind the curve. But this post offers an overview of Google's efforts and ambitions in this area.

  13. Simple way to eliminate pharmaceutical spam by Alzheimers · · Score: 2, Funny

    Free universal health care

  14. Where do all the calculators go when they die? by Scrameustache · · Score: 3, Funny

    I wouldn't have thought /. would have too many people believing in responses from machines that don't exist. Were getting phantom pings from the ghosts of the still-smoldering servers we slashdotted in our folly!
    I'm scared...
    --

    You can't take the sky from me...

  15. What hijacked phrases? Not seeing this. by Animats · · Score: 4, Informative

    I'm not seeing any of this. I'm trying commonly spammed phrases in Google, and seeing nothing unusual.

    • "digital camera" - OK
    • "ink cartridge" - OK
    • "flat screen TV" - PCworld at the top
    • "auto parts" - OK
    • "london hotels" - usual results
    • "britney spears" - usual results
    • "viagra" - Pfizer, Wikipedia, etc.
    • "rebelde" (the Mexican telenovela, one of the top ten searches) - normal
    Not one .cn site in the top 10 for any of these.
  16. Re:Calling Bullshit along with this one :Nothing N by onepoint · · Score: 2, Insightful

    well for those of us whom deal with Google as their lively hood ( I currently run PPC campaigns and do some SEO work on my web sites ), this was a problem.

    I spent the better part of a afternoon about 2 weeks ago, submitting my searches to Google asking them too look at these sites.

    they were under my key word group and it was driving me nut's.

    --
    if you see me, smile and say hello.
  17. google-analytics.com by TFGeditor · · Score: 2, Insightful

    Has anyone ever looked into how google-analytics.com (formerly Urchin) works? This blogger http://labnol.blogspot.com/2005/11/prevent-google-analytics-from-tracking.html gives a bit of info--and it does not appear to comply with the Google "do no evil" mantra.

    --
    Ignorance is curable, stupid is forever.
    1. Re:google-analytics.com by nicolastheadept · · Score: 2, Insightful

      And how is a webmaster simply monitoring what people click on evil? Just because you may be paranoid, it doesn't make Google evil.

      --
      09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
  18. Search Engine Pessimisation by ajs318 · · Score: 2, Insightful

    Worse, I think, is the act of spamming blogs with links. The theory is that, the more links there are pointing to a website, the more popular it must be; so, by using commonly-available, spam-advertised commercial software to pollute blogs with links unrelated to the subject matter, webmasters imagine they can improve their ranking without paying baksheesh to the search engine companies.

    I have had an idea for a hack to WordPress, which will make all links invisible to GoogleBot (and maybe the other search engines too). This should make it pointless for anybody to spam blogs with links to their site, since the links won't be picked up by search engines. In a nod to Mel, I call this "Search Engine Pessimisation".

    --
    Je fume. Tu fumes. Nous fûmes!
  19. What is up with images? They being abused too? by hurfy · · Score: 2

    I just did an image search and forgot a space. I got a lot of bizarre results, a large number of odd ones come from .hu

    I searched on Opel Manta but forgot the space. With it i got many matches very little junk in 1st 10 pages. Without a space i got weird results starting on 1st page. What does a car name have to do with a naked chick with a Nokia phone? Mud wrestlers? Homer Simpson? Paris Hilton? Dozens and dozens of unrelated pictures it seems.

    Spyware is off ATM so i didn't get any farther than that.

    1. Re:What is up with images? They being abused too? by Daniel+Spiewak · · Score: 2, Funny

      What does a car name have to do with a naked chick with a Nokia phone? Mud wrestlers? Homer Simpson? Paris Hilton?
      ...as the number of people searching for "Open Manta" unexpectedly jumps by a factor of 8000x.
  20. This Finding was Validated by Jeremiah+Cornelius · · Score: 2, Interesting

    and commented on by Dvorak. (God, did I just say that he confirmed anything!?!)
    http://www.pcmag.com/article2/0,1895,2188281,00.asp

    Also, the Reg noticed - after my Slashdot posting, for once - so they are chasing this tail!
    http://www.theregister.co.uk/2007/10/01/google_spam_infiltration/

    Wheee!

    --
    "Flyin' in just a sweet place,
    Never been known to fail..."