Spam Sites Infesting Google Search Results
The Google Watchdog blog is reporting that "Spam and virus sites infesting the Google SERPs in several categories" and speculates, ...Google's own index has been hacked. The circumvention of a guideline normally picked up by the Googlebot quickly is worrisome. The fact that none of the sites have real content and don't appear to even be hosted anywhere is even more scary. How did millions of sites get indexed if they don't exist?
Probably the reason they don't have content is the sites respond differently to requests from googles search engine then to requests from users. It would seem that they recognize googles search engine, either from the user agent or from the ip range, and then respond with content. It seems they get the content by proxying US sites. Which I don't think is anything new it's just being done on a larger scale.
When they served the proxied content to google, they could rewrite links on the fly to point to their own domains. They could basically appear like they mirror the whole internet. When a request comes in from a user, since it isn't a google user agent, it would just send it to their trojan infested site.
The sites could show one content to Googlebot and another to normal visitors. Google has to test with a different agent string and if the contents differ, they just have to junk the whole domain. I am sure they already do.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact