Google Warns About Search-Spammer Site Hacking
Al writes "The head of Google's Web-spam-fighting team, Matt Cutts, warned last week that spammers are hacking more and more poorly secured websites in order to 'game' search-engine results. At a conference on information retrieval, held in Boston, Cutts also discussed how Google deals with the growing problem of search spam. 'I've talked to some spammers who have large databases of websites with security holes,' Cutts said. 'You definitely see more Web pages getting linked from hacked sites these days. The trend has been going on for at least a year or so, and I do believe we'll see more of this [...] As operating systems become more secure and users become savvier in protecting their home machines, I would expect the hacking to shift to poorly secured Web servers.' Garth Bruen, creator of the Knujon software that keeps track of reported search spam, added that some campaigns involve creating up to 10,000 unique domain names."
Link to the original source at the bottom of the article goes to the same article near the beginning of the article.
I don't know about you, but something else that REALLY annoys me is pages that contain lists of words just so they come up on many searches... with no actual content. Or sites like "Buy *search term* at low prices" and they don't even sell what you're looking for. What's being done about those?
Evolution - Est. 4500000000 B.C. Don't piss in the gene pool.
I found this pretty interesting: "Authentication [across the Web] would be really nice," says Tunkelang. "The anonymity of the Internet, as valuable as it is, is also the source of many of these ills." Having to register an e-mail before you can comment on a blog is a step in this direction, he says, as is Twitter's recent addition of a "verified" label next to profiles it has authenticated."
The idea of universal authentication has been tossed around for a while. I feel like the biggest drawback is privacy (we'd have to trust some universal authentication system to hold onto some identifier even if posting anonymously) and the biggest obstacle is the need for universal participation. It's kind of too late to make an opt-in system. But I've liked the idea ever since early sci-fi interwebs (read: Ender's Game) had SOME kind of authentication.
Do something about world hunger. Click here
Anyone who frequently uses google knows this already. Plug in any kind of search and you're bound to get a slew of crap results along the lines of:
Download [term] full version
Torrent [term] keygen
Torrent [term] latest version
Torrent [term] hacked no-cd
You'll get those even when searching for books.
Or perhaps he meant it's only been popular in the last year or so. I've seen this going on for the last three years at the least.
Fuck Ajit Pai
If your website's front page has a PageRank score of 3/10 or higher it is a prime candidate for hijacking. Google gives extra clout to hyperlinks from sites with a high PageRank (aka "link juice"), so it's easiest for a malicious party to hijack a small number of high-ranking sites than a large number of low-ranking sites. The higher your PageRank the greater your risk.
I am assuming you can produce a list of candidate sites that may be benefitting from this by tracking for sudden rapid growth in links. From there you should be able to come up with an algorithm that looks at what the beneficiary site is about and what the linking sites are about. I would assume the hacked sites will have a random distribution of topics and sources- or a highly clustered distribution if a certain type of site is most often hacked. Regardless the distribution should be markedly different from a typical site.
NB: I am not very familiar with search engine algorithms so there is sure to be room for +5 comments whether you explain why this can work or can't work.
My webcomic
If you look at the discussion for almost any stock, they are all stock scam span. Having seen Google catch most of my email spam and news groups are pretty clean so this is a bit surprising.
"and users become savvier in protecting their home machines"
And when pigs fly...
@de_machina
The funniest part of this is that Google itself seams to fund them and has the ability to stop this MFA sites, link fraud sites -- this is a connected issue, but for some (very obvious) reason keeps it quiet.
Google can't solve this problem because their business model requires web spam.
Google is in the advertising business, not the search business. Search is a traffic builder for the ads. Google's customers are their advertisers, not their search users. They have to maximize ad revenue. The problem is that more than a third of Google's advertisers are web spammers, broadly defined. All those "landing pages", typosquatters, spam blogs, and similar junk full of Google ads are revenue generators for Google. Every time someone clicks on an AdWords ad, Google makes money, no matter what slimeball is running the ad. Google can't crack down too hard, or their revenue will drop substantially. Google does have some standards, but they're low.
Google went over to the dark side around 2006. In 2004 and 2005, Google sponsored the Web Spam Summit, devoted to killing off web spammers. From 2006, Google sponsored the Search Engine Strategies conference, where the "search engine optimization" people meet. That was a big switch in direction, and a sad one.
As we demonstrate with SiteTruth, it's not that hard to get rid of most web spam if you're willing to be a hardass about requiring a legit business behind each commercial web site. Google can't afford to do that. It would hurt their bottom line.
However, cleaning up web search results with browser plug-ins is a viable option. Stay tuned.
I saw this in the wild a few weeks ago. I had a google email alert running for my bank, which pointed me to a page which was blog-like but when you looked closer it was completely auto-generated gibberish. They had built the whole thing based on a list of banks and insurance companies. As it was under envsci.rutgers.edu I guessed they had been compromised.
I reported it to the webmaster and I see that it is gone (both from Google's index and the server). Not a word of thanks though. How long does that take...
Maybe someone here will give me a medal instead?
theres a greasemonkey script for that // ==UserScript== // @name No Experts Exchange // @namespace userscripts.org // @description Hide Expets-Exchange.com Results From Google // @version 0.1 // @include http://google.com/search?* // @include http://www.google.com/search?* // @include http://*.google.com/search?* // ==/UserScript==
var url = document.URL;
if(!url.match('-site%3Aexperts-exchange.com')){
var urlArray = document.location.toString().split("q=");
var queryArray = urlArray[1].split('&',1);
var newQuery = queryArray[0] + '+-site%3Aexperts-exchange.com&';
window.location.replace(url.replace('q=' + queryArray[0],'q=' + newQuery));
}
This is particularly bad at the .edu domains. It is shocking and inexplicable that the IT departments at these universities don't know what's going on with their own servers and in their own zone files. There are literally thousands of hijacked subdomains under valid .edu domains. How can the network administrators not know what's going on? Don't they check their logs? Don't they see the google referrers for this spammy content? Could they be responsible for it themselves, or maybe getting a payoff for looking the other way? Just look at the results of this google search and see just how bad it is:
http://www.google.com/search?hl=en&safe=off&q=%22low+cost+payday+loans%22+site%3A.edu&aq=f&oq=&aqi=
These schools are required by law and regulation to protect their student's private information. If their servers are so badly compromised, how can their students and employees trust them with their personal and financial information? It displays shocking disregard for security or utter incomptence, or perhaps even corruption on the part of the IT staff, and seriously needs to be investigated, and corrected, without delay!
IMHO, it's mostly script kiddies doing this "hacking". Over the years I have had a few sites developed and largely left to sit for posterity. Unfortunately, the ones that were running off-the-shelf packages such as phpNuke (CMS) or WordPress (blog) or phpBB (forum) have been hacked, or overrun by spammers, at least once. All of those packages had security flaws over the years... some worse than others.
Yes, I should have keep them up to date, but, no I didn't and lot's of people don't.
I want to keep the blog for a crazy project up for kicks, I don't want to keep updating WordPress on every release just to have that privilege.
Anyway, it's getting better these days, all the major packages are much more security aware.