Google Purges Thousands of Malware Sites
Stony Stevenson sends in word on the most massive "SEO poisoning" seen to date. The attack was directed at Google in particular and resulted in tens of thousands of Web pages hosting exploits showing up on the first page of Google searches for thousands of common terms (PDF). Sunbelt Software blogged about the attack on Monday after investigating it for months. By Wednesday Google had removed tens of thousands of malware-hosting pages from its index.
http://news.bbc.co.uk/1/hi/technology/7118452.stm
The sites were targeting IE exploits.
Recently (end of October) Google reordered some of their sites and dropped the PageRank on many (mine included) there was a blog post about it here. My PageRank suffered immensely dropping from an overall high of 6/10 to the now 3/10. The most noticeable difference for me was that for the next two weeks (and the first time ever) I was no longer the #1 hit for: Bill Roehl, "Bill Roehl", or any variation thereof. Not only that but the first result from Google wasn't even for my root page, it was for some post I had underneath. I found that to be very odd.
Now, while I was digging through the Google results to find out why this could have possibly happened (prior to reading the blog post linked above) I found tons of SEO spam sites that my site had been linked from. I had never seen that many junk results returned before and was surprised they were getting through. I was seriously concerned that they had something to do w/my ranking drop.
At least Google is getting back on track dumping those bastards. While most people probably don't change their default settings to see anything more than the first 10 results, I am constantly looking through the first 100 on various searches and have seen more and more of that. I was wondering if some of the claims of Google's drop from #1 would imminent if something didn't change.
For those of you, like me, who did not immediately recognise this TLA, it stands for Search Engine Optimization.
Sounds like net censorship to me! What if I wanted to visit those malware sites?
When will their crawlers automatically disqualify ALL sites that contain malware though? That would be nifty.
I don't think it would be possible. I linked to a turing test program I wrote called "art.exe" from my Artificial Insanity page that I hosted on another site I owned (which I since have let lapse). The only way a crawler would know that this program was benign was because it isn't listed in any of the antivirus lists of viral signatures.
What would be nice is if Google would have its crawlers automatically check pages as they crawled. If there were any known malwars the page would be blacklsted. But there's no way I can think of to flag malware that hasn't been identified as such by humans.
-mcgrew
PS:)downside would be that you couldn't find microsoft.com (Foghorn Leghorn says...)
PPS: I've been mulling over rewriting the Artificial Insanity program in javascript. But I'm having a hard time finding the time.
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
The pdf contains a list of 2161 popular Google search terms. This is an SEO wet dream. Thanks!
It is dangerous to be right when the government is wrong.
Wide awake.
Personally, I'm comfortable with the fact that I'm only the second-best me out there. Let that other fella have his glory, because I'm never going back to the Rob Vincent Academy. I'm not going into it here, but those bastards Rob, Rob, and Rob know why.
Slashdot Burying Stories About Slashdot Media Owned
I'm probably too late on this discussion, but I thought something needed to be said. I work in online marketing (no, that doesn't mean I am a spammer) and I think this speaks volumes about what Google is hard-pressed to admit. The system can still be gamed. And it seems to me that no matter what Google does to improve their algorithm, the system will still be vulnerable to gaming.
In part, I think this has to do with the oddness that is their ranking strategy. They want to find the most relevant sites for any given query. So they study online behavior and adjust their algorithm to reflect that behavior. At the same time, they publish "guidelines" on how webmasters should design their sites and link out/in. It seems like they're trying to influence how websites behave online and then say that they're picking up on the organic trends. But in the end, they generate the trends. And then they tell everyone how to do it. Because of this, the system will always be vulnerable.
Until, that is, PigeonRank(TM) is launched.
Yes, you can dance to Radiohead.
Nothing (except antitrust law, maybe) stops Google from "forgetting to include" live.com in it's indexes now and this situation is quite unlikely to change in the near future. The only two reasons I think of as relevant to leave competitors in are the outrage from both the internet community and the "forgotten" competitor (perhaps culminating in lawsuits for anti-competitive behaviour, IANAL) and the desire for the own index to be perceived as fair and complete.
An independent body deciding about the malness of any ware is, if a certain responsiveness could be guaranteed, a creepy idea. Forming such a commitee would very surely be a huge leap in the direction of an often-mentioned TCPA (Palladium, NGSCB, Donkey poop)-secured blacklist society. A small aristocraty of people in this decision commitee would become the target of a trillion-dollar industry and be able to decide exactly what piece of software is ran by anybody. On the other hand, allowing anybody to participate in these votes would guarantee this operation not to be effective because of the huge delay this would cause. The same goes for adding legal ways to fight a decision by this body - having one would cause the system to become as slow as many legal systems throughout the world are today, not having one would be a surefire way to cause dissatisfaction with lots and lots of developers (both natural and legal persons).
Also, don't forget to take into account the current legal trouble e.g. encryption software is going through. I'm certain an independent body would decide similar to lawmakers throughout the world. Essentially, you could probably forget about running Linux (Open Source? That could run anything, including highly illegal tools like decss without any way to stop it), any cd/dvd copying software (It's fun to break the D-M-C-A (sung to the tune of YMCA)), nmap (Remember germany banning "Hacker tools"?) or anything else.
Sorry for painting such a dystopian future, but letting any (independent, governmental or profit-oriented) body whatsoever decide what software's good and what's bad just isn't what you, me or most anybody else wants.
I do agree... and maybe an independent body would just become corrupt like the rest of them BUT.
In googles interest, they are a search engine and not a publisher and for that reason are not subject to the indexes of child porn and other illegal activity. Once google start going down the road of blocking spam and other malicious sites it could be suggested they lose the right of being an automatic aggregation engine.
All the The pirate bay does is index pointer links, all google does is index pointer links -- one of them has a safe harbour in the US and the other does not. How long before Google itself loses its 'safe harbour' ?
After reading this, I immediately checked to see if Google had fixed their open redirector. No, they haven't, and there are six exploits of it listed in PhishTank. Google needs to turn that off. If they absolutely insist on having an open redirector, it needs its own subdomain, which is what Yahoo does. Then the subdomain can be blacklisted without collateral damage.
Phishing via exploits of major sites is a big problem, but involves a small number of major sites. 168 major sites today. The usual exploits are:
Out of 1.6 million domains in DMOZ, and over 10,000 phishes in PhishTank, only 168 domains are in both. So the number of sites that need to be fixed is small. In fact, some of those sites are already fixed, but the entries haven't been removed from PhishTank yet. (Hint: if you kill a hostile page on your domain, make it a 404 error; that gets the page out of PhishTank's "active and online" list automatically. Don't just change the content or redirect it somewhere else, or it stays in the tank until somebody rechecks it manually, which can take weeks.)
For every site in the list, there's some competitor in the same business who isn't on the list. "Everybody has this problem" isn't a valid excuse any more. This is a useful point to make with management if you find your own company on the list.
This list of 168 exploited sites is updated automatically every three hours. There's also a list of sites recently removed from PhishTank. "n-insanity.com", "tropmet.res.in", "wsjob.com" were dropped from the list today; they no longer have active, online entries in PhishTank. "gentlesource.com", "t35.com" (an eBay phish), "tilapia.com" (another eBay phish), and "uic.edu" (already fixed) were added; they just appeared in PhishTank. If you have any responsibility for a site on the list, please take steps to fix the problem. If you're not part of the solution, you're part of the problem.