Google's Research on Malware Distribution
GSGKT writes "Google's Anti-Malware Team has made available some of their research data on malware distribution mechanisms while the research paper[PDF] is under peer review. Among their conclusions are that the majority of malware distribution sites are hosted in China, and that 1.3% of Google searches return at least one link to a malicious site. The lead author, Niels Provos, wrote, 'It has been over a year and a half since we started to identify web pages that infect vulnerable hosts via drive-by downloads, i.e. web pages that attempt to exploit their visitors by installing and running malware automatically. During that time we have investigated billions of URLs and found more than three million unique URLs on over 180,000 web sites automatically installing malware. During the course of our research, we have investigated not only the prevalence of drive-by downloads but also how users are being exposed to malware and how it is being distributed.'"
During that time we have investigated billions of URLs and found more than three million unique URLs on over 180,000 web sites automatically installing malware 180,000 out of billions doesn't seem like a lot to me.
Did Google consider itself to be a source of malware? http://blog.opendns.com/2007/05/22/google-turns-the-page/
I found it quite interesting that the methodology of the research doesn't even bother to check sites with Mac OS X or Linux operating systems. But on the server side, Apache websites running outdated versions of PHP were singled out for comment.
In all there were twice as many compromised IIS servers as Apache, but fully 50% of all compromised Apache servers were running some version of PHP.
It was also interesting to note that computer-related websites ranked second only to social networking sites as most likely to be compromised with redirections to malware sites. Seems we might want to tone down our holier-than-thou rhetoric. 8^)
Crumb's Corollary: Never bring a knife to a bun fight.
It occurred to me that if Google started desisting sites that tried to implant malware into visitors computers, then webmasters would be much more diligent about keeping the crap off their sites, or at least keep a few more hapless victims out of harm's way.
Apocalypse Cancelled, Sorry, No Ticket Refunds
The problem is with the client software. I can understand the danger of sites that try to fool you into downloading and running an application, or infected media that harnesses an exploit in an application - but automatically infecting the machine just by visiting the site is beyond belief. There's a serious problem with what the "web" has become, forced upon us by reckless and naive developers. The WWW and HTML was never meant to be something that runs active code on the client. Period. Most of us realise there is no way this problem can ever be solved without revising exactly what a browser is supposed to be, as long as browsers will run code instead of interpreting data there will always be malicious sites set up to exploit this.
I have to observe a cast iron policy in my work. It means that quite a few sites on the internet are unavailable, but since they are mostly entertainment based it isn't a serious loss. No Javascript, no ActiveX, no Macromedia Flash. My activities are limited to viewing HTML and PDFs, even animated GIFs are blocked. In many years we have had no malware incidents (that I know of). Sometimes it's absolutely necessary to view a site containing potentially insecure content, so there is a "dirty machine" which is not allowed to connect to anything else and is wiped and reinstalled weekly.
The problem is that even serious academic and scientific sites (that should know better) are starting to add Flash plugins and heavy scripting, so it's getting hard for conscientious users to maintain security even where they want to. Insecure technology is being forced upon us by the site developers.
It would be nice if Google could display whether a site needs JavaScript, Flash or whatever and be able to search for HTML only content. The difficult way is to use Google Cache in text only mode of course.
You'll start seeing people use H1 for everything. If you are lucky they'll override it with a style sheet so it doesn't look obnoxious.
I wonder if Google has ever considered a moderation system, allowing logged-in Google users to rank the results of their searches on a random and infrequent basis. It would be easy enough to have the "click here to open" link change to a "click here to open, and open survey in new tab/window" if the user said they were willing to moderate search results.
If a page got a bad "reputation" for a given search, its rank would go down for that particular search.
If a page got a bad "reputation" as a malware haven, link farm, or other abusive page, that page would be punished.
If a page got flagged as "illegal content" Google would drop the comment with a note saying "We are not the police, but please contact your local or national police. Click here for a list of national police web sites worldwide."
If a page got flagged as a copyright violation, Google would drop the comment with a note saying "We are not in the business of enforcing private court actions. To find a copyright attorney, click here."
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
I wonder if there are plans to release this data to the general public. Someone could then write a pretty useful Firefox extensions that would warn or prevent people from even going to these sites.
Having first been unable to use google translate and now google search due to the "Error- Your request appears to be virus related please scan your computer for malware" I do wonder how sound any google analysis of malware is. If they have problems distinguishing between my computer that is not malware infected and the transparent port 80 proxy for my home cable ISP which is shared by 100,000s of computers some of which are obviously malware infected, then what hope a useful analysis of the much more devious and murky world of drive-by installers?
The underlying problem is that advertising space is often syndicated to other parties who are not known to the web site owner. Although non-syndicated advertising networks such as Google Adwords are not affected...
Did you catch the above line in their article?
Power tends to corrupt, and absolute power corrupts absolutely.
In the 10 months of data the researchers used, Google found 9,340 distribution sites. The other 180,000 sites simply redirect you to the the distribution site, which is where you download the malware.
It gets better - those 9340 distribution sites are under the aegis of only 500 autonomous systems. Which means Google could send their list to those 500 AS's - and each would have (on average) around 20 malware sites to clean up. After this, Google could keep notifying AS's of the distribution sites found (less than a thousand a month).
Looks like a very measurable and approacheable problem now! I can't wait for Google's spam report. (They are working on one, aren't they?)