Google's Research on Malware Distribution
GSGKT writes "Google's Anti-Malware Team has made available some of their research data on malware distribution mechanisms while the research paper[PDF] is under peer review. Among their conclusions are that the majority of malware distribution sites are hosted in China, and that 1.3% of Google searches return at least one link to a malicious site. The lead author, Niels Provos, wrote, 'It has been over a year and a half since we started to identify web pages that infect vulnerable hosts via drive-by downloads, i.e. web pages that attempt to exploit their visitors by installing and running malware automatically. During that time we have investigated billions of URLs and found more than three million unique URLs on over 180,000 web sites automatically installing malware. During the course of our research, we have investigated not only the prevalence of drive-by downloads but also how users are being exposed to malware and how it is being distributed.'"
During that time we have investigated billions of URLs and found more than three million unique URLs on over 180,000 web sites automatically installing malware 180,000 out of billions doesn't seem like a lot to me.
As if we need to ask.
Where is the page listing each of the bad sites? I'd like to get started on my Virus Aquarium
Did Google consider itself to be a source of malware? http://blog.opendns.com/2007/05/22/google-turns-the-page/
Three million out of billions is not bad, assuming randomness (only, say 1 in 1000 chance of using a bad URL), but it is a lot worse than 180k out of billions.
However not all URLs are used equally. Bad URLs linked to some popular pron site, for instance, will get hit a lot more than Joe Sixpack's facebook site.
Engineering is the art of compromise.
It occurred to me that if Google started desisting sites that tried to implant malware into visitors computers, then webmasters would be much more diligent about keeping the crap off their sites, or at least keep a few more hapless victims out of harm's way.
Apocalypse Cancelled, Sorry, No Ticket Refunds
The problem is with the client software. I can understand the danger of sites that try to fool you into downloading and running an application, or infected media that harnesses an exploit in an application - but automatically infecting the machine just by visiting the site is beyond belief. There's a serious problem with what the "web" has become, forced upon us by reckless and naive developers. The WWW and HTML was never meant to be something that runs active code on the client. Period. Most of us realise there is no way this problem can ever be solved without revising exactly what a browser is supposed to be, as long as browsers will run code instead of interpreting data there will always be malicious sites set up to exploit this.
I have to observe a cast iron policy in my work. It means that quite a few sites on the internet are unavailable, but since they are mostly entertainment based it isn't a serious loss. No Javascript, no ActiveX, no Macromedia Flash. My activities are limited to viewing HTML and PDFs, even animated GIFs are blocked. In many years we have had no malware incidents (that I know of). Sometimes it's absolutely necessary to view a site containing potentially insecure content, so there is a "dirty machine" which is not allowed to connect to anything else and is wiped and reinstalled weekly.
The problem is that even serious academic and scientific sites (that should know better) are starting to add Flash plugins and heavy scripting, so it's getting hard for conscientious users to maintain security even where they want to. Insecure technology is being forced upon us by the site developers.
It would be nice if Google could display whether a site needs JavaScript, Flash or whatever and be able to search for HTML only content. The difficult way is to use Google Cache in text only mode of course.
Searchers won't use your engine if it does not give them what they want.
I have seen search results on Google that show a warning that the site is known to contain malware. Perhaps they just censor the listings outright in other countries though.
You'll start seeing people use H1 for everything. If you are lucky they'll override it with a style sheet so it doesn't look obnoxious.
I wonder if Google has ever considered a moderation system, allowing logged-in Google users to rank the results of their searches on a random and infrequent basis. It would be easy enough to have the "click here to open" link change to a "click here to open, and open survey in new tab/window" if the user said they were willing to moderate search results.
If a page got a bad "reputation" for a given search, its rank would go down for that particular search.
If a page got a bad "reputation" as a malware haven, link farm, or other abusive page, that page would be punished.
If a page got flagged as "illegal content" Google would drop the comment with a note saying "We are not the police, but please contact your local or national police. Click here for a list of national police web sites worldwide."
If a page got flagged as a copyright violation, Google would drop the comment with a note saying "We are not in the business of enforcing private court actions. To find a copyright attorney, click here."
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Well its in Google's best interest to fight this, as Malware has the potential to affect their business.
Really, as much as I am not a MS basher, malware is almost entirely Microsoft's fault. If they had paid attention back in the day to security, we wouldn't have the steaming swamp of malware we have now.
The only serious way to fight malware is to reduce the potential infection hosts.
fighting this is just like fighting any sort of sickness or plague. If you have enough immunized hosts, they the issue won't be as bad.
That's a stupid idea. It's been tried before against spam, look up DNSBL.
It's a shame that Google chose to not identify the three AV vendors it tested. Their ability to protect against malware ranged from bad (~80%) to abysmal (~20%). To identify them would have been a public service for us and a motivation for them.
Having first been unable to use google translate and now google search due to the "Error- Your request appears to be virus related please scan your computer for malware" I do wonder how sound any google analysis of malware is. If they have problems distinguishing between my computer that is not malware infected and the transparent port 80 proxy for my home cable ISP which is shared by 100,000s of computers some of which are obviously malware infected, then what hope a useful analysis of the much more devious and murky world of drive-by installers?
How is this a good idea? Sure, having headings suggests that the author may have gone to some trouble to structure the page, but it's no real indicator of quality. A script can easily crank out reasonable-looking headings. Same goes for HTML/XHTML compliance.
Punishing JavaScript will punish everyone using Ruby on Rails, Wordpress, or anything else that does AJAX stuff. Sure, JavaScript can be used to do bad things, but a lot of UI enhancement and "Web 2.0" stuff depends on it.
RSS feed? Only relevant for blogs/news/comics/etc. RSS feeds are not relevant or useful for reference material, or other relatively static content. And once again, they can be cranked out by a script. All those "TPG Feed" meta-porn sites have RSS feeds. That doesn't give any indication of quality.
Hyperlink directly after hyperlink will penalise all the sidebars on Slashdot, and on blogs. Penalising multi-word hyperlinks will penalise tables of contents in books and research papers.
As to your format snobbery, I have a default installation of Firefox on a Windows machine here at work. I have a default installation of Firefox on a Mac at home. Neither of them can play FLAC or Ogg/Vorbis. Why not reward standard formats like MPEG and 3GP? FLAC and Ogg/Vorbis may not be patent-encumbered, but they are developed in a closed way, by single entities. Real standards should be controlled by real standards bodies.
Usually, good conferences and journals have an anonymous peer review process. I find if very odd that google researchers chose to publicize their paper before the peer review process is done. That is at least lack of decorum, IMO.
One site I work on got hit by a PHPBB SQL injection attack and had a tiny iframe inserted into the forum header that pointed to a well-known malware site, hightstats.net (and if you're curious the malicious script is in the strong/044 folder). Google picked up on the iframe's contents being a malicious script and added the malware warning to the search results pertaining to the forums section of our website.
I just wonder how it is that hightstats.net can still be in existence when it contains known malicious stuff that hackers are inserting into unwary websites?!
-- thinkyhead software and media
Chinese firewall installed upside down!
Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
The underlying problem is that advertising space is often syndicated to other parties who are not known to the web site owner. Although non-syndicated advertising networks such as Google Adwords are not affected...
Did you catch the above line in their article?
Power tends to corrupt, and absolute power corrupts absolutely.
2/3 of all malware distribution sites & sites that link to them are hosted in China.
The next worst offender is the US with 1/6.
About 3.5M websites attempt to send you to exploits from 180K distribution sites.
63% of the 180K malicious sites are IIS, 33% are Apache, and a handful are other.
80% of malware from not in ads (e.g. iframes) was within 4 redirects of the malware distributor.
80% of malware from ads was more than 4 redirects from the distributor.
3/4 of distribution sites and 1/2 of landing sites are in 2 blocks occupying 6.5% of IP4.
Among drive-by downloads, 1/2 alter your startup, 1/3 attack your security, 1/4 corrupt your preferences, and 7% install BHOs.
87% of outbound connections the malware initiates are HTTP, 8.3% are IRC.
The three AV engines tested against malware retrieved by the study had detection rates of about 35, 50, and 70%.
The part I find scariest is the 3.5M malware fronts. I mean, there are only about 70M active hosts on the entire Internet - that's 5 percent! Since I think that trying to make programmers these days write secure code is a lost cause, we should focus on breaking up the software monoculture. This kind of shit really starts to lose it's efficacy if only 1/4 or 1/5 attempts even attack the right browser...
ICANN allows the sites (typos and fronting).
IE, Outlook, and most other web/email clients take you to them happily.
And Google funds the whole ecosystem with their ads.
Maybe Google should look in a mirror once in a while. Becasue in the mirror it doesnt say "do no evil" it says "be a greedy profit hungry corporation or get sued by the shareholders and goto jail"
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
maybe every time a look at a phishing site out of curiosity I'll tell them my email address is qwerty@highstats.net.
Apocalypse Cancelled, Sorry, No Ticket Refunds
But there's a huge business of websites in China that are used by spammers, phishers, and other parasites, because the Internet means that you can connect to anywhere in the world for the cost of a few hundred milliseconds, and China not only as a large technically skilled population, a lot of infrastructure, and an imbalance in bandwidth usage, and it also has a regulatory attitude that doesn't care too much what you do to make money selling foreigners what they want. (And DNS doesn't care what language your sysadmins speak - you can be a
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
er, ok.
Nothing of *value* then. Certainly nothing that would stop me wanting to have their results filtered from my google search results.
I guess I was talking about sites that had legitamate content but which had been poisoned by various malware or whatever.
Max.
The paper points out that most of the attacks involve redirection of some portion of page content. That's a useful piece of information, because, other than for advertising purposes, redirection of IFRAME items and images is quite rare. A useful blocking strategy would be to block all redirects below the top level page. Many ads will disappear; no great loss.
Checking for hostile full web pages is already being done. McAfee SiteAdvisor was the first to do that, then Google copied them. Our "bottom feeder filter", SiteTruth, does some of that too, although it throws out far more sites than McAfee or Google do, just by insisting that some identifiable business stand behind any page that looks commercial.
Google's revenue model depends, to some extent, on those "bottom feeder" sites: all those anonymous "landing pages", "directory pages", "made for AdWords pages", and similar junk. Those things bring in substantial AdWords revenue, although they don't usually generate much in the way of sales for advertisers. Throwing them out of the "Google Content Network" would cut Google's ad income. This is where "don't be evil" collides with Google's profitability.
This looks like a solveable problem, but the solution will come from the security companies, not the search companies. The search companies can't afford to fix it.
In the 10 months of data the researchers used, Google found 9,340 distribution sites. The other 180,000 sites simply redirect you to the the distribution site, which is where you download the malware.
It gets better - those 9340 distribution sites are under the aegis of only 500 autonomous systems. Which means Google could send their list to those 500 AS's - and each would have (on average) around 20 malware sites to clean up. After this, Google could keep notifying AS's of the distribution sites found (less than a thousand a month).
Looks like a very measurable and approacheable problem now! I can't wait for Google's spam report. (They are working on one, aren't they?)
A potential vector for redirects to malware sites lies in bogus registrations on bulletin boards. A board of which I'm a member has seen a large number of such registrations purporting to originate in England, with links to sites in eastern Europe. Redirection? Walks like a duck ...
Shouldn't a centralized spider md5 (or the like) legit binaries with a central CA like authority to verify and identify these exploiters?
M
JavaScript (yes) = Punish website
JavaScript (no) = Reward website
JavaScript OnLoad = Double punish website
This seems pretty silly - just because a website uses javascript doesn't mean it *requires* it. Well designed web sites work just fine without JS but if you have it then they give you an enhanced experience.
HTML/XHTML compliance = Reward website
HTML/XHTML not compliance = Punish website
Sadly Google doesn't properly support XHTML, so if you are punished anyway for using XHTML (why?!)
RSS feed = Reward website
RSS feeds are not appropriate for all websites - rewarding people for using inappropriate technologies is silly.
http://blog.nexusuk.org
that because of outdated LAMP configs
The GoogleBot doesn't execute JavaScript. Google listing any content from a given site means it does, to a certain point, degrade gracefully.
/search?q=your+query), you probably didn't mind not having to click into that textbox, now did you? JavaScript can cause some problems, but implemented sensibly (by the browser devs) it is no security threat and used responsibly (by web devs) has great benefits.
Also, what's your problem with JavaScript? If you ever used the Google front page (instead of your browser's quick search function or
If I gathered this right, then Google can parse the content behind the links they serve, to the point of ientifying the drive-bys? Okay, so why not block them at that point? And why not throw enough CPU power to parse the results before they're returned, so as to protect the users? Yeah, tag this "whatcouldpossiblygowrong".
What, then, about a browser that can identify a drive-by, by pre-parsing the content behind the links it shows. Heuristics would do that Real Well, too; I can think of a zillion methods to do Just That off the top of my head. "If it ends up writing to disk, don't." How hard is THAT?
"Yes but it uses vulnerabilities..." Yes, and? Run the browser in a VM, then, and meta-parse if it ever tries to write to a part of the disk that it should not access.
"That would be slow..." Well, seeing how many people use Azureus, a program's performance does not affect adoption. (Not that a browser in Java would be a good idea, unless you've got an FPGA wired to run Java bytecode natively. "Java sucks because Java is slow. Java is not for the desktop, because it is too slow. The desktop needs zero latency, never wait. Java can't be fast enough unless you've got an FPGA wired to run Java bytecode natively." (Repeat until you got it through your head. Azureus is painful to use even with a Raptor HD and a Q6600@3GHz and DDR3@1333MHz, and there exists nothing faster as of now. No, not the dual quad-core Xeon systems, they're stuck with DDR2@667MHz.)
I can't even begin to understand how it is possible that browsers suck that much at security. All the problems they have are long-solved, or can all be solved in under five seconds by thinking about them. Let's fix the whole lot of them right now, mkay?
-What does a browser do?
-"Send request, get reply, render."
So how is it possible that the *browser* can register system-wide extensions? THAT does not parse in MY brain. Speaking of IE, just a thought : IE could render URLs in a definable way in the address bar. Now who could have ever been stupid enough to think that was a good idea? "IE : the browser with phishing support".
And if Firefox is not more secure, then how? Why? Did Netscape suck that much? (Yes, I know : "yes".) Just re-write the whole thing from scratch then, it's not as if it was hard. "Send request, get reply, render."
Making laws based on opinions that stem up from false informations leads to witch hunts.
I'd have thought it'd be way higher than that.* *if the anti-malware companies are to be believed** ** Some of them can't be believed because their anti-malware contains malware.
No portion of this post may be rebroadcast without the express, written consent of Major League Baseball.
Besides, my mobile phone doesn't handle Javascript very well at all. Sites which don't degrade nicely aren't viewable on my phone, so it would be nice if Google didn't return those queries (particularly when I'm using Google from my phone.)
Of course, there are also problems with the design of Javascript (such as the lack of threading) but that's not a reason to avoid its use--just a reason to dislike it.
Anyways, JavaScript may not be the biggest of the web's security problems. Cross-site-scripting can be accomplished almost as easy with pure html (e.g. instead of redirecting victims, a full-screen IFrame is laid over the whole affected page. Additional benefit: address bar still shows the correct domain); CSRF is a bit more limited (to GET requests, to be precise) but certainly still possible.
I hope you're kidding. We're talking about a language to enhance the web surfing experience, not the language you're going to write your next huge application in. Also, JavaScript does provide enough possibility of asynchronous operations; Prototype.js and the like make them easy to use. Adding a threading framework could very well introduce huge gaps between browsers. Again.
http://www.google.com/search?as_q=&hl=en&num=10&btnG=Google+Search&as_epq=spybot&as_oq=&as_eq=&lr=&cr=&as_ft=i&as_filetype=&as_qdr=all&as_nlo=&as_nhi=&as_occt=any&as_dt=i&as_sitesearch=&as_rights=&safe=images
I think since Google has the technology to discover and index malware distributing sites, and they should provide a new feature which will put a small red warning beside malicious results. Like McAfee SiteAdvisor service dose. This will decrease the number of infected machines in the Internet, and this is very easy to be noticed by novice users. ExtremeSecurity Blog Admin http://extremesecurity.blogspot.com/