Google Talks About the Dangers of User Content
An anonymous reader writes "Here's an interesting article on the Google security blog about the dangers faced by modern web applications when hosting any user supplied data. The surprising conclusion is that it's apparently almost impossible to host images or text files safely unless you use a completely separate domain. Is it really that bad? "
...is it a server problem, with the way it interprets record data, or the browser (any browser) (maybe as instructions rather than markup)? I'm guessing server in this case, since if the stream is intercepted and there's a referrer URL that directly references an image or other blob on the same or another server on a subdomain, that could be used to pwn the account/whatever... I'm not up on that sort of hack (you can probably tell). I don't quite get how hosting blobs on an entirely different domain would mitigate against that hack, since you would require some sort of URI that the other domain would recognise to be able to serve up the correct file - which would be in the URL request! Someone want to try and make sense of what I'm trying to say here?
Operation Guillotine is in effect.
Convert the file to the site supported format and quality level in sandbox.
Tadaaaa,,,
Google's solution is effectively to make all content belong to Google.
Gooooo cloud!
For all its transparency, I've yet to see a working list of security breach attempts made on Google servers. I bet there are many, and it would be useful to know just the source and method if nothing more.
Why not check HTTP_REFERER variable and not serve up content if missing or not from the sites domain?
The ususal objections about not trusting browsers seem to be misplaced... You can trust the browser when you want the browser to protect the end user.
Another objection has to do with hacks in ancient versions of flash and other machinary that would allow referer checks to be forged/circumvented. If you are asserting this you need to show how it can still be done in 2012. These vulnerabilities were closed up years ago.
The only downside I can see you couldn't make content externally linkable from other sites other than your own which is the behavior most sites seem to prefer anyway.
This is what happens when you try to be lenient with markup instead of strict (note: compliant does not preclude extensible), and then proceed to use a horribly inefficient and inconsistent (by design) scripting language and a dysfunctional family of almost sane document display engines combined with a stateless protocol to produce a stateful application development platform by way of increasingly ridiculous hacks.
When I first heard of "HTML5" I thought: Thank Fuck Almighty! They're finally going to start over and do shit right, but no, they're not. HTML5 is just taking the exact same cluster of fucks to even more dizzying degrees. HOW MANY YEARS have we been waiting for v5? I've HONESTLY lost count and any capacity to give a damn when we reached a decade -- Just looked it up, 12 years. For about one third the age of the Internet we've been stuck on v4.01... ugh. I don't, even -- no, bad. Wrong Universe! Get me out!
In 20XX when HTML6 may be available I may reconsider "web development". As it stands web development is chin-deep in its own filth which it sprays with each mention, onto passers by and they receive the horrid spittle joyously not because its good or even not-putrid, but because we've actually had worse! I can crank out a cross platform pixel perfect native application for Android, iOS, Linux, OSX, XP, Vista, Win7, and mother fucking BSD in one third the time it takes to make a web app work on the various flavours of IE, Firefox, Safari, Chrom(e|ium). The time goes from 1/3rd down to 1/6th when I cut out testing for BSD, Vista, W7 (runs on XP, likely runs on Vista & Win7. Runs on X11 + OpenGL + Linux, likely builds/runs on BSD & Mac).
Long live the Internet and actual cross platform development toolchains, but fuck the web.
The easiest way to secure embedded content would be a sandbox tag that allows to limit what kind of content can be inside of it.
Put down the bong and step AWAY from the computer.
Images and text can be sanitized reliably. The problem is that this strips out all of the non-essential features. Users have a hard time understanding that, because users do not understand the trade-offs involved.
But the process is easy: Map all images to meta-data and compression free formats (pnm, e.g.) then recompress with a trusted compressor. For text, accept plain ASCII, RTF and HTML 2.0. Everything else, convert either to images or to cleaned PDF/Postscript by "printing" and OCR'ing.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Anyway, are you honestly saying that it is better to post on /. while not high?
For you? Definitely.
Your past two retorts have been boring. You have one more try if you can think up something more witty, otherwise I'm closing this tab.
I do appreciate the sentiment, though.
Assuming that today I am high, my posts are even better when I'm not high.
Stop extending HTML! HTML does not need more tags. HTML was not designed to be a presentation language for applications and certainly not to be an environment for running applications; it was designed to be a hypertext document language (yes, "hypertext" is a word with meaning beyond HTML). The worst thing we did was to allow HTML documents with embedded programs -- applets, Javascript, etc.
The real answer is a new standard that is designed for application presentation and deliver, that does not have so much in-band signaling. We need to get it right the first time by building security into the system, not extend an already bloated monstrosity to make up for the inevitable security problems that result from turning a language for describing documents into a platform for running distributed software with malicious users.
Palm trees and 8
What about the fact that some companies don't want their staff pulling images from sites like google, and would block the images domain, but allow the search domain?
If it were all one domain and not separated, then companies of this mindset would have to make a choice of blocking all of google, or blocking merely the images. Many of Google's ads are text based, and they would lose money if the didn't offer an alternative that would allow companies to selectively block those.
Some regimes require families to have a content filter either on their computer or on their ISP's router that is configured to block all domains with non-premoderated user-generated content if they have children below certain age. So, if a site contains a mixture of known-safe content and user-generated content on the same domain, it will be blocked completely. That's definitely suboptimal.
You don't have a method that can output strings, at all. You have a method that can output HTMLString, and it escapes everything it outputs. If you want to output raw HTML, you have RawHTMLString. Makes it much harder to make a mistake when you're doing Response.Write(new RawHTMLString(userField)).
Interesting technique. But how much runtime overhead do all those constructors impose for Java, C#/VB.NET, PHP, and Python?
Where originally scripts could only be defined in the HTML header, some not-to-be-named company in Redmond
It wasn't Nintendo of America, was it? :-p
decided it was a good idea to permit defining them in the document body as well.
Anywhere you have HTML element attributes beginning with on, you have scripts in the body. It's been so long ago, I can't remember: did Netscape's original version of JavaScript have onclick or onmouseover?
Before SVG, and even now with Internet Explorer on Windows XP, SWF was the most widely compatible format for displaying vector pictures on a PC.
You cite a multi-TB IDS log. May I have it filtered to the cases that came closest to a substantial intrusion?
If you actually read the article, you'd know that there are stupid browsers out there that will happily interpret a perfectly valid ASCII text file served as text/plain as HTML, making your "sanitizing" of it by requiring it to be plain ASCII text ineffective. :(
Another objection has to do with [...] machinary that would allow referer checks to be forged/circumvented. If you are asserting this you need to show how it can still be done in 2012.
Because writing a script to forge the Referer (sic) header is trivial.
Go ahead and show us how please.
Why not check HTTP_REFERER variable and not serve up content if missing
Because a lot of proxies and web browser extensions strip Referer for privacy reasons.
And madness seems like madness to those who are insightful.
A madman is more likely to seriously claim to be insightful than an insightful man is to claim he is mad.
It will be in April 2014 when Windows XP, the operating system for which the latest version of the bundled browser is IE 8, leaves extended support.
The real answer is a new standard that is designed for application presentation and deliver
That's been tried, in the form of Flex and Silverlight. Good luck getting Apple to adopt your proposed new standard.
They could be even worse when you're not high.
However, this doesn't excuse you being high and posting dreck like you do.
Here's the give: "In the days of static HTML and simple web applications, giving the owner of the domain authoritative control over how the content is displayed wasn’t of any importance."
"giving the owner of the domain authoritative control over how content is displayed"
The article says no more about this, but instead proceeds to (correctly) detail a number of flaws with common web app protocols and procedures and how Google deals with them.
I agree with Google - web apps suck eggs. The world could really use something better. But be very careful what you wish for, because for all of it's warts, web apps remain one of the only viable ways to produce widely available applications using open standards. Take that away, and we're back to the 1980's, when the only way to do anything was to serve at the caprice of proprietary vendors.
This was a real problem back in the 1980s. Everytime I would connect to a BBS my computer would execute any code it came across, which made it very easy for viruses to infect my PC. But lucky for me, in the early 90's the world wide web came into being and I didn't have to run executable code just to view content that someone else posted. The PC was insulated from outside threats by viewing the web "pages" only through a "web browser" that only let you view the content, which could be innocuous text, graphics, images, sound, and even animation that was uploaded to the net by way of a non-executable markup language known as HTML. It was at this time that the whole world began to use their home computers to view content online because it was now safe for amateurs and noobs to connect their PCs to the internet without any worries of being inundated with viruses and other malware.
Today I only surf the web with browsers like Erwise, Viola, Mosaic, and Cello. People today are accessing the internet with applications that run executable code, such as Internet Explorer and Firefox. Very dangerous for amateurs and noobs.
I googled and found that it is TRUE...
The problem is you currently can't escape everything reliably.
Why? Because the mainstream browser security concept is making sure that all the thousands of "Go" buttons are not pressed aka "escaped". But people are always introducing new "Go" buttons. If your library is not aware of the latest stuff it will not escape the latest crazy "Go" button the www/html/browser bunch have come up with.
So in theory a perfectly safe site could suddenly become unsafe, just because someone made a new "Go" button for the latest browser. Your library could also parse things differently from the victim browser.
Many years ago I proposed a tag to disable any active stuff. A "Stop" button if you like in a world full of "Go" buttons. But most of the browser and W3C people weren't interested. If they had done it, a lot of those worms (MySpace etc) wouldn't have worked at all.
Only recently they have finally come up with something called Content Security Policy: https://developer.mozilla.org/en-US/docs/Security/CSP/Introducing_Content_Security_Policy
"Stop" buttons aren't 100% but it's way easier to specify a "Stop" than it is to make sure that all the hundreds of current AND future "Go" buttons are properly escaped.
Car Analogy: before CSP, browsers were like cars with hundreds of accelerator pedals. To stop you had to make sure ALL the pedals were not pressed!
Anyone who thinks escaping is easy to do 100% should go look at the various security researcher/hackers guides on exploiting stuff. Especially if you are trying to still allow HTML content (say from advertisers or HTML email for your users). It's easy if you are only going to allow ASCII text. But once you throw in HTML and unicode, it all starts to get complicated.
A long the same line, someone has also described and published tools to create JavaScript using only the following characters: ()[]{}!+
The blog post was a bit terse, but I gather one of the main problems is the following:
Google lets users upload profile photos. So when anyone views that user's page, they will see that photo. But, malicious users were making their photos files contain Javascript/Java/Flash/HTML code. Browsers (I think it's always IE) are very lax and will try to interpret files how they please, regardless of what the web page says. So, webpage says it's pointing to a IMG, but some browsers will interpret it as Javascript/Java/Flash/HTML anyway once they look at the file. So now a malicious user can serve up scripts that seem to be coming from Google.com, and so they are given a lot of access at Google.com and break their security (e.g., let you look at other people's private files).
Their solution: user images are hosted at googleusercontent.com. Now, if a malicious user tries to put a script in there, it will only have the privileges of a script run from that domain--which is no privileges at all. Note this just protects Google's security...you're still running some other user's malicious script. Not google's problem.
The article then discusses how trying to sanitize images can never work, since valid images can appear to have HTML/whatever in them, and their own internal team worked out how to get HTML to appear in images even after image manipulation was done.
Shorter summary: Browsers suck.
And madness seems like madness to those who are insightful.
Or perhaps these self-appointed judges are mere dullards assuming themselves to have insight.
I read the TFA, that's a great summary.
It's like waking up in a crappy mirror universe where all the work that we have done on security in the past 10 years is out the window, because unbeknownst to anyone but the browser vendors, our web browsers will go ahead and execute code embedded in non-executable mimetypes.
Would it have been so hard to limit JavaScript execution to the handful of content types where it is supposed to be found? Apparently. So now images are Turing-complete, and all your cookies can be lifted by someone who puts <script src="http://private.com/users/you/profile.jpg"></script> in a page you visit.
Images and text can be sanitized reliably. The problem is that this strips out all of the non-essential features. Users have a hard time understanding that, because users do not understand the trade-offs involved.
But the process is easy: Map all images to meta-data and compression free formats (pnm, e.g.) then recompress with a trusted compressor. For text, accept plain ASCII, RTF and HTML 2.0. Everything else, convert either to images or to cleaned PDF/Postscript by "printing" and OCR'ing.
If you'd read TFA, you'd know that it explains why this is insufficient:
For a while, we focused on content sanitization as a possible workaround - but in many cases, we found it to be insufficient. For example, Aleksandr Dobkin managed to construct a purely alphanumeric Flash applet, and in our internal work the Google security team created images that can be forced to include a particular plaintext string in their body, after being scrubbed and recoded in a deterministic way.
In your case, yes. Clip related: http://www.youtube.com/watch?feature=fvwp&NR=1&v=WGmY96qhnBI
Google around.
around didn't provide relevant results.
But with the literal-minded housekeeper costume off, forge referer and spoof referer still don't. This page is from 2006, and this page likewise explains a flaw that has since been fixed. This page claims that it's possible to forge a referer in the visitor's browser using redirection, but only from a domain that the attacker controls. This result claims that the only way is to get the user to install a plug-in: "If you want to redirect a visitor to another website and set their browser's referer to any value you desire, you'll need to develop a web browser-plugin or some other type of application that runs on their computer. Otherwise, you cannot set the referer on the visitor's browser." A bunch of results were links to such plug-ins, but the viewer is likely to decline the plug-in installation. What am I missing?
The point of the article is that they can't. Internet Explorer can be coerced into interpreting JPEG images as HTML, interpreting ASCII text as Flash, and interpreting text/plain documents as text/html, among other things. You can also play games with the encoding-recognition code by tweaking the first few bytes of the file, such that a document uploaded as ISO-8859-1 is interpreted by IE as UTF-7, or whatever other encoding suits your purposes. Note that in all of these attacks, the file is entirely valid in its original format, so there is nothing the server can do to prevent them.
"They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
xhtml did save us. It was easy to check for well-formed xhtml with all the xml tools. But apparently proper xml is so hard to write these days, that in its shiny new specification, html5 did away with that entirely and went back to the old html model of a stream of junk tags where anything goes and the browser does its best to interpret it.