Google Talks About the Dangers of User Content

← Back to Stories (view on slashdot.org)

Google Talks About the Dangers of User Content

Posted by samzenpus on Wednesday August 29, 2012 @06:59PM from the watch-what-you-say dept.

An anonymous reader writes "Here's an interesting article on the Google security blog about the dangers faced by modern web applications when hosting any user supplied data. The surprising conclusion is that it's apparently almost impossible to host images or text files safely unless you use a completely separate domain. Is it really that bad? "

10 of 172 comments (clear)

Min score:

Reason:

Sort:

Re:I don't know if the question should be... by Sarusa · 2012-08-29 19:49 · Score: 5, Informative

It's fundamentally a problem with the browsers. Without getting too technical...
Problem 1: Browsers try real hard to be clever and interpret maltagged/malformed content so people with defective markup or bad mime content headers won't say 'My page doesn't work in Browser X, Browser X is defective!'. Or if the site is just serving up user text in html, stick some javascript tags in the text. Whichever way, you end up so someone malicious can upload some 'text' to a clipboard or document site which the browser then executes when the malicious person shares the URL.
Problem 2: There are a lot of checks in most browsers against 'cross site scripting', which is a page on site foobar.com (for instance) making data load requests to derp.com, or looking at derp.com's cookies, or even leaving a foobar.com cookie when derp.com is the main page. But if your script is running 'from' derp.com (as above) then permissions for derp.com are almost wide open, because it would just be too annoying for most users to manage permissions on the same site. Now they can grab all your docs, submit requests to email info, whatever is allowed. This is why just changing to another domain name helps.
There's more nitpicky stuff in the second half of TFA, but I think that's the gist of it.
Re:It's called reprocessing by Anonymous Coward · 2012-08-29 19:57 · Score: 5, Informative

As TFA points out, it is possible to create a Flash applet using nothing but alphanumeric characters. Good luck catching that in your reprocessing.
Yes, it really is that bad. by VortexCortex · 2012-08-29 19:59 · Score: 5, Interesting

This is what happens when you try to be lenient with markup instead of strict (note: compliant does not preclude extensible), and then proceed to use a horribly inefficient and inconsistent (by design) scripting language and a dysfunctional family of almost sane document display engines combined with a stateless protocol to produce a stateful application development platform by way of increasingly ridiculous hacks.
When I first heard of "HTML5" I thought: Thank Fuck Almighty! They're finally going to start over and do shit right, but no, they're not. HTML5 is just taking the exact same cluster of fucks to even more dizzying degrees. HOW MANY YEARS have we been waiting for v5? I've HONESTLY lost count and any capacity to give a damn when we reached a decade -- Just looked it up, 12 years. For about one third the age of the Internet we've been stuck on v4.01... ugh. I don't, even -- no, bad. Wrong Universe! Get me out!
In 20XX when HTML6 may be available I may reconsider "web development". As it stands web development is chin-deep in its own filth which it sprays with each mention, onto passers by and they receive the horrid spittle joyously not because its good or even not-putrid, but because we've actually had worse! I can crank out a cross platform pixel perfect native application for Android, iOS, Linux, OSX, XP, Vista, Win7, and mother fucking BSD in one third the time it takes to make a web app work on the various flavours of IE, Firefox, Safari, Chrom(e|ium). The time goes from 1/3rd down to 1/6th when I cut out testing for BSD, Vista, W7 (runs on XP, likely runs on Vista & Win7. Runs on X11 + OpenGL + Linux, likely builds/runs on BSD & Mac).
Long live the Internet and actual cross platform development toolchains, but fuck the web.
1. Re:Yes, it really is that bad. by sgrover · 2012-08-29 20:14 · Score: 5, Funny
  
  +1, but tell us how you really feel
2. Re:Yes, it really is that bad. by SuricouRaven · 2012-08-29 20:29 · Score: 5, Insightful
  
  Of course it's a mess. The combination of HTTP and HTML was designed for simple, static documents displaying predominatly text, a little formatting and a few images. By this point we're using extensions to extensions to extensions. It's a miracle it works at all.
3. Re:Yes, it really is that bad. by adolf · 2012-08-29 22:07 · Score: 4, Funny
  
  It's a miracle it works at all.
  It works?
  
  --
  Kid-proof tablet..
Re:I don't know if the question should be... by TubeSteak · 2012-08-29 20:00 · Score: 5, Insightful

It's fundamentally a problem with not validating inputs. Without getting too technical...
Problem 1: Browsers try real hard to be clever and interpret maltagged/malformed content instead of validating inputs.
Problem 2: There are a lot of checks in most browsers against 'cross site scripting', which is fundamentally a problem of not validating inputs.
/don't forget to validate your outputs either.

--
[Fuck Beta]
o0t!
Re:I don't know if the question should be... by 19thNervousBreakdown · 2012-08-29 20:56 · Score: 5, Interesting

I'm actually not a big fan of validating inputs. I find proper escaping is a much more effective tool, and validation typically leads to both arbitrary restrictions of what your fields can hold and a false sense of security. It's why you can't put a + sign in e-mail fields, or have an apostrophe in your description field.
In short, if a data type can hold something, it should be able to read every possible value of that data type, and output every possible value of that data type. That means that if you have a Unicode string field, you should accept all valid Unicode characters, and be able to output the same. If you want to restrict it, don't use a string. Create a new data type. This makes escaping easy as well. You don't have a method that can output strings, at all. You have a method that can output HTMLString, and it escapes everything it outputs. If you want to output raw HTML, you have RawHTMLString. Makes it much harder to make a mistake when you're doing Response.Write(new RawHTMLString(userField)).
A multi-pronged approach is best, and input validation certainly has its place (ensuring that the user-supplied data conforms to the data type's domain, not trying to protect your output), but the first and primary line of defense should be making it harder to do it wrong than it is to do it right.

--
<xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
Re:I don't know if the question should be... by ais523 · 2012-08-29 22:05 · Score: 5, Informative

After seeing a demonstration of a successful XSS attack on a plaintext file (IE7 was the offending browser, incidentally), I find it hard to see what sort of validation could possibly help. After all, the offending code was a perfectly valid ASCII plain text file that didn't even look particularly like HTML, but happened to contain a few HTML tags. (Incidentally, for this reason, Wikipedia refuses to serve user-entered content as text/plain; it uses text/css instead, because it happens to render the same on all major browsers and doesn't have bizarre security issues with IE.)

--
(1)DOCOMEFROM!2~.2'~#1WHILE:1<-"'?.1$.2'~'"':1/.1$.2'~#0"$#65535'"$"'"'&.1$.2'~'#0$#65535'"$#0'~#32767$#1"
My explanation of article by kent.dickey · 2012-08-30 03:35 · Score: 5, Informative

The blog post was a bit terse, but I gather one of the main problems is the following:
Google lets users upload profile photos. So when anyone views that user's page, they will see that photo. But, malicious users were making their photos files contain Javascript/Java/Flash/HTML code. Browsers (I think it's always IE) are very lax and will try to interpret files how they please, regardless of what the web page says. So, webpage says it's pointing to a IMG, but some browsers will interpret it as Javascript/Java/Flash/HTML anyway once they look at the file. So now a malicious user can serve up scripts that seem to be coming from Google.com, and so they are given a lot of access at Google.com and break their security (e.g., let you look at other people's private files).
Their solution: user images are hosted at googleusercontent.com. Now, if a malicious user tries to put a script in there, it will only have the privileges of a script run from that domain--which is no privileges at all. Note this just protects Google's security...you're still running some other user's malicious script. Not google's problem.
The article then discusses how trying to sanitize images can never work, since valid images can appear to have HTML/whatever in them, and their own internal team worked out how to get HTML to appear in images even after image manipulation was done.
Shorter summary: Browsers suck.