Google Talks About the Dangers of User Content
An anonymous reader writes "Here's an interesting article on the Google security blog about the dangers faced by modern web applications when hosting any user supplied data. The surprising conclusion is that it's apparently almost impossible to host images or text files safely unless you use a completely separate domain. Is it really that bad? "
...is it a server problem, with the way it interprets record data, or the browser (any browser) (maybe as instructions rather than markup)? I'm guessing server in this case, since if the stream is intercepted and there's a referrer URL that directly references an image or other blob on the same or another server on a subdomain, that could be used to pwn the account/whatever... I'm not up on that sort of hack (you can probably tell). I don't quite get how hosting blobs on an entirely different domain would mitigate against that hack, since you would require some sort of URI that the other domain would recognise to be able to serve up the correct file - which would be in the URL request! Someone want to try and make sense of what I'm trying to say here?
Operation Guillotine is in effect.
As TFA points out, it is possible to create a Flash applet using nothing but alphanumeric characters. Good luck catching that in your reprocessing.
This is what happens when you try to be lenient with markup instead of strict (note: compliant does not preclude extensible), and then proceed to use a horribly inefficient and inconsistent (by design) scripting language and a dysfunctional family of almost sane document display engines combined with a stateless protocol to produce a stateful application development platform by way of increasingly ridiculous hacks.
When I first heard of "HTML5" I thought: Thank Fuck Almighty! They're finally going to start over and do shit right, but no, they're not. HTML5 is just taking the exact same cluster of fucks to even more dizzying degrees. HOW MANY YEARS have we been waiting for v5? I've HONESTLY lost count and any capacity to give a damn when we reached a decade -- Just looked it up, 12 years. For about one third the age of the Internet we've been stuck on v4.01... ugh. I don't, even -- no, bad. Wrong Universe! Get me out!
In 20XX when HTML6 may be available I may reconsider "web development". As it stands web development is chin-deep in its own filth which it sprays with each mention, onto passers by and they receive the horrid spittle joyously not because its good or even not-putrid, but because we've actually had worse! I can crank out a cross platform pixel perfect native application for Android, iOS, Linux, OSX, XP, Vista, Win7, and mother fucking BSD in one third the time it takes to make a web app work on the various flavours of IE, Firefox, Safari, Chrom(e|ium). The time goes from 1/3rd down to 1/6th when I cut out testing for BSD, Vista, W7 (runs on XP, likely runs on Vista & Win7. Runs on X11 + OpenGL + Linux, likely builds/runs on BSD & Mac).
Long live the Internet and actual cross platform development toolchains, but fuck the web.
Umm, what does your comment have to do with the subject in TFA? They used to host content on google.com, then they moved it to googleusercontent.com for security reasons. If anything they have made it clear that the user owns it, but not for that reason.
The easiest way to secure embedded content would be a sandbox tag that allows to limit what kind of content can be inside of it.
Images and text can be sanitized reliably. The problem is that this strips out all of the non-essential features. Users have a hard time understanding that, because users do not understand the trade-offs involved.
But the process is easy: Map all images to meta-data and compression free formats (pnm, e.g.) then recompress with a trusted compressor. For text, accept plain ASCII, RTF and HTML 2.0. Everything else, convert either to images or to cleaned PDF/Postscript by "printing" and OCR'ing.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
This was a real problem back in the 1980s. Everytime I would connect to a BBS my computer would execute any code it came across, which made it very easy for viruses to infect my PC. But lucky for me, in the early 90's the world wide web came into being and I didn't have to run executable code just to view content that someone else posted. The PC was insulated from outside threats by viewing the web "pages" only through a "web browser" that only let you view the content, which could be innocuous text, graphics, images, sound, and even animation that was uploaded to the net by way of a non-executable markup language known as HTML. It was at this time that the whole world began to use their home computers to view content online because it was now safe for amateurs and noobs to connect their PCs to the internet without any worries of being inundated with viruses and other malware.
Today I only surf the web with browsers like Erwise, Viola, Mosaic, and Cello. People today are accessing the internet with applications that run executable code, such as Internet Explorer and Firefox. Very dangerous for amateurs and noobs.
The blog post was a bit terse, but I gather one of the main problems is the following:
Google lets users upload profile photos. So when anyone views that user's page, they will see that photo. But, malicious users were making their photos files contain Javascript/Java/Flash/HTML code. Browsers (I think it's always IE) are very lax and will try to interpret files how they please, regardless of what the web page says. So, webpage says it's pointing to a IMG, but some browsers will interpret it as Javascript/Java/Flash/HTML anyway once they look at the file. So now a malicious user can serve up scripts that seem to be coming from Google.com, and so they are given a lot of access at Google.com and break their security (e.g., let you look at other people's private files).
Their solution: user images are hosted at googleusercontent.com. Now, if a malicious user tries to put a script in there, it will only have the privileges of a script run from that domain--which is no privileges at all. Note this just protects Google's security...you're still running some other user's malicious script. Not google's problem.
The article then discusses how trying to sanitize images can never work, since valid images can appear to have HTML/whatever in them, and their own internal team worked out how to get HTML to appear in images even after image manipulation was done.
Shorter summary: Browsers suck.