Google Talks About the Dangers of User Content

← Back to Stories (view on slashdot.org)

Google Talks About the Dangers of User Content

Posted by samzenpus on Wednesday August 29, 2012 @06:59PM from the watch-what-you-say dept.

An anonymous reader writes "Here's an interesting article on the Google security blog about the dangers faced by modern web applications when hosting any user supplied data. The surprising conclusion is that it's apparently almost impossible to host images or text files safely unless you use a completely separate domain. Is it really that bad? "

172 comments

Min score:

Reason:

Sort:

I don't know if the question should be... by Tastecicles · 2012-08-29 19:17 · Score: 2

...is it a server problem, with the way it interprets record data, or the browser (any browser) (maybe as instructions rather than markup)? I'm guessing server in this case, since if the stream is intercepted and there's a referrer URL that directly references an image or other blob on the same or another server on a subdomain, that could be used to pwn the account/whatever... I'm not up on that sort of hack (you can probably tell). I don't quite get how hosting blobs on an entirely different domain would mitigate against that hack, since you would require some sort of URI that the other domain would recognise to be able to serve up the correct file - which would be in the URL request! Someone want to try and make sense of what I'm trying to say here?

--
Operation Guillotine is in effect.
1. Re:I don't know if the question should be... by Anonymous Coward · 2012-08-29 19:27 · Score: 1
 
 It is a security thing. Scripts from one domain may not modify pages on another.
 So if you mix content from foo.google.com and bar.google.com on the same page then a js from foo can't do anything to the content from bar.
2. Re:I don't know if the question should be... by Sarusa · 2012-08-29 19:49 · Score: 5, Informative
 
 It's fundamentally a problem with the browsers. Without getting too technical...
 Problem 1: Browsers try real hard to be clever and interpret maltagged/malformed content so people with defective markup or bad mime content headers won't say 'My page doesn't work in Browser X, Browser X is defective!'. Or if the site is just serving up user text in html, stick some javascript tags in the text. Whichever way, you end up so someone malicious can upload some 'text' to a clipboard or document site which the browser then executes when the malicious person shares the URL.
 Problem 2: There are a lot of checks in most browsers against 'cross site scripting', which is a page on site foobar.com (for instance) making data load requests to derp.com, or looking at derp.com's cookies, or even leaving a foobar.com cookie when derp.com is the main page. But if your script is running 'from' derp.com (as above) then permissions for derp.com are almost wide open, because it would just be too annoying for most users to manage permissions on the same site. Now they can grab all your docs, submit requests to email info, whatever is allowed. This is why just changing to another domain name helps.
 There's more nitpicky stuff in the second half of TFA, but I think that's the gist of it.
3. Re:I don't know if the question should be... by TubeSteak · 2012-08-29 20:00 · Score: 5, Insightful
 
 It's fundamentally a problem with not validating inputs. Without getting too technical...
 Problem 1: Browsers try real hard to be clever and interpret maltagged/malformed content instead of validating inputs.
 Problem 2: There are a lot of checks in most browsers against 'cross site scripting', which is fundamentally a problem of not validating inputs.
 /don't forget to validate your outputs either.
 
 --
 [Fuck Beta]
 o0t!
4. Re:I don't know if the question should be... by Tastecicles · 2012-08-29 20:03 · Score: 0
 
 answer to problem 1: should browsers, whose primary purpose is to interpret markup language, be specified to interpret markup language and display server-provided content according to that markup, and NOTHING MORE? As in, malformed/maltagged content should be IGNORED (ie dropped, not processed further)?
 Oh, got it. Microsoft helped specify the capabilities of mainstream browsers, didn't they - though not until after they looked at what Nutscrape were trying to do, arseraped them and implemented most of the Nutscrape shit into IE... think back to the .wmf mess, back in the days when .wmf was the preferred file format for print clipart(! Yeah, I know - I still have dozens of CDs full of wmf cliparts)...
 
 --
 Operation Guillotine is in effect.
5. Re:I don't know if the question should be... by Sarusa · 2012-08-29 20:53 · Score: 3, Insightful
 
 This is true! You could even say it's a sooper-dooper-fundamental problem of HTTP/HTML not sufficiently separating the control channel from the data channel and/or not sufficiently encapsulating things (active code anywhere? noooo.)
 But since browsers have actively chosen to validate invalid inputs and nobody's going to bother securing HTTP/HTML against this kind of thing any time soon, or fix the problems with cookies, or, etc etc etc, I figured that was a good enough high level summary of where we're at realistically. Nobody's willing to fix the foundations or 'break' when looking at malformed pages.
6. Re:I don't know if the question should be... by 19thNervousBreakdown · 2012-08-29 20:56 · Score: 5, Interesting
 
 I'm actually not a big fan of validating inputs. I find proper escaping is a much more effective tool, and validation typically leads to both arbitrary restrictions of what your fields can hold and a false sense of security. It's why you can't put a + sign in e-mail fields, or have an apostrophe in your description field.
 In short, if a data type can hold something, it should be able to read every possible value of that data type, and output every possible value of that data type. That means that if you have a Unicode string field, you should accept all valid Unicode characters, and be able to output the same. If you want to restrict it, don't use a string. Create a new data type. This makes escaping easy as well. You don't have a method that can output strings, at all. You have a method that can output HTMLString, and it escapes everything it outputs. If you want to output raw HTML, you have RawHTMLString. Makes it much harder to make a mistake when you're doing Response.Write(new RawHTMLString(userField)).
 A multi-pronged approach is best, and input validation certainly has its place (ensuring that the user-supplied data conforms to the data type's domain, not trying to protect your output), but the first and primary line of defense should be making it harder to do it wrong than it is to do it right.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
7. Re:I don't know if the question should be... by dzfoo · 2012-08-29 21:30 · Score: 3, Interesting
 
 I'm actually not a big fan of validating inputs. I find proper escaping is a much more effective tool, and validation typically leads to both arbitrary restrictions of what your fields can hold and a false sense of security.
 
 OK, fair point. How about if we expand the concept of "validating input" to include canonicalization and sanitation as well? Oh, it already does. Go figure.
 Reducing it to a mere reg-exp is missing the point. Proper canonicalization (and proper understanding of the underlying standards and protocols, but that's another argument) would allow you to use a plus-sign in an e-mail address field.
 But this won't happen as long as every kid fresh out of college wants to roll their own because they known The One True Way to fix it, this time For Real. As long as they keep ignoring everything learned before because, you know, it's old stuff and this is the new technology of The Web, where everything old does not count at all; nothing will change.
 
 A multi-pronged approach is best, and input validation certainly has its place (ensuring that the user-supplied data conforms to the data type's domain, not trying to protect your output), but the first and primary line of defense should be making it harder to do it wrong than it is to do it right.
 "MOAR TECH!!!1" and over-wrought protocols are no silver-bullet against ignorance, naivety, and hubris.
 -dZ.
 
 --
 Carol vs. Ghost
 ...Can you save Christmas?
8. Re:I don't know if the question should be... by Anonymous Coward · 2012-08-29 21:56 · Score: 1
 
 Problem 3: Where originally scripts could only be defined in the HTML header, some not-to-be-named company in Redmond decided it was a good idea to permit defining them in the document body as well.
9. Re:I don't know if the question should be... by 19thNervousBreakdown · 2012-08-29 22:01 · Score: 2
 
 Your solution appears to be, "Do exactly what we've been doing, just more." My rebuttal to that is the entire history of computer security. While it's true that proper understanding of underlying standards and protocols would go a long way toward mitigating the problems, a more complete solution is to make such detail-oriented understanding unnecessary. Compartmentalization of knowledge is, in my opinion anyway, the primary benefit of computers, and the rejection of providing that benefit to other programmers or utilizing it yourself while writing software smacks of programmers who don't want others invading their turf.
 I'll grant you, new does not necessarily mean better. Some new approaches work better, some work worse, but we already know exactly what the old approach accomplishes.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
10. Re:I don't know if the question should be... by ais523 · 2012-08-29 22:05 · Score: 5, Informative
 
 After seeing a demonstration of a successful XSS attack on a plaintext file (IE7 was the offending browser, incidentally), I find it hard to see what sort of validation could possibly help. After all, the offending code was a perfectly valid ASCII plain text file that didn't even look particularly like HTML, but happened to contain a few HTML tags. (Incidentally, for this reason, Wikipedia refuses to serve user-entered content as text/plain; it uses text/css instead, because it happens to render the same on all major browsers and doesn't have bizarre security issues with IE.)
 
 --
 (1)DOCOMEFROM!2~.2'~#1WHILE:1<-"'?.1$.2'~'"':1/.1$.2'~#0"$#65535'"$"'"'&.1$.2'~'#0$#65535'"$#0'~#32767$#1"
11. Re:I don't know if the question should be... by dzfoo · 2012-08-29 23:14 · Score: 2
 
 You misunderstood my point, and then went on to suggest that the "old way" won't work; inadvertently falling into the trap I was pointing out.
 My "solution" (which really, it wasn't a solution per se) is not "more of the same." It is the realization that previous knowledge or practices may not be obsolete, and that we shouldn't try to find new ways to do things for the mere sake of being new.
 A lot, though not all, of the security problems encountered in modern applications have been known and addressed in the past, to various degrees of success. We should embrace this experience and apply it, not shunt it as antiquated.
 Whether you want to admit it or not, lack of input validation and understanding of data encoding at the various transport layers, is the source of most security issues. We should acknowledge this and address it directly.
 You are right, a lot can be done to build solutions into our tools to ease their implementation. However, technology itself won't solve the problem of developers not understanding the risks or why they happen.
 What does not help at all is to hand-wave or diminish this particular problem and blame the tools for not doing our due diligence. Or worse, ignore experience and history and mark it as a new problem, only solvable by more technology.
 dZ.
 
 --
 Carol vs. Ghost
 ...Can you save Christmas?
12. Re:I don't know if the question should be... by Zero__Kelvin · 2012-08-29 23:16 · Score: 1
 
 "Your solution appears to be, "Do exactly what we've been doing, just more."
 No. His solution is that people need to start doing it. You're solution is to ignore solid secure programming practices. In other words, your solution is to keep failing to practice secure programming.
 
 "Some new approaches work better, some work worse, but we already know exactly what the old approach accomplishes."
 Right. And we have also seen what doesn't work. Another way to say it is: "What we've got here is failure to communicate. Some men you just can't reach. So you get what we have here now, which is the way Microsoft wants it... well, Bill Gates gets it. I don't like it any more than you men."
 
 --
 Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
13. Re:I don't know if the question should be... by ultrasawblade · 2012-08-29 23:43 · Score: 1
 
 There is, though. "Control" stuff is supposed to go in the HTTP header and "data" stuff is supposed to go in the HTTP body.
14. Re:I don't know if the question should be... by postbigbang · 2012-08-30 00:01 · Score: 1
 
 It's easier for a lot of coders to just bypass the step of input parsing and validation. That ease, which IMHO amounts to sloppy coding, is a major crux of things like injection problems, downstream coding errors (and far beyond things like simple type mismatch), and eventual corruption.
 For every programmer shrugging it off, there's another wondering if someone did the work, and probing everything from packets to simple scriptycrap to break it open for giggles, grins, and profit. They write long tomes of garbage to assault in various automated ways, big rocks to crash through the straw built from sleazy code. To those that believe in quality, carry on.
 
 --
 ---- Teach Peace. It's Cheaper Than War.
15. Re:I don't know if the question should be... by Anonymous Coward · 2012-08-30 01:01 · Score: 0
 
 You're assuming input comes from a browser using a page you made yourself. That's not how things work. That input can come from code with deliberately malformed data to exploit your system. Which is what's been happening for well over a decade. If you aren't validating input in your server code, what is?
16. Re:I don't know if the question should be... by dzfoo · 2012-08-30 01:06 · Score: 1
 
 Agreed.
 
 --
 Carol vs. Ghost
 ...Can you save Christmas?
17. Re:I don't know if the question should be... by Khyber · 2012-08-30 01:10 · Score: 1
 
 And all of that is on the same data channel. Again, lack of proper separation.
 
 --
 Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
18. Re:I don't know if the question should be... by Anonymous Coward · 2012-08-30 01:18 · Score: 2, Informative
 
 It doesn't "refuse" to serve text/plain, it just makes you ask for it specifically. (Use ?action=raw via index.php and/or format=txt via api.php)
19. Re:I don't know if the question should be... by cdrguru · 2012-08-30 02:51 · Score: 1
 
 I'm not sure you have a firm grasp of the problem.
 The problem, from my reading, can be explained as analogous to having a numeric data item that sometimes gets letters put in it. Rather than rejecting this as invalid browsers are making stuff up as they go along so A=1, B=2, and so on and so forth. This has the obvious benefit to users of not exposing them to the improper construction of web pages, but it does create sort of a sub-standard whereby other authors recognize this work-around and decide to make use of it in a widespread manner. Suddenly, we have a data item that is supposed to accept only numeric values but now it also accepts other things as well and interprets them.
 Maladjusted Teenager then discovers that not only do we have A=1 but on some browser !@#!=exception and makes use of this.
 The problem is the original non-validation and non-rejection of illegal input. Sure it makes this more "user friendly" but it opens huge gaping holes in any sort of standard. It then also encourages folks to intentionally code ABC when they want 123 because the browser accepts it. With widespread enough usage this interpretation is forced on all browsers because otherwise they are left flagging huge swaths of the web as "invalid". Keep doing this sort of stuff and you have the mess that we have today.
 I assure you the solution isn't to not validate and accept anything unless you are prepared to throw out the idea of any sort of restricted content data item and everything becomes a string containing any possible character. And even that doesn't really work because of context - there are contexts where a numeric value is needed and having a non-numeric string is really incorrect. Where we have gone is pushing browsers to "interpret" this as something legal even when they should not. You can see where that has gotten us.
 I'd say the correct behavior in all cases is to not interpret and not accept improper input but to throw it back. Perhaps go to the drastic step of saying because one part of this document is malformed, the document cannot be properly formatted. This would make it a lot more obvious to web designers, developers and authors they have done something wrong the first time they look at what they have done. Instead, we have interpretation trying to cover up mistakes and the result is they are hidden from both the author and the end user.
20. Re:I don't know if the question should be... by Anonymous Coward · 2012-08-30 03:07 · Score: 0
 
 I'm not sure you have a firm grasp of the problem.
 It seems neither do a whole host of people commenting on this issue.
 
 The problem, from my reading, can be explained as analogous to having a numeric data item that sometimes gets letters put in it. Rather than rejecting this as invalid browsers are making stuff up as they go along so A=1, B=2, and so on and so forth.
 No, the problem is users uploading content containing javascript. Then referencing this javascript from an external site. When a user who happens to also be logged into the popular site hosting my content views my page my javascript executes in the context of the other sites domain with access to the users credentials.
 
 I'd say the correct behavior in all cases is to not interpret and not accept improper input but to throw it back. Perhaps go to the drastic step of saying because one part of this document is malformed, the document cannot be properly formatted. This would make it a lot more obvious to web designers, developers and authors they have done something wrong the first time they look at what they have done. Instead, we have interpretation trying to cover up mistakes and the result is they are hidden from both the author and the end user.
 Irrelevant to the topic at hand.
21. Re:I don't know if the question should be... by 19thNervousBreakdown · 2012-08-30 03:50 · Score: 0
 
 Yeah, if everybody just starts doing what we haven't been able to get them to start doing for the last 40 years, things will be great! Come on guys, roll your sleeves up, let's get to it!
 Ugh.
 That link is suggesting that a regex is the proper way to validate an e-mail address.
 NO NO NO NO NO NO NO NO NO NO NO NO NO
 Parse it. On the server that will be committing the data, using the very same code. It is possible to do it right. Layered defense isn't a bad thing, but a crappy half measure is often worse than nothing at all.
 It suggests that crappy heuristics is the proper way to ensure a user doesn't get access to a file they shouldn't have access to.
 NO NO NO NO NO NO NO NO NO NO NO NO NO
 You first decide exactly which file you will be accessing, and then check if it's within your sandbox. Optimally, you have a single class or set of functions that gates all access to files, and/or you don't put user input in the filesystem metadata at all to begin with.
 Hilariously, it even suggests restricting user input to not allow it to contain SQL control characters. Man. The number of security bugs I've found and fixed when that fails... I've found, through experience, that explicitly allowing them actually results in more secure code, as it forces programmers to think for two seconds before committing and moving on building a new gaping hole, and ensures that they'll be in the test cases. People seriously don't even think about the possibility if it's not held right under their nose.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
22. Re:I don't know if the question should be... by Cajun+Hell · 2012-08-30 03:53 · Score: 2
 
 You're assuming input comes from a browser using a page you made yourself. .. . If you aren't validating input in your server code, what is?
 No he's not. If you do things right, then hostile input, honestly mistaken input, and perfectly valid input all get handled the same way. Instead of getting "validated," they get escaped for whatever context they're used within, as they get written to that context.
 If you're building a string for use in a SQL statement, then the string gets escaped for SQL, regardless of whether you trust it or not. You just always do it (unless some other part of the system is guaranteed to be doing it for you, later than your own handling of the data). So it's ok if the data has a single-quote character, because you're always going to be sending that to the database as '' or \'. If you're outputting it to be part of a text node in HTML, then the string gets escaped for HTML text -- always, regardless of whether you trust it or not. So it's ok if it has a < character, because you're always going to send that to web browsers as <.
 Validation would impose needless restrictions (you can't have a quotation mark or a less-than sign) that are going to turn out to be useless anyway. You won't ever think of all the characters that might break something else that the data some day gets used for. I currently maintain a system where there's a rule that some data can't contain "weird characters" (it actually tells that to people as they enter it) and it's a decade too late to fix that, so it merely validates the strings and there's a shitload of code that trusts that validation to have happened, and because of that, there's an upper bound to how diversely this data can ever be used. All because someone back in the mists of time thought that input validation was the answer, rather than output escaping.
 OTOH, escaping at the last moment always fixes the problem, every time and in every context, whether we're talking about SQL, HTML, or something that hasn't been invented yet. Every format will always have some mechanism for escaping strings. Use it, as you're outputting to that format, not prematurely as you're storing the value somewhere. Do this, and you'll have no security problems related to data values, and there's nowhere your data can't go.
 BTW, I'm not totally anti-validation. Sometimes the actual value of a string matters, although usually when it does, it means some other part of the system is mis-designed. (But we all sometimes have to maintain mis-designed systems.) An invalid input should usually be expressed as a failed lookup (e.g. since I'm trying to store the foreign key for a car manufacturer named "Ferd", rather than validate that "Ferd" is the name of manufacturer before I store that string) or a failed conversion (e.g. I wasn't able to translate 2012-08-32 into a Julian date) or something like that. If it's really raw text with no systemic meaning ("I L1ke ur b00bies in yer v1d30 and want to date u") then there's no reason it needs any sort of validation at all, regardless of whether a stupid human or a malicious robot wrote it. There is no conceivable Unicode character that you shouldn't allow in a string like that, no matter how it's going to be used, as long as you're escaping it for each context right as you use it.
 Most of the time, though, validation should be semantic. It's not that you entered an invalid name for something, it's that you entered that your movie will be in theaters in 3012 or that your thing which turned out to be a book had a blank author (whereas it would have been ok for a teacup to lack an author), or something like that.
 
 --
 "Believe me!" -- Donald Trump
23. Re:I don't know if the question should be... by ais523 · 2012-08-30 04:09 · Score: 2
 
 http://en.wikipedia.org/w/index.php?title=Main_Page&action=raw&ctype=text/plain
 "You have chosen to open index.php which is a: text/x-wiki from: http://en.wikipedia.org/"
 http://en.wikipedia.org/w/api.php?format=txt
 "You have chosen to open api.php which is a: text/text from: http://en.wikipedia.org/"
 It refuses to serve text/plain, even if you ask for it specifically. (Compare http://en.wikipedia.org/w/index.php?title=Main_Page&action=raw&ctype=text/css, which it'll serve quite happily.)
 
 --
 (1)DOCOMEFROM!2~.2'~#1WHILE:1<-"'?.1$.2'~'"':1/.1$.2'~#0"$#65535'"$"'"'&.1$.2'~'#0$#65535'"$#0'~#32767$#1"
24. Re:I don't know if the question should be... by 19thNervousBreakdown · 2012-08-30 04:19 · Score: 1
 
 If your output is properly escaped and you're correctly using parameterized queries and not doing stupid dynamic SQL tricks that are generally necessitated by having a terrible DB layout, it doesn't matter. Go ahead. Put a billion apostrophes, and Unicode apostrophes (that MS SQL [and maybe others] will horrifically collapse down to regular if your connection is ASCII), semicolons, whatever you want. It'll sit there in the field looking pretty.
 Uh, just don't make it 2.2 billion apostrophes. Bad things.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
25. Re:I don't know if the question should be... by dgatwood · 2012-08-30 04:32 · Score: 1
 
 Validation is required, too, in many cases. XSS is a good example. Quoting prevents somebody who only has the ability to write plain text from being able to insert arbitrary HTML. Quoting does nothing if the users are able to actually provide HTML (e.g. any website built around contentEditable).
 In the latter case, if your server does nothing but quote user-generated HTML properly, you're wide open to XSS attacks, because they do not require malformed content. Merely the fact that your server is serving the data means that any scripts on the page are trusted, including onclick handlers, onmouseover handlers, etc. Thus, you must completely parse the HTML content, rip out any such handlers, any "javascript:" links, etc., and emit the HTML in a sanitized form. This combines quoting (for any mangled structure that your parser turned into text, but that some browsers might otherwise have interpreted as code) with validation (for whitelisting allowable attributes and attribute values).
 
 --
 Check out my sci-fi/humor trilogy at PatriotsBooks.
26. Re:I don't know if the question should be... by fatphil · 2012-08-30 04:52 · Score: 1
 
 > Problem 1: Browsers try real hard to be clever and interpret maltagged/malformed content instead of validating inputs.
 
 But XHTML saved us from that over a decade ago!
 
 Channelling Eric Naggum: Clearly we're not using enough XML!
 
 --
 Also FatPhil on SoylentNews, id 863
27. Re:I don't know if the question should be... by steelfood · 2012-08-30 04:53 · Score: 1
 
 Throw up a warning screen whenever there's malformed input. Kinda like the warning screen with self-signed certs, without the stupid part of having the add the site to a permanent exception list.
 And if people want the convenience of whitelisting or just turning the message off entirely, put those in the options, just like the way browsers handle cookies.
 This warning page will show up a lot at first. But it would also ultimately shame people into fixing their outputs.
 
 --
 "If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be."
28. Re:I don't know if the question should be... by fatphil · 2012-08-30 05:14 · Score: 1
 
 I'm with you, but I think we're in the minority. However, I cling to my view because I have a strange obsession with a kind of purity that's probably because of my pure mathematics background.
 
 If an envelope weighs less than 60g, then the postal service should deliver it. It should presume that it contains bombs, nerve poison, and corrosives, etc. but it should deliver it in tact, and then it's the recipient's problem. It should let the recipient know that it's not to be trusted, of course.
 
 If a text entry field for a forum can have 60 characters, then it should be able to contain any 60 characters, and presume it contains stuff that looks like HTML, SQL, javascript, etc. , and the server should happily accept that message, and then included it in the forum page delivered to the browser. Of course, it should let the browser know, by escaping it, that it's untrusted data that should not be assumed to have any particular meaning (i.e. be in any language that can be interpreted), of course.
 
 OK, it's in imperfect analogy, but I think the server (containing user-submitted content) should just be a pipe along which arbitrary data can flow. As long as the recipient knows it's arbitrary data, then it's his own fault if he attempts to parse it and give it meaning. (I'm thinking of double-escaped '<' characters, for example, which some browsers have decoded twice accidentally, and started to parse the following tags.)
 
 Unfortunately, my mindset prevents the user from using even the simplest markup in his forum messages. And then another meta-syntax has to be created to put that functionality back. Mess ensues...
 
 Which is why I don't design websites for anything apart from my own use...
 
 --
 Also FatPhil on SoylentNews, id 863
29. Re:I don't know if the question should be... by DamnStupidElf · 2012-08-30 05:28 · Score: 1
 
 That's not really a valid complaint. Even if HTTP was like FTP and opened a second TCP/IP connection to transfer data the exact same problems would arise if browsers ignored mime-types and tried to interpret the data contents instead of trusting the control channel.
30. Re:I don't know if the question should be... by Anonymous Coward · 2012-08-30 05:30 · Score: 0
 
 Problem 1: Browsers try real hard to be clever and interpret maltagged/malformed content so people with defective markup or bad mime content headers won't say 'My page doesn't work in Browser X, Browser X is defective!'. Or if the site is just serving up user text in html, stick some javascript tags in the text. Whichever way, you end up so someone malicious can upload some 'text' to a clipboard or document site which the browser then executes when the malicious person shares the URL.
 I'll say. I once went to a porn site that prevented viewers from downloading images thanks to an HTML/image hack. If the user tried to download the image (click Save As...) they'd get an HTML file and not an image.
 It only seemed to work in Internet Explorer, though. Firefox wouldn't display the images at all.
31. Re:I don't know if the question should be... by TuringCheck · 2012-08-30 05:54 · Score: 1
 
 Blah blah.
 All nice, elegant and safe until management decides this field NEEDS to be red, and underlined, and this one too, and this one - so let's store it in database as HTML... oh, wait, what field was it?
32. Re:I don't know if the question should be... by DamnStupidElf · 2012-08-30 06:00 · Score: 1
 
 No, the problem is users uploading content containing javascript. Then referencing this javascript from an external site. When a user who happens to also be logged into the popular site hosting my content views my page my javascript executes in the context of the other sites domain with access to the users credentials.
 I'm surprised no one else has pointed this out before. The domain security model is broken. Period. It's time to move beyond that and to an explicit capability system that is granular enough to reference individual URIs, and even *that* is probably insufficient. It is entirely reasonable that a trusted script at http://trusted.example.com/trusted_path/trusted_handler.js?script=1 should be able to *not trust* http://trusted.example.com/trusted_path/trusted_handler.js?script=2, and further that even http://trusted.example.com/trusted_path/trusted_handler.js?script=1 should not be trusted if any cookie available to trusted.example.com is added, modified, or deleted between the time the URIs are fetched.
33. Re:I don't know if the question should be... by Anonymous Coward · 2012-08-30 06:16 · Score: 0
 
 I think what the GP is saying is that if somebody wrote arbitrary HTML in a field you expected was text, then when you output it again (eg: read from database and crap it to HTML) all of the HTML they wrote is now properly escaped/entitified so when somebody tries to view the page that includes the HTML, instead of the browser running the JS/HTML, they are presented with a plain text box containing all of the JS/HTML they entered into it before AND that text is not executed anywhere because it was properly escaped/entitified when it was converted to HTML and sent to the web browser.
 So what he's saying is that instead of returning exactly what they wrote (which would then be executed by the browser), you crap out the already escaped text so it isn't executed by the browser and it is neatly displayed for them.
34. Re:I don't know if the question should be... by Zero__Kelvin · 2012-08-30 06:23 · Score: 1
 
 "That link is suggesting that a regex is the proper way to validate an e-mail address." and "Hilariously, it even suggests restricting user input to not allow it to contain SQL control characters. Man."
 It suggests no such thing. It explicitly describes a default deny policy rather than a default allow policy. I now see why you don't think secure programming can work though, since if it is you doing the programming, it definitely won't.
 
 --
 Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
35. Re:I don't know if the question should be... by fatphil · 2012-08-30 06:40 · Score: 1
 
 > ... that explicitly allowing them actually results in more secure code ...
 
 Amen! Everything's a bomb!
 
 (and no, not one of those airport security 125ml toothpaste tube bombs which they dispose of by chucking it in the waste paper bin below the counter.)
 
 --
 Also FatPhil on SoylentNews, id 863
36. Re:I don't know if the question should be... by dgatwood · 2012-08-30 07:40 · Score: 1
 
 Yes, I'm quite aware of what the GP is saying. What I'm saying is that more often than not these days, plain text input isn't sufficient. So you have two choices: HTML input or some alternative scheme like BBCode. And as soon as you find yourself using either one, mere quoting isn't sufficient. You have to do validation.
 
 --
 Check out my sci-fi/humor trilogy at PatriotsBooks.
37. Re:I don't know if the question should be... by jgrahn · 2012-08-30 08:17 · Score: 1
 
 I'm actually not a big fan of validating inputs. I find proper escaping is a much more effective tool, and validation typically leads to both arbitrary restrictions of what your fields can hold and a false sense of security. It's why you can't put a + sign in e-mail fields, or [...]
 That's not validation! That is trying (and failing, because you are too ignorant to read an RFC) to guess what some other software wants, even if it's none of your business. A well-formed mail address is, for most purposes, one which /lib/sendmail will not complain about.
 That does of course not mean you shouldn't validate data meant to be interpreted by *you*. It's simple: if you need to interpret it, you need to validate it. Hell, you *are* validating it by intrepreting it, even if you do a lousy job.
38. Re:I don't know if the question should be... by 19thNervousBreakdown · 2012-08-30 09:08 · Score: 0
 Default accept or default deny is splitting hairs when you're doing it wrong in the first place. It specifically says:
 
 However, most programs can be quite strict and only accept a very limited subset of e-mails to work well. In most cases, it's okay to reject technically valid addresses like "John Doe <john.doe@somewhere.com>" as long as the program can accept normal Internet addresses in the "name@domain" format (like "john.doe@somewhere.com").
 And this is under the heading "Secure programmer". If your security depends on rejecting valid e-mail addresses, it's terrible. This isn't hard, I do it all the time. What programming environment doesn't have an e-mail address parser already built-in? As for "It suggests no such thing", it suggests exactly what I said it does:
 
 The biggest problem is figuring out exactly what should be legal in the string. In general, you should be as restrictive as possible. There are a large number of characters that can cause special problems; where possible, you don't want to allow characters that have a special meaning to the program internals or the eventual output. That turns out to be really difficult, because so many characters can cause problems in some cases.
 
 Metacharacters: Metacharacters are characters that have special meanings to programs or libraries you depend on, such as the command shell or SQL.
 The author is correct that it's really difficult, but it's also really inadvisable, so no big deal. Any halfway decent SQL library has a way to escape SQL. Why would you re-invent the wheel, except square? An apostrophe is harmless in SQL except in certain contexts, which the SQL library handles when you put it through a parametrized query. For added fun, different databases have different rules on what's escaped where. Double single-quotes, backslash single-quote, you have to change whenever you change your backend. So, you put that logic in your Javascript, and you've now coupled your database layer to your UI. It's an insane, confused approach that has been recommended since 1995 and has proven to be highly ineffective, requires doing things perfectly over and over and over again, destroys program structure, needlessly restricts capabilities, and flat-out doesn't make sense. You're protecting the wrong end, out of some misguided overstretched metaphor where certain input is some abstract "poison", and you don't want to let it into your "system".
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
39. Re:I don't know if the question should be... by Zero__Kelvin · 2012-08-30 09:16 · Score: 1
 
 You don't seem to understand what was written or the fact that nobody is talking about Javascript programming. If you are using Javascript, you're already by definition incapable of practicing secure programming.
 
 Wheeler is talking about the API implementer, not the client programmer.
 
 --
 Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
40. Re:I don't know if the question should be... by 19thNervousBreakdown · 2012-08-30 09:49 · Score: 0
 
 First off, the article never states your assertion. In fact, "API" never appears once in it. He does, however, reference HTML, Perl, Cookies, URIs... you get the idea. I think it's possible that you're the one who doesn't understand what was written.
 For an actually relevant argument though, the approach's failings are only more apparent at that level. If you're talking directly to the SQL server, why in God's name would you do anything other than ask the SQL library to handle your validation? You don't have to guess what characters to allow or carefully consider, you can just go look it up since you're talking directly to the server/filesystem/whatever. And restrictions in APIs are much more serious than restrictions in a UI. You can generally rip out the UI and put a new one on, but you're often stuck bridging two APIs together. If they disagree on their arbitrary restrictions, well, now you've got a "fun" project on your hands.
 And if you can't write secure Javascript (it involves not giving JS the power to cause problems), I wouldn't go advertising that. The simple act of being capable of programming Javascript disqualifies one from being able to write secure code? Please. What you uttered is very similar to something I heard a guy say once: "I'm not a web developer, I'm a backend guy" as a defense against something stupid he did in the web app he was developing. No, you are a web developer, you're just a bad one. The idea that there's certain programming tasks that are "below" you is ridiculous. If you can't do the easy stuff, why should anyone trust you to do the hard stuff?
 It's funny how the more strident you get in your defense of these antiquated techniques, the more your misguided elitism shows.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
41. Re:I don't know if the question should be... by Zero__Kelvin · 2012-08-30 13:45 · Score: 1
 
 "First off, the article never states your assertion. In fact, "API" never appears once in it. He does, however, reference HTML, Perl, Cookies, URIs... you get the idea. I think it's possible that you're the one who doesn't understand what was written."
 Great point. It hadn't occurred to me that APIs never take these as arguments. What was I thinking?
 
 "The idea that there's certain programming tasks that are "below" you is ridiculous. If you can't do the easy stuff, why should anyone trust you to do the hard stuff?"
 You completely miss the point. I don't use Javascript, because I don't do AJAX. If I did, I would make damn sure that the sanitization was done server side in the APIs, not on the client over which I have no control. The difference between you and I is that I know how to do secure programming, and you never will.
 
 --
 Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
42. Re:I don't know if the question should be... by 19thNervousBreakdown · 2012-08-30 14:32 · Score: 0
 
 Ah, completely unsubstantiated claims of uber-leetness in response to me refuting your arguments and irrelevant misdirection attempts. Good work. Way to address the fact that my point the entire time has been that restricting allowed characters based on the control characters of its eventual possible destination is a weak and error-prone security practice.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
43. Re:I don't know if the question should be... by Anonymous Coward · 2012-08-30 21:25 · Score: 0
 
 > That link is suggesting that a regex is the proper way to validate an e-mail address.
 > NO NO NO NO NO NO NO NO NO NO NO NO NO
 > Parse it.
 You are stupid beyond recognition. Using a regexp IS parsing and in fact many programs use a regexp to parse e-mail addresses.
44. Re:I don't know if the question should be... by Zero__Kelvin · 2012-08-31 00:51 · Score: 1
 
 You think that was the point you were making, but the real point you were making was: "You have no clue and you aren't about to get one." Your point is loud and clear. Have a nice life.
 
 --
 Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
45. Re:I don't know if the question should be... by 19thNervousBreakdown · 2012-08-31 03:14 · Score: 1
 
 And you think you were saying anything different? Please.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
46. Re:I don't know if the question should be... by Zero__Kelvin · 2012-08-31 03:48 · Score: 1
 
 I know I was saying something completely different, but at least you finally admit that it was what you were actually saying. HANL
 
 --
 Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
47. Re:I don't know if the question should be... by 19thNervousBreakdown · 2012-08-31 04:32 · Score: 1
 
 Ohohoho, you got me good! And thank you for your kind wishes. I've had a wonderful life so far and hope it stays that way too. Although perhaps I could find some improvement if I tried to win arguments with appeals to authority and insults instead of facts and reason. I'd have more time on my hands at least.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
48. Re:I don't know if the question should be... by 19thNervousBreakdown · 2012-08-31 06:48 · Score: 1
 
 Or maybe the trick is to mod-bomb anyone I disagree with with a bunch of Overrated mods like some generous soul did in this thread. Particularly telling is that they modded down the posts with actual facts, citations, and logical arguments in them instead of the posts where this degenerated into nothing more than an immature slapfight. Why bother to support assertions when you can just silence your opponents with a dropdown?
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
49. Re:I don't know if the question should be... by lennier · 2012-09-02 17:49 · Score: 1
 
 What does not help at all is to hand-wave or diminish this particular problem, and blame the tools for not doing our due diligence.
 
 I couldn't disagree more.
 We're programmers. That means, we're in the business of creating tools to do repetitive things rather than doing them by hand. Automation, rather than hand-crafting, is what our chosen line of work is all about. And the best use of our tool-making ability is to automate, wherever and whenever we can, our own jobs.
 So it is exactly the job of the tools we create to do our due diligence for us; we're making 'em, we're using 'em, we're giving and selling 'em to others, so we darn better make sure they do the job safely and correctly.
 We're not only toolmakers, we're makers of tools that other toolmakers use to make other tools that make other tools. At every step, it's our job to make sure we're doing nothing by hand in an unsafe, unrepeatable manner that our tools can't do more simply and correctly and perfectly.
 It's because we haven't been doing this - because we've been satisfied with making badly designed, unsafe tools, that don't automate everything that can be automated and when they do, have complex interactions that aren't correctly documented and defy logic - and then patching up our tools' deficiencies with a Byzantine maze of hand-crafted exceptions and fudges and "programmer lore" - that we've got the Internet into the mess that it's in.
 So go ahead and blame our tools. They don't have feelings, but they do have bugs that need to be fixed.
 
 --
 You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
It's called reprocessing by KreAture · 2012-08-29 19:17 · Score: 1, Interesting

Convert the file to the site supported format and quality level in sandbox.
Tadaaaa,,,
1. Re:It's called reprocessing by Anonymous Coward · 2012-08-29 19:57 · Score: 5, Informative
 
 As TFA points out, it is possible to create a Flash applet using nothing but alphanumeric characters. Good luck catching that in your reprocessing.
2. Re:It's called reprocessing by 19thNervousBreakdown · 2012-08-29 20:57 · Score: 1
 
 Without an example it's tough to say for sure, but I suspect that it only works when the output isn't properly escaped.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
3. Re:It's called reprocessing by KreAture · 2012-08-29 20:59 · Score: 1
 
 How is that a picture, and how would your reprocessing code not reject it for not having header matching file name?
 I think it was obvious my post refered to pictures.
4. Re:It's called reprocessing by Anonymous Coward · 2012-08-30 00:27 · Score: 0
 
 It really wasn't. "Convert the file to the site supported format" could easily mean "reparse everything from unicode to ASCII"
5. Re:It's called reprocessing by fulldecent · 2012-08-30 00:54 · Score: 1
 
 I'd like to see that regex
 
 --
 -- I was raised on the command line, bitch
6. Re:It's called reprocessing by Jonner · 2012-08-30 06:35 · Score: 1
 
 Convert the file to the site supported format and quality level in sandbox.
 Tadaaaa,,,
 If you'd read TFA, you'd know it covers that and explains why it's insufficient.
7. Re:It's called reprocessing by Carnildo · 2012-08-30 10:05 · Score: 1
 
 Convert the file to the site supported format and quality level in sandbox.
 You're applying a known transform to the image. By reversing the transform, the attacker can craft an image such that the original upload is innocent, while the reprocessed image is malicious. I've seen it done where the upload is clean, but the generated thumbnail is goatse; it shouldn't be too hard to create a clean upload that the converter turns into something IE will interpret as Javascript.
 
 --
 "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
"user content" by Hazel+Bergeron · 2012-08-29 19:30 · Score: 1, Insightful

Google's solution is effectively to make all content belong to Google.
Gooooo cloud!
1. Re:"user content" by Anonymous Coward · 2012-08-29 19:35 · Score: 0
  
  Gooooo cloud!
  Clouds are cool until it rains. Then the cloud will be slow as fuck and I have to wait for the weather to clear.
  http://idle.slashdot.org/story/12/08/29/2215244/survey-reveals-a-majority-believe-the-cloud-is-affected-by-weather
2. Re:"user content" by Anonymous Coward · 2012-08-29 20:14 · Score: 2, Interesting
  
  Umm, what does your comment have to do with the subject in TFA? They used to host content on google.com, then they moved it to googleusercontent.com for security reasons. If anything they have made it clear that the user owns it, but not for that reason.
3. Re:"user content" by Hazel+Bergeron · 2012-08-29 20:22 · Score: 0
  
  I say we rename "Google" to "The Democratic People's Republic of Google". This would make it more clear that the product^Wcustomer^Wnetizen is empowered.
4. Re:"user content" by Anonymous Coward · 2012-08-29 21:00 · Score: 1
  
  Are you connected to reality in any meaningful way?
5. Re:"user content" by Hazel+Bergeron · 2012-08-29 21:50 · Score: 0, Offtopic
  
  The universe is not connected to anyone in any meaningful way. I wouldn't want reality to feel in my debt.
Google security breaches by romit_icarus · 2012-08-29 19:38 · Score: 1

For all its transparency, I've yet to see a working list of security breach attempts made on Google servers. I bet there are many, and it would be useful to know just the source and method if nothing more.
1. Re:Google security breaches by cbiltcliffe · 2012-08-29 19:48 · Score: 1
  
  Security breach *attempts*?
  I'm guessing a simple csv of that would be several TB in size. That's probably why you can't get a working list.
  
  --
  "City hall" in German is "Rathaus" Kinda explains a few things......
Referererer by Anonymous Coward · 2012-08-29 19:40 · Score: 0

Why not check HTTP_REFERER variable and not serve up content if missing or not from the sites domain?
The ususal objections about not trusting browsers seem to be misplaced... You can trust the browser when you want the browser to protect the end user.
Another objection has to do with hacks in ancient versions of flash and other machinary that would allow referer checks to be forged/circumvented. If you are asserting this you need to show how it can still be done in 2012. These vulnerabilities were closed up years ago.
The only downside I can see you couldn't make content externally linkable from other sites other than your own which is the behavior most sites seem to prefer anyway.
1. Re:Referererer by 19thNervousBreakdown · 2012-08-29 20:59 · Score: 1
 
 Because writing a script to forge the Referer (sic) header is trivial.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
2. Re:Referererer by Anonymous Coward · 2012-08-30 02:52 · Score: 0
 
 Because writing a script to forge the Referer (sic) header is trivial.
 You don't understand the problem. Sending a request with a forged referer header is trivial and also irrelevent.
 The problem is how can you forge a request within the *browser* context of the user to enable you to manipulate the users relationship with the domain? If that is so trivial why not just tell us how it can be done by forging the referer then?
3. Re:Referererer by 19thNervousBreakdown · 2012-08-30 03:57 · Score: 1
 
 I'll do that as soon as you demonstrate how to bite your own ear off.
 What's that? You never claimed it was possible? Huh.
 An implicit assumption when talking about checking an HTTP header is that it's done somewhere that actually needs to check it. A browser is the source of the header. Looking at it is silly and irrelev[b]a[/b]nt. It's you that doesn't understand the problem.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
Yes, it really is that bad. by VortexCortex · 2012-08-29 19:59 · Score: 5, Interesting

This is what happens when you try to be lenient with markup instead of strict (note: compliant does not preclude extensible), and then proceed to use a horribly inefficient and inconsistent (by design) scripting language and a dysfunctional family of almost sane document display engines combined with a stateless protocol to produce a stateful application development platform by way of increasingly ridiculous hacks.
When I first heard of "HTML5" I thought: Thank Fuck Almighty! They're finally going to start over and do shit right, but no, they're not. HTML5 is just taking the exact same cluster of fucks to even more dizzying degrees. HOW MANY YEARS have we been waiting for v5? I've HONESTLY lost count and any capacity to give a damn when we reached a decade -- Just looked it up, 12 years. For about one third the age of the Internet we've been stuck on v4.01... ugh. I don't, even -- no, bad. Wrong Universe! Get me out!
In 20XX when HTML6 may be available I may reconsider "web development". As it stands web development is chin-deep in its own filth which it sprays with each mention, onto passers by and they receive the horrid spittle joyously not because its good or even not-putrid, but because we've actually had worse! I can crank out a cross platform pixel perfect native application for Android, iOS, Linux, OSX, XP, Vista, Win7, and mother fucking BSD in one third the time it takes to make a web app work on the various flavours of IE, Firefox, Safari, Chrom(e|ium). The time goes from 1/3rd down to 1/6th when I cut out testing for BSD, Vista, W7 (runs on XP, likely runs on Vista & Win7. Runs on X11 + OpenGL + Linux, likely builds/runs on BSD & Mac).
Long live the Internet and actual cross platform development toolchains, but fuck the web.
1. Re:Yes, it really is that bad. by sgrover · 2012-08-29 20:14 · Score: 5, Funny
 
 +1, but tell us how you really feel
2. Re:Yes, it really is that bad. by Anonymous Coward · 2012-08-29 20:19 · Score: 0
 
 Indeed. It's kind of ironic, Google's security team are grumbling about problems that are perpetuated by HTML5, a standard being driven by an employee of ... Google! Thanks Hixie!
3. Re:Yes, it really is that bad. by SuricouRaven · 2012-08-29 20:29 · Score: 5, Insightful
 
 Of course it's a mess. The combination of HTTP and HTML was designed for simple, static documents displaying predominatly text, a little formatting and a few images. By this point we're using extensions to extensions to extensions. It's a miracle it works at all.
4. Re:Yes, it really is that bad. by Anonymous Coward · 2012-08-29 20:49 · Score: 0
 
 In 20XX when HTML6 may be available I may reconsider "web development".
 I think you're missing an X there
5. Re:Yes, it really is that bad. by adolf · 2012-08-29 22:07 · Score: 4, Funny
 
 It's a miracle it works at all.
 It works?
 
 --
 Kid-proof tablet..
6. Re:Yes, it really is that bad. by svick · 2012-08-29 22:58 · Score: 1
 
 HOW MANY YEARS have we been waiting for v5? I've HONESTLY lost count and any capacity to give a damn when we reached a decade -- Just looked it up, 12 years.
 But HTML 5 is already here! It's just that it's not like the standards of old, it's a living standard. And if you don't like that, you're not agile enough.
7. Re:Yes, it really is that bad. by Anonymous Coward · 2012-08-29 23:15 · Score: 0
 
 Pixel perfect web is a myth as it was designed specifically to be adaptive on client side. And it's a good thing. I've damn tired of "font-size: 10pt" (and similar styles) everywhere.
8. Re:Yes, it really is that bad. by arose · 2012-08-29 23:20 · Score: 1
 
 Remind me, is i possible to serve XHMTL 1.0 accross the board yet? I think it just about it, and we are to the point of "why the fuck bother anymore", if you can do better at getting shit implemented go right ahead, but so far HsTML5 has made more tangible progress than just about any other single initiative of W3C.
 
 --
 Analogies don't equal equalities, they are merely somewhat analogous.
9. Re:Yes, it really is that bad. by Skapare · 2012-08-29 23:37 · Score: 1
 
 It's posts like this that make me wish Slashdot could do moderations above level 5.
 
 --
 now we need to go OSS in diesel cars
10. Re:Yes, it really is that bad. by TheDarkMaster · 2012-08-30 00:07 · Score: 2
 
 I think the same thing. I currently work doing "web systems". And do they work? Work, I managed to make a web application that can use a card printer. But at what price? I spent twice the time that I would spend if I did compiled desktop applications, and lost count of the many horrible hacks I had to do to similar desktop functionality using HTML
 
 --
 Religion: The greatest weapon of mass destruction of all time
11. Re:Yes, it really is that bad. by Anonymous Coward · 2012-08-30 00:12 · Score: 0
 
 That's because it was WHATWG, not W3C. W3C can't get shit done.
 W3C only jumped on board and gave up on XHTML 2.0 when they realized WHATWG was winning with HTML5
12. Re:Yes, it really is that bad. by wolverine1999 · 2012-08-30 00:40 · Score: 1
 
 there won't be an HTML6. It's all HTML now.
 
 --
 SCIREV.NET - fanfics,reviews & more
13. Re:Yes, it really is that bad. by Anonymous Coward · 2012-08-30 00:48 · Score: 0
 
 The GP probably never has visited this website. At least not from his mobile device, you know: simple text and some pictures.
14. Re:Yes, it really is that bad. by FireFury03 · 2012-08-30 01:16 · Score: 1
 
 When I first heard of "HTML5" I thought: Thank Fuck Almighty! They're finally going to start over and do shit right, but no, they're not. HTML5 is just taking the exact same cluster of fucks to even more dizzying degrees.
 XHTML was a pretty good step in the right direction. Enforced well-formed ness is a good thing (although IMHO browsers should've had a built in "please try to fix this page" function that the user could manually run over a broken page), genericsising tags is sensible (if you're going to embed a rectangular object then it makes sense to have a single <object> tag to do it for all content, for example - no need to produce a whole new revision of the language just because someone has invented a new type of embeddable content).
 Unfortunately, the "industry" (Nokia, Microsoft, etc) were not interested in a major overhaul, and essentially wanted a quick bodge, so they came up with HTML 5 and more or less forced the W3C to adopt it. All the good stuff that HTML 5 brings, could have easilly been added to XHTML in a more generic way, but the industry weren't interested so we're left with the almighty clusterfuck known as HTML 5.
 
 --
 http://blog.nexusuk.org
15. Re:Yes, it really is that bad. by Anonymous Coward · 2012-08-30 01:43 · Score: 0
 
 Call me a Dinosaur, but I'm still using HTML1. Never had any problems with it and never had the feeling I am needing anything more or better. But this thread makes me curious. Can anyone give me a good reason to go to HTML2 ? :)
16. Re:Yes, it really is that bad. by Anonymous Coward · 2012-08-30 02:40 · Score: 0
 
 Pixel perfect won't be so perfect if Apple succeeds in getting the ball rolling for 2880 x 1800 and higher displays.
17. Re:Yes, it really is that bad. by DaveV1.0 · 2012-08-30 06:32 · Score: 1
 
 This. A thousand times this. Way too many people and companies are using the browser as a general purpose network GUI. Stop expecting to be able to shove everything and the kitchen sink into the browser and expecting it to be able to handle it all quickly and securely without a single problem.
 
 --
 There is no "-1 offended" or "-1 you don't agree with me" mod options for a reason.
18. Re:Yes, it really is that bad. by Jonner · 2012-08-30 06:39 · Score: 1
 
 Do whatever kind of development floats your boat and pays the bills. As much as some aspects of web development suck, it is getting gradually better and it can't be ignored. The answer to web development problems certainly isn't to return to platform-specific binaries.
19. Re:Yes, it really is that bad. by Jonner · 2012-08-30 06:46 · Score: 1
 
 HOW MANY YEARS have we been waiting for v5? I've HONESTLY lost count and any capacity to give a damn when we reached a decade -- Just looked it up, 12 years.
 But HTML 5 is already here! It's just that it's not like the standards of old, it's a living standard. And if you don't like that, you're not agile enough.
 I'm not sure if they're on the right track in general, but at least the WHATWG is honestly recognizing that web developers have never waited for an official standard to use new browser features. It's a chicken and egg problem: if nobody used a new feature until it were described in an official standard, browsers wouldn't have much motivation to implement and test the feature.
20. Re:Yes, it really is that bad. by Jonner · 2012-08-30 06:56 · Score: 1
 
 Remind me, is i possible to serve XHMTL 1.0 accross the board yet? I think it just about it, and we are to the point of "why the fuck bother anymore", if you can do better at getting shit implemented go right ahead, but so far HsTML5 has made more tangible progress than just about any other single initiative of W3C.
 I think IE 9 finally handles XHTML properly. Of course it's far too late, since XHTML is completely dead.
21. Re:Yes, it really is that bad. by SuricouRaven · 2012-08-30 08:38 · Score: 1
 
 Aside from the script-driven ability to expand comments and use a slider-bar to set a filter. Features which depend underneath on the ability to fetch new data via HTTP and seemlessly incorporate it into an already-open page without a full refresh.
22. Re:Yes, it really is that bad. by eugene+ts+wong · 2012-08-30 08:57 · Score: 1
 
 We haven't even got full implementations of 4.01. As far as I know, all browsers still have 1 or 2 bugs. Opera, for example, can't handle colspan, or maybe rowspan.
 
 --
 testing out my trending skills
23. Re:Yes, it really is that bad. by arose · 2012-08-30 17:00 · Score: 1
 
 I know, what I'm saying is that they made the right choice there and dismissing the effort because it is different completely misses that point.
 
 --
 Analogies don't equal equalities, they are merely somewhat analogous.
HTML needs a sandbox tag by Hentes · 2012-08-29 21:02 · Score: 2

The easiest way to secure embedded content would be a sandbox tag that allows to limit what kind of content can be inside of it.
1. Re:HTML needs a sandbox tag by Anonymous Coward · 2012-08-29 22:11 · Score: 1
  
  you mean like the iframe sandbox ? http://www.w3schools.com/html5/att_iframe_sandbox.asp (actually firefox supports it now as well in nightly)
2. Re:HTML needs a sandbox tag by Anonymous Coward · 2012-08-29 23:19 · Score: 0
  
  That was already discussed along with including a silver bullet tag.
3. Re:HTML needs a sandbox tag by TheLink · 2012-08-30 02:51 · Score: 1
  
  I suggested something like that 10 years ago: http://lists.w3.org/Archives/Public/www-html/2002May/0021.html
  http://www.mail-archive.com/mozilla-security@mozilla.org/msg01448.html
  But hardly anyone was interested. If implemented it could have prevented the Hotmail, MySpace, yahoo and many other XSS worms.
  There's Content Security Policy now:
  https://developer.mozilla.org/en-US/docs/Security/CSP/Introducing_Content_Security_Policy
  As far as I see security is not a priority for the browser and W3C bunch.
  --
  
  Too many replies beneath your current threshold
4. Re:HTML needs a sandbox tag by dgatwood · 2012-08-30 05:43 · Score: 1
  
  The iframe tag is less than ideal because the content must be provided out-of-band in a separate request. I mean sure, you can usually jam most things in so that they'll fit, but it isn't really ideal for content inserted dynamically.
  Also, if I read the spec correctly, this has almost all of the same flaws as Mozilla's content security policies (minus the requirement that you must use a separate server to provide the content). Specifically, it's all-or-nothing. Either you allow scripts or you don't. A more rational policy would be "no content-provided scripts". That is, the browser drops any javascript: URLs in content-provided href attributes (or similar constructs in HTML and CSS), and does not parse the values of any attributes that are interpreted as being scripts (e.g. the onclick attribute), but does allow trusted scripts (from outside the sandbox) to install handlers that trigger behavior. Without that, there are almost certainly a number of site designs that cannot practically use the iframe sandbox.
  
  --
  Check out my sci-fi/humor trilogy at PatriotsBooks.
5. Re:HTML needs a sandbox tag by DaveV1.0 · 2012-08-30 06:34 · Score: 1
  
  Or, we could stop trying use the browser as a general, all-purpose UI and start writing secure network application frontends.
  
  --
  There is no "-1 offended" or "-1 you don't agree with me" mod options for a reason.
So "No" is the answer. by Anonymous Coward · 2012-08-29 22:51 · Score: 0

Put down the bong and step AWAY from the computer.
Problem can be solved, but users are the problem by gweihir · 2012-08-29 23:06 · Score: 2

Images and text can be sanitized reliably. The problem is that this strips out all of the non-essential features. Users have a hard time understanding that, because users do not understand the trade-offs involved.
But the process is easy: Map all images to meta-data and compression free formats (pnm, e.g.) then recompress with a trusted compressor. For text, accept plain ASCII, RTF and HTML 2.0. Everything else, convert either to images or to cleaned PDF/Postscript by "printing" and OCR'ing.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:Before 1939, "propaganda" just meant "PR" by Anonymous Coward · 2012-08-29 23:23 · Score: 0

Anyway, are you honestly saying that it is better to post on /. while not high?
For you? Definitely.
Re:Before 1939, "propaganda" just meant "PR" by Hazel+Bergeron · 2012-08-29 23:26 · Score: 0

Your past two retorts have been boring. You have one more try if you can think up something more witty, otherwise I'm closing this tab.
Re:Before 1939, "propaganda" just meant "PR" by Hazel+Bergeron · 2012-08-29 23:28 · Score: 0

I do appreciate the sentiment, though.
Assuming that today I am high, my posts are even better when I'm not high.
Please no... by betterunixthanunix · 2012-08-29 23:56 · Score: 1

Stop extending HTML! HTML does not need more tags. HTML was not designed to be a presentation language for applications and certainly not to be an environment for running applications; it was designed to be a hypertext document language (yes, "hypertext" is a word with meaning beyond HTML). The worst thing we did was to allow HTML documents with embedded programs -- applets, Javascript, etc.

The real answer is a new standard that is designed for application presentation and deliver, that does not have so much in-band signaling. We need to get it right the first time by building security into the system, not extend an already bloated monstrosity to make up for the inevitable security problems that result from turning a language for describing documents into a platform for running distributed software with malicious users.

--
Palm trees and 8
1. Re:Please no... by firewrought · 2012-08-30 02:43 · Score: 1
  
  The real answer is a new standard that is designed for application presentation and deliver, that does not have so much in-band signaling. We need to get it right the first time by building security into the system.
  And to help folks bridge the gap, we could deliver this app over HTTP to a browser plugin. Great idea!! Now we just need a fancy name that will make it resonate with programmers like, um.... "Java" (cause it's a type of coffee, get it?) or "Silverlight" (cause we code while the moon's up!).
  
  --
  -1, Too Many Layers Of Abstraction
2. Re:Please no... by Jonner · 2012-08-30 06:49 · Score: 1
  
  The real answer is a new standard that is designed for application presentation and deliver, that does not have so much in-band signaling. We need to get it right the first time by building security into the system, not extend an already bloated monstrosity to make up for the inevitable security problems that result from turning a language for describing documents into a platform for running distributed software with malicious users.
  Let us know how that works out.
Firewalls/Webfiltering by Anonymous Coward · 2012-08-29 23:57 · Score: 1

What about the fact that some companies don't want their staff pulling images from sites like google, and would block the images domain, but allow the search domain?
If it were all one domain and not separated, then companies of this mindset would have to make a choice of blocking all of google, or blocking merely the images. Many of Google's ads are text based, and they would lose money if the didn't offer an alternative that would allow companies to selectively block those.
1. Re:Firewalls/Webfiltering by Richy_T · 2012-08-30 02:46 · Score: 1
  
  Any decent web filtering software allows blocking based on URL components, not just the domain. Google would have to work pretty hard to circumvent that and what would be the motivation?
2. Re:Firewalls/Webfiltering by fatphil · 2012-08-30 08:11 · Score: 1
  
  Web filtering won't work in general. See the cookie "path" issue.
  Summary: http://www.foo.com/you/trust/this/bit/../.%2e/../../oh/shit
  
  --
  Also FatPhil on SoylentNews, id 863
Another reason why a separate domain is useful by Anonymous Coward · 2012-08-30 00:28 · Score: 0

Some regimes require families to have a content filter either on their computer or on their ISP's router that is configured to block all domains with non-premoderated user-generated content if they have children below certain age. So, if a site contains a mixture of known-safe content and user-generated content on the same domain, it will be blocked completely. That's definitely suboptimal.
Constructor overhead by tepples · 2012-08-30 00:33 · Score: 1

You don't have a method that can output strings, at all. You have a method that can output HTMLString, and it escapes everything it outputs. If you want to output raw HTML, you have RawHTMLString. Makes it much harder to make a mistake when you're doing Response.Write(new RawHTMLString(userField)).
Interesting technique. But how much runtime overhead do all those constructors impose for Java, C#/VB.NET, PHP, and Python?
1. Re:Constructor overhead by Anonymous Coward · 2012-08-30 01:27 · Score: 0
 
 None, really, since you should be escaping all user input anyway.
 Most MVCish frameworks have something like this, so you can test it yourself. (e.g. <%: in asp.net)
2. Re:Constructor overhead by Richy_T · 2012-08-30 02:34 · Score: 1
 
 You escape user input for SQL (if you're not using parameterized queries) or whatever database you're using. You escape the output for HTML or whatever you are outputting.
 If you've ever run across an application where someone has HTML escaped user input before insertion into the database and you now want to output it in a format that isn't HTML, you'll know what I'm talking about. User data should usually be *stored* as accurately to the original as possible.
3. Re:Constructor overhead by Cajun+Hell · 2012-08-30 03:04 · Score: 1
 
 But how much runtime overhead do all those constructors impose for Java, C#/VB.NET, PHP, and Python?
 Either nothing, or nothing significant, or something-but-it-fixed-a-bug-which-was-definitely-there.
 
 --
 "Believe me!" -- Donald Trump
4. Re:Constructor overhead by 19thNervousBreakdown · 2012-08-30 04:13 · Score: 1
 
 Seriously?
 Compared to the overhead of reading from the database, building the rest of the page's HTML, and then sending over the network, practically nothing. This is not hyperbole.
 Even if it wasn't nothing, it would have to be very significant, and performance would have to be a primary factor in the software's spec before I'd consider scrapping an extremely easy to use security practice in order for a faster runtime.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
5. Re:Constructor overhead by tepples · 2012-08-30 08:12 · Score: 1
 
 If you've ever run across an application where someone has HTML escaped user input before insertion into the database and you now want to output it in a format that isn't HTML
 For example, Slashdot comment fields contain a subset of HTML, and I imagine that they're inserted into the database as HTML. But for full-text searching, one wants to search the text, not the tags.
6. Re:Constructor overhead by tepples · 2012-08-30 08:14 · Score: 1
 
 Interesting. Do you know of a PHP framework that uses this idea? (I have to use PHP because a lot of hosting plans lack ASP.) If I knew the canonical name of this technique, I'd search for it myself, but when I tried Google "new RawHTMLString", the only thing it found was your comment.
7. Re:Constructor overhead by 19thNervousBreakdown · 2012-08-30 08:40 · Score: 1
 
 I don't. I'm not sure it's even common enough to be considered a pattern, let alone have good libraries, those names are just things I came up with in the moment.
 What it essentially boils down to though is you create classes and conversions between those classes that always maintain the correct escaping. If you find yourself writing the same escaping method more than once, refactor. There should be One Version of the Truth, that is a pattern. You then write output routines that refuse to render unrecognized classes, and only ever output using those routines.
 To be honest though, I'd step very carefully when leaning hard on PHP's type system as this technique does. It's got so much missing functionality you'll find yourself painted into unexpected corners, and yet is so permissive that it makes it very hard to make it hard to do things wrong. It really shines in C++ or to a slightly lesser extent C#, but obviously you can't just change languages at a whim, and not many people use C++ for web development.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
8. Re:Constructor overhead by Richy_T · 2012-08-30 09:23 · Score: 1
 
 Search is a whole 'nother beast, probably second only to dates in its "gotcha" potential.
Scripts in the body by tepples · 2012-08-30 00:36 · Score: 1

Where originally scripts could only be defined in the HTML header, some not-to-be-named company in Redmond
It wasn't Nintendo of America, was it? :-p

decided it was a good idea to permit defining them in the document body as well.
Anywhere you have HTML element attributes beginning with on, you have scripts in the body. It's been so long ago, I can't remember: did Netscape's original version of JavaScript have onclick or onmouseover?
1. Re:Scripts in the body by Richy_T · 2012-08-30 02:36 · Score: 1
 
 I think the <a href="javascript: predates even that.
Vector pictures by tepples · 2012-08-30 00:39 · Score: 1

Before SVG, and even now with Internet Explorer on Windows XP, SWF was the most widely compatible format for displaying vector pictures on a PC.
Filter the multi-TB IDS log plz by tepples · 2012-08-30 00:42 · Score: 1

You cite a multi-TB IDS log. May I have it filtered to the cases that came closest to a substantial intrusion?
1. Re:Filter the multi-TB IDS log plz by Gavagai80 · 2012-08-30 11:15 · Score: 1
  
  The either break in or they don't. There is no closest.
  
  --
  This space intentionally left blank
Re:Problem can be solved, but users are the proble by Anonymous Coward · 2012-08-30 00:50 · Score: 0

If you actually read the article, you'd know that there are stupid browsers out there that will happily interpret a perfectly valid ASCII text file served as text/plain as HTML, making your "sanitizing" of it by requiring it to be plain ASCII text ineffective. :(
Show how it can still be done in 2012 by tepples · 2012-08-30 00:55 · Score: 1

Another objection has to do with [...] machinary that would allow referer checks to be forged/circumvented. If you are asserting this you need to show how it can still be done in 2012.
Because writing a script to forge the Referer (sic) header is trivial.
Go ahead and show us how please.
1. Re:Show how it can still be done in 2012 by Anonymous Coward · 2012-08-30 01:59 · Score: 0
 
 Seriously? Google it. It's dead simple to put *whatever* you want in a purpose-built HTTP request. (In fact, it's *supposed* to be dead simple, so normal HTTP requests can be built easily.)
 How do you think your browser fills the Referrer field in the header when it sends a request? A script can do it exactly the same way, and it only gets easier if you don't assume said script has to run inside the browser.
2. Re:Show how it can still be done in 2012 by 19thNervousBreakdown · 2012-08-30 04:44 · Score: 1
 
 Dammit, I was going to whip up a quick example of this, and a google showed me that browsers have added protection to the XMLHttpRequest. I was basing my claim on years-old information. My bad.
 I mean, I still think it's not a good idea, and that same google shows that there are many ways around it, and other headers don't share the same protection so trusting them is a bad habit to get into, but it looks like it would take more than the two minutes I was willing to spend. Google around. It may not be trivial anymore, but it's still very possible.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
3. Re:Show how it can still be done in 2012 by WaffleMonster · 2012-08-30 07:37 · Score: 1
 
 mean, I still think it's not a good idea, and that same google shows that there are many ways around it, and other headers don't share the same protection so trusting them is a bad habit to get into, but it looks like it would take more than the two minutes I was willing to spend. Google around. It may not be trivial anymore, but it's still very possible.
 I would REALLY like to know how it is still possible to forge a referer field from a browser request. There are lots of talking heads on the Internet who more or less assume what many here have in this regard.
 I know how to clear the referer field but having previously spent dozens of hours researching this with no solution that does not involve ancient bugs or signaling outside the browsers session context.
 Any specific pointers or implementations rather than unspecific references to talking heads assuming you are a foolish moron for asking would be very much appreciated.
4. Re:Show how it can still be done in 2012 by tepples · 2012-08-30 07:47 · Score: 1
 
 it only gets easier if you don't assume said script has to run inside the browser.
 If a script doesn't run inside the browser, then how is the victim's computer induced to execute it?
Strip Referer by tepples · 2012-08-30 00:57 · Score: 1

Why not check HTTP_REFERER variable and not serve up content if missing
Because a lot of proxies and web browser extensions strip Referer for privacy reasons.
1. Re:Strip Referer by WaffleMonster · 2012-08-30 03:13 · Score: 1
  
  Because a lot of proxies and web browser extensions strip Referer for privacy reasons.
  Privacy plugins only strip foreign referers not same domain which is all that is needed in this case.
Re:Before 1939, "propaganda" just meant "PR" by Anonymous Coward · 2012-08-30 01:05 · Score: 0

And madness seems like madness to those who are insightful.
A madman is more likely to seriously claim to be insightful than an insightful man is to claim he is mad.
April 2014 by tepples · 2012-08-30 01:14 · Score: 1

It will be in April 2014 when Windows XP, the operating system for which the latest version of the bundled browser is IE 8, leaves extended support.
Good luck getting Apple to adopt it by tepples · 2012-08-30 01:16 · Score: 1

The real answer is a new standard that is designed for application presentation and deliver
That's been tried, in the form of Flex and Silverlight. Good luck getting Apple to adopt your proposed new standard.
1. Re:Good luck getting Apple to adopt it by Richy_T · 2012-08-30 02:43 · Score: 1
  
  Flex? Silverlight was just another Microsoft attempt to abuse the market and that's a play everyone has gotten wise to by now.
2. Re:Good luck getting Apple to adopt it by tepples · 2012-08-30 07:46 · Score: 1
  
  Flex?
  Flex was Adobe's attempt to reposition Flash Player as a rich Internet application platform.
Doesn't follow. by Anonymous Coward · 2012-08-30 01:20 · Score: 0

They could be even worse when you're not high.
However, this doesn't excuse you being high and posting dreck like you do.
1. Re:Doesn't follow. by Hazel+Bergeron · 2012-08-30 03:55 · Score: 0
  
  Exhibit A:
  
  Anyway, are you honestly saying that it is better to post on /. while not high?
  For you? Definitely.
  
  Exhibit B:
  
  [Your posts] could be even worse when you're not high.
  You failed to maintain my interest. Ta ta.
What Google wants by Anonymous Coward · 2012-08-30 01:42 · Score: 0

Here's the give: "In the days of static HTML and simple web applications, giving the owner of the domain authoritative control over how the content is displayed wasn’t of any importance."
"giving the owner of the domain authoritative control over how content is displayed"
The article says no more about this, but instead proceeds to (correctly) detail a number of flaws with common web app protocols and procedures and how Google deals with them.
I agree with Google - web apps suck eggs. The world could really use something better. But be very careful what you wish for, because for all of it's warts, web apps remain one of the only viable ways to produce widely available applications using open standards. Take that away, and we're back to the 1980's, when the only way to do anything was to serve at the caprice of proprietary vendors.
Novel Solution by Sentrion · 2012-08-30 02:00 · Score: 2, Interesting

This was a real problem back in the 1980s. Everytime I would connect to a BBS my computer would execute any code it came across, which made it very easy for viruses to infect my PC. But lucky for me, in the early 90's the world wide web came into being and I didn't have to run executable code just to view content that someone else posted. The PC was insulated from outside threats by viewing the web "pages" only through a "web browser" that only let you view the content, which could be innocuous text, graphics, images, sound, and even animation that was uploaded to the net by way of a non-executable markup language known as HTML. It was at this time that the whole world began to use their home computers to view content online because it was now safe for amateurs and noobs to connect their PCs to the internet without any worries of being inundated with viruses and other malware.
Today I only surf the web with browsers like Erwise, Viola, Mosaic, and Cello. People today are accessing the internet with applications that run executable code, such as Internet Explorer and Firefox. Very dangerous for amateurs and noobs.
1. Re:Novel Solution by BronsCon · 2012-08-30 04:00 · Score: 1
  
  Today I only surf the web with browsers like Erwise, Viola, Mosaic, and Cello. People today are accessing the internet with applications that run executable code, such as Internet Explorer and Firefox. Very dangerous for amateurs and noobs.
  So, which are you, an amateur or a noob?
  
  --
  APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
2. Re:Novel Solution by pipedwho · 2012-08-30 16:54 · Score: 1
  
  Huh? Maybe I missed the woosh.
  Text based terminal apps for DOS like Telix/Procomm/Zcomm/etc that were used in the '80s to connect to BBSs didn't execute any code at all. In fact, back then, with the exception of a few internet worms and trojan style applications, viruses were all disk based and propagated when someone physically passed the media to someone else.
  It wasn't until email/internet and Windows became popular in the '90s that non-disk based viruses became a problem. For a time, browsers were utterly insecure, but at the time, web content was in it's infancy and drive by attacks were few and far between. Trojaned apps, and people clicking on 'images' with .exe extensions in their email clients (ie. Outlook) were the main concern towards the end of the '90s.
It is True by Anonymous Coward · 2012-08-30 02:14 · Score: 0

I googled and found that it is TRUE...
Escaping is hard by TheLink · 2012-08-30 02:34 · Score: 1

The problem is you currently can't escape everything reliably.
Why? Because the mainstream browser security concept is making sure that all the thousands of "Go" buttons are not pressed aka "escaped". But people are always introducing new "Go" buttons. If your library is not aware of the latest stuff it will not escape the latest crazy "Go" button the www/html/browser bunch have come up with.
So in theory a perfectly safe site could suddenly become unsafe, just because someone made a new "Go" button for the latest browser. Your library could also parse things differently from the victim browser.
Many years ago I proposed a tag to disable any active stuff. A "Stop" button if you like in a world full of "Go" buttons. But most of the browser and W3C people weren't interested. If they had done it, a lot of those worms (MySpace etc) wouldn't have worked at all.
Only recently they have finally come up with something called Content Security Policy: https://developer.mozilla.org/en-US/docs/Security/CSP/Introducing_Content_Security_Policy
"Stop" buttons aren't 100% but it's way easier to specify a "Stop" than it is to make sure that all the hundreds of current AND future "Go" buttons are properly escaped.
Car Analogy: before CSP, browsers were like cars with hundreds of accelerator pedals. To stop you had to make sure ALL the pedals were not pressed!
Anyone who thinks escaping is easy to do 100% should go look at the various security researcher/hackers guides on exploiting stuff. Especially if you are trying to still allow HTML content (say from advertisers or HTML email for your users). It's easy if you are only going to allow ASCII text. But once you throw in HTML and unicode, it all starts to get complicated.
--
- Too many replies beneath your current threshold
1. Re:Escaping is hard by dgatwood · 2012-08-30 05:14 · Score: 1
  
  This is why the correct solution is always whitelisting, not blacklisting. Whitelist the allowed tags, attributes, CSS subsets, etc. that you consider safe. This way, anything added to the specification is likely to get stripped out by your filtering code.
  For example, I'm working on a website in which users provide content in a subset of HTML/XML. The only tags I'm allowing are p, span, div, select, and a couple of custom tags. The only attributes I'm allowing are the chosen value for the select element, the values for the option elements (which actually get replaced about a quarter second after the browser loads the content), the id attribute, and the class attribute. Everything else goes away and never gets served back, even to the same user.
  The fundamental problem with trying to prevent users from doing XSS using content security policies is that to do it right, you need something more fine-grained than "this entire page may not contain JavaScript". Any real-world site is going to contain JavaScript, even if it is only a simple onclick element that executes an actual JavaScript function from a .js file provided by the server. So a truly no-js policy on the whole document would be a useless policy. And since the user-served content comes from the same server, a policy that limits it to content from a single server is also a useless policy. The policy logic needs to be at the element level in order to be useful, e.g. nothing in this user-provided div element is allowed to provide any JavaScript. And even at the element level, you would still need to do sanity checking on the server—parsing, specifically—to ensure that the user doesn't specify HTML content with extra close tags to break out of the sandboxed div or whatever.
  Also, such a policy engine would have to be designed in such a way that it allows things like onclick handlers to be added to those elements using JavaScript code after the fact (just not through the onclick HTML attribute). Otherwise, you could not usefully allow user-provided HTML on any interestingly dynamic site (e.g. contentEditable). With such a design, you might be able to come up with a way to provide user-provided content in an iframe element using a data provider script running on a server in a different subdomain (assuming you don't run afoul of the browser's cross-domain security policies) so that they would not be able to provide code themselves, but your own scripts could still add executable handlers into them. However, it would be a colossal hack, with very steep costs, both in terms of needing an SSL wildcard cert and in terms of making your site performance much, much worse (because you'd need an extra HTTP request for each one of those iframe elements).
  Ultimately, what this comes down to is a fundamental flaw in the way HTML's JavaScript support was designed. Under the hood, if you assign a handler to an element, you assign a function pointer/reference, but at the HTML level, you assign a string that gets parsed and turned into a new function, which gets assigned to the actual object's attribute. If HTML had been designed for security, the syntax would be onclick="function_name", which would prevent you from encoding any arbitrary JavaScript code into those attributes. WIth that change, a content security policy at the document level would be feasible. Unfortunately, HTML isn't designed for security—it is designed for the designer's convenience—which means that anybody taking arbitrary third-party HTML and incorporating it into a website must either do so with human intervention or must use some sort of whitelisting technique. Anything else borders on madness.
  
  --
  Check out my sci-fi/humor trilogy at PatriotsBooks.
2. Re:Escaping is hard by fatphil · 2012-08-30 06:43 · Score: 1
  
  > The problem is you currently can't escape everything reliably.
  
  You can - by escaping everything.
  
  > If your library is not aware of the latest stuff it will not escape the latest crazy "Go" button
  
  It will. It escapes everything. What bit of "everything" did you not understand.
  
  Sure, it won't let people have crazy "go" buttons, whatever they are, but nothing of value was lost.
  
  --
  Also FatPhil on SoylentNews, id 863
JavaScript whacky encoding also by Anonymous Coward · 2012-08-30 02:41 · Score: 0

A long the same line, someone has also described and published tools to create JavaScript using only the following characters: ()[]{}!+
My explanation of article by kent.dickey · 2012-08-30 03:35 · Score: 5, Informative

The blog post was a bit terse, but I gather one of the main problems is the following:
Google lets users upload profile photos. So when anyone views that user's page, they will see that photo. But, malicious users were making their photos files contain Javascript/Java/Flash/HTML code. Browsers (I think it's always IE) are very lax and will try to interpret files how they please, regardless of what the web page says. So, webpage says it's pointing to a IMG, but some browsers will interpret it as Javascript/Java/Flash/HTML anyway once they look at the file. So now a malicious user can serve up scripts that seem to be coming from Google.com, and so they are given a lot of access at Google.com and break their security (e.g., let you look at other people's private files).
Their solution: user images are hosted at googleusercontent.com. Now, if a malicious user tries to put a script in there, it will only have the privileges of a script run from that domain--which is no privileges at all. Note this just protects Google's security...you're still running some other user's malicious script. Not google's problem.
The article then discusses how trying to sanitize images can never work, since valid images can appear to have HTML/whatever in them, and their own internal team worked out how to get HTML to appear in images even after image manipulation was done.
Shorter summary: Browsers suck.
Re:Before 1939, "propaganda" just meant "PR" by Hazel+Bergeron · 2012-08-30 04:02 · Score: 0

And madness seems like madness to those who are insightful.
Or perhaps these self-appointed judges are mere dullards assuming themselves to have insight.
Mod parent up by psydeshow · 2012-08-30 05:25 · Score: 1

I read the TFA, that's a great summary.
It's like waking up in a crappy mirror universe where all the work that we have done on security in the past 10 years is out the window, because unbeknownst to anyone but the browser vendors, our web browsers will go ahead and execute code embedded in non-executable mimetypes.
Would it have been so hard to limit JavaScript execution to the handful of content types where it is supposed to be found? Apparently. So now images are Turing-complete, and all your cookies can be lifted by someone who puts <script src="http://private.com/users/you/profile.jpg"></script> in a page you visit.
1. Re:Mod parent up by Carnildo · 2012-08-30 09:57 · Score: 1
 
 Apparently. So now images are Turing-complete, and all your cookies can be lifted by someone who puts <script src="http://private.com/users/you/profile.jpg"></script> in a page you visit.
 It's worse than that. If you're using Internet Explorer, your cookies can be lifted by someone who puts <img src="http://private.com/users/you/profile.jpg"> in a page you visit, or your flash storage tampered by <a href="http://private.com/uploads/schedule.txt">.
 
 --
 "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
Re:Problem can be solved, but users are the proble by Jonner · 2012-08-30 06:48 · Score: 1

Images and text can be sanitized reliably. The problem is that this strips out all of the non-essential features. Users have a hard time understanding that, because users do not understand the trade-offs involved.
But the process is easy: Map all images to meta-data and compression free formats (pnm, e.g.) then recompress with a trusted compressor. For text, accept plain ASCII, RTF and HTML 2.0. Everything else, convert either to images or to cleaned PDF/Postscript by "printing" and OCR'ing.
If you'd read TFA, you'd know that it explains why this is insufficient:

For a while, we focused on content sanitization as a possible workaround - but in many cases, we found it to be insufficient. For example, Aleksandr Dobkin managed to construct a purely alphanumeric Flash applet, and in our internal work the Google security team created images that can be forced to include a particular plaintext string in their body, after being scrubbed and recoded in a deterministic way.
Re:Before 1939, "propaganda" just meant "PR" by Anonymous Coward · 2012-08-30 07:10 · Score: 0

In your case, yes. Clip related: http://www.youtube.com/watch?feature=fvwp&NR=1&v=WGmY96qhnBI
No relevant results for "around". by tepples · 2012-08-30 08:09 · Score: 1

Google around.
around didn't provide relevant results.
But with the literal-minded housekeeper costume off, forge referer and spoof referer still don't. This page is from 2006, and this page likewise explains a flaw that has since been fixed. This page claims that it's possible to forge a referer in the visitor's browser using redirection, but only from a domain that the attacker controls. This result claims that the only way is to get the user to install a plug-in: "If you want to redirect a visitor to another website and set their browser's referer to any value you desire, you'll need to develop a web browser-plugin or some other type of application that runs on their computer. Otherwise, you cannot set the referer on the visitor's browser." A bunch of results were links to such plug-ins, but the viewer is likely to decline the plug-in installation. What am I missing?
1. Re:No relevant results for "around". by 19thNervousBreakdown · 2012-08-30 08:25 · Score: 1
 
 There are legitimate reasons to want to do this, and people are much more likely to be helpful when you're not asking how to exploit. With that in mind, the search set referer xmlhttprequest gives decent results. If you look through them, you'll see that it used to work in basically every browser by simply setting the header, but there are now various levels of protection depending on the browser, the calling code's domain, and where the request is going.
 All in all, basing security on a header that was never secure is a dumb idea. Instead of redefining an old header, make a new one. This is security we're talking about, not opening a Word 97 document on Word 2008. If it's not secure, it should break, it shouldn't make a best effort.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
2. Re:No relevant results for "around". by WaffleMonster · 2012-08-30 11:57 · Score: 1
 
 All in all, basing security on a header that was never secure is a dumb idea. Instead of redefining an old header, make a new one. This is security we're talking about, not opening a Word 97 document on Word 2008. If it's not secure, it should break, it shouldn't make a best effort.
 Please see the specification which explicitly exempts Referer and a host of other fields in request from being user changable.
 http://www.w3.org/TR/XMLHttpRequest
 
 All in all, basing security on a header that was never secure is a dumb idea.
 The browser is expected to moderate certain activity protecting the end user from an "anything goes" scripting environment.
 There were security bugs that got fixed but I don't understand the assertion being made about this field never being secure. The specifications for xmlhttprequest seems pretty clear to me on the intent to protect this field.
 
 If it's not secure, it should break, it shouldn't make a best effort.
 You seem to be making a philosophical or political argument. While I respect your opinion no specification is perfect and people often don't have the luxury of always doing what "sounds good" on paper.
 I invite anyone to provide an objective reason or reference to the same why this would not work in the "real world".
3. Re:No relevant results for "around". by 19thNervousBreakdown · 2012-08-30 13:18 · Score: 1
 
 I've already conceded that it is no longer supposed to be possible.
 The point is that it's not a strong security technique. If you look at the first release of the spec you linked, you'll find that there is no mention of the Referer header, and according to the spec, the opposite behavior is what was specified, and that specification was in fact followed. Do some googling, you'll find many references to this working. And XMLHttpRequest came out in 1999, so it's spent most of its life in its insecure incarnation.
 Even if relying on the server to not accidentally send malicious content based on a header that the client sent was a good idea, redefining a header from insecure to secure is an inherently bad idea.
 You're also trying to argue both sides here, by saying that browsers need to deal in the real world, but at the same time they should make assumptions that could only be valid if every browser instantly followed specifications as they were released. In the real world, the header was redefined, and the redefinition caused years later as developers "missed spots" when securing it. In the real world, it's easier to create an entirely new header that is supposed to be secure from the beginning--if the header is there, you are safe to assume it's secure. If it's not, you don't have that security and can make an informed decision as to whether or not to continue. In redefining the header, they took that choice away and created a situation where you had to rely on browser sniffing to determine if you're dealing with the secure or insecure version of the header (although, in the real world, nobody bothered with that because it was so spectacularly difficult to get right). It's terrible security practice all around, and this should be obvious to anyone who's actually done development.
 
 I invite anyone to provide an objective reason or reference to the same why this would not work in the "real world".
 It didn't work in the real world, as evidenced by the years spent where referer spoofing was a common attack. Just because we slogged through and came out on the other side eventually doesn't mean that we couldn't have done better by taking the obviously better tack of creating a new header.
 I personally think that we could do one better by not relying on any sort of referrer mechanism, but at the very least, saying the way it was done is the best way in light of the history of the issue is silly.
 
 --
 <xml><am><so><damn>Web 2.0</damn></so></am></xml>
4. Re:No relevant results for "around". by WaffleMonster · 2012-08-30 18:46 · Score: 1
 
 You're also trying to argue both sides here, by saying that browsers need to deal in the real world, but at the same time they should make assumptions that could only be valid if every browser instantly followed specifications as they were released.
 I'm not concerned about the past. I tested every current browser I could find YEARS ago and they all worked as expected.
 Old versions of browsers are subject to a wide array of exploits dwelling on them is fruitless.
 
 In the real world, the header was redefined,
 The purpose of the referer field has always been constant as far as I understand it.
 
 redefinition caused years later as developers "missed spots" when securing it. In the real world, it's easier to create an entirely new header that is supposed to be secure from the beginning
 Again I don't think referer has changed. New features were added to the platform which interfered with previous security assumptions. People later fixed their mistakes after realizing the error of their ways.
 
 It's terrible security practice all around, and this should be obvious to anyone who's actually done development.
 I personally think that we could do one better by not relying on any sort of referrer mechanism, but at the very least, saying the way it was done is the best way in light of the history of the issue is silly.
 I am only concerned with the present and future. The "best" way is useless to me if it does not exist or I can't take advantage of it in the present.
 Right now the only two options other than checking referer I have any knowledge of are:
 1. Not using cookies at all
 2. Using different domains for the cookies
 Neither option is acceptable and wishing things were different does me no good in the present.
 Googles domain concept is particularly useless when access to user uploaded content is subject to authentication.
Re:Problem can be solved, but users are the proble by Carnildo · 2012-08-30 09:51 · Score: 1

Images and text can be sanitized reliably.
The point of the article is that they can't. Internet Explorer can be coerced into interpreting JPEG images as HTML, interpreting ASCII text as Flash, and interpreting text/plain documents as text/html, among other things. You can also play games with the encoding-recognition code by tweaking the first few bytes of the file, such that a document uploaded as ISO-8859-1 is interpreted by IE as UTF-7, or whatever other encoding suits your purposes. Note that in all of these attacks, the file is entirely valid in its original format, so there is nothing the server can do to prevent them.

--
"They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
html5 threw away well-formedness by Anonymous Coward · 2012-08-31 04:10 · Score: 0

xhtml did save us. It was easy to check for well-formed xhtml with all the xml tools. But apparently proper xml is so hard to write these days, that in its shiny new specification, html5 did away with that entirely and went back to the old html model of a stream of junk tags where anything goes and the browser does its best to interpret it.