XSS Vulnerabilities Reviewed and Re-Classified
An anonymous reader writes "Security Analysts at NeoSmart Technologies have revisited the now-famous XSS-type security vulnerabilities and attempted to re-classify their status as a security vulnerability. The argument is that XSS vulnerabilities are not a mark of bad or insecure code but rather a nasty but unavoidable risk that's a part of JavaScript - and that even then, XSS 'vulnerable' sites are no less dangerous or vulnerable at heart." Are they unavoidable, or just a symptom of lazy coding, or both?
I work for a bank. A hacker found one page in the Internet banking system that echoed a value from a form into an error message. They then used this to inject some JavaScript which gathered user logons. They managed to acumulate about $70,000 of fools money into a holding account before they were caught. I don't feel particulay sorry for fools who fall for phishing scams but it was still a security hole in the web application that could simply be avoided by an echoding of values before echoing to the page. Since then all code is audited for SQL injection and XSS by an external company before being relesed to production.
XSS is a real security threat.
"By simply turning > into >
Rubbish. That's one of the most basic errors made when people start trying to filter out XSS. Suppose you have a form that takes a user's name and then uses it in a hidden field on the next page ? You could quide easily do something like :
UserName" style="background:url(javascript:alert('Getting rid of angled brackets won't help you here'))
Not an angled bracket in there, yet on most systems that'll work and display a popup. Hence the reason it's really not that simple, and the parent post referrs to "an arms race against the latest techniques"
Yet even this can be too simplistic, since there may be other things that's happening in the background.
The first book to deal with this properly that I ever saw was Innocent Code by Sverre H. Huseby (ISBN 0-470-85744-7, Wiley).
I recommend this not only to people new to web programming, but also to seasoned programmers. There's more than one time that I've heard people say "pfah, I know the pit traps, I don't need this book", and a few weeks later tell me that there were things there they hadn't thought about.
The book is concise and to the point.
Note: I'm not neutral about this book; I was one of the people who read through the book and commented on it before publishing time, and Sverre is one of my friends.
As someone else has pointed out, that's a naïve and incorrect approach.
HTML is a standard. BBcode is a whim. HTML wins for its ubiquity. BBcode gives you nothing.
People that don't think they can effectively and safely include HTML content from untrusted sources are not viewing the problem in a formal way. Address the cause, not the symptom.
The cause is not thinking of and treating your HTML input as structured data. Rather, you're thinking of it as a character stream. Textual substitutions are a sign of that line of thought.
Your user's HTML content is a tree structure. Parse it. Then filter out all elements that are not in your allowed-elements list. Filter out all element attributes that are not in your allowed-attributes lists. Construct these lists by examining the HTML specification and considering the risks of each element or attribute.
Take it a step further. For each attribute value that contains a URI, parse that URI using a formal grammar. Filter out all URI schemes ("http", "ftp", etc) that are not in your allowed-schemes list. Certain characters, like non-printables, should never occur in a URI directly—signal an exception to the user to inform them of their error. Don't just stop if you don't find anything wrong! Reconstruct the URI from its constituent parts and replace the original with your sanitized version.
Likewise, formally parse all CSS code: in referenced external stylesheets, embedded stylesheets, and in style attributes. Filter out anything not explicitly allowed. Replace any URIs with the output of the same URI-sanitization function above. Reserialize the content. (This is hard; drop all CSS as a short-cut.)
When you're done, you'll serialize the HTML document and transmit that to your clients. I guarantee that this will eliminate XSS problems stemming from Internet Explorer incorrectly interpreting malformed HTML, CSS, or URIs. There are other attack vectors; be careful of what you allow to be included inline with documents, or linked to. (Think Flash.)
This is the correct solution, and most flexible to your users. It's not another idiosyncratic language to learn. It's the world standard for rich textual documents on the World Wide Web.
Unfortunately, it requires work.