Microsoft Downplays IIS Bug Threat
snydeq writes "Microsoft confirmed that its IIS Web-server software contains a vulnerability that could let attackers steal data, but downplayed the threat, saying 'only a specific IIS configuration is at risk from this vulnerability.' The flaw, which involves how Microsoft's software processes Unicode tokens, has been found to give attackers a way to view protected files on IIS Web servers without authorization. The vulnerability, exposed by Nikolaos Rangos, could be used to upload files as well. Affecting IIS 6 users who have enabled WebDAV for sharing documents via the Web, the flaw is currently being exploited in online attacks, according to CERT, and is reminiscent of the well-known IIS unicode path traversal issue of 2001, one of the worst Windows vulnerabilities of the past decade."
Is Microsoft 'correct' in downplaying, in the sense that the particular vulnerable configuration mentioned is not used by many?
Serious question, has the Apache package even had any bad vulnerabilities like this in the past ten years?
Change is certain; progress is not obligatory.
It sounds like the basic cause is something attempting to translate a string into "unicode" before using it.
For some reason, normally intelligent programmers turn into complete morons when presented with UTF-8 and other Unicode encodings. They become convinced that it is somehow physically impossible to do anything to these strings without first finding all the "characters" (actually Unicode code points, which are not "characters") and will write pages and pages of elaborate and bug-prone code to do this and "count characters". This code is COMPLICATED and there is the basic fact that the mapping is often not 1:1 and even when it is different implementations vary and thus don't invert correctly. This causes bugs, nasty ones like you can see right. here.
In fact it would be trivial to just treat it as a string of bytes that happens to maybe represent some text. The ONLY time you need "characters" is when you are rendering the string into an image that humans will look at, and if you want to do semantic analysis such as grammar checking. It is not needed if you are looking for the period that starts the extension or trying to find a number.
What is really sad and mysterious is that this disease only seems to be triggered by UTF-8. Nobody worries about finding the boundaries between "words". Nobody seems to worry about UTF-16 surrogate pairs, and nobody was really concerned with older Japanese multi-byte encodings.
This is NOT Microsoft-specific so don't feel complacent. Microsoft's moronic decision to name files with UTF-16 is really bad, but witness open source Python 3.0 which has decided that all strings will have to be converted to "unicode" (acutally UTF-16 or UTF-32 depending on the platform) before anything is done to them. Python is heavily used to parse HTML and URLs and I expect a huge mess from this stupid idea.
I'm sure there will be a few responses claiming some magical property of "characters" so that you can't do anything about it. PLEASE, try some thought experiments. Try substituting "words" in your example, it will either be stupid, or you will realize that that only a tiny portion of software needs it. Go and write some code where you leave the strings in UTF-8 and maybe you will learn.