Unicode Encoding Flaw Widespread

← Back to Stories (view on slashdot.org)

Unicode Encoding Flaw Widespread

Posted by kdawson on Monday May 21, 2007 @06:33PM from the sneaking-past-the-IDS dept.

LordNikon writes "According to this CERT advisory: 'Full-width and half-width encoding is a technique for encoding Unicode characters. Various HTTP content scanning systems fail to properly scan full-width/half-width Unicode encoded HTTP traffic. By sending specially-crafted HTTP traffic to a vulnerable content scanning system, an attacker may be able to bypass that content scanning system.' A proof of concept affecting IIS is already being posted to security mailing lists. Cisco IPS and other IDS products are also affected." The CERT advisory lists 93 systems, with 6 reported as vulnerable (including 3com, Cisco, and Snort), 5 known not vulnerable (including Apple and HP), and the rest unknown.

12 of 184 comments (clear)

Min score:

Reason:

Sort:

Not a surprise... by gweihir · 2007-05-21 20:22 · Score: 1, Insightful

That Unicode is a very bad idea in all semantics carrying containers is nothing new. In fact one of the counterarguments to Unicode ist that it is a nightmare to secure. Filter evasion was expected to be a typical security concern. We will see more of this and all only because some people want features without ever reflecting on what problems they might cause.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
1. Re:Not a surprise... by etnu · 2007-05-21 20:42 · Score: 5, Insightful
  
  You'd prefer securing against vulnerabilities in dozens, if not hundreds of different encodings? The only people who are against Unicode are those that have never had to work with more than one written language in the same project. Yes, it's a lot easier to secure stuff when you only accept ASCII or ISO8859-1/Windows CP-1252, but then you're limiting your software to about a third of the world (if that). Crappy engineers are going to write crappy code no matter what the encoding. No sense compromising for the sake of poorly written software.
2. Re:Not a surprise... by KiloByte · 2007-05-21 21:23 · Score: 3, Insightful
  
  Wrong, the flaw in Cisco's "security" software and IIS is due to them converting things to 8-bit charsets, not due to Unicode. In fact, the whole idea of "code pages" is fundamentally broken, as it assumes all data ever moves to another places only in the same region.
  
  The idea of double-width characters is broken too, yeah, and they are there only to appease the users of some broken Chinese/Japanese software -- but there's nothing wrong with having strange characters in file names. They don't match any file they are not supposed to unless you try to shoehorn them into a limited character set.
  
  So, it's a flaw in the software, not Unicode by itself.
  
  --
  The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
3. Re:Not a surprise... by kahei · 2007-05-21 22:19 · Score: 5, Insightful
  
  Down below this post, there's a troll writing something like 'lol if u cant just use ASCII u shud let ur language die u foreign creeps lol k thx'.
  
  And a whole bunch of people then jump on the troll and criticize him for his US-centrism, and so on, and the troll is at -1.
  
  Yet the post I'm replying to, which is at +4, really comes to the same thing as this troll; it's simply UNIX 8-bit centric rather than USA ASCII centric.
  
  The fact is, computers are used for text, and much if not most text is non-ASCII. How would you rather represent that text:
  
  --With Unicode
  --With KOI-8, KOI-8R, KOI-8RU, EBCDIC, EUC-KR, EUC-JP, shift-JIS, Shift-JIS-the-Jphone-version, ISCII, VISCII, ISO-2022-*, and the many many other encodings that have evolved in different times and environments.
  
  Seriously, which is going to be easier to secure (and otherwise manage) -- one encoding (which is HEAVILY documented and discussed) or a large number of encodings (the actual number being ever-changing and impossible to really know) many of which are not well documented and have forgotten ramifications and assumptions?
  
  Right -- so now you know why people use Unicode so much.
  
  But the interesting question is, why is one error ("All teh world is teh USA lol! Shouldn't you learn to speak English?") rightly jumped on and pounded flat, whereas another form that's actually more problematic ("All teh world is C on UNIX lolz!! Shouldn't you stop wanting dangerous extra features?") isn't?
  
  Actually, I see in another window that some people have indeed been pounding the parent poster flat, so perhaps my question isn't valid after all.
  
  --
  Whence? Hence. Whither? Thither.
Re:Smelly foreigners by Anonymous Coward · 2007-05-21 20:36 · Score: 2, Insightful

To think that even English fits in 7-bit ASCII is naïve.
Nothing to see, move along ... by udippel · 2007-05-21 22:24 · Score: 4, Insightful

It is a vulnerability, in the strict sense.
It is a self-inflicted misbehaviour as in common sense.
It is like those silly Cisco content inspectors on port 25, that try to avoid attacks on flimsy MTAs.
It is like someone dying from a jab against measles: the jab protected that person from contracting measles, actually.
It is like those stupid anti-virus programs that are more vulnerable than the daemons they profess to protect.

When the attacker uses a codepage different from the one that you think she ought to use, she can circumvent your content filter. Which ought not be an attack vector, in any case.

As I said: nothing to see, move along ...
Re:Limited impact. by Ravnen · 2007-05-21 22:44 · Score: 2, Insightful

The Network Service account on Windows has similar privileges to a normal user, which means it can't access files owned by other users, but can of course read some files owned by the system. The notion of reserved ports doesn't exist on Windows, so no software makes security assumptions based on whether or not a port is below 1024, and the ability to open port 80 doesn't imply any higher privileges than the ability to open any other port.
At any rate, running in a chroot jail is arguably better in some ways than just running as an unprivileged user. Vista has some sandboxing features, using 'integrity levels' and redirecting various file and registry accesses to a 'virtual store', but I'm not really familiar with them, except for the basics, and I don't know if IIS uses them anyway.
Re:Limited impact. by fatphil · 2007-05-21 23:32 · Score: 4, Insightful

I think you've missed his point. There are now two ways that, for example, a quote character can be passed as user input to your program: either as " or as %ublah.

Your program, sitting below the layer performing the unicode translations, doesn't need to do anything differently from before, as it doesn't matter which of the two methods were used. If you _relied on_ the layers above you to strip out, reject, escape, or whatever, quote characters, then you're writing teabag code, and should get a job selling flowers instead, as software engineering is beyond you.

Always validate user input to your own specification. Never rely on something external to do it.

This exploit hasn't changed the rules one little bit, it's just highlighted the fact that some idiots don't follow them.

--
Also FatPhil on SoylentNews, id 863
Re:Limited impact. by CastrTroy · 2007-05-22 00:57 · Score: 2, Insightful

Is this another problem with unescaped quotes? When will people learn? Not an hour goes by that a system doesn't get attacked by SQL injection attacks. Why do programmers continue to not use things like prepared statements which are invulnerable against such attacks. I blame it on the people writing the tutorials. Every beginner tutorial on the web shows queries being constructed at runtime, and doesn't have any mention of how insecure doing things like this is. It's hard to break the habit once you've been programming like that for so long.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Limited impact. by jZnat · 2007-05-22 02:56 · Score: 2, Insightful

Well, the way I see it, there are three ways to handle Unicode characters (one of which is wrong): store as full two-byte Unicode values (inefficient when using mostly ASCII characters like in english), store in a UTF character set such as UTF-8 (useful for primarily ASCII text as it is a superset of ASCII), or pretend it isn't Unicode and treat it as two (or three if input is in UTF-8 for example) separate ASCII characters (bad).

So, perhaps if data was all stored and represented in UTF-8, for example, this wouldn't be a problem? Or perhaps stored as raw Unicode characters via wchar_t (or language equivalent like u"" in Python)?

--
'Yes, firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.'
Re:IIS's fault by phasm42 · 2007-05-22 11:31 · Score: 2, Insightful

here are 2 ways of producing the < glyph: you can use character code x8B or xFF1C.
Shouldn't that be x3C?
I'm not sure if that's right or wrong, if there is a right and wrong way to handle this issue (I suppose that means it's excellent grounds for a religious war)--it's just important that it be handled consistently.
I thought about this a little more, and I think the difference will be in what it is used for. In HTML, the "<" glyph has a special meaning, so it makes sense that a different version (in this case, full-width) of the character should have a different meaning. From an application perspective, perhaps they should be the same. IIS translates the full-width version to the regular version, probably reasoning that if a full-width angle bracket was submitted to the webserver, such as in "<something@somewhere.com>", a regular one was intended. However, this isn't a safe assumption, which leads me to another question -- anyone know if this is optional behavior in IIS, and if so, is it defaulted to on or off?

--
"No one likes working in a hamster wheel, and your shop smells of cedar shavings from here." - TaleSpinner
Re:half-wit encoding? by HeroreV · 2007-05-22 12:13 · Score: 2, Insightful

UTF-16 (fixed-width double byte encoding)
UTF-16 is a variable-width encoding. Code points from plane 0 are encoded in 16 bits and code points from planes 1 through 16 are encoded as two 16 bit surrogates. Many developers, like you, aren't aware of this, so it's very common for software to choke on UTF-16 with surrogate pairs.
I don't understand how mistaking one character for another is going to break anything
scenario:
1) You escape a Unicode string that contains fullwidth characters. The fullwidth characters have no special properties, so they aren't escaped.
2) You translate the escaped Unicode string into ASCII. Fullwidth characters are translated into halfwidth characters. Some of those halfwidth characters, like quotes, have special properties.

The fullwidth quote was perfectly safe, because it wasn't treated like a quote. It was treated the same as an "A" or "b". But when it was translated to a "normal" quote, it went from being a plain old character to being a quote character, with a completely different meaning.

The lesson here is that you should never translate fullwidth characters into halfwidth characters unless you know whether they should be escaped or not, and you should escape them during translation if they need to be. Also, it's not a good idea to translate an escaped string between character sets.