Unicode Encoding Flaw Widespread

← Back to Stories (view on slashdot.org)

Unicode Encoding Flaw Widespread

Posted by kdawson on Monday May 21, 2007 @06:33PM from the sneaking-past-the-IDS dept.

LordNikon writes "According to this CERT advisory: 'Full-width and half-width encoding is a technique for encoding Unicode characters. Various HTTP content scanning systems fail to properly scan full-width/half-width Unicode encoded HTTP traffic. By sending specially-crafted HTTP traffic to a vulnerable content scanning system, an attacker may be able to bypass that content scanning system.' A proof of concept affecting IIS is already being posted to security mailing lists. Cisco IPS and other IDS products are also affected." The CERT advisory lists 93 systems, with 6 reported as vulnerable (including 3com, Cisco, and Snort), 5 known not vulnerable (including Apple and HP), and the rest unknown.

18 of 184 comments (clear)

Min score:

Reason:

Sort:

Limited impact. by shird · 2007-05-21 18:38 · Score: 3, Informative

This appears to be limited to content scanning, and isn't really a vulnerability in itself. Relying on content scanning to prevent an exploit to reach an exploitable system is a pretty bad idea, much better to fix the system than the extra layer of defense on the outside.

Content scanning is mostly useful against filtering known exploits, and is hardly meant to be your primary defense. Being able to bypass this scanning won't buy you much. If the content scanner is aware of an exploit it scans for, chances are so are the systems being targeted and are patched to protect against it.

--
I.O.U One Sig.
1. Re:Limited impact. by TheRaven64 · 2007-05-21 22:27 · Score: 4, Informative
  
  Windows makes no distinction between privileged and unprivileged ports, so any application that can open sockets can listen on port 80. That said, every port number (and every other object in the NT kernel) has an associated ACL, so it is possible to limit them on an individual basis. I've never seen this exposed to the UI though, so I've no idea how you'd go about doing it. Filesystem objects also have ACLs, so I'd imagine that IIS is not allowed access to the filesystem outside the tree it is sharing.
  The NT kernel provides a lot of facilities that are very useful for writing secure code. I often wonder if the application developers at Microsoft ever noticed that they weren't writing code on top of DOS anymore...
  
  --
  I am TheRaven on Soylent News
2. Re:Limited impact. by flydpnkrtn · 2007-05-22 01:03 · Score: 2, Informative
  
  I think he meant getting to "port object permissions" on a programmatic level... with an API. What you are describing are filesystem Access Control Lists. He's talking about using ACLs on ports. Everything being an object in NT, and being able to have ACLs applied to "everything," is a good idea. As the grandparent said, the application developers at MS just have to use them.
  
  Basically the "Security tab" you see for files could be applied to individual ports.
  
  --
  Here's to the crazy ones
3. Re:Limited impact. by rabtech · 2007-05-22 03:11 · Score: 4, Informative
  
  The NT kernel has a root namespace for everything in the system (from local filesystems to network drives to sockets to synchronization objects like mutexes), and in fact treats everything as a file (just like Unix) underneath.
  
  Using the Native (NT Executive) API you can read or set the ACL on any object in the namespace, assuming you have the appropriate user rights and you own the object (or the ACL allows you to modify the permissions). NT kernel objects can also be case-sensitive (though that can confuse some Win32 programs). Often, you can delete, move, etc files that are locked by the Win32 subsystem, which can be useful in certain situations (though in Vista they made the IO system capable of cancelling outstanding IOs on its own so the zombie process bug that ends up locking files doesn't happen anymore. Its unfortunate Vista is so DRM-laden, or I'd try upgrading.)
  
  The APIs are NtQuerySecurityObject and NtSetSecurityObject and I believe the devices are in \Device\Tcp, \Device\Ip, \Device\RawIp, \Device\Udp, etc. Check out http://undocumented.ntinternals.net/ for more details on what is in the native API (ntdll). This API provides everything necessary to implement a full POSIX layer, which is exactly what Services for Unix does, installing itself as a new runtime subsystem right next to the Win32 subsystem. (With Server 2003 R2 SP2 they shipped it as an available component as part of the install; I've even got setuid support and GCC installed as part of the package.)
  
  --
  Natural != (nontoxic || beneficial)
4. Re:Limited impact. by Anonymous Coward · 2007-05-22 11:01 · Score: 1, Informative
  
  "The notion of reserved ports doesn't exist on Windows" - by Ravnen (823845) on Tuesday May 22, @06:44AM (#19218927)
  
  Check this then:
  
  HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Servic es\Tcpip\Parameters
  
  And, there? Check the "RESERVED PORTS" parameter... here is documentation (scant) on it from MS:
  
  How to reserve a range of ephemeral ports on a computer that is running Windows Server 2003 or Windows 2000 ServerL
  
  http://support.microsoft.com/kb/812873
  
  Apparently, this does exist, albeit apparently ONLY for "ephemeral ports" (short-lived ones).
  
  (AND, iirc, UDP based ones only are used for this value afaik - the reason I make that statement, is when I have attempted to perform UDP filtering, it NEVER works out right & I have connection problems, when I attempted to use that on UDP!)
  
  Port filtering stuff that I outlined here (and this is WHY I mention I let everything in on UDP, instead of limiting it as I did on TCP to ports 80/8080/443 only).
  
  http://it.slashdot.org/comments.pl?sid=235621&thre shold=-1&commentsort=0&mode=thread&cid=19221131
  
  IP PortFiltering is done here/HOW TO, STEP-by-STEP:
  
  Start Button -> Control Panel -> Network Connections -> Local Area Connection (or whatever you called yours) -> Properties Button -> (Next Popup dialog screen) -> Highlite "Internet Protocol (TCP/IP) -> Click the PROPERTIES button -> Click the ADVANCED button @ the bottom of this screen -> Go to the OPTIONS tab & highlite TcpIP Filtering & click the PROPERTIES button -> Check off "ENABLE TCP/IP Filtering on ALL Adapters" -> Permit only (add ports as you need to here)
  
  E.G./I.E. -> In the tcp list section, I leave 80/8080/443, for my personal home use @ least. In the UDP list I let all pass thru, & in the IP stack list, I only allow 16 (tcp) & 7 (udp).
  
  (Any feedback on this note is appreciated. I can learn from you all, like anyone else is why.)
  
  APK
Send your claim in now by Anonymous Coward · 2007-05-21 18:42 · Score: 1, Informative

Quick! Claim the $16,000! http://it.slashdot.org/article.pl?sid=07/05/18/189 208

"It's very hard to exploit [those listed applications]," Aitel said. "IIS 6 hasn't had a public remotely exploitable bug in it. Ever."

Doh!
"Not vunerable" by iamacat · 2007-05-21 19:20 · Score: 2, Informative

According to the advisory, Apple products do not provide HTTP content filtering and are therefore not vulnerable. This will do nothing to help someone build a functioning protection system.
Re:Apple and HP? by Anonymous Coward · 2007-05-21 20:38 · Score: 1, Informative

Their comment on Apple is that they don't have anything that provides this functionality, so there's nothing that can be vulnerable.
Re:Depends on alphabet size by TheRaven64 · 2007-05-21 22:24 · Score: 2, Informative

Chinese ideographs are so numerous and difficult to remember that they are considered one of the reasons for China's incredibly low literacy rate. If you want some evidence of this, then take a look at what happened to Korea when it dropped the Chinese ideograms in favour of a new, home-grown phonogram-based alphabet.

--
I am TheRaven on Soylent News
Re:Not a surprise... by gnasher719 · 2007-05-21 23:46 · Score: 2, Informative

Unicode is of course not the problem at all.

The problem is using character sets that can represent huge amounts of different characters, and among them characters that have similar looking glyphs. That is at the same time a feature that people really really want.

So spam filters will have a problem. They filter out "Viagra" but they don't filter out sequences of letters that look the same. Well, tough. If you follow the rule not to follow any links in emails but type them in yourself, that gets you mostly around it.

The other "problem" is filtering to prevent SQL injection and all that crap. There I'd have to say two things: 1. It is just common sense if you accept Unicode to translate it into a canonical form first, either precomposed canonical or predecomposed canonical (by the way, predecomposed canonical UTF8 is what the MacOS X file system uses). Once that is done, nothing unexpected should slip through. 2. Why would you need to filter out anything at all? This is a completely brain-damaged approach in the first place, using user input to form commands that could potentially be dangerous and filtering out user input that would produce dangerous commands. Instead, there shouldn't be any commands that could be dangerous in the first place.
Re:Smelly foreigners by vtcodger · 2007-05-21 23:55 · Score: 2, Informative

***Would some of the things that led to computers - morse code, telegraphy etc have been feasible using, say, Chinese in its normal written form?***
The answer would seem to be -- sort of ... maybe. See http://www.njstar.com/tools/telecode/jim-reeds-ctc .htm.
Summary: For telegraphy, Chinese characters are assigned numeiic codes in radical-stroke count order. That's the way that Japanese, and -- I assume -- Chinese, dictionaries, are arranged.
It may seem inefficient to use 20 bits (sort of) to encode a character, but remember that each character is a word, not a letter, and that composite words like "Beijing" or "paleontology" are only two words. That means that most "words" will be either 2.5 or 5 eight "bit" characters. Conventional telegraphy is really a trinary rather than a binary code -- pause, short, and long, and the 'digits' differ in length -- so bit count isn't really all that accurate an analogy.
So, no, the Chinese language probably wouldn't have made the development of computers by the Chinese all that much more difficult than European languages did. And the classic Chinese numeric notation is not as convenient as 'arabic' notation. But it's much less unwieldy than say Roman numerals, so I don't think it would have been an insumountable hurdle either.

--
You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey
Re:Another likely example of OSS? by Frankie70 · 2007-05-22 02:51 · Score: 2, Informative

Apparently, Vista's networking stack has been rewritten from scratch -- which does make you wonder how much of the reason for that was technical, and how much was MS wanting to be seen to get rid of all the BSD/*nix code in Windows in preparation for their patent offensive...

Why should using BSD code come in the way of their patent offensive?
Using BSD code isn't infringing on BSD's or someone else's patent.
Re:Another likely example of OSS? by Anonymous Coward · 2007-05-22 04:02 · Score: 1, Informative

When NT 3.1 first shipped in 1993, Microsoft did not have the resources to develop all of the network stacks. They wrote the NetBIOS stack but licensed the OSI stack from a 3rd-party and contracted the TCP/IP stack from another 3rd-party. This TCP/IP contractor used the BSD code for expediency, which is the source of the rumor.

However, the BSD stack required emulation of the STREAMS interface which was pretty inefficient. For the NT 3.5 release in 1994 they wrote their own TCP/IP stack in-house, which is the same stack that shipped with Win95.

MS hasn't shipped the BSD TCP/IP stack for 14 years. The reason they rewrote it for Vista is to incorporate IPv6. With XP, you had to install the v6 stack separately.

dom
Re:IIS's fault by spitzak · 2007-05-22 08:43 · Score: 2, Informative

They are there for compatability with some Japanese and Chinese character sets, which contained most of the ascii characters in both "half" and "full width" forms. The full-width ones were twice as wide to match the square characters, which was useful for lining up columns.

This is all pointless now with proportionally-spaced fonts (and multiple fonts, you could easily select the "wide" font to print those characters instead). However Unicode had as a design requirement that translating from any common encoding to unicode and back again would be the identity transform. Thus if any character set existed with two ways of representing the same character, then there had to be two ways to represent it in Unicode. Therefore the full-width characters. This is also why Unicode has hundreds of random accented characters even though the combining characters would allow all of them to be represented easily with only a few dozen characters.
Re:Depends on alphabet size by Anonymous Coward · 2007-05-22 09:00 · Score: 1, Informative

The word you're looking for is "romaji"

As for Japanese, the language needs to change severely to make all-alphabet practical. They have homophones like you wouldn't believe (comes from the small number of sounds the language uses). It's very easy to write a sentence that, even in context, could mean multiple widely varying things if you don't have kanji to indicate meaning.
Re:Depends on alphabet size by ShakaUVM · 2007-05-22 09:05 · Score: 2, Informative

You're missing the key roadblock to simply replacing characters with pinyin, or any other romanization: Chinese is a heavily overloaded language. While there are a bit of homophones in English, *every* word in Chinese is a homophone, with something like 13 different homophones per sound on some of them. We differentiate some of homophones by writing them differently (layed, laid, etc.), Pinyin *cannot* differentiate these homophones -- it's an exact transcription of the sound. Chinese differentiate their written words with characters. When having a conversation you can get by with spoken Chinese or pinyin, since you can always ask the other person which character they meant, if there's confusion, and Chinese will make do with pinyin in a pinch, but it's more or less impossible to ask them to switch to a Romanisation for all purposes.
Re:Smelly foreigners by jc42 · 2007-05-22 10:19 · Score: 2, Informative

The notable difference between Chinese and English (or most other written languages) is that several English characters combine to form syllables, which combine to form words (i.e., we use an alphabet). In Chinese, each character corresponds directly with a word (each character is a logogram).

Actually, this is pretty much a myth that originated from people with very little knowledge of Chinese language and writing. In all the Chinese languages ("dialects";-), most of the vocabulary is two-syllable words, as in English. Three-syllable words aren't uncommon. The writing system is actually a sort of syllabary, and the meaning of most two-character words can't be inferred from knowing what the syllables mean as standalone words.

It's similar to how lots of English words, e.g. "insight", can be parsed as two words ("in"+"sight"), but this doesn't really help you understand what the word actually means. Or, an example that shows how such things evolve is the English word "upstairs". If I say I'm going upstairs and take the elevator, did I lie to you? Of course not, because "upstairs" doesn't mean going up stairs. It did a few centuries ago, but hasn't meant that during the lifetime of anyone alive now. Similarly, proto-Chinese of N thousand years ago may have been mostly single-syllable words, but this hasn't been true for at least the few thousand years that we have readable examples of the writing system.

For a Mandarin example, which I'll write in pinyin (or pin1yin1;-) to get past the /. filters, consider the word zi4ran2. The zi4 syllable is a word, and means "from" or "since" (and is also used like "-ly" to form adverbs). The ran2 syllable is also a word, and basically means "correct" or "yes". The zi4ran2 combination means "nature" or "naturally". Like "insight", you might be able to kludge some sort of connection here, but in reality you just have to learn zi4ran2 as a separate word unrelated to its two syllables. It may have been a two-word idiom several thousand years ago; it's a two-syllable word now.

For an entertaining debunking of both this myth and a very common trope among Western pseudo-intellectuals and pop psychologists, read this article at languagelog. After chuckling at that particular bit of silliness about Chinese writing, you can find other articles there that go into the general problem in more detail. A number of experts in East-Asian linguistics regularly contribute to that blog, and they've been pushing for a campaign to debunk the nonsense that Westerners insist on saying about these languages.

Oh, well; I haven't yet heard any claim that Chinese doesn't have a word for "freedom". But I wouldn't be surprised. (Hint: the word starts with the same character as the above "zi4ran2", but has a different second character. ;-)

--
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Re:Smelly foreigners by jc42 · 2007-05-23 03:23 · Score: 2, Informative

[T]he classic Chinese numeric notation is not as convenient as 'arabic' notation. But it's much less unwieldy than say Roman numerals, so I don't think it would have been an insumountable hurdle either.

Actually, classical Chinese numbers are only slightly worse than Arabic notation (which apparently developed in India but was spread by Arab traders who knew a good accounting system when they saw it). The Chinese notation was far better than any of the Western number notations that the Arabic notation supplanted, such as the Greek or Hebrew notation. Roman was probably the worst notation ever invented, and nobody ever really used it for accounting.

The basis of the Chinese system was symbols for 1 to 9, and symbols for powers of 10. To illustrate with ascii characters, the symbol for 10 looks like a large '+' sign, so we can use + for 10, H for hundred, T for thousand. We'd write the number 5347 as 5T3H4+7. Unused powers of 10 are omitted, so 2007 would be 2T7. 1024 would be T2+4 or 1T2+4. And so on. There are symbols for a few more powers of 10, and they can be chained to get higher powers of 10, so HT could be used for 100,000.

Nowadays, most numeric work in East Asia is done using the Western version of Arabic notation. But you also see a hybrid form that uses the Chinese 1-9 characters plus the Western 0. Converting between this notation and the traditional Chinese notation is essentially trivial and can be done as fast as you can write the numbers. But for arithmetic on paper, the Arabic form (or Arabic with Chinese digits) is a bit simpler than the traditional Chinese notation, since using 0 as a place holder results in correct alignment in columns of numbers, and the digits 1-9 are a bit faster to write than the Chinese digits.

An interesting aspect of the Chinese system is that the basic symbols have alternate "fancy" forms with a lot more strokes. These characters have the property that you can't add strokes to convert them to a different character. So they're an anti-tampering, fraud-proof way of writing numbers. I don't know of another numeric notation with this feature. Asian financial documents have historically used these fancy forms of numbers.

Actually, the Chinese and Arabic notations are the 3rd and 2nd easiest numeric notation that various societies have invented. A few years ago, Scientific American had an interesting article explaining the Mayan number system, and included an explanation of why it was a lot easier to use than the Arabic system. For example, instead of the big multiplication table that we memorized in school, the Mayan system really only needs one rule: 5x5=15. (This makes sense if you understand that they used base 20.) The rest of the rules for adding, subtracting and multiplying consist of the techniques for "carrying" and "borrowing", and are essentially similar to what you do with an abacus.

But I suppose we're stuck with the Arabic system. It's good enough, really, for the remaining uses where we don't bother with a computer.

--
Those who do study history are doomed to stand helplessly by while everyone else repeats it.