Unicode Encoding Flaw Widespread
LordNikon writes "According to this CERT advisory: 'Full-width and half-width encoding is a technique for encoding Unicode characters. Various HTTP content scanning systems fail to properly scan full-width/half-width Unicode encoded HTTP traffic. By sending specially-crafted HTTP traffic to a vulnerable content scanning system, an attacker may be able to bypass that content scanning system.' A proof of concept affecting IIS is already being posted to security mailing lists. Cisco IPS and other IDS products are also affected." The CERT advisory lists 93 systems, with 6 reported as vulnerable (including 3com, Cisco, and Snort), 5 known not vulnerable (including Apple and HP), and the rest unknown.
I work incident response in a large web company (hence anonymous posting, natch) and currently we're treating this as "interesting, but case not proven". We test our web apps filter all input so I'm adding double-width unicode to our security regression test cases; however I'm happy to let the FD posters lab it out between them in the short term. These alleged IIS exploits don't work for us - which is not to say that we don't have some system, somewhere, for which this is an issue. At the end of the day it's just a clear restatement of something that's obvious to anyone - you need to filter input carefully, and you need to be aware of issues around alternative encodings. But it's not a "BRB" (big-red-button, ie emergency stop and all hands to the pumps to fix a vulnerability) issue for us - yet. The last time we had one of those, it was the Microsoft DNS server remote root... because most of our internal domain controllers were also running DNS servers.
I'm wondering if the great firewalls (Cisco product?) are also vulnerable to this. At least it'll force them to do longer string matching.
Would some of the things that led to computers - morse code, telegraphy etc have been feasible using, say, Chinese in its normal written form? Are computers biased towards English (and other languages using the same or similar alphabets) because they were largely invented by English speakers, or is the language fundamentally more amenable to small, simple encoding?
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
The notable difference between Chinese and English (or most other written languages) is that several English characters combine to form syllables, which combine to form words (i.e., we use an alphabet). In Chinese, each character corresponds directly with a word (each character is a logogram). If you're interested you can look up Alphabet on Wikipedia as a starting point, although I must admit I find the article hard to follow even though I know what it should be saying.
The practical result of this is that English is normally encoded as a long sequence of 0-25 values (a-z), whereas Chinese would be encoded as a shorter sequence of 0-~100,000 values (Wikipedia reports Chinese dictionaries with 85,000 characters). Naturally, there would be fewer Chinese characters required for a message as each character corresponds to an entire word.
I guess that since morse code is rather like binary and English letters can be encoded using 5 bits, Chinese morse codes would need to be... about 20 bits long? It's late at night, brain not work so good. It seems to me that morse codes using 20 dots/dashes would be extremely difficult to learn; but on the other hand it shouldn't be any more difficult than learning Chinese characters in the first place.
I wouldn't be surprised if English morse codes were more robust against poor data, siny Englxsh is stvll reahible even if sew2eral cheracter; are wrong.
Disclaimer: I don't know anything about the subject, I'm talking out of my elbow for the sake of discussion.
.evom ton seod gis eht
1) unicode is better than having a hundred other encodes to debug
2)there's is nearly two billion chinese and Indians, who can't use your encoding.
3)I get just as much spam from US companies as I do foreign ones
i thought once I was found, but it was only a dream.
IIRC, China was on its way to moving to an alphabet system (certain characters can be used for their alphabetic sounds in various circumstances) and so was Japan (look at Katakana/Hirigana).
It is likely that the introduction of the printing press (and later mass media like TV/radio and computers) have "arrested" this natural evolution. It may also be possible that the development of a national identity and cohesive society tends to put the brakes on some developments as well - if a single unified language is mandated by culture or a central authority then local variations are much less important.
Romanji (and to a certain extent English itself) is definitely influencing the Japanese; the younger generations even moreso. Japan may end up using an alphabet for day to day needs almost exclusively within the next 100 years. The situation in China is much less clear but it will probably happen eventually.
If we look into the past, nearly all societies with ideographic/logographic writing systems eventually moved to an alphabetic system. Hell, even Ancient Egyptian Hieroglyphs were partially syllabic much like Katakana. Much as previous posters have pointed out, changing to an alphabetic system from Chinese-characters has allowed Korea to dramatically raise literacy rates. There is only so much time for schooling and memorization, and only so much effort to expend on literacy. If a simpler writing system is more accessible then that is a net gain, even if there are a few things that logographic writing systems do better than alphabetic ones.
Natural != (nontoxic || beneficial)
The point is you as your own program might have escaped or regexed items incorrectly and be open to this attack. Of course you don't blindly depend on some "magic" function. Duh! but you yourself are mortal too. And I doubt many people knew about fullwidth/halfwidh unicode transforms. The fact that one of the articles linked to this says they did a successful SQL injection SHOWS there are issues. BTW insulting people is not normally a useful technique
I'm a Chinese but I have never heard of this. Would you be so kind to educate me on this...? Where did you hear such things?
I'm serious.