OS X Users: 13 Characters of Assyrian Can Crash Your Chrome Tab
abhishekmdb writes No browsers are safe, as proved yesterday at Pwn2Own, but crashing one of them with just one line of special code is slightly different. A developer has discovered a hack in Google Chrome which can crash the Chrome tab on a Mac PC. The code is a 13-character special string which appears to be written in Assyrian script. Matt C has reported the bug to Google, who have marked the report as duplicate. This means that Google are aware of the problem and are reportedly working on it.
Let us henceforth dub it the Snow Crash exploit.
This exploit rang a bell, so I searched Bruce Schneier's website. And, sure enough, on July 15, 2000, he observed ``Unicode is just too complex to ever be secure.'' Doesn't exactly warm the cockles of the paranoid's heart.
I refuse to believe corporations are people until Texas executes one. -- desert rain on http://www.dailykos.com/user/
to ditch unicode support. They recognized that experimental technology like this shouldn't be rolled out to this much users. Thank you dice for keeping slashdot safe!
That script is the Syriac script not the Assyrian one: https://en.wikipedia.org/wiki/....
Now then, this particular Assyrian, the one whose cohorts were gleaming in purple and gold,
Just what does the poet mean when he says he came down like a wolf on the fold?
In heaven and earth more than is dreamed of in our philosophy there are great many things.
But I don't imagine that among them there is a wolf with purple and gold cohorts or purple and gold anythings.
Ogden Nash
http://blogs.msdn.com/b/oldnew...
About every ten months, somebody new discovers the Notepad file encoding problem. Let's see what else there is to say about it.
First of all, can we change Notepad's detection algorithm? The problem is that there are a lot of different text files out there. Let's look just at the ones that Notepad supports.
8-bit ANSI (of which 7-bit ASCII is a subset). These have no BOM; they just dive right in with bytes of text. They are also probably the most common type of text file.
UTF-8. These usually begin with a BOM but not always.
Unicode big-endian (UTF-16BE). These usually begin with a BOM but not always.
Unicode little-endian (UTF-16LE). These usually begin with a BOM but not always.
If a BOM is found, then life is easy, since the BOM tells you what encoding the file uses. The problem is when there is no BOM. Now you have to guess, and when you guess, you can guess wrong. For example, consider this file:
D0 AE
Depending on which encoding you assume, you get very different results.
If you assume 8-bit ANSI (with code page 1252), then the file consists of the two characters U+00D0 U+00AE, or "". Sure this looks strange, but maybe it's part of the word VATNI which might be the name of an Icelandic hotel.
If you assume UTF-8, then the file consists of the single Cyrillic character U+042E
If you assume Unicode big-endian, then the file consists of the Korean Hangul syllable U+D0AE
If you assume Unicode little-endian, then the file consists of the Korean Hangul syllable U+AED0
Some people might say that the rule should be "All files without a BOM are 8-bit ANSI." In that case, you're going to misinterpret all the files that use UTF-8 or UTF-16 and don't have a BOM. Note that the Unicode standard even advises against using a BOM for UTF-8, so you're already throwing out everybody who follows the recommendation.
Okay, given that the Unicode folks recommend against using a BOM for UTF-8, maybe your rule is "All files without a BOM are UTF-8." Well, that messes up all 8-bit ANSI files that use characters above 127.
Maybe you're willing to accept that ambiguity, and use the rule, "If the file looks like valid UTF-8, then use UTF-8; otherwise use 8-bit ANSI, but under no circumstances should you treat the file as UTF-16LE or UTF-16BE." In other words, "never auto-detect UTF-16". First, you still have ambiguous cases, like the file above, which could be either 8-bit ANSI or UTF-8. And second, you are going to be flat-out wrong when you run into a Unicode file that lacks a BOM, since you're going to misinterpret it as either UTF-8 or (more likely) 8-bit ANSI. You might decide that programs that generate UTF-16 files without a BOM are broken, but that doesn't mean that they don't exist. For example,
cmd /u /c dir >results.txt
This generates a UTF-16LE file without a BOM. If you poke around your Windows directory, you'll probably find other Unicode files without a BOM. (For example, I found COM+.log.) These files still "worked" under the old IsTextUnicode algorithm, but now they are unreadable. Maybe you consider that an acceptable loss.
The point is that no matter how you decide to resolve the ambiguity, somebody will win and somebody else will lose. And then people can start experimenting with the "losers" to find one that makes your algorithm look stupid for choosing "incorrectly".
My conclusion is that the unicode guys are assholes.
(They spoke Aramaic long before they became Christian, of course.)
The people in question call themselves Assyrians at the present day; there are some Akkadian words preserved in their Aramaic language even now, although Akkadian itself probably died out in the earlier part of the first millennium BC.
The name "Syriac" is itself from a worn-down version of the same name; it was once used pretty much as the equivalent of "Aramaic" but is now generallly used to describe only one particular version of Aramaic which was a major literary language of Western Asia in early Christian times, and is still used as a liturgical language by Nestorian Christians as far afield as India. The script is used to write several modern Aramaic languages spoken by Christians.
These ancient communities have suffered greatly in the Middle East wars of recent times, and a huge proportion have left as refugees.
Aberrations have appeared in my destiny prognostication engine!