A Text Message Can Crash An iPhone and Force It To Reboot
DavidGilbert99 writes with news that a bug in iOS has made it so anyone can crash an iPhone by simply sending it a text message containing certain characters. "When the text message is displayed by a banner alert or notification on the lockscreen, the system attempts to abbreviate the text with an ellipsis. If the ellipsis is placed in the middle of a set of non-Latin script characters, including Arabic, Marathi and Chinese, it causes the system to crash and the phone to reboot." The text string is specific enough that it's unlikely to happen by accident, and users can disable text notification banners to protect themselves from being affected. However, if a user receives the crash-inducing text, they won't be able to access the Messages app without causing another crash. A similar bug crashed applications in OS X a few years ago.
it's a parsing bug, what difference would sanitizing user input make...
It's not a special character that needs escaping. It's a character that needs multiple bytes to specify the code point. The parser just isn't handling the fact that you can't just crop a character mid code point - it's operating at the byte level when it should be operating at the code point level during a crop operation.
It's not that NSString itself is broken, it's that the fact that 99.99% of the time an NSString is one 16-bit code unit per glyph that apps using it rarely test the use case where it's two code units per glyph. So a person goes in and writes an app that inserts a new character at a particular byte offset and it works 99.99% of the time, but if it happens to get stuck in the middle of a multi-code-unit glyph, the program breaks.
The documentation is no help. First off, it lies:
As we all should know, that's simply not true. Unfortunately, a lot of people don't know better. Unicode is not a universal, uniform encoding scheme that is 16 bits per character. Even UTF-16 isn't that.
characterAtIndex returns a 16-bit integer. So obviously it has no way to actually represent wider unicode characters. The length method is not the number of characters on the screen, but the number of code units, which is different, but highly misleading to programmers. They're, again, the same thing 99.99% of the time, but those rare cases where they're not generally slip through testing. And this is why UTF-16 is such a hazardous encoding to use.
Yes, NSString is old. And that's part of the problem. It was made at a time where many thought that unicode was only going to be 16 bits. It hasn't aged well. And it's caused a lot of bugs over its time. And now I'd bet that it or something similar has created a brand new iPhone-equivalent of Winnuke.
Programmers really need two types of strings, and only two, for the lion's share of tasks. One, binary strings, where a char is always 8 bytes and operations can be optimized to heck and back. And two, unicode strings, where a char always represents a whole unicode character that you would display, and the count of characters represents the count of display characters and so forth. None of this "99.99% of the time it's one thing, but every so often it's another...". That's asking for bugs.
POTUS Witch Hunt tracker: 75 charges filed against 19 witches, 4 witches cooperating and 5 witches have pled guilty.