Unicode is not a "16-bit character definition". Unicode is a "character coding system" for assigning code points to abstract characters. i'll hereby suggest that the author of this piece has confused Unicode itself with one of the encoding forms of Unicode, that is, ways that characters are expressed as bitstrings. please to shoot this down.
a "character coding system" (drawing on http://www.unicode.org/ and my copy of the standard 3.0 here) is a system for assigning characters to code points. Unicode 3.1 assigns some 94,000 odd characters, and the roadmap for allocations (start at http://www.unicode.org/pending/pending.html) will assign more in the future. these assignments are just that: an abstract character to an integer value in the Unicode repertoire. this assignment does not dictate how to represent the character as data in any way.
There are a variety of encoding forms of Unicode, each for ways of representing characters in the repertoire as data (not at all "on screen", that's glyphs, and that's a whole other issue). The different encoding schemes have different strengths and weaknesses. UTF-16 is a form that uses fixed-width 16-bit sequences as the base unit (though through a concept known as Surrogates, two such scalars adjacent to each other can represent a value normally not expressable with just 16-bits). UTF-8 is a different form that uses a variable number of 8-bit sequences to represent characters. There is a UTF-32 form, a UTF-EBCDIC form, believe it or don't. These are just encoding forms, they make no restrictions on what or how many characters get assigned. If the Unicode Consortium wanted to assign abstract characters to values that exceed the limits of current encoding forms, we could certainly do something about that, but it isn't the horrible catastrophe the author makes it out to be.
this is just the thing that leaps out at me. thoughts?
Unicode is not a "16-bit character definition". Unicode is a "character coding system" for assigning code points to abstract characters. i'll hereby suggest that the author of this piece has confused Unicode itself with one of the encoding forms of Unicode, that is, ways that characters are expressed as bitstrings. please to shoot this down.
a "character coding system" (drawing on http://www.unicode.org/ and my copy of the standard 3.0 here) is a system for assigning characters to code points. Unicode 3.1 assigns some 94,000 odd characters, and the roadmap for allocations (start at http://www.unicode.org/pending/pending.html) will assign more in the future. these assignments are just that: an abstract character to an integer value in the Unicode repertoire. this assignment does not dictate how to represent the character as data in any way.
There are a variety of encoding forms of Unicode, each for ways of representing characters in the repertoire as data (not at all "on screen", that's glyphs, and that's a whole other issue). The different encoding schemes have different strengths and weaknesses. UTF-16 is a form that uses fixed-width 16-bit sequences as the base unit (though through a concept known as Surrogates, two such scalars adjacent to each other can represent a value normally not expressable with just 16-bits). UTF-8 is a different form that uses a variable number of 8-bit sequences to represent characters. There is a UTF-32 form, a UTF-EBCDIC form, believe it or don't. These are just encoding forms, they make no restrictions on what or how many characters get assigned. If the Unicode Consortium wanted to assign abstract characters to values that exceed the limits of current encoding forms, we could certainly do something about that, but it isn't the horrible catastrophe the author makes it out to be.
this is just the thing that leaps out at me. thoughts?