Slashdot Mirror


Unicode Consortium Releases Unicode 8.0.0

An anonymous reader writes: The newest version of the Unicode standard adds 7,716 new characters to the existing 21,499 – that's more than 35% growth! Most of them are Chinese, Japan and Korean ideographs, but among those changes Unicode adds support for new languages like Ik, used in Uganda.

2 of 164 comments (clear)

  1. Already = 65K characters by divec · · Score: 4, Informative

    "...adds 7,716 new characters to the existing 21,499 – that's more than 35% growth!"

    There were already 113K characters in Unicode version 7.0. Which is more than 2^16 characters, so remember:

    --

    perl -e 'fork||print for split//,"hahahaha"'

  2. Re:CJK is Unicode's big failing by gustygolf · · Score: 4, Informative

    In short:
    To render text properly in Japanese, you need a Japanese font. To render text properly in Chinese, you need a Chinese font. It's not just because of character coverage, but because of a thing called Han unification the consortium did.

    The Unicode consortium decided to map similar characters to the same code-point. Personally, I'm not particularly bothered by this. but it leads to the technical problem that each text must be supplied with a language tag to select a correct font.

    And this is problematic when there are two CJK languages mixed in the same document -- in the GP's case, Chinese and Japanese --, or when a program must automatically decide which font to render things in.

    Take a web browser for example. It reaches a random Chinese web page, encoded in UTF-8. The page's author never bothered adding a language tag. Now the web browser must guess whether to render the page in a Chinese font or a Japanese one. And a "guess" is really all that it can do.

    (Typically, software used base the guesses on the user's locale. It's pretty accurate -- Chinese users tend to view Chinese documents, Japanese Japanese ones. But the problems start when someone tries viewing a 'foreign' document...)

    It's really quite ironic that the consortium decided on codepoint unification for the three languages that would most benefit from Unicode.

    --
    "Slow Down Cowboy! It's been 58 minutes since you last successfully posted a comment" -- slashdot, driving users away.