Unicode Consortium Releases Unicode 8.0.0

← Back to Stories (view on slashdot.org)

Unicode Consortium Releases Unicode 8.0.0

Posted by timothy on Friday June 19, 2015 @05:39PM from the hobo-symbols-and-the-black-tongue-of-mordor dept.

An anonymous reader writes: The newest version of the Unicode standard adds 7,716 new characters to the existing 21,499 – that's more than 35% growth! Most of them are Chinese, Japan and Korean ideographs, but among those changes Unicode adds support for new languages like Ik, used in Uganda.

4 of 164 comments (clear)

Min score:

Reason:

Sort:

Re:Ithought by Anonymous Coward · 2015-06-19 18:40 · Score: 2, Informative

> That slashdot didn't support unicode
However Soylent News has had full unicode support since last year.
Here is a recent thread with lots of greek.
Already = 65K characters by divec · 2015-06-19 18:53 · Score: 4, Informative
"...adds 7,716 new characters to the existing 21,499 – that's more than 35% growth!"
There were already 113K characters in Unicode version 7.0. Which is more than 2^16 characters, so remember:
- 1. UTF-16 is *not* two bytes per character
- 2. Therefore a "character" in Java, C#, Javascript sometimes only holds half a Unicode character
- 3. Even a whole unicode character may be only part of a grapheme cluster, which means that taking arbitrary substrings may not result in readable text.
--
perl -e 'fork||print for split//,"hahahaha"'
Re:Ithought by sound+vision · 2015-06-19 20:17 · Score: 3, Informative

Your post... has a Facebook icon next to it.
I knew Slashdot was going in a different direction, but... Facebook? The alt-text says "From Facebook". I'm not even completely sure what that means, but I don't want anything "from Facebook" in here. I hope you know your post has fucked up my world. I'm going to have a hard time sleeping now. Bro... your post has a Facebook icon on it! However you managed to get that to appear, don't do it ever again! And tell all your friends not to. Together, we can make Slashdot sane again...
Re:CJK is Unicode's big failing by gustygolf · 2015-06-19 20:45 · Score: 4, Informative

In short:
To render text properly in Japanese, you need a Japanese font. To render text properly in Chinese, you need a Chinese font. It's not just because of character coverage, but because of a thing called Han unification the consortium did.
The Unicode consortium decided to map similar characters to the same code-point. Personally, I'm not particularly bothered by this. but it leads to the technical problem that each text must be supplied with a language tag to select a correct font.
And this is problematic when there are two CJK languages mixed in the same document -- in the GP's case, Chinese and Japanese --, or when a program must automatically decide which font to render things in.
Take a web browser for example. It reaches a random Chinese web page, encoded in UTF-8. The page's author never bothered adding a language tag. Now the web browser must guess whether to render the page in a Chinese font or a Japanese one. And a "guess" is really all that it can do.
(Typically, software used base the guesses on the user's locale. It's pretty accurate -- Chinese users tend to view Chinese documents, Japanese Japanese ones. But the problems start when someone tries viewing a 'foreign' document...)
It's really quite ironic that the consortium decided on codepoint unification for the three languages that would most benefit from Unicode.

--
"Slow Down Cowboy! It's been 58 minutes since you last successfully posted a comment" -- slashdot, driving users away.