Unicode 7.0 Released, Supporting 23 New Scripts

Seriously? by newsman220 · 2014-06-16 11:49 · Score: 4, Funny

Still no Klingon?

Re:Seriously? by rubycodez · 2014-06-16 12:14 · Score: 1

there is no standard from which to make Unicode, fans have made the most popular versions of various klingon alphabets
Re:Seriously? by Anonymous Coward · 2014-06-16 14:35 · Score: 0

No standard? It's a robust computer industry standard, with strictly defined character maps. Sure, make forks if you want, but don't call them "Unicode".
Re:Seriously? by Anonymous Coward · 2014-06-16 15:30 · Score: 0

No standard? It's a robust computer industry standard, with strictly defined character maps.

Klingon? Is the robust standard defined by ISO, ANSI or an RFC?
Re:Seriously? by Anonymous Coward · 2014-06-16 19:27 · Score: 0

No. It is defined by the Unicode Consortium, which cooperates with ISO/IEC JTC1.
Re:Seriously? by NotInHere · 2014-06-17 04:55 · Score: 1

Still no Klingon?
At least the Vulcan salute.

Linear A? by J053 · 2014-06-16 11:57 · Score: 1

I'm sure there are lots of docs in that....

Re:Linear A? by Livius · 2014-06-16 12:14 · Score: 4, Insightful

There are a few, and researchers and historians would like to have them on computer.
Re:Linear A? by K.+S.+Kyosuke · 2014-06-16 15:15 · Score: 1

But why? We couldn't understand Linear B, and even Michael Ventris found it was all Greek to him. And Linear A seems even more incomprehensible.

--
Ezekiel 23:20
Re:Linear A? by kasperd · 2014-06-16 22:03 · Score: 1

But why? We couldn't understand Linear B
That shouldn't be a prerequisite for including it. After all, having the text represented on a computer would be a useful tool in getting to understand it.

--

Do you care about the security of your wireless mouse?
Re:Linear A? by K.+S.+Kyosuke · 2014-06-17 00:51 · Score: 1

Neither should a sense of humor. :-)

--
Ezekiel 23:20

Klingon in more useful by Anonymous Coward · 2014-06-16 11:58 · Score: 1

Seriously, there are Klingon speakers. I worked with three, one of whom didn't know the other two knew Klingon until he cursed in Klingon. It was surreal. Linear A is an absolutely fascinating script (with hundreds of symbols), but there just aren't enough extant samples to justify adding it to Unicode, and nobody can translate it.

(Yes, that was a weird job. I left as soon as I could, though not due to the Klingons, but management.)

Re:Klingon in more useful by LordLucless · 2014-06-16 12:32 · Score: 2

but there just aren't enough extant samples to justify adding it to Unicode, and nobody can translate it.
Unicode is supposed to be universal, and it has more than enough codepoints to spare - why is there a problem adding it? I'm sure having it in a standard encoding would prove useful to anyone who is trying to translate Linear A, or to archeologists/historians looking to digitize fragments we do have, etc.

--
Just because you're paranoid doesn't mean there isn't an invisible demon about to eat your face
Re:Klingon in more useful by frisket · 2014-06-16 12:45 · Score: 1

The lack (or not) of speakers isn't the reason. According to one of my moles, the official dead-pan response to the question why Klingon and Elvish aren't in Unicode is that they are not human languages :-)
Re:Klingon in more useful by Anonymous Coward · 2014-06-16 12:56 · Score: 0

Thus we need to develop Multicode.
Re:Klingon in more useful by Electricity+Likes+Me · 2014-06-16 13:07 · Score: 1

Isn't unicode already variable-length integer-ish via the UTF-8 standard?
Surely we could implement a version which accommodate an effectively infinite number of character sets.
Re:Klingon in more useful by Kjella · 2014-06-16 13:42 · Score: 1

Isn't unicode already variable-length integer-ish via the UTF-8 standard? Surely we could implement a version which accommodate an effectively infinite number of character sets.
Before they gimped it to match UTF-16 it had ~2^31 combinations, now it has ~2^16. And you could have extended UTF-8 to a full ~2^42 by just continuing the scheme to fill the entire first byte, so space is really of little concern. They probably just don't want to coordinate a million different people who want to add a smiley or their imaginary fantasy language to the standard.

--
Live today, because you never know what tomorrow brings
Re:Klingon in more useful by lithis · 2014-06-16 14:00 · Score: 2

There is already at least one effort to extend Unicode beyond the current maximum of 1.1 million characters: The UCS-X Family of UCS Extensions. It defines UCS-G, which supports over two billion characters, UCS-E with over nine quintillion, and UCS-Infinity with no upper bound. They each support 8-, 16-, and 32-bit variable-byte encodings (e.g. UTF-E-32, UTF-Infinity-8). Itâ(TM)s been a while since I read about them, but I believe they are all compatible with UTF- 8, 16, and 32.
Re:Klingon in more useful by marcansoft · 2014-06-16 14:21 · Score: 1

Not 2^16 (Unicode already has way over 2^16 codepoints assigned). The maximum Unicode codepoint value is 1114111, which is somewhat over 2^20 (and happens to be the highest codepoint encodable in UTF-16).
Re:Klingon in more useful by craigminah · 2014-06-16 14:28 · Score: 1

Those people who spoke Klingon weren't Klingons...I think the correct term is "nerd" (as opposed to "geek").
Re:Klingon in more useful by narcc · 2014-06-16 15:15 · Score: 1

Wait, what? I was unaware there was a distinction between "nerd" and "geek". Can I get a few nerds to geek out here and argue over their definitions?

--
Required reading for internet skeptics
Re:Klingon in more useful by relyimah · 2014-06-16 15:39 · Score: 1

This link should help... www.youtube.com/watch?v=2Tvy_Pbe5NA
Re:Klingon in more useful by Anonymous Coward · 2014-06-16 19:09 · Score: 1

It has nothing to do with it being a human language or not. The reason why Klingon pIqaD failed was because nobody in the Klingon community actually uses it for writing texts to each other. A Private Use agreement that is more widely supported than almost any SMP script exists for Klingon pIqaD, but tlhIngan Hol speakers just don't use it. Tengwar and Cirth are still immature proposals, and it is more a lack of initiative within the Tolkeinist community that has had these stalled before being formally developed for encoding.
Re:Klingon in more useful by Anonymous Coward · 2014-06-16 20:49 · Score: 0

How do you say "you are a very sad person" in Klingon?
Re:Klingon in more useful by Anonymous Coward · 2014-06-17 00:51 · Score: 0

Seriously, there are Klingon speakers. Yeah, they're called "dorks."
Re:Klingon in more useful by grouchomarxist · 2014-06-17 03:17 · Score: 1

Given that Linear A hasn't been deciphered yet, I wonder how they justify putting it in unicode. They don't know for certain which glyphs are distinct characters yet.
Re:Klingon in more useful by rubycodez · 2014-06-17 04:31 · Score: 1

the problem is there is no klingon alphabet to add, just several fan made lists claiming to be that
so your advocating adding an act of fanservice to a fictional language by adding something to unicode for which the authors of the fiction themselves haven't even been arsed to make. that's beyond silly, that's like saying the next space shuttle should be shaped like the starship enterprise.

Pictographic symbols by toejam13 · 2014-06-16 11:58 · Score: 2

Good. If you do a search of Wingdings on Google, many of the top results are questions on how to use the font with browsers other than IE. Since it isn't a Unicode compliant font, you can't. This update helps correct that problem.

Re:Pictographic symbols by Kjella · 2014-06-16 14:27 · Score: 1

Wingdings is to fonts what VBA/Access is to application development, so I can't say I feel terribly sad about that.

--
Live today, because you never know what tomorrow brings
Re:Pictographic symbols by narcc · 2014-06-16 15:39 · Score: 2

Used all-over?

--
Required reading for internet skeptics
Re:Pictographic symbols by Anonymous Coward · 2014-06-16 20:47 · Score: 0

By idiots, yes.
Re:Pictographic symbols by Anonymous Coward · 2014-06-16 23:40 · Score: 0

Believe it or not, MSAccess does make a good frontend, although I certainly do prefer open source solutions where possible. You are picturing MSAccess as a backend, which of course is a suicide mission.

Why emoji? by Anonymous Coward · 2014-06-16 12:00 · Score: 2, Insightful

What's the point of adding pictographic symbols to Unicode? Is this really something we want frozen in time for eternity? What's the benefit of standardizing them anyway?

Wouldn't we be better off standardizing all characters used in written language and be done with it?

Re:Why emoji? by RyuuzakiTetsuya · 2014-06-16 12:11 · Score: 1

ðY'...ðY'ðY'©

--
Non impediti ratione cogitationus.
Re:Why emoji? by RyuuzakiTetsuya · 2014-06-16 12:25 · Score: 4, Insightful

Not everyone speaks English or Chinese or Spanish.
Everyone recognizes stop sign, airport, pile of poop and other symbols. So communicating via pictographs is actually good. Even if it was incidental.

--
Non impediti ratione cogitationus.
Re:Why emoji? by Guy+Harris · 2014-06-16 12:31 · Score: 3, Informative

Not everyone speaks English or Chinese or Spanish.
Everyone recognizes stop sign, airport, pile of poop and other symbols. So communicating via pictographs is actually good. Even if it was incidental.
And many of them recognize this as well.
Re:Why emoji? by Darinbob · 2014-06-16 13:22 · Score: 1

But they're not "standard" even if Unicode claims they are. I only heard of emoji within the last year, but there is not central body that dictates exactly what they look like, so that pile of poop symbol will vary depending upon which texting app you use it with. The apps that use emojis are not coordinating with any standard's body or ensuring that the intended meaning is preserved.
Today emojis are purely a fad. We'd think it ridiculous if unicode standardized some of the 80's era desktop icons (so that future generations know what the floppy disk symbol means). Meanwhile there are existing characters that have survived a long test of time which are not yet in unicode.
Re:Why emoji? by BitZtream · 2014-06-16 15:12 · Score: 5, Interesting

But they're not "standard" even if Unicode claims they are.
They are standard in reference to Unicode because the Unicode Consortium defines the Unicode standard. Someone has to be the first to define the standard.

but there is not central body that dictates exactly what they look like, so that pile of poop symbol will vary depending upon which texting app you use it with
Yes, those are called fonts, and in case you haven't noticed, that was true before digital computers with silicon microprocessors even existed and has been true for thousands of years.

The apps that use emojis are not coordinating with any standard's body or ensuring that the intended meaning is preserved.
Apple does, hence why the Messages app already matches the new code points. Google Hangouts seems to work fine as well. Both Messages and Hangouts convert even things like :) into the proper unicode code point and use standard fonts for display. Sure, some half assed apps may not work correctly, but anyone that supports unicode and has fonts will receive them properly already.
Emoji is somewhat silly, but its hardly new, just go ask Japan. Just because you're new to the ballgame doesn't mean its a new ballgame.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:Why emoji? by Anonymous Coward · 2014-06-16 16:42 · Score: 0

What's the point of adding pictographic symbols to Unicode?
Hear here.
When I'm sorting text, it's important to know how individual symbols relate -- is A before or after $? -- but I don't want to need to give a flying fart whether A comes before or after winky-smile and whether that comes before or after steaming turd. (No, really, there is a steaming turd character.)
Re:Why emoji? by Anonymous Coward · 2014-06-16 22:08 · Score: 0

streaming turd makes sense, but why is there a Moon viewing ceremony character. I can't think of any good reason for that.
Re:Why emoji? by Goaway · 2014-06-17 02:38 · Score: 1

Round-trip compatibility with other encodings that already have them.
Re:Why emoji? by Ark42 · 2014-06-17 02:51 · Score: 1

I think the problem most people think Apple/Emoji has with compatibility is that old versions of Apple stuff used the private-use codepoint areas for emoji, instead of the Unicode standard code points. This has since been fixed, as far as I know, but there are a TON of free Android keyboards that are supposed to type emoji, but only use the old private-use codepoints, and thus don't display anything but a blank space or a square box on Android without some special app to translate and display them.
If you look harder though, you CAN find Android keyboards that have emoji buttons that produce the proper Unicode standard codepoints. The button on the keyboard may be in full color, but the glyph produced with be monochrome. Basically a limit of the direct font rendering, but it will work in every app without any issue then, and Apple people can still see the glyphs you send them via text just fine, etc.

--
Morphing Software
Re:Why emoji? by idji · 2014-06-17 21:21 · Score: 1

If the emoji are standardized in Unicode, then it will be easier for any kind of software to support them.

23 new scripts, 2834 new characters... by Anonymous Coward · 2014-06-16 12:04 · Score: 0

And it's still a big 💩

Re:23 new scripts, 2834 new characters... by Anonymous Coward · 2014-06-16 20:36 · Score: 0

+U1F595 you.

Why no Runes? by Anonymous Coward · 2014-06-16 12:16 · Score: 0

It's decipherable. :)

The larger, the less useful by Anonymous Coward · 2014-06-16 12:42 · Score: 1

The larger Unicode becomes, the more fragmented the implementations will be. The more fragmented it is, the more errors and incompatibilities will compound. It will get less and less useful, and more and more bulky, and will eventually be as useful as Flash. (well, it may not be that bad, but still, Flash was all things to all people, and almost universally installed, until it wasn't.

Re:The larger, the less useful by Anonymous Coward · 2014-06-16 13:49 · Score: 0

The more fragmented it is, the more errors and incompatibilities will compound. It will get less and less useful, and more and more bulky, and will eventually be as useful as Flash.
As evidenced by the announcement blog entry itself, http://unicode-inc.blogspot.com.au/2014/06/announcing-unicode-standard-version-70.html, where they've mistakenly interchanged the glyphs for U+1F596 (raised hand with part between middle and ring fingers) and U+1F6E0 (hammer and wrench). If the Unicode Consortium itself can't even get it right then what hope for the rest of humanity?
Re: The larger, the less useful by Anonymous Coward · 2014-06-16 13:57 · Score: 0

I've heard breathing can eventually cause death, but I guess I'll throw your logic to the wind and do it anyway...
Re:The larger, the less useful by Anonymous Coward · 2014-06-16 14:23 · Score: 0

As evidenced by [incorrectly two placed graphic files entitled emoji-8.png and emoji-9.png] ... If the Unicode Consortium itself can't even get it right then what hope for the rest of humanity?
Nicely spotted. But evidence of fragmentation and incompatibility across the unicode space? Not so much.
Re:The larger, the less useful by Anonymous Coward · 2014-06-16 14:29 · Score: 0

Err ... 'two' was incorrectly placed %). That should read "two incorrectly placed ..." (I hope the English language will survive that mistake.)
Re:The larger, the less useful by cheater512 · 2014-06-16 15:00 · Score: 1

There is no such thing as fragmentation with Unicode. Most fonts only implement a small portion of it however.
If Microsoft and Apple both decide to implement 'Linear A' for example, they will do it with different fonts but using the same codepoints.
Re:The larger, the less useful by gwgwgw · 2014-06-16 15:55 · Score: 1

That *is* pretty funny. That and it hasn't yet been corrected.

--
That was Zen, this is Tao

Why no Runes? by Anonymous Coward · 2014-06-16 12:44 · Score: 0

http://en.wikipedia.org/wiki/Runes#Unicode

Han unification by Anonymous Coward · 2014-06-16 13:16 · Score: 0

And yet they still refuse to recognize that Chinese and Japanese are different languages.

Re:Han unification by Ark42 · 2014-06-17 02:58 · Score: 1

I ran into this problem recently. The kanji for "leader" is supposed to be like the diagram at: http://jisho.org/kanji/details... (note the 4 individual lines for the top right piece) but the fonts on my Android phone insisted on rendering this glyph using the Chinese font, that looks like http://www.hantrainerpro.com/h...
It's not just drawn differently, it's actually one less stroke in Chinese, but it's supposed to be the same glyph somehow!
Unicode has no way to indicate which language you actually want characters like this to display in. Sure for single-language documents like HTML, you can use a lang= attribute and hope the browser handles it right, but you certainly can't mix the two together very easily.

--
Morphing Software
Re:Han unification by draconx · 2014-06-17 08:33 · Score: 1

Sure for single-language documents like HTML, you can use a lang= attribute and hope the browser handles it right, but you certainly can't mix the two together very easily.
Pretty easy to mix them in HTML. For example:

The Japanese version is '将' and the Simplified Chinese version is '将'.

My browser displays the appropriate glyph in each instance.
Re:Han unification by Ark42 · 2014-06-18 01:53 · Score: 1

And this highlights an incredibly deep flaw in Unicode... plus, unfortunately, the app I was using on Android wasn't rendering with HTML, so I was basically out of luck there.

--
Morphing Software

less useful how? Re:The larger, the less useful by Fubari · 2014-06-16 13:33 · Score: 4, Interesting

Fragmented? I haven't heard of any unicode forks. The people at the Unicode_Consortium seem like they're doing ok. Unicode seems pretty backwards compatible; have any of the the newer versions overwritten or changed the meaning of older versions (e.g. caused damage)? That isn't true for various ascii encodings, which is an i18n abomination on the hi-bit characters. Or with ebcdic, which isn't self compatible. One of the things I love about unicode is the characters (glyphs) stay where you put them, and don't transmute depending on what locale a program happens to run in.

The larger Unicode becomes, the more fragmented the implementations will be.

Maybe instead of fragmented, you mean there won't be font sets that can't render all of unicode's characters?
*shrug* Even if that were a problem, the underlying data is intact and undamaged and will be viewable once a suitable font library is obtained.

The more fragmented it is, the more errors and incompatibilities will compound. It will get less and less useful, and more and more bulky, and will eventually be as useful as Flash. (well, it may not be that bad, but still, Flash was all things to all people, and almost universally installed, until it wasn't.

Can you give me an example of an incompatibility? I'm not saying there are none, just that I don't know of anything and that, in general, I've been very pleased with Unicode's stability - compared to other encodings - for doing data exchange.

Re:less useful how? Re:The larger, the less useful by Anonymous Coward · 2014-06-16 17:01 · Score: 1

BIDI is one of the weirder and more difficult parts of Unicode, and its semantics have not been 100% stable across versions.
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=Unicode5QuoteMirroring
In fairness, they did attempt to limit the damage, and on the whole, having a well-thought-out standard for BIDI, even if occasionally buggy, is better than not having one.
Re:less useful how? Re:The larger, the less useful by BetterThanCaesar · 2014-06-16 18:53 · Score: 2

Unicode seems pretty backwards compatible; have any of the the newer versions overwritten or changed the meaning of older versions (e.g. caused damage)?
Yes. Version 2.0 completely changed the Hangul character set. Korean texts written with Unicode 1.1 were not readable in Unicode 2.0, and vice versa. This was 17 years ago, but note that it was after ISO had accepted version 1.1 as an ISO/IEC standard.

--
"Stop failing the Turing test!" -- Dilbert
Re:less useful how? Re:The larger, the less useful by AmiMoJo · 2014-06-16 19:58 · Score: 4, Interesting

The main problem is the broken CJK (Chinese, Japanese, Korean) support that has caused numerous ad-hok work-arounds and hacks to be developed. In a nutshell all three languages shared some common characters in the past, but over time they diverged. Unfortunately these characters share the same code points in Unicode, even though they are rendered differently depending on the language. A Japanese and Chinese font will contain different glyphs for the same character.
It is therefore impossible to mix Chinese and Japanese in the same plain text document. You need extra metadata to tell the editor which parts need Chinese characters and which need Japanese. There are Japanese bands that release songs with Chinese lyrics and vice versa, and books that contain both (e.g. textbooks, dictionaries). Unicode is unable to encode this data adequately.
Even the web is somewhat broken because of this. If a random web page says it is encoded with Unicode there is no simple way for the browser to choose a Japanese, Korean or Chinese font, and all the major ones just use whatever the user's default is.
It really isn't clear how this can be fixed now. Unicode could split the code pages but a lot of existing software will carry on using the old ones. It's a bit of a disaster, but most westerners don't seem to be aware of it.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Re:less useful how? Re:The larger, the less useful by Anonymous Coward · 2014-06-16 20:45 · Score: 0

Fragmented? I haven't heard of any unicode forks.
There's more to "fragmentation" than "code forks". Start with noticing that over half of libc is devoted to character handling largely thanks to the complexity unicode brings. This itself carries considerable costs. The more code, the more likely implementations elsewhere won't work quite like this one and so you need even more code to make your program work the same everywhere, by working around bugs. Code breeds code. While this may seem unlikely on the surface, it gets worse quickly, much worse, very quickly. See following:

Can you give me an example of an incompatibility? I'm not saying there are none, just that I don't know of anything and that, in general, I've been very pleased with Unicode's stability - compared to other encodings - for doing data exchange.
As noted elsewhere, it isn't stable. Latin-1 is latin-1. Unicode... is not so much unicode.
In fact, there isn't even a single canonical representation. Many characters have multiple representations, and then there's accents. There are at least two ways to create any accented character: (CHARACTER WITH ACCENT) and (CHARACTER) with (ACCENT SIGN), that's two code points. Then there's the visually-similar-but-wildly-different-codepoint problem. I haven't even started with things like sneakily inserted ZERO WIDTH SPACE CHARACTER and like variants. That's multiple problems that can bite you in multiple ways.
Such as how spotify accounts (that allow unicode usernames) could be hijacked. They used a "canonicalise" function that... uh... they just pulled from somewhere and wasn't up to snuff for the unicode version they actually were using. "Big deal", you say, but yes, it is, in multiple ways. For one, different code bases will have different implementations, giving rise to obscure bugs and perhaps security breaches.
Before that there's the encoding. Both utf-8 and utf-16 have non-trivial invalid encodings that can be... replaced by "invalid code point" markers (there are at least two official ones and many unofficial options), or ignored, or cause errors to be thrown up. Different input handlers will handle this differently. Yay, more differences in implementation.
But it gets bigger: The whole concept of a single representation is foreign to unicode. The aim is to capture all possible language, or thereabouts, but now you run into a similar thing that caused people to start to standardise language. In that sense, unicode is a de-standardisation effort, through standardisation, sure, but well-defined meaning out of possibly not so well-defined input gets that much harder, often in sneaky and counter-intuitive ways. Finding the problems might well depend on spotting minute differences between regional variant code points that may or may not show up differently in the font you're currently using.
So there may be one unicode, and it may be used to encoding a wide variety of meaning, but reliably getting the meaning back is actually getting harder for computer code, giving rise to yet more code, and added complexity, in multiple implementations, and all the fun and joy that brings.
Among other things it means that it's a poor choice to encode security-sensitive things like usernames and passwords (also assuming the input method required for all character sets used in your username and password is available everywhere you might ever want to log in), or URLs (even with the IDN subset), or security policies, or I don't know what else. Is this enough pointers for you?
Re:less useful how? Re:The larger, the less useful by Goaway · 2014-06-16 22:57 · Score: 1

That sucks, but it does not seem to be an example of what was asked for.
Re:less useful how? Re:The larger, the less useful by ais523 · 2014-06-17 04:04 · Score: 1

One situation I was wondering about for that problem was the use of Japanese/Chinese/Korean marks/overrides, the same way that there are LTR and RTL overrides. Choice of language for a particular ideograph seems to be much the same as choice of direction for an inherently undirectional character (you're interpreting the character differently depending on context). This also has the advantage of being pretty much backwards compatible.

--
(1)DOCOMEFROM!2~.2'~#1WHILE:1<-"'?.1$.2'~'"':1/.1$.2'~#0"$#65535'"$"'"'&.1$.2'~'#0$#65535'"$#0'~#32767$#1"
Re:less useful how? Re:The larger, the less useful by dillee1 · 2014-06-17 04:04 · Score: 1

That "divergence over time" actually occurs not that long ago. Right before WW2 everyone on the planet that use Chinese characters use the 1 and only 1 glyph, traditional Chinese. That includes China, Japan, Korea, Vietnam, Hong Kong, Macau, Taiwan.
After WW2 China and Japan tries to simplify the Chinese characters in separate effort, resulting in completely different glyphs and the shitty state of CJK coding we see now.
Korea and Vietnam largely abandoned Chinese characters, may be except for person and place names for clarification reasons.
Hong Kong, Macau, Taiwan all use the same pre-WW2 traditional Chinese glyphs. Thus they have no ambiguity or trouble for exchanging text at all.
FFS just use traditional Chinese glyphs if one want to exchange text with other kanji user. It is the "true" Chinese that everyone in Sinosphere understand for last 3000 years.
Re:less useful how? Re:The larger, the less useful by Fubari · 2014-06-17 06:52 · Score: 1

r.e. CJK - that is interesting, and it is something I haven't interacted with directly. The collisions in mapping to unicode sounds like a *significant* headache. Thanks for the heads up (now I'm at least aware that I'm ignorant of this; a small step forward).
Re:less useful how? Re:The larger, the less useful by Fubari · 2014-06-17 06:54 · Score: 1

Good to know; thanks.

On Earth, Klingon is written in Latin by tepples · 2014-06-16 13:43 · Score: 2

First I'll assume that you're talking about the KLI pIqaD for tlhIngan Hol, and not the Skybox pIqaD or the Mandel script. The Unicode team looked at encoding KLI pIqaD but decided against it because the Klingon-speaking community on Earth had already adopted a Latin-based script. (Reference: Klingon alphabets on Wikipedia) But it could use a slight spelling reform to make it case-insensitive.

Peso vs. Dollar by steelfood · 2014-06-16 13:46 · Score: 2

It's great they're adding new currency symbols for new currencies, but there's still a long-standing issue of the $ with one bar and $ with two bars. It's currently still considered a stylistic difference, but the scope of Unicode has evolved to account for every glyph known to man. Certainly, one- and two-bar $ can hardly be said to be the same glyph within this new context.

Especially considering that there are already stylistic duplicates (half-width and full-width latin forms vs. plain latin), I can't seem to understand the justification behind letting one- and two-bar $, which are historically separate glyphs, be underrepresented.

--
"If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be."

Re:Peso vs. Dollar by Anonymous Coward · 2014-06-16 14:01 · Score: 0

incorrect! unicode contains *no glpyhs*, only *codepoints*. there's a
reason for the linguistic tapdance. they are trying very hard to avoid
encoding font information into unicode.
the *-width latin codepoints are bizarre, and it's too bad they were
included. but it is perhaps best not to double-down on a mistake.
Re:Peso vs. Dollar by lithis · 2014-06-16 14:14 · Score: 4, Informative

Many of the stylistic duplicates, for example the half-width and full-width latin forms that you mentioned, are only in Unicode because of backwards compatibility with pre-Unicode character sets. If there hadn't been character sets that had different encodings for half- and full-width forms, Unicode never would have had them either. So you can't use them to argue for more glyph variations in Unicode. The same applies to many of the formatted numbers, such as the Unicode characters "VII" (U+2166), "7." (U+248E), "(7)" (U+247A), and "1/7" (U+2150), and units of measure ("cm^2", U+33A0).
(Oh, for Unicode support in Slashdot....)
Re:Peso vs. Dollar by 91degrees · 2014-06-16 19:30 · Score: 1

This is strange. The UK Pound is U+00A3 and the Italian Lira is U+20A4. While the latter has two lines across, two lines is acceptable for the pound and a single line was acceptable for the Lira.

(Not that anyone stil uses the Italian Lira but other countries use the symbol and people may still write about it)
Re:Peso vs. Dollar by Anonymous Coward · 2014-06-16 20:29 · Score: 0

The same symbol, Lira translates as pound, the UK pound and its symbol is taken from a Roman Lira
Re:Peso vs. Dollar by TangoMargarine · 2014-06-17 03:22 · Score: 1

And people wonder why Unicode is so hard to do...we can't even keep them straight in real life.

--
Unity? Screw that: XFCE. Slashdot Beta? Screw that: SoylentNews. Australis? Screw that: Pale Moon. UX developers DIAF
Re:Peso vs. Dollar by steelfood · 2014-06-17 07:44 · Score: 1

My argument isn't that the one- and two-bar $ are variations that deserve two code points, but that they are inideed separate glyphs that deserve separate code points. There's historical as well as current cultural precedent for this. For Unicode to aspire to represent all written symbols (especially now that it's taken on emoji), this treatment of the two different $ continues to baffle me.
My point about the half- and full-width glyph variations are that they exist. I just find it odd that a character with what I think is a stronger case for a separate code point is completely marginalized.

--
"If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be."

The irony by Anonymous Coward · 2014-06-16 13:47 · Score: 1

Slashdot celebrates new version of Unicode...

Fragmentation - Ghost of Steve Jobs, is that you? by Anonymous Coward · 2014-06-16 13:49 · Score: 0

It's a set of numbers from zero to 2^32 - 1 that map to symbols, with a well-defined way of displaying unrepresentable characters. How much more incompatibility or "fragmentation" can there be?

Latin unification too by tepples · 2014-06-16 13:49 · Score: 2

True, some characters have forms that differ between traditional Chinese and Japanese. But that's not limited to Chinese and Japanese, as Unicode also has Latin unification. For example, the letter "i" is the same whether in English or Turkish, but its capital form differs between the two languages. And in Dutch, the letter 'y' with umlaut/diaeresis is supposed to be written using the rounded form, as it's considered a ligature of "ij". Implementations are supposed to define out-of-band language markers, such as HTML's lang= attribute, to handle this.

Re:Latin unification too by AmiMoJo · 2014-06-16 21:06 · Score: 2

The problem with unification is that metadata is often either unavailable or inadequate. The goal should be to represent all characters in plain text, not rely on specific document formats to provide context.
How would a music player app handle a file tagged with a unified character? How would a file manager handle it? There is no context, no metadata to tell it what language is in use and what font to select. Anyone who uses both Japanese and Chinese can tell you this is a common problem, and I imagine Dutch people get it too.
Even in HTML you only get to set one language for the entire document. Good luck writing a page in Chinese about learning Japanese. The ones I have seen tend to use GIFs to represent the characters that Unicode can't differentiate, but that means you can't copy/paste them and the fonts don't match.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Re:Latin unification too by hackertourist · 2014-06-16 22:24 · Score: 1

I imagine Dutch people get it too.
I'm Dutch. I've seen the y-dieresis just about 0 times. The ij ligature is very rare as well. Everyone just uses the non-ligatured ij (i.e. two characters).
Re:Latin unification too by draconx · 2014-06-17 02:24 · Score: 1

The problem with unification is that metadata is often either unavailable or inadequate. The goal should be to represent all characters in plain text, not rely on specific document formats to provide context.
How would a music player app handle a file tagged with a unified character? How would a file manager handle it? There is no context, no metadata to tell it what language is in use and what font to select.

Older Unicode standards included control sequences which could be inserted in plain text to indicate language. This is still supported in some applications to influence font choice. However, the feature was removed in favour of external markup, probably because it was really hard to edit (most text editors don't really handle non-printing characters very well.)

Even in HTML you only get to set one language for the entire document.
This is simply incorrect. Language in HTML can be set on any element.
Re:Latin unification too by laie_techie · 2014-06-17 04:04 · Score: 1

>
Even in HTML you only get to set one language for the entire document. Good luck writing a page in Chinese about learning Japanese. The ones I have seen tend to use GIFs to represent the characters that Unicode can't differentiate, but that means you can't copy/paste them and the fonts don't match.
Most elements in HTML accept the lang attribute. Please refer to the W3C

2,834 glyphs, not characters by Anonymous Coward · 2014-06-16 14:15 · Score: 0

A character is like the letter 'A', which is represented once for each language table that has a letter 'A'.

Shit in one font vs. shit in another font by tepples · 2014-06-16 14:26 · Score: 2

that pile of poop symbol will vary depending upon which texting app you use it with

So will any symbol. Though A, A, and A probably produce distinct glyphs on your machine, you can recognize them all as U+0041 LATIN CAPITAL LETTER A. Likewise, though U+1F4A9 appears different in different fonts, it'll look like shit in all of them.

Grammar of pictographs by tepples · 2014-06-16 14:31 · Score: 0

Say you're communicating with pictographs, and you have an action involving two things. Do you put the pictograph for the action before, between, or after the pictographs for the things?

(Spoiler: Speakers of Welsh or Arabic will want to put the action first, while speakers of Japanese or Finnish will want to put it last.)

Re:Grammar of pictographs by BitZtream · 2014-06-16 15:13 · Score: 0

How is that relevant to the discussion of unicode code points? Unicode doesn't define how you conjugate the verb either.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:Grammar of pictographs by rossdee · 2014-06-16 16:13 · Score: 1

"Speakers of Welsh or Arabic will want to put the action first, while speakers of Japanese or Finnish will want to put it last"
Along with speakers of hsiloP
Re:Grammar of pictographs by tepples · 2014-06-17 01:52 · Score: 1

I intended to ask to what extent RyuuzakiTetsuya's concept of "communicating via pictographs" (plural) is practical.
Re:Grammar of pictographs by RyuuzakiTetsuya · 2014-06-17 22:55 · Score: 1

Emoji was an accidental feature for ntt docomo phones.
That being said, if I don't understand Portuguese and you don't understand Korean, I message you a stop sign, that's straightforward to understand.

--
Non impediti ratione cogitationus.
Re:Grammar of pictographs by Anonymous Coward · 2014-06-18 20:23 · Score: 0

I don't think it would be that big of an issue. Nobody who saw Star Wars had any problem grokking what Yoda meant. And, as a speaker (second language) of Japanese, it is interesting to note that, while Japanese grammar itself uses a subject-object-verb word order ("The boy/the ball/kicks"), the language also has a lot of vocabulary that derives from Chinese, which uses more or less the same word order as English. For example, the Japanese word for "sterilize" (as in an autoclave) is "sakkin," which is literally "kill-germ," in that order, whereas an actual Japanese sentence would have the order reversed: "Kin o korosu" (the "o" is a grammatical particle, and "korosu" is the Japanese word that corresponds to the Chinese-derived morpheme "sa-" for "kill").
As long as you're conveying simple ideas, people seem pretty good at guessing the intended meaning even if you muck around with the word order. Sure, ambiguities aren't impossible (you could imagine having a word meaning "contains germs that kill [you]"), but context generally clarifies.

Proprietary fonts by ortholattice · 2014-06-16 16:28 · Score: 5, Insightful

Over the years, I've tried to use Unicode for math symbols on various web pages and tend to revert back to GIFs or LaTeX-generating tools due to problems with symbols missing from the font used by this or that browser/OS combination, or even incorrect symbols in some cases.

IMO the biggest problem with Unicode is the lack of a public domain reference font. Instead, it is a mishmash of proprietary fonts each of which only partly implements the spec. Even the Unicode spec itself uses proprietary fonts from various sources and thus cannot be freely reproduced (it says so right in the spec), a terrible idea for a supposed "standard".

I'd love to see a plain, unadorned public-domain reference font that incorporates all defined characters - indeed, it would seem to me to be the responsibility of the Unicode Standard committee to provide such a font. Then others can use it as a basis for their own fancy proprietary font variations, and I would have a reliable font I could revert to when necessary.

Re:Proprietary fonts by SEE · 2014-06-16 16:59 · Score: 1

Why do you think an official Unicode font would solve your mathematical symbol problem any more than the already-available STIX has failed to?
Re:Proprietary fonts by Swistak · 2014-06-16 20:00 · Score: 1

Probably becouse then when startin new font design you just fork reference fotn and replace all glyphs you want/have to. Your font will display your glyphs in places you care about and use standard glyphs for ones you didn't implement?
Re:Proprietary fonts by StripedCow · 2014-06-16 21:51 · Score: 1

The problem is not with Unicode. Don't blame the character set, blame the font-specification, the software, and copyrights (!)
In my view, every font that does not specify all unicode characters should point to one or more fall-back fonts, and the search should proceed recursively. Eventually, there should be a default "unicode" font implementing all characters.
Also, fonts should not be copyrightable, because that adds greatly to the whole mess.

--
If Pandora's box is destined to be opened, *I* want to be the one to open it.
Re:Proprietary fonts by Anonymous Coward · 2014-06-16 23:04 · Score: 0

GNU Unifont is pretty complete. Sure, it's ugly and monospaced, but it's a freely-available font.
Re:Proprietary fonts by Sarlok · 2014-06-17 03:14 · Score: 1

These folks have several open fonts that cover some lesser-used code points. They don't have a big font with everything, but the Doulos font has pretty good coverage for Latin and Cyrillic scripts.
Re:Proprietary fonts by juancnuno · 2014-06-17 05:19 · Score: 1

I agree that it's a problem but I don't think it's Unicode's. I don't think the consortium has set out to do anything but encode characters (and I think they're doing a good job). I imagine that coming up with a font for all those characters would be another massive undertaking.
And as much as I champion free software I would have no problem with a company stepping in and filling that need by selling such a font.

Middle finger by Anonymous Coward · 2014-06-16 18:57 · Score: 0

I read somewhere they are having the middle finger emoji:
emojipedia.org/reversed-hand-with-middle-finger-extended/
Real or not?

Re:Middle finger by DaphneDiane · 2014-06-16 19:37 · Score: 1

I believe you are referring to U1F595.
Re:Middle finger by rubycodez · 2014-06-17 04:39 · Score: 1

ok, so we have a character for fuck you, but none for fucking? most of us wouldn't be here but for the fucking.

Emoji? by bradley13 · 2014-06-16 20:46 · Score: 2

Great, Unicode is already a fragmented mess, and now the standards organization justifies its existence by adding characters that do not exist.

An earlier poster asked why anyone thinks Unicode is fragmented. The answer in one word: fonts. Different fonts support different subsets of Unicode, because the whole thing is just too big. If you expect your font to mostly be used in Europe, you are unlikely to bother with Asian characters. if you have an Asian font, it probably has only English characters, not the rest of Europe. huge. If you have a font with complete mathematical symbols, it will include the Greek alphabet, but actual language support is a crapshoot.

So the solution to this problem is to add made-up characters that no one cares about. "Man in business suit, levitating". Really?

--
Enjoy life! This is not a dress rehearsal.

Re:Emoji? by Anonymous Coward · 2014-06-16 21:08 · Score: 0

This is not what 'fragmented' means. And any decent GUI toolkit will look for other fonts on the system to fill in glyphs not present in the font it's currently using.
What's /your/ solution to this problem? Forbid people to write in Chinese?
Re:Emoji? by laie_techie · 2014-06-17 04:22 · Score: 1

Great, Unicode is already a fragmented mess, and now the standards organization justifies its existence by adding characters that do not exist.
An earlier poster asked why anyone thinks Unicode is fragmented. The answer in one word: fonts. Different fonts support different subsets of Unicode, because the whole thing is just too big. If you expect your font to mostly be used in Europe, you are unlikely to bother with Asian characters. if you have an Asian font, it probably has only English characters, not the rest of Europe. huge. If you have a font with complete mathematical symbols, it will include the Greek alphabet, but actual language support is a crapshoot.
You are correct in the reason that most fonts only contain a subset of Unicode code points. There are thousands of code points. Most documents will only use a small subset. Why should I have to have all those Chinese or Arabic characters when I only write in English, Spanish, Portuguese, and Hawaiian? People who read and write Hawaiian have fonts which support the Hawaiian letters `okina and kahako. Chinese have fonts which support the Chinese glyphs.
As for language support, that isn't a font's problem. It's up to the writer to know how to intelligently combine glyphs into words, and words into coherent thoughts.
Re:Emoji? by rubycodez · 2014-06-17 04:40 · Score: 1

your assertion the characters don't exist is provably false. chat software produces them, hundreds of millions of people use them

Re:Fragmentation - Ghost of Steve Jobs, is that yo by kasperd · 2014-06-16 21:55 · Score: 1

It's a set of numbers from zero to 2^32 - 1 that map to symbols

Actually it only goes from 0 to 1114111, mainly because that's the range you can achieve with UTF16.

--

Do you care about the security of your wireless mouse?

Strike out by Anonymous Coward · 2014-06-16 23:36 · Score: 0

Wingdings is to fonts what VBA/Access is to application development

You would have been better off with a car analogy. How about this: "Wingdings is to fonts what bananas are to cars."

Re:Strike out by rubycodez · 2014-06-17 04:35 · Score: 1

you've obviously never typed in "bananamobile" in google image search

Slashdot Mirror

Unicode 7.0 Released, Supporting 23 New Scripts

108 comments