Unicode 7.0 Released, Supporting 23 New Scripts
An anonymous reader writes "The newest major version of the Unicode Standard was released today, adding 2,834 new characters, including two new currency symbols and 250 emoji. The inclusion of 23 new scripts is the largest addition of writing systems to Unicode since version 1.0 was published with Unicode's original 24 scripts. Among the new scripts are Linear A, Grantha, Siddham, Mende Kikakui, and the first shorthand encoded in Unicode, Duployan."
Still no Klingon?
Good. If you do a search of Wingdings on Google, many of the top results are questions on how to use the font with browsers other than IE. Since it isn't a Unicode compliant font, you can't. This update helps correct that problem.
What's the point of adding pictographic symbols to Unicode? Is this really something we want frozen in time for eternity? What's the benefit of standardizing them anyway?
Wouldn't we be better off standardizing all characters used in written language and be done with it?
There are a few, and researchers and historians would like to have them on computer.
but there just aren't enough extant samples to justify adding it to Unicode, and nobody can translate it.
Unicode is supposed to be universal, and it has more than enough codepoints to spare - why is there a problem adding it? I'm sure having it in a standard encoding would prove useful to anyone who is trying to translate Linear A, or to archeologists/historians looking to digitize fragments we do have, etc.
Just because you're paranoid doesn't mean there isn't an invisible demon about to eat your face
The larger Unicode becomes, the more fragmented the implementations will be.
Maybe instead of fragmented, you mean there won't be font sets that can't render all of unicode's characters?
*shrug* Even if that were a problem, the underlying data is intact and undamaged and will be viewable once a suitable font library is obtained.
The more fragmented it is, the more errors and incompatibilities will compound. It will get less and less useful, and more and more bulky, and will eventually be as useful as Flash. (well, it may not be that bad, but still, Flash was all things to all people, and almost universally installed, until it wasn't.
Can you give me an example of an incompatibility? I'm not saying there are none, just that I don't know of anything and that, in general, I've been very pleased with Unicode's stability - compared to other encodings - for doing data exchange.
First I'll assume that you're talking about the KLI pIqaD for tlhIngan Hol, and not the Skybox pIqaD or the Mandel script. The Unicode team looked at encoding KLI pIqaD but decided against it because the Klingon-speaking community on Earth had already adopted a Latin-based script. (Reference: Klingon alphabets on Wikipedia) But it could use a slight spelling reform to make it case-insensitive.
It's great they're adding new currency symbols for new currencies, but there's still a long-standing issue of the $ with one bar and $ with two bars. It's currently still considered a stylistic difference, but the scope of Unicode has evolved to account for every glyph known to man. Certainly, one- and two-bar $ can hardly be said to be the same glyph within this new context.
Especially considering that there are already stylistic duplicates (half-width and full-width latin forms vs. plain latin), I can't seem to understand the justification behind letting one- and two-bar $, which are historically separate glyphs, be underrepresented.
"If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be."
True, some characters have forms that differ between traditional Chinese and Japanese. But that's not limited to Chinese and Japanese, as Unicode also has Latin unification. For example, the letter "i" is the same whether in English or Turkish, but its capital form differs between the two languages. And in Dutch, the letter 'y' with umlaut/diaeresis is supposed to be written using the rounded form, as it's considered a ligature of "ij". Implementations are supposed to define out-of-band language markers, such as HTML's lang= attribute, to handle this.
There is already at least one effort to extend Unicode beyond the current maximum of 1.1 million characters: The UCS-X Family of UCS Extensions. It defines UCS-G, which supports over two billion characters, UCS-E with over nine quintillion, and UCS-Infinity with no upper bound. They each support 8-, 16-, and 32-bit variable-byte encodings (e.g. UTF-E-32, UTF-Infinity-8). Itâ(TM)s been a while since I read about them, but I believe they are all compatible with UTF- 8, 16, and 32.
that pile of poop symbol will vary depending upon which texting app you use it with
So will any symbol. Though A, A, and A probably produce distinct glyphs on your machine, you can recognize them all as U+0041 LATIN CAPITAL LETTER A. Likewise, though U+1F4A9 appears different in different fonts, it'll look like shit in all of them.
Over the years, I've tried to use Unicode for math symbols on various web pages and tend to revert back to GIFs or LaTeX-generating tools due to problems with symbols missing from the font used by this or that browser/OS combination, or even incorrect symbols in some cases.
IMO the biggest problem with Unicode is the lack of a public domain reference font. Instead, it is a mishmash of proprietary fonts each of which only partly implements the spec. Even the Unicode spec itself uses proprietary fonts from various sources and thus cannot be freely reproduced (it says so right in the spec), a terrible idea for a supposed "standard".
I'd love to see a plain, unadorned public-domain reference font that incorporates all defined characters - indeed, it would seem to me to be the responsibility of the Unicode Standard committee to provide such a font. Then others can use it as a basis for their own fancy proprietary font variations, and I would have a reliable font I could revert to when necessary.
Great, Unicode is already a fragmented mess, and now the standards organization justifies its existence by adding characters that do not exist.
An earlier poster asked why anyone thinks Unicode is fragmented. The answer in one word: fonts. Different fonts support different subsets of Unicode, because the whole thing is just too big. If you expect your font to mostly be used in Europe, you are unlikely to bother with Asian characters. if you have an Asian font, it probably has only English characters, not the rest of Europe. huge. If you have a font with complete mathematical symbols, it will include the Greek alphabet, but actual language support is a crapshoot.
So the solution to this problem is to add made-up characters that no one cares about. "Man in business suit, levitating". Really?
Enjoy life! This is not a dress rehearsal.