New Unicode Bug Discovered For Common Japanese Character "No"
AmiMoJo writes: Some users have noticed that the Japanese character "no", which is extremely common in the Japanese language (forming parts of many words, or meaning something similar to the English word "of" on its own). The Unicode standard has apparently marked the character as sometimes being used in mathematical formulae, causing it to be rendering in a different font to the surrounding text in certain applications. Similar but more widespread issues have plagued Unicode for decades due to the decision to unify dissimilar characters in Chinese, Japanese and Korean.
The character in question is Hiragana "No", codepoint U+306E. As far as I can tell, this has existed since Unicode 1.1 and there are no differences in the Unicode metadata when compared to any other Hiragana glyph. It is marked as IsAlphabetic=True, Category=Other Letter, and NumbericType=None for example. So are all the other common Hiragana glyphs. If there is a bug, it's clearly with some specific application, and not Unicode or Unicode metadata. Compare http://www.fileformat.info/inf... with any other Hiragana glyph, like http://www.fileformat.info/inf... (Hiragana "Ha").
Morphing Software
As you have just discovered, Slashdot cleverly avoids all Unicode bugs by not supporting Unicode at all.
This is not a "Unicode bug". It is a rendering bug exhibited by some applications.
I do not fail; I succeed at finding out what does not work.
Just write chinese in pinyin and speak it normally. (the number of Chinese speakers does not matter, the issue is with how it is written down.) When it comes to ideograph based languages, we would have been better off designing an entirely separate text system rather than trying to shoehorn it into a font-character paradigm derived from the needs of writing and printing latin scripts. Indeed having a writing system designed around the needs of calligraphy would be a useful thing, but like with ideograph based writing systems it is a long way from the use case we normally see with alphabet based writing systems.
John_Chalisque
Just write chinese in pinyin and speak it normally. (the number of Chinese speakers does not matter, the issue is with how it is written down.)
"Chinese" is not a single spoken language. A passage written in one Chinese language, such as Mandarin, is often readable in another Chinese language, such as Cantonese, so long as they're written with Han characters. It's as if French could be read as Italian or Spanish with the same characters. In addition, different words that sound the same in a given Chinese language due to historic sound changes usually have different Han characters. They may end up sounding different in a different Chinese language whose different historic sound changes produced different homophone sets. Pinyin, on the other hand, depends on Mandarin and confuses homophones.
A lot of people complain about the idea of unification without understanding it. I can't judge if unicode's unification is great or awful. The English-speaking media constantly says it's awful, but it's usually clear the authors don't know what unification is, who's driving it, or how unicode's work compares to what existed beforehand, so they can only be ignored. (They're sometimes trying to spin up some clickbait about ignorant westerners imposing blah blah blah on Asia, which just shows they no nothing about the topic.)
The issue:
There's a certain number of symbols which have been copied from one East Asian language to another. They're the same symbol, so unicode has one slot for that symbol. Then there's a second category where the symbol has been copied, but one group draws it a little different (the Japanese might like to put a little flick at the end of one line, or the Chinese draw the line a little slantier). And a third category where one group has developed a simplified symbol, which means again the traditional and the simplified symbols are the same thing but drawn differently. The two symbols are equivalent, the new one is just a new suggestion for how to draw it.
Unification is about having one slot for the symbols in categories two and three and leaving it to the font to decide how to display it.
(Unicode uses more precise terms, but I'm calling them "symbols" and "slots" for simplicity.)
A disadvantage to this approach is that there can't be a font which would display a symbol both the way a Japanese would draw it and the way a Chinese would draw it. Fonts have to choose one style to draw each unified symbol.
An advantage of this approach is that new languages and dialects can be added supported without needing another 100,000 slots per language or dialect (we do all know there are more than three East Asian languages, don't we?), and it's much easier for fonts to add support for all the East Asian languages because once they've done Chinese, Japanese is automatically almost finished.
Here are some example symbols:
https://en.wikipedia.org/wiki/...
unicode.org's FAQ also has clarifications:
If the character shapes are different in different parts of East Asia, why were the characters unified?
http://www.unicode.org/faq/han...
Isn't it true that some Japanese can't write their own names in Unicode?
http://www.unicode.org/faq/han...
(All that said, it's been years since I looked into this so there's a chance I've gotten some detail wrong, but I'm confident it's a good summary of the issue.)
Help build the anti-software-patent wiki
I have been reading the comments for 20 minutes because I don't understand Japanese, but I still don't understand the problem. There's a Japanese character called no, it looks very much like a lowercase English/Latin "e" rotated clockwise about 80 degrees and then flipped over the vertical axis. Is this being mixed up with something else or rendered wrongly? Can anybody provide examples of what it's getting mixed up with or how or where it's being rendered improperly?