Official Kanji Count Increasing Due To Electronics

← Back to Stories (view on slashdot.org)

Official Kanji Count Increasing Due To Electronics

Posted by timothy on Wednesday June 9, 2010 @08:42AM from the switch-to-english-it's-easy dept.

JoshuaInNippon writes "Those who have studied Japanese know how imposing kanji, or Chinese characters, can be in learning the language. There is an official list of 1,945 characters that one is expected to understand to graduate from a Japanese high school or be considered fluent. For the first time in 29 years, that list is set to change — increasing by nearly 10% to 2,136 characters. 196 are being added, and five deleted. The added characters are ones believed to be found commonly in life use, but are considered to be harder to write by hand and therefore overlooked in previous editions of the official list. Japanese officials seem to have recognized that with the advent and spread of computers in daily life, writing in Japanese has simplified dramatically. Changing the phonetic spelling of a word to its correct kanji only requires a couple of presses of a button, rather than memorizing an elaborate series of brush strokes. At the same time, the barrage of words that people see has increased, thereby increasing the necessity to understand them. Computers have simplified the task of writing in Japanese, but inadvertently now complicated the lives of Japanese language learners. (If you read Japanese and are interested in more details on specific changes, Slashdot.jp has some information!)"

4 of 284 comments (clear)

Min score:

Reason:

Sort:

Re:UTF-8 by JustinOpinion · 2010-06-09 09:13 · Score: 5, Interesting

The usual explanation given is that people were injecting unicode characters as part of trolling attempts to break Slashdot's layout. So trolls were doing things like using right-to-left control characters to spoof their comment score. See this comment, which explains the situation and links to some examples. Slashdot reacted by blocking anything not in the basic character set.

Frankly this is an unsatisfying answer. Or rather an unsatisfying solution. It seems like it wouldn't take that long for a developer to go through some of the unicode set and build a whitelist and/or blacklist that was comprehensive enough to allow us geeks to use useful symbols (currency, micro, greek letters, etc.) without allowing damaging characters.

It seems like many of Slashdot's anti-trolling features (e.g. trying to prevent allcaps or ASCII art) are somewhat misguided. Nowadays the moderation is pretty good, such that troll comments are basically buried. You may as well let regular posters with good karma post in caps or use ASCII art if that's what their post requires (e.g. posting some calculations that uses lots of symbols and few words ends up being flagged unnecessarily).

All that to say that Slashdot could presumably fix these things, but apparently they have little interest in doing so.
Re:What about Official English? by angus77 · 2010-06-09 09:36 · Score: 4, Interesting

To be pedantic, Hiragana and Katakana glyphs are the equivalent of English syllables.
To be extra pedantic, they're not necessarily syllables, but morae.
For example, "o" is a one-mora syllable on it's own, whereas "oo" is also one syllable, but containing two morae (two beats to one syllable). "Oto" would then be both two morae and two syllables.
WTF by NemosomeN · 2010-06-09 10:07 · Score: 4, Interesting

Ok, the characters listed aren't difficult, or uncommon, they just aren't "official." The real issue here is, why the hell does slashdot.jp have more features than slashdot.org? Click an external link, and there's an interstitial offering a direct, Google cache, and web archive (Way Back Machine) link. Seriously, bring this to .org. And add Coral cache to both, I know it's got an l AND an r in it, but it could still benefit .jp.

--
I hate grammar Nazi's.
Re:(5:erocS) tuoyal eht kaerb dluow 8-FTU by Hurricane78 · 2010-06-09 10:29 · Score: 4, Interesting

And it is an epic fail, that this retarded excuse is used.
The characters that cause such things are a well-known set. Like the control (<32) characters in ASCII.
If you filter them, you’re good.
And if you are smart, you can even check for RTL/LTR/etc characters, and add a character to the end that fixes it. Or do it like a pro, and just force LTR via CSS for the element surrounding UTF-8 user input. So people can comment in RTL languages too.
There. Done.
That lame excuse only works on non-professionals. If you can’t handle UTF-8 you’re not one.

--
Any sufficiently advanced intelligence is indistinguishable from stupidity.