Slashdot Mirror


Linux Goes Unicode

Markus Kuhn writes: "Linux and other Unices are well on the way of making UTF-8 their single main character encoding. Replacing ASCII with UTF-8 is now one of the hottest Linux developer topics. Soon gone will be the annoying restrictions that Latin-1 imposes currently on even English language Linux users (no en/em dashes, no smart quotes, no math symbols, etc.). Counted are the days of the bewildering number of different regional ASCII extensions such as ISO 8859-1/2/3/5/7/9/13/15, KOI8-R/U, GBK, CP1251, VISCII, TIS-620, EUC-JP/KR, SJIS. Pioneered by the fathers of Unix in Plan 9 a decade ago, the ASCII-compatible UTF-8 encoding of Unicode / ISO 10646 (UCS) has emerged as the final way-to-go out of the current character-set chaos. With glibc 2.2.x and XFree86 4.x, the basic infrastructure for UTF-8 support is now well in place. To get started, read the UTF-8 FAQ and look at some of the UTF-8 example files listed there with xterm, emacs, vim, etc. Then think about whether running in a UTF-8 locale and using UTF-8 files, filenames, terminals and stdin/stdout has any consequences for software that you use or maintain. Join the linux-utf8 mailing list if you need advice. In two years from now, it should be possible to recommend every Linux user to switch over to UTF-8 permanently."

0 of 8 comments (clear)

No comments match the current filter.