Unicode and the Unix Console?
Phactorial asks: "At it's current state, most UNIX consoles (not graphical terminal emulators, mlterm is out for this) I have dealt with do not handle unicode properly. This is essential when it comes to dealing with languages that require characters that are not in the current ASCII set. I was wondering if anyone out there is developing a solution for non-Linux platforms. I know the Arabeyes project is currently working on a project called 'Akka' which provides UTF-8 (kinda) support and even shaping and bidirectional code (essential for many languages in the East, the program works fine and I am working on getting a FreeBSD port out). However, I was pondering, how are other UNIX consoles doing? Do any of them fully support unicode, even bidirectional characters? shaping? (a great many of today's UNIX applications lack many if not all of these ;(). If you know of such applications or are working on support for a platform, could you give feedback as to your experiences and thoughts on the current state of the UNIX console?"
Its Gnome 2 terminal can deal with any truetype unicode font, even those that are proportionally spaced such as the luscious, but now under-wraps, 'Arial Unicode MS'. RH 8's vim is also unicode savvy.
A major improvement for my line of work.
Already done, at least in part. Take a look at the UTF-8 and Unicode FAQ for Unix/Linux
I've seen make work just fine with UTF-8 and other character encodings. You can build gcc with "--enable-c-mbchar" to turn on MBCS support. The kernel would need little or no modification to work properly - take a look at the "How do I have to modify my software?" and "What is UTF-8?" entries in the FAQ mentioned above:
"Great men are not always wise: neither do the aged understand judgement." Job 32:9
People who fear that a switch from US-ASCII to UTF-8 will break their existing programs should really read the Bell Labs document linked above, section 2.3 of the Unicode spec, or RFC 2044. UTF-8 was designed very carefully to make life extremely easy for people making that exact migration. There are amazingly few circumstances where it even matters that it is variable width. Those people who are suggesting UCS-2, UCS-4, etc. as alternatives in order to solve the nonexistant problem of UTF-8's variable width nature should really take a closer look at it.
But my grandest creation, as history will tell,
Was Firefrorefiddle, the Fiend of the Fell.