Unicode and the Unix Console?
Phactorial asks: "At it's current state, most UNIX consoles (not graphical terminal emulators, mlterm is out for this) I have dealt with do not handle unicode properly. This is essential when it comes to dealing with languages that require characters that are not in the current ASCII set. I was wondering if anyone out there is developing a solution for non-Linux platforms. I know the Arabeyes project is currently working on a project called 'Akka' which provides UTF-8 (kinda) support and even shaping and bidirectional code (essential for many languages in the East, the program works fine and I am working on getting a FreeBSD port out). However, I was pondering, how are other UNIX consoles doing? Do any of them fully support unicode, even bidirectional characters? shaping? (a great many of today's UNIX applications lack many if not all of these ;(). If you know of such applications or are working on support for a platform, could you give feedback as to your experiences and thoughts on the current state of the UNIX console?"
Solaris 2.6 supports 56 "locales" and is six or so years old now. Is this what you were asking about? I don't have experience with non-USA locales, but it seems the UNIX people have realized that there are countries outside of North America and have tried to accomodate them.
Vote in November. You won't regret it.
You can keep the byte orientation and still have Unicode support. See this.
As it is, I'm always hitting the limitations of those programmers who think that ASCII is good-enough.
The most common example for me: In Unix consoles that do not support Unicode, I can't (easily) move between directories that were created with Unicode characters on an OS that supports it. Typically the Unicode characters are converted to unprintable, or at least, untypeable, characters.
Some programmers forget that the point of the program is to serve the user, not some idiotic notion of what the underlying implementation should be.
Also, those mods that gave dentin an Insightful, might want to look at his other (recent) brilliant reasons why Unicode should not be supported:
The Internet will be broken up into cliques because some people don't know how to type an umlaut, and therefore won't have access to a site they can't read anyway
Forcing programmers to support languages that cannot use ASCII is unfair to computer science and all those programmers who have spent years investing in ASCII
"In a hundred years, there will be a global language anyway - if anything we should be vehmently refusing to pointlessly break perfectly good code to support local quirks"
Well, you could start by looking at everybody who wrote that software you mention.
People who write that software never use their "internationalization" -- they see it as a "feature" to add in the list of marketing checkboxes.
Then add everybody who has to deal with more than just Ameri^H^H^H^H^HEnglish text on a day-to-day basis.
That will be me -- and I hate Unicode.
Probably took too small a survey, then. People in my lab write them every day. We write mostly in English (sometimes German), and refer to people, locations, and events in a dozen European countries. Using some pre-Unicode technique, like "codepages", would be a nightmare.
Almost all European languages, including English, are in a single iso8859-1 charset -- what happens to coincide with the beginning of Unicode table. People who use iso8859-1 can "switch to Unicode" and continue using just the same thing with longer bytes, getting no benefit whatsoever but pretending to have "internationalized" their software. For everyone else Unicode causes nothing but trouble, waste of resources and incompatibilities.
As for "code pages" this is a DOS/Windows kludge that is a dumb idea in its own way -- everyone else uses _charsets_ and those can be easily displayed in pretty much everything. The only problem is, no one bothered to make a usable (that means, not XML) tagged format that can include information about languages and charsets used in a document. MIME has charset information for parts of the document, and substrings in the header but not substrings in the document, so it isn't really usable either, however can be used as a proof of viability -- most of mail clients have it all implemented, therefore metainformation with charsets can be easily used.
Contrary to the popular belief, there indeed is no God.