Unicode and the Unix Console?

← Back to Stories (view on slashdot.org)

Unicode and the Unix Console?

Posted by Cliff on Thursday December 19, 2002 @12:10PM from the moving-beyind-ASCII dept.

Phactorial asks: "At it's current state, most UNIX consoles (not graphical terminal emulators, mlterm is out for this) I have dealt with do not handle unicode properly. This is essential when it comes to dealing with languages that require characters that are not in the current ASCII set. I was wondering if anyone out there is developing a solution for non-Linux platforms. I know the Arabeyes project is currently working on a project called 'Akka' which provides UTF-8 (kinda) support and even shaping and bidirectional code (essential for many languages in the East, the program works fine and I am working on getting a FreeBSD port out). However, I was pondering, how are other UNIX consoles doing? Do any of them fully support unicode, even bidirectional characters? shaping? (a great many of today's UNIX applications lack many if not all of these ;(). If you know of such applications or are working on support for a platform, could you give feedback as to your experiences and thoughts on the current state of the UNIX console?"

57 comments

RH 8.0, out of the box by Boiotos · 2002-12-19 12:53 · Score: 2, Informative

Its Gnome 2 terminal can deal with any truetype unicode font, even those that are proportionally spaced such as the luscious, but now under-wraps, 'Arial Unicode MS'. RH 8's vim is also unicode savvy.

A major improvement for my line of work.
1. Re:RH 8.0, out of the box by tokki · 2002-12-23 00:34 · Score: 1
  
  RH 8.0 handles unicode, but the implementation is awkward and doesn't display everything quite correctly. If you've ever logged into a RH 8.0 machine and run something like man, you'll see garbage for special characters.
  
  The solution is to set /etc/sysconfig/i18n from LANG="en_US.UTF-8" (I think that's what it was) to just LANG="en_US".
Uhh... by jensend · 2002-12-19 13:06 · Score: 5, Insightful

From text of question:

(not graphical terminal emulators, mlterm is out for this)
I was wondering if anyone out there is developing a solution for non-Linux platforms.

The answer "Sure, there's this graphical terminal emulator in a recent linux distro!" seems somewhat inappropriate to the question.
It's been around quite a while... by Mr.+Piddle · 2002-12-19 13:50 · Score: 3, Interesting

Solaris 2.6 supports 56 "locales" and is six or so years old now. Is this what you were asking about? I don't have experience with non-USA locales, but it seems the UNIX people have realized that there are countries outside of North America and have tried to accomodate them.

--
Vote in November. You won't regret it.
1. Re:It's been around quite a while... by zmooc · 2002-12-20 05:58 · Score: 1
  
  but it seems the UNIX people have realized that there are countries outside of North America and have tried to accomodate them.
  hehe that's not so strange since the vast majority of UNIX people lives outside of the USA anyway:)
  
  --
  0x or or snor perron?!
2. Re:It's been around quite a while... by Wolfier · 2002-12-20 17:58 · Score: 1
  
  What if you want to mix different charsets in one document? locale code pages doesn't work.
from the learn-to-spell-with-26-chars-first depart by Anonymous Coward · 2002-12-19 15:10 · Score: 0

etc.
Isn't by rolfwind · 2002-12-19 15:23 · Score: 1

BeOS unicode native. I'd expect that FreeBeOs (or whatever it's called) is the same, and I think it's also Unix compatible?
Use something else by dentin · 2002-12-19 16:31 · Score: 2, Insightful

(This will be considered flamebait, but someone has to say it.)

The way I see it, we shouldn't be cluttering a clear, simple and sane interface like the unix console with complexity like unicode. Unix is inherently byte based, and unix terminals are byte based. If it's not a byte, don't put it in a unix terminal.

This isn't to say that we shouldn't have other mechanisms for supporting foreign languages - but this particular path has been travelled before and it's not pretty. Look at the AS/400 - tables stored in the DB/FS are marked as being in a particular character set, and the OS tries real hard to fix up and convert from set to set as needed. This causes countless problems in the infrequent cases where there is no possible mapping between sets.

Another way to look at it - why don't we have unicode support for grep? Why aren't all files tagged with an appropriate character set, so we know what they're really supposed to look like? When you 'tail -n 20' a file, how does tail know that those line feeds and carriage returns aren't part of some unicode char?

In short, unix is byte based. All the unix tools are byte based. If you want to use unicode, build a unicode layer on top of the bytes, but dont screw with the existing stuff that already works perfectly well.

--
Alter Aeon Multiclass MUD - http://www.alteraeon.com
1. Re:Use something else by SN74S181 · 2002-12-19 18:09 · Score: 3, Insightful
  
  In short: Yikes! UNIX is a timesharing system for TTY terminals from 1979*.
  
  That's a rather depressing outlook. We need to do better. This is supposed to be a discussion about that, not just another 'UNIX is UNIX because it is UNIX' polemic.
  
  (* Just stating the facts. I connected to a SparcStation with a VT220 terminal as a serial console just last week- it was handy and it's cool that it works.)
2. Re:Use something else by Meowing · 2002-12-19 20:13 · Score: 3, Interesting
  
  You can keep the byte orientation and still have Unicode support. See this.
3. Re:Use something else by divbyzero · 2002-12-20 10:21 · Score: 4, Informative
  
  People who fear that a switch from US-ASCII to UTF-8 will break their existing programs should really read the Bell Labs document linked above, section 2.3 of the Unicode spec, or RFC 2044. UTF-8 was designed very carefully to make life extremely easy for people making that exact migration. There are amazingly few circumstances where it even matters that it is variable width. Those people who are suggesting UCS-2, UCS-4, etc. as alternatives in order to solve the nonexistant problem of UTF-8's variable width nature should really take a closer look at it.
  
  --
  But my grandest creation, as history will tell,
  Was Firefrorefiddle, the Fiend of the Fell.
4. Re:Use something else by YU+Nicks+NE+Way · 2002-12-23 17:53 · Score: 2
  
  Well...it isn't quite that simple. UTF-8 is a fine compromise, but it has real imitations when compared to a constant width unicode encoding like UCS-2 or UCS-4.
  
  UTF-8 is much better than other MBCS systems because backspace is not O(n) in the length of the string. That's good. That said, UTF-8 is inefficient for multilingual operation. First, many characters in UCS-2 wind up three bytes long in UTF-8. That means that FE systems require 50% more memory to do string ops than they would in UCS-2, which is itself not as compact as the individual code pages are for each of the languages. UCS-2 is a better compromise, in that case.
  
  RAM is cheap, though -- cycles are not. UTF-8 is inefficient:
  
  (a) backspace through a string still involves repeated calls to back-search functions and
  (b) worse, forward space in logical order through a string requires repeated calls to multistep logical functions.
  
  Considering the frequency with which strings are searched for tokens, there is a significant performance hit to using UTF-8.
Go all the way by Anonymous Coward · 2002-12-19 19:27 · Score: 3, Insightful

If you start handling Unicode in files, then you need unicode in file names, because users will try to name them that way. If you allow unicode in file names, then you need to have make understand unicode, because someone will name all their .c files with Cyrillic characters. The shells then need it for completion. Soon you realize that you need the C compiler to understand unicode as well, so that you can have unicode variable names, etc.

So I think it would be best to bite the bullet and go all the way. It will require some planning -- the gcc folks would have to decide gcc will read unicode in 4.0, and Linus would have to decide linux 3.0 will be in unicode. Then the various distributions will have to come out with "unstables" or "rawhides" or whatever they call them, and slowly beat thousands of little apps each with their own presumptions on the size of a text character into submission.

plan9 is unicode inside and out. I'm not advocating it over simply improving good old linux, but it can be examined for lessons and ideas.
1. Re:Go all the way by Anonymous Coward · 2002-12-20 01:49 · Score: 1
  
  I agree. But which unicode encoding? UTF-8 is a no-no, in my book - variable-length encodings suck. I vote for UCS-32 - space is cheap these days, and there's actually no reason that a byte has to be defined as 8 bits in the C spec - so why not declare bytes 32bit, use UCS-32, and be done with it?
2. Re:Go all the way by Samrobb · 2002-12-20 05:22 · Score: 4, Informative
  
  Already done, at least in part. Take a look at the UTF-8 and Unicode FAQ for Unix/Linux
  
  I've seen make work just fine with UTF-8 and other character encodings. You can build gcc with "--enable-c-mbchar" to turn on MBCS support. The kernel would need little or no modification to work properly - take a look at the "How do I have to modify my software?" and "What is UTF-8?" entries in the FAQ mentioned above:
  
  Any Unix-style kernel can do fine with soft conversion and needs only very minor modifications to fully support UTF-8.
  
  UTF-8 was originally called UTF-FSS (for "UCS transformation format, file system safe") UTF-8 was originally called UTF-FSS (for "UCS transformation format, file system safe")
  
  --
  "Great men are not always wise: neither do the aged understand judgement." Job 32:9
3. Re:Go all the way by larry+bagina · 2002-12-21 12:35 · Score: 1
  
  So I think it would be best to bite the bullet and go all the way. It will require some planning -- the gcc folks would have to decide gcc will read unicode in 4.0, and Linus would have to decide linux 3.0 will be in unicode. Then the various distributions will have to come out with "unstables" or "rawhides" or whatever they call them, and slowly beat thousands of little apps each with their own presumptions on the size of a text character into submission.
  gcc is a poor place to start the unicode revolution. A more reasonable starting point would be a --utf flag for the gnu text utils, [ef]grep, etc.
  GCC does support using L".." to generate unicode/wide string constants, and has for quite a while.
  
  --
  Do you even lift?
  These aren't the 'roids you're looking for.
Nonsense by GCP · 2002-12-19 19:31 · Score: 5, Insightful

Not flamebait, just nonsense.

Unix isn't byte based, it's text based. Of course one layer deeper, it's byte based, but so is every other OS, and below that it's transistor based, etc.

What distinguishes Unix from other OSes is an emphasis on working in text with text utilities, often thru windows (telnet clients) on other machines -- windows whose only supported datatype is text.

In Unix, as in XML, text is sort of considered the ultimate data type. Bytes are just the medium used to represent the text under the surface. If the bytes were what mattered, people would usually work in a hex editor and do hex I/O, but they don't. They work at the text level of abstraction most of the time. It's the text that matters, not the bytes used to digitize it.

For text to reach its full potential, though, you have to say goodbye to grampa's ASCII and move on to a rich, universal form of text: Unicode. It's ludicrous for someone to say that speakers of non-Western languages should never have the ability to use the full range of Unix the way Westerners can. People who make comments like that are usually unaware of the problems that even English speakers have with single-byte encodings. (The second most powerful currency on earth is the Euro. Where is the Euro sign in Latin-1? Where are the curly quotes used by almost all English-language press? What happens when a press release destined for Time Magazine gets piped thru a series of single-byte Unix utilities? Undefined!)

XML, another system that considers text the universal data type, is Unicode based. They understand the concept of "universal". Same for HTML now. More and more Web pages are going to UTF-8, even for English, to avoid weird problems with Macs vs. PCs, Euro signs, curly quotes, embedded non-English text, etc. Are such pages really supposed to be out of reach of standard Unix utilities?

Java and .Net are 100% Unicode. Windows and Macintosh are now all Unicode based.

IETF and W3C have made it clear that no non-Unicode-based text protocols will be considered from now on.

Oracle is recommending Unicode as the format for all database text for new databases. So what happens when you cripple Unix so that it can't handle Oracle data in default form?

AT&T considers Unicode the future of Unix (cf. Plan 9), Sun has made the conversion to full Unicode fundamental to the future of Solaris, and as we speak the Free Standards Organization is preparing to do the same for an upcoming version of the LSB (Linux Standards Base) common core that all major Linux vendors have committed to.

It's unfortunate that so many Unix users still think that ASCII was good enough for grampa, so it should be good enough for every Unix user on earth from now on, but fortunately those who drive the standards have abandoned that kind of thinking forever.

--
"Those who have never entered upon scientific pursuits know not a tithe of the poetry by which they are surrounded."
1. Re:Nonsense by Jon+Peterson · 2002-12-19 22:20 · Score: 2
  
  This is the post I've read on /. for a very long time. It looks like we (sensible people who want unicode) are slowly winning.
  May I offer this guide to all things unicode:
  Unicode terms, FAQs, and mistakes?
  It helps clear up confusion between things like 'character sets' and 'encodings' and 'code points'.
  
  --
  ----- .sig: file not found
2. Re:Nonsense by Jon+Peterson · 2002-12-19 22:22 · Score: 0, Offtopic
  
  Arg. I meant to say "This is the best post I've read on /. for a very long time."
  
  You preview your post 3 times to get the HTML right and you still forget to read your own words!
  
  --
  ----- .sig: file not found
3. Re:Nonsense by Anonymous Coward · 2002-12-20 05:23 · Score: 1, Interesting
  
  As it is, I'm always hitting the limitations of those programmers who think that ASCII is good-enough.
  The most common example for me: In Unix consoles that do not support Unicode, I can't (easily) move between directories that were created with Unicode characters on an OS that supports it. Typically the Unicode characters are converted to unprintable, or at least, untypeable, characters.
  
  Some programmers forget that the point of the program is to serve the user, not some idiotic notion of what the underlying implementation should be.
  
  Also, those mods that gave dentin an Insightful, might want to look at his other (recent) brilliant reasons why Unicode should not be supported:
  
  The Internet will be broken up into cliques because some people don't know how to type an umlaut, and therefore won't have access to a site they can't read anyway
  
  Forcing programmers to support languages that cannot use ASCII is unfair to computer science and all those programmers who have spent years investing in ASCII
  
  "In a hundred years, there will be a global language anyway - if anything we should be vehmently refusing to pointlessly break perfectly good code to support local quirks"
Looks like OS X does... by DaphneDiane · 2002-12-19 20:29 · Score: 4, Informative

I just tried a test in the standard Terminal in Jaguar and it works. (In case the characters don't display in the post... I tried typing a i u e o in hiragana.)
bash-2.05a$ echo "AãIãUãEãOãS" | perl -ne 'print join(",",map { sprintf("%04X",$_) } unpack("U*",$_))."\n";' 0041,3042,0049,3044,0055, 3046,0045,3048,004F,304A,000A
1. Re:Looks like OS X does... by ThinkingGuy · 2002-12-20 02:31 · Score: 2
  
  At first my browser (IE5.5 on Win2K, with Japanese support installed) did't display your post correctly, but after switching the encoding to "Unicode," all the hiragana displays correctly.
2. Re:Looks like OS X does... by really? · 2002-12-21 18:28 · Score: 1
  
  sure it works. But, I think you missed the point. The way I read question is about a terminal before the GUI is started.
  
  --
  
  "Consistency is contrary to nature, contrary to life. The only completely consistent people are the dead." A. Huxley
3. Re:Looks like OS X does... by Anonymous Coward · 2002-12-29 05:24 · Score: 0
  
  "Not the beard!"
Good luck by sql*kitten · 2002-12-19 21:46 · Score: 4, Funny

However, I was pondering, how are other UNIX consoles doing? Do any of them fully support unicode, even bidirectional characters? shaping? (a great many of today's UNIX applications lack many if not all of these ;( ). If you know of such applications or are working on support for a platform, could you give feedback as to your experiences and thoughts on the current state of the UNIX console?"

Whoa there, cowboy. Let's work on getting the delete key to work properly before we try any of that fancy stuff! If I never have to type stty erase again, I'll be a happy bunny!
1. Re:Good luck by Jon+Peterson · 2002-12-19 22:18 · Score: 1
  
  ROFL
  
  Giggle. That's the second best thing I've read on /. for a long time (see above for best thing).
  
  --
  ----- .sig: file not found
Theres only one language necessary... by Chexsum · 2002-12-19 23:14 · Score: 0, Offtopic

Shell scripting language.

--
Pixels keep you awake!
Port? Nah, base! by mirabilos · 2002-12-20 02:21 · Score: 2

I don't want a BSD port.
I want my OpenBSD to be native utf-8, nothing else.
Currently it is not locale/NLS aware (which I consider
A Good Thing(tm)), but handles eight-bit I/O as if
it was iso-8859-1. I want it to change that to utf-8
because more characters ( comes to mind) can
be handled that way.

--
My Karma isn't excellent, damn it! (And /. still does not get UTF-8 right in 2012. Wow.)
1. Re:Port? Nah, base! by Alex+Belits · 2002-12-22 01:10 · Score: 2
  
  It's not "as if it was iso-8859-1", it's "byte-value transparent handling of data", and it is a good thing to have -- software not directly involved in displaying data (what is pretty much everything in /usr/bin) should not make assumptions about it other than that it's a sequence of bytes. If UTF-8 will be declared "native" charset, it will have to be enforced/handled in every utility, and those utilities will lose the ability to pass anything else -- even binary data. Want dd to handle blocks? No, can't do that, utilities are not allowed to split multibyte sequences! Want wc to count bytes? Same problem, it will count "characters" instead. And so on -- in the end everything that Unix is built on will have to be ruined just because someone wants to enforce a poorly designed enormous charset that can't be used (or written, or remembered, or designed a complete font) by any single person anyway.
  
  I'll rather use fonts of charsets that I use, and thank the underlying layer and utilities for not messing up my data by assuming that everything they handle is a "text" (as opposed to "binary" -- hi, DOS idiocy, long time no see!) and that I love a semi-proprietary monstrosity of a charset (Unicode) in a variable-length nightmare of encoding (UTF-8).
  
  --
  Contrary to the popular belief, there indeed is no God.
kterm by Dr.+Tom · 2002-12-20 02:35 · Score: 2

Kterm is xterm with double byte support. It's been available since before unicode, but you shouldn't have any trouble hacking it to use a unicode font. http://packages.debian.org/stable/x11/kterm.html
plan9 - unicode through and through by DrSkwid · 2002-12-20 05:43 · Score: 2

even the c source code

http://plan9.bell-labs.com/plan9

okok, It's a graphical OS but bitmap terminals are hardly hard to come by

--
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
unicode c by Anonymous Coward · 2002-12-20 06:56 · Score: 0

As of the C99 standard, all identifiers (even external symbols!) must be allowed to be in Unicode, even if your character set does not allow for the characters.
Sometimes it can get a little ugly, though. e.g. footnote 52 from the standard:
On systems in which linkers cannot accept extended characters, an encoding of the universal character name may be used in forming valid external identifiers. For example, some otherwise unused character or sequence of characters may be used to encode the \u in a universal character name. Extended characters may produce a long external identifier.
C99 is also the first time the extended character set is required to map to UCS, too (see Annex D). Pretty crazy.
I don't believe gcc has put in extended identifiers yet, though (it's gaining C99 compliance pretty slowly).
UCS-32 by Anonymous Coward · 2002-12-20 08:56 · Score: 0

32 bytes per character seems a bit overkill, considering the size of Unicode's character set is only 20 bits in size. Perhaps you mean UCS-4?
To be honest I don't know why the difference in numbering. You have UTF-32, but UCS-4? What's the deal? Oh well.
FWIW, I believe that on some Crays, bytes are 32 bits in size. They had to come up with a pretty wack C implementation to get 8-bit chars. I believe 8 bits per byte is a POSIX standard?
xterm isn't console (but does utf8 on its own) by Xtifr · 2002-12-20 18:48 · Score: 2

The question was explicitly not about terminal emulators (xterm, mlterm, kterm, whatever). However, if you want a unicode-supporting xterm, then why not use gen-oo-wine xfree86 xterm? Yes, xterm indeed supports unicode (at least since Xfree4.2), if you use the -u8 switch. Debian's xterm package even comes with a script, uxterm, to set up variables and resources as appropriate.
1. Re:xterm isn't console (but does utf8 on its own) by Wills · 2002-12-22 01:42 · Score: 2
  
  Is any version of xterm better than kterm at displaying Japanese text?
  
  --
  Scroogle
Terminal for Indian Languages by Anonymous Coward · 2002-12-21 07:27 · Score: 0

Classification 5 in http://acharya.iitm.ac.in/iitmsoft.html seems related
Unicode sucks, no one uses it by Alex+Belits · 2002-12-21 10:31 · Score: 2

Seriously, I have yet to see a person (other than Martin Duerst who apparently made a career of stuffing Unicode into everything he notices) willingly using Unicode, as opposed to being forced to do that by some software that requires it. The "internationalization" of documents is a strawman -- at this point in history no non-linguistic document contains more than two languages, local charsets handle that perfectly, and linguists went far beyond what Unicode can provide already, so they have to use different formats anyway. If and when true internationalization will be necessary people will need one simple thing -- language/charset tagging. Tagging is also important because it makes those texts "machine-readable" -- programs will know what parts of text they should interpret using rules that apply to different languages and charsets, and pass "as-is" everything that is in the languages they don't know.

XML already allows language attribute in all tags, and if charset attribute will become valid everywhere where the language is valid, problem will go away immediately and without mandatory Unicode adoption everywhere because everyone who can read a language has a font of a charset that is used with it, and everyone who doesn't shouldn't have problem with occasional

"can't display this section of text, ( ) download "klingon fixed" font to make it readable, (x) show as block, do not edit, ( ) display/edit in hex".

Obviously, a console, or any other kind of program can easily be modified to do that if necessary, and there will be no loss for people that, like myself, simply use their native language + ascii charset, and switch to all other charsets using nice xterm font menu.

--
Contrary to the popular belief, there indeed is no God.
1. Re:Unicode sucks, no one uses it by Anonymous Coward · 2002-12-22 16:13 · Score: 0
  
  Seriously, I have yet to see a person ... willingly using Unicode, as opposed to being forced to do that by some software
  
  Well, you could start by looking at everybody who wrote that software you mention. Then add everybody who has to deal with more than just Ameri^H^H^H^H^HEnglish text on a day-to-day basis.
  
  Seriously, there are probably quite a few of us.
  
  at this point in history no non-linguistic document contains more than two languages
  
  Probably took too small a survey, then. People in my lab write them every day. We write mostly in English (sometimes German), and refer to people, locations, and events in a dozen European countries. Using some pre-Unicode technique, like "codepages", would be a nightmare.
2. Re:Unicode sucks, no one uses it by Alex+Belits · 2002-12-23 10:55 · Score: 3, Interesting
  
  Well, you could start by looking at everybody who wrote that software you mention.
  
  People who write that software never use their "internationalization" -- they see it as a "feature" to add in the list of marketing checkboxes.
  
  Then add everybody who has to deal with more than just Ameri^H^H^H^H^HEnglish text on a day-to-day basis.
  
  That will be me -- and I hate Unicode.
  
  Probably took too small a survey, then. People in my lab write them every day. We write mostly in English (sometimes German), and refer to people, locations, and events in a dozen European countries. Using some pre-Unicode technique, like "codepages", would be a nightmare.
  
  Almost all European languages, including English, are in a single iso8859-1 charset -- what happens to coincide with the beginning of Unicode table. People who use iso8859-1 can "switch to Unicode" and continue using just the same thing with longer bytes, getting no benefit whatsoever but pretending to have "internationalized" their software. For everyone else Unicode causes nothing but trouble, waste of resources and incompatibilities.
  
  As for "code pages" this is a DOS/Windows kludge that is a dumb idea in its own way -- everyone else uses _charsets_ and those can be easily displayed in pretty much everything. The only problem is, no one bothered to make a usable (that means, not XML) tagged format that can include information about languages and charsets used in a document. MIME has charset information for parts of the document, and substrings in the header but not substrings in the document, so it isn't really usable either, however can be used as a proof of viability -- most of mail clients have it all implemented, therefore metainformation with charsets can be easily used.
  
  --
  Contrary to the popular belief, there indeed is no God.
3. Re:Unicode sucks, no one uses it by Anonymous Coward · 2002-12-23 18:13 · Score: 0
  
  Almost all European languages, including English, are in a single iso8859-1 charset
  
  ISO 8859-1 is "West European". A quick web search seems to indicate it covers about half of the European languages.
  
  Besides, "almost all" doesn't mean shit if it doesn't support the one I want -- and it doesn't support most of the ones that I want: Greek and Turkish, for example.
  
  just the same thing with longer bytes, getting no benefit whatsoever but pretending to have "internationalized" their software. For everyone else Unicode causes nothing but trouble, waste of resources and incompatibilities.
  
  If it was in US-ASCII, in UTF-8 it's the exact same number of bytes. You know that.
  
  As a software developer, it pays of in spades because I don't have to answer any questions about languages: if they're in Unicode, they'll always be there. No more wondering how to get this language to display in that web browser; it just works. If I want a character, I can look up the bytes; if I know the bytes, I can look up the character. No more "incompatibilities" than you'd get from bugs in any other library on your system.
  
  As for performance, I'd like to see what you're doing. I don't believe for a second that Unicode encoding or decoding is a hot spot in any actual program.
  
  You seem to be changing your tune: Unicode sucks, nobody uses it. Oh, I guess people do use it. Well, nobody uses more than 2 languages in a document! Oh, I suppose they do use more than 2 languages in a document. Well, performance sucks! Oh, er, I don't have any evidence to support that, either. Get over it.
4. Re:Unicode sucks, no one uses it by Alex+Belits · 2002-12-24 23:06 · Score: 3, Insightful
  
  ISO 8859-1 is "West European". A quick web search seems to indicate it covers about half of the European languages.
  
  And this is who in Europe actually "uses" it.
  
  As a software developer, it pays of in spades because I don't have to answer any questions about languages: if they're in Unicode, they'll always be there.
  
  At the expense of crippling the software.
  
  No more wondering how to get this language to display in that web browser; it just works.
  
  Software is not to "wonder how it will look in a web browser", it's to operate on data. Most of operations can be absolutely byte-value-transparent and they must never depend on charsets and languages in the first place, however ones that are dependent on them usually have to either use tags with metainformation (and there Unicode is not any better than anything else) or do some horrible guesswork -- say, which language is used in a certain chunk of text that contains some characters (what simply should never be done in the first place, but Unicode accomplishes nothing for it because characters are shared between languages). Displaying pretty characters is easy. So easy, one should never think about it. However a computer is not a typewriter, and therefore charsets and encodings should never be designed to simplify a tiny bit the simple task of displaying while turning any complex text processing into a complete hell.
  
  If I want a character, I can look up the bytes; if I know the bytes, I can look up the character.
  
  How can a computer LOOK UP the bytes? There is nothing BUT the bytes in the computer's memory so certainly it can't look them up. You can use "bytes" as an index in a font, or you can pass them to some processing routine.
  
  No more "incompatibilities" than you'd get from bugs in any other library on your system.
  
  There are no bugs in the font renderers already -- this is a non-issue. The problem is entirely in trying to force people to make huge amount of assumptions about the data's content in otherwise byte-value transparent operations, just to accommodate UTF-8 where it should not matter. It's a design issue, not implementation.
  
  --
  Contrary to the popular belief, there indeed is no God.
Wouldn't a framebuffer be needed? by vadim_t · 2002-12-22 03:50 · Score: 1

If I remember correctly, the text modes have the font stored in a 2048 bytes array, with every character having a byte per line, and 8 lines per character. I don't think there's any way of squeezing more chracters into a text mode, unless video card designers come up with some extension.

So probably if Linux is made to support Unicode correctly this will only work in framebuffer modes, where it's possible to have as many characters as you want. That would mean a lot of improvement is needed in this area. For example the rivafb driver/nVidia X driver would need to be fixed to coexist. Sure vesafb can be used, but it's painfully slow, and some really old cards don't support it.
It's the future, but not without it's pains by voodoo1man · 2002-12-25 16:26 · Score: 1

Here is a neat paper describing how Plan9 made the full transition to Unicode. Not exactly an easy feat, although it was harder than necessary for them because they decided to do it back when Unicode was still being standardized. And of course, ASCII isn't going away anytime soon, as there are plenty of systems that don't need it but do need all the memory they can get.

--
In the great CONS chain of life, you can either be the CAR or be in the CDR.
This is the event horizon... by dagg · 2003-01-03 12:30 · Score: 1

Can AC post a pithy reply in time?
I prefer the Unix console, myself.

--
Sex - Find It
1. Re:This is the event horizon... by Anonymous Coward · 2003-01-03 14:59 · Score: 0
  
  Sorry I'm late, some of us have better things to do on Friday night than post on slashdot, because we're not a 14-year-old gaybo. You, however, are.
2. Re:This is the event horizon... by Anonymous Coward · 2003-01-03 19:13 · Score: 0
  
  Now will you stop claiming you are a woman?
  Go drown in your own spooge, you asstonguing preteen homo.
I researched UNICODE several years ago... by dagg · 2003-01-03 13:02 · Score: 2

Back then, it was very well publicized, but hardly anyone used it. Unfortunately, I feel we are in the same boat today.
ac, you fail it.

--
sex

--
Sex - Find It
1. Re:I researched UNICODE several years ago... by Anonymous Coward · 2003-01-03 15:20 · Score: 0
  
  Several years ago? You mean when you were 11 and still only a gaybo-in-training?