An anonymous reader asks:
"Like many others these past few weeks, I took the time to download the latest RedHat (replace with your favorite distro here) and upgrade my system. Despite the usual mail hangovers (corporate mail is still Outlook through POP3, etc.), the new *Office suites are great and I can almost dump Windows. However I was amazed at the sorry state of Linux instant messaging. Before you flame me, mod me down or doom me to a lifetime of Windows usage, allow me to explain: I am not a native English speaker, and it seems that every single Windows IM client I use (except Trillian) can deal properly with accented characters. Worse, every third-party Linux client I have tried deals with them differently, resulting in garbled (vaguely Unicode-ish) junk! Does the Slashdot crowd (especially the non-English folk) have a solution for this? (Short of VMware and Win32 clients, that is. Wine doesn't work at all for me)"
"Portuguese (Continental) is my native language, and I speak French, Spanish, some German and (after spending quite a few months in Poland on a project) passable Polish. I speak those languages practically every other day, and besides e-mail, I have taken to using MSN and Yahoo to discuss work with my clients and colleagues.
I have thus far tried GAIM (both the RedHat 8.0 bundled and the CVS versions), Kopete, Everybuddy, the native Yahoo! (which crashes more often than not and does not even deserve the 'beta' moniker), and none of them are suitable. IRC does the job, sure, but most of the people I have to reach can't use it (firewalls, usually) and won't install another IM client, so the solution has got to be at my end."
The problem is lack of standardization. Windows clients assume (I'm guessing, based on my experience) an 8-bit character set; some values in the higher 128 just happen to be mapped to different characters in different locales, so if the locales match there's no problem but if it doesn't you get garbage.
Ideally, every text would either be unicode or declare the encoding, but of course that's not going to happen. So we want an IM client that knows about different encodings and can be told that a certain person sends and receives messages in a particular encoding (and probably a way to specify a relation between encodings, keyboard bindings, and fonts - or is that the responsibility of the operating system or window manager or KDE or whatever the architecture is).
As far as I know, no such client exists.
I'm an English speaker, but I just opened a window in each and sent a friend a message copied and pasted from a Spanish website. The messages went through just fine with accented characters and all.
There are three levels: bytes, character sets, and glyphs. Your program recieves a stream of bytes and you have to display those bytes to the user as text in the correct langugage. That display is done using glyphs (font characters). A character set maps bytes to glyphs.
There are tens if not hundreds of different character sets. A character set might map each byte to a different glyph (latin 1 and 2), only some bytes to glyphs (ASCII), multiple bytes to a glyph (Unicode), or varying numbers of bytes to glyphs and some bytes to no glyph (UTF-8).
Java APIs handle this by reprensenting all Strings internally in Unicode. Unicode is the granddaddy of all character sets. Almost every glyph has a value in Unicode. When you get a stream of bytes in java you can use a Reader to translate that stream into Unicode. The Reader is constructed with the name of a character set. If no character set is specified, the system's default is used. The character to be used usually comes from meta-data. In html for example, the character set for the page is transmitted by the server in the data that comes before the page itself is delivered (the http header.)
Once you have a unicode string it is straighforward to find a glyph to display for each character. This all depends on the right fonts being installed, but usually APIs handle it for you.
I18n problems usually occur when a programmer doesn't know to how to translate bytes into unicode characters. The programmer may always use the default character set, ignoring any meta-data. Similarly on sending data, the programmer must tell the other end with which character set the data is sent. Other problems may occur when a needed font is not installed.
Often, a system works with a specific character set that doesn't support all characters (such as latin 1). When more characters are desired in such an instance, escape characters are often used. There are \uXXXX style escape sequences in source code, and &XXX; escape sequences in html. Such escape sequences may be able to retrofit an older system in which a specific non-inclusive character set is assumed.
Look at amsn.. Large list of languages supported in the menu systems etc, although not having tried it apart from in English mode I don't really help you that much. Worth a shot tho, it's the best MSN client out there...
Feel that power? That's mah MOUSING FINGER
My experience with Licq is that it works perfectly in ISO 8519-1 (I use mainly French). In the options you can select other languages as well (UTF-8, ISO 8519-2, ISO 8519-6, CP 1256, KOI8-R, JIS7, etc.), but I never used them. Just make sure the proper fonts are installed, and you should be good to go.
The only problem I see is that you don't mention ICQ in your possible choices...
Licq does pretty good job both with displaying different languages and localisation (at least it is quite well translated to Polish). The only problem there is that you can choose encoding on program-wide basis, so it will be difficult to chat with different people in their native languages.
So I would recommend it for western languages and gg or ekg (two excelent gadu-gadu clients) for chats with your Polish friends
Raf
Oh ho HO! International chat certainly is a problem in the Linux community, owing to many factors; not the least of which being that the developers of the major IM forces out there seem to largely be from ISO-8859-1 locales. Thus ISO-8859-1 works pretty well, and other, more ASCII-deviant (CJK) locales work with virtually no success.
The good news is that an answer is here! I've been on a crusade to make Gaim the penguin-pimpin'est international chat machine available, and it's really paying off! For the stable series of gaim (0.59.x, currently at 0.59.4 with 0.59.5 to be released possibly as I type this) (I just looked and 0.59.5 is out), if your locale is set correctly you should be able to chat in whatever language your little heart desires... (I have personally successfully used it with English and Japanese) as long as you aren't chatting over the AOL Instant Messenger service or the ICQ service, both of which use the Oscar protocol. However, MSN and Jabber, for instance, should be substantially correct.
The fabulous news is that the development version of gaim coming at us right now has first class i18n support on the whole gamut of protocols! With a timely move to Gtk+ 2.0 and the Pango text formatting system, Gaim now has international text formatting second to none in the OSS community and hardly rivalled in the commercial world. Images like this shot of gaim displaying Japanese, Russian, and English simultaneously display what I'm talking about very nicely. Not only can we do non-English text, thanks to UTF-8 we can do all of the modern languages of the world simultaneously. In addition, support for internationalization on the troublesome Oscar (AIM and ICQ) protocol has been added and is coming along very nicely.
In short, look for the next major release of gaim to clear up these issues in a big way. For those hardy souls wanting to test the code that's currently in CVS, please note that it is NOT currently complete, and isses that you have are most likely transient.
Also, please be aware that your locale MUST be set correctly for internationalized programs to work the way you expect. Programs that only deal with your system can be more forgiving, but programs that communicate over the network absolutely must know more about your locale, including your character set. If the output of your 'locale' command lists LC_CTYPE as, for instance, "C", it's no wonder i18n isn't working! Set your LANG or LC_CTYPE correctly for your language (en_US for English with ISO-8859-1, es_ES or es_MX for Spanish, pt_PT or pt_BR for Portuguese, ja_JP for Japanese, zh_CN for Chinese, etc.) and you might see general i18n support improving dramatically.
* OSCAR, IRC, and JABBER file transfer support is working int he CVS version of Gaim (last I checked). We will be continuing development to add transfer support to the rest of the protocols (ICQ included)
* Re: periodically not working: We have not had any problems with oscar for nearly a year. Any problem has been immediately resolved.
---
Rob Flynn
Pidgin
It's a console MSN client and AFAIK it works perfectly in spanish (haven't tried on other langs, but iso8859-1 ones should work) Give it a try http://pebrot..sourceforge.net