Slashdot Mirror


Software Internationalization

Anonymous Coward writes "It seems that the folks over at O'Reilly have quietly released a book entitled, "Java Internationalization". The website for the book can be reached from the Java O'Reilly site, . The authors also have a website dedicated to the book. I'm curious as to how developers are treating software internationalization, not just in Java, but in other programming languages like C#, C++, Perl. For software designers out there today, is internationalization and localization a forethought or an afterthought? Is Java the only viable language for writing truly multi-lingual applications?"

5 of 29 comments (clear)

  1. Internal unicode and code page translation by MikeApp · · Score: 2, Interesting

    Java not only handles unicode but also does code page translation.

    For example, I had to port an ASP app that used an Access database with Big5 Chinese data. The web pages it output were also Big5. I used Java to convert the data to UTF8 and loaded it into Postgres. A servlet grabs the UTF8 data from the database, Java stores the data as UTF16 internally, and the servlet produces either Big5 or UTF8 web pages, depending on the user's preference. It only took a couple lines of code to make this happen, because Java can convert from its internal Unicode format to other codepages. I believe that the same applies for other languages (e.g., KOI8 Cyrillic).

    Unicode is definitely the standard of the future, and it also allows for easier transfer of data between applications that can't handle CJKV multibyte character sets.

    Of course, I don't know Chinese, which made this a fun project :)

    1. Re:Internal unicode and code page translation by LeftHanded · · Score: 2, Interesting

      This is a nice feature of the Java API, but you can achieve the same result using the UNIX (tm) libiconv implementation. If your UNIX (tm) doesn't have one (or you use Linux, BSD, etc), then there is a Free Version. It will do all the conversions for you, for many character sets. Most current *nix distributions include this as a package.

      --
      I think...I think it's in my basement. Let me go upstairs and check. -M.C. Escher (1898-1972)
  2. I18n by Anonymous Coward · · Score: 2, Interesting

    There is so much more to i18n than translating text messages. The biggest problem is character encoding.

    I work in Japan, only work in Japanese and we only have products for the Japanese market. Yet, most of our time is spent dealing with i18n issues - converting to and from different encodings (shift-jis for MS and EUC-JP for Linux). There are several other encodings used in other areas as well.

    The reason Java is so good in this environment is that the internal encoding is all unicode. Therefore we just have to translate encodings at input and output and everything else works with very few problems. (Having said that, even though Java support for multibyte character sets is very good there are still a few gotchas to watch out for). The whole API and 3rd party software is then available for use without limitation. I don't think this can be said for many other programming environments.

    Slightly off-topic but the take up of Linux in places such as China and Japan will be greatly accelerated if flagship software, such as Nautilus, would work in a multibyte character environment at version 1.0

  3. ...and they agree to disagree. by neonedge · · Score: 2, Interesting

    Part of the problem is that there is no agreed-upon implementation. The POSIX group could not choose between X/Open's catgets implementation and GNU's gettext, and as such, left it out of the standard entirely. Another problem with both toolsets is that neither presents a truly extensible strings database format. If you need to add additional storage fields to the strings database for a language other than C, you're out of luck if you plan to use the library and tools on the same files. Very short-sighted IMHO.

  4. Re:Non sequitur by Moridineas · · Score: 2, Interesting

    Right--sort of. Latin evolved further from Greek basically. Russian is directly from the greek (or, from the Greek w/o the Latin intermediary) and English is pretty much the Latin script.

    Arabic is also Right-to-left which can be trouble, though there are still a small number of characters (compared to say Chinese).

    the problem with an alphabet like Arabic is not only the storage, but the display. Different letters have different shapes, depending on where they occur. Vowels aren't usually written, but probably a good idea to store them, so display them or not. And so on. and of course right-to-left, when 99% of computer design is oriented left-to-right.

    Scott