Slashdot Mirror


Li18nux Effort Announced

The Li18nux effort is now underway. The group seems to have ambitious goals and plenty of members (not to mention a multi-lingual press release). The effort is supposed to lead to documentation, components, etc. in many many languages, including languages with different character sets. Japan and many other nations seem to be fairly represented.

16 of 121 comments (clear)

  1. Internationalisation is NOT easy by vlax · · Score: 2
    I'm working on a project that has to support exactly two languages - English and Japanese. It runs only in a console window on UNIX. To do this, I have to support five characters sets in curses, a major headache in and of itself.

    I'm all for using Unicode accross the board - this would be an extremely good thing, but it doesn't even begin to solve internationalisation problems. The major european languages use variants of the Roman or Cyrillic alphabets and place spaces between words, as do many other languages that have been alphabetised in the last two centuries. That can be handled easily enough. However, even among these languages, there are some serious issues. Alphabetical order is different from English in many European langauges, such as Swedish, Spanish, Icelandic, and even French (just to name a few). In languages like Hungarian, Finnish and Turkish, complex hyphenation rules have to be taken into account. Indexing of compound characters can be real nightmare. And in German, ü, ö and ä are sometimes indexed with ue, oe and ae for orthographic reasons. Yet another complication: books in French have the table of contents in back. Structured publishing software has to take this kind of thing into account too.

    Going global, this is just the beginning of the problems. Chinese, Japanese and Korean don't usually employ spaces between words at all, messing up line justification algorithms. Korean writers sometimes use Sino-Korean (Chinese characters) and sometimes the Korean syllabary for the same words - should they be indexed together? In Japanese, unusual kanji characters are often accompanied by hiragana characters that are either above or to the right of the kanji to aid in pronunciation and comprehension - this is called a ruby, and it's an I18N nightmare. Korean sometimes does the same thing, as do the Chinese speakers in Taiwan with their bopomofo system. Many Chinese characters have two forms - a traditional one still used by many overseas Chinese and simplified characters used in the PRC. This is another indexing problem. Vietnamese uses a very complex variant on the Roman alphabet today, but only a century ago, they too still used Chinese characters, and that double system imposes still more constraints. Also, some Japanese, Korean and Chinese texts are written top to bottom and right to left. Books are generally printed right to left, the reverse of western order. Oh, and I won't bother you with the utter nightmare of alphabetical order in Chinese. Take a look at a Chinese dictionary if you want to see it in action.

    Arabic, Farsi, Urdu, Thai, Hebrew and Yiddish (among other languages) are written right to left, and Arabic and Hebrew sometimes write the vowels and sometimes don't. This is a major consideration when doing, for example, searches in documents. Allowing a terminal or any application to accept characters in both right to left and left to right (and top to bottom for traditional Mongolian and sometimes for other Asian langauges, as described above) is not very easy, and to truly internationalise, it has to be done across the board. Not to mention the frequent occaisions when both data in the local language and English or some other western language have to be mixed.

    Indian and southeast Asian langauges each have their own alphabet, but they are often derived from common sources (usually Sanskrit). Although the letters in each langauge may be different, their are certain equivalencies between them that are often taken into account when indexing or transcribing. The old Indian telegraph system used to use these similarities to provide a unified code for data exchage. How is an I18N system to take these into account?

    The Unified Canadian Syllabic system (used for some dialects of Cree, Ojibwa and Inuktitut) has compound letters that have to be taken into account, as well as a unique system of rotating the letter to indicate part of its phonetic value. This matters in making data entry systems (e.g. keyboards) and in alphabetisation.

    And of course, it's a huge nightmare when one system has to provide for unknown combinations of languages, e.g. a Russian who needs to use Japanese, a Pakistani who does business in Chinese, or an Israeli linguist doing work in Cree.

    And, lastly and even less pleasantly, an I18N project has to take into account all the existing half-assed standards for encoding and working with these langauges. There are at least three different systems for Japanese alone (EUC, JIS, Shift-JIS), and four for Chinese (GB, Big5, JIS, EUC). The dominant system for encoding Inuktitut is just a font the main newspaper in Iqaluit made up so they can put their articles on the web (Nunatsiaq News).

    There are no simple answers for internationalisation, and we can't just tear down all the existing software and rebuild it to work right. I hope the OSS community is up to this because this is a major project that has to be undertaken in a unified way, from top to bottom, or else it will not succeed.

  2. Re:Myth on i18n and "One World Language" by JoeBuck · · Score: 2
    If you've got 2 people whose native language is not the same, English is almost certainly what they'll speak to each other ...

    Unless one of the two is Dutch. Those folks are amazing (my old German teacher says "the Dutch have mouths shaped for languages). Many Dutch people speak four or more languages. I was at a workshop in Italy once, watching this Dutch woman I know carrying on conversions in English, German, Dutch, and Italian all at once, switching languages as she talked to each person at the table. And she doesn't think of herself as an outstanding linguist, that kind of language ability is considered typical.

  3. Re:Teaching the world English by BJH · · Score: 2

    Japan provides six years of English education during junior and senior high school; only a very small percentage of high school graduates can actually speak it at a level of proficiency that would allow communication.

    Hong Kong's population is such a small proportion of the total population of China that it's not funny. Even then, English proficiency in Hong Kong seems to be declining.

    You say that the idea is not to force everyone to speak English, but then you come up with ideas like "requiring a year or two of English", only publishing information on the Internet in English, making most software English-only, etc. etc. etc. What the hell is that if not forcing people to use English!?

    Strangely enough, the people who I know in industrial nations that don't speak English seem to get along just fine. I hope to God that you're not a software developer, because you'd no doubt drag us back to the stone age of 7-bit ASCII on teletypes or something.

    And what is it with your obsession with "superior" and "inferior" languages? There is NO SUCH THING! Ask any linguist - every language has its complex areas. In your first comment, you stated that:

    - English has no gender-specific terms.
    There are, however, significant differences in male and female speech.

    - English has no formal/informal dialects.
    Oh, and I suppose you speak to your boss the same way that you speak to your children, your friends, your dog...

    - English has no tone-dependent words.
    Maybe not, but a large portion of conversation depends on differences in emphasis. Consider "EX-tract"-"ex-TRACT", "PER-vert"-"per-VERT" or any of the hundreds of other lexical items that rely on emphasis to distinguish between noun/verb forms. Also remember that without emphasis, it becomes extremely difficult to recognise sarcasm, humor, etc. (Just look at all the misunderstandings that occur on /. because of this.)

    - English has no time-dependent words.
    Look at English's use of tense and compare it with that of Japanese, Chinese, the Polynesian languages, or many others. They are all simpler in structure.


    I'm really not sure what else I can say to someone who seems determined to ignore the right of 80% of the world's population to speak the language that they choose to speak. Grow up and try leaving your own country once in a while.

  4. AmigaOS locale system by mjg · · Score: 2

    When AmigaOS 2.1 was released, it had something called locale. This was basically a system where a small catalog of the text used in any application was stored in a file, and could fairly easily be translated to other languages. The user just selected their language of choice in the locale preferences, and any application that had the appropriate locale catalog available would load it up and be presented in the user's language.

    It seemed to work quite well... Well, it had problems, such as large differences in lengths of translated words would cause nasty UI mess-ups. But we're talking 1992-1993 here, so it was a fairly nice thing to have that early. From a programmer's point of view, making your applications 'locale-aware' was very simple, it involved a few minor changes to your code, and then running a tool over your code to extract the locale catalog information. You ended up with a program which was written to use a native language by default, and then would support any language for which the translated locale catalog was available on the system.

    I don't think the system had support for non-latin character sets (I could be wrong, I only played with it briefly), so it wasn't the best solution, but it was fairly impressive for it's time...

  5. How very chauvinistic! by Scurrilous+Knave · · Score: 2

    ... wouldn't it just be much easier to teach the world English?

    Spoken like a true English-speaker. Probably from the USA, right? Sigh.

    For every mod hip progressive netizen who thinks that English is the obvious World Interlanguage, and oh, wouldn't it be better if all those misguided little multi-colored and warring peoples in those places with unpronounceable names just stopped using that yip-yap jibber-jabber they call a language, at least in public, and start speaking The Obvious Choice, English, there's another person who thinks roughly the same thing about French, or Chinese, or Russian, or any one of a hundred other of the world's languages.

    As for what would be easier, Esperanto would be a good candidate. The Esperanto community has some pretty convincing data that it's actually easier to learn than most other languages, even for those whose native tongue is not derived from the same Latin that Eo is. But hey, that'd mean that you would have to bend your poor little brain around something new, and we couldn't have that, now, could we? Better that the wogs learn to talk proper, eh?

    Sorry, you probably deserved only about half of that. I'm calmer now. On the odd chance that you or others would care for some facts in place of my ranting, see esperanto.net for some actual details about a real alternative.

  6. Re:Teaching the world English by Jordy · · Score: 2
    I'm really not sure what else I can say to someone who seems determined to ignore the right of 80% of the world's population to speak the language that they choose to speak. Grow up and try leaving your own country once in a while.
    I guess I've been thinking about this from a practical standpoint and not from a cultural standpoint.

    English could just as easily be Spanish or French or Russian. It really doesn't matter. I chose English simply because it is used a lot in the global setting.

    I see no practical reason to maintain several hundred different languages and translate all the world's knowledge constantly between them and not just standardize on one.

    In the US, we have a whole lot of different cultures all living under one roof. Russian, Chinese, Japanese, Korean, British, African, Brazilian, Egyptian, Indian, and so on. Everyone seems to accept the fact that in order to participate in a meaningful way in our society, you must learn English. The culture doesn't go away just because you learn a second language but English becomes your primary language.

    Now, one might point out that you do loose a piece of yourself here and it is true. I being a white middle-class male have pretty much no specific ethnic culture to speak of and I'll be honest, I don't see it as a big loss.

    One more thing, during this discussion I've tried to be sensible and practical as possible and just throw out some ideas which I'd hope people would comment on with an open mind.

    Please try and refrain from making personal attacks when criticizing posts.

    --
    --
    The world is neither black nor white nor good nor evil, only many shades of CowboyNeal.
  7. Should /. go with the times? by T.Hobbes · · Score: 2

    Perhpas this is the time for /. to take the initiative, and create the first multilingual website.. think of it, it'd be like the altavista translator, but automatically done on the server side, according to cookies, or for unregistered users, ip! hmmm.... what one comes up with at such hours!
    _____________

    1. Re:Should /. go with the times? by GoRK · · Score: 2

      No need to use cookies. Browsers pass the language encoding in the headers. It's a very standardized system and many, many, many websites use it. Like almost all sites in France... Most people never notice it because their default language preference is set up for only their language and they can't tell what other languages are available. There are entire companies that build new multilingual sites or translate existing sites.

      The best part about the language stuff in HTTP is the 1st, 2nd, 3rd preferences... I have mine set to show me english first, then japanese, then spanish, which is the order in which I can comprehend languages from perfect to something that I can read most of to something that bablefish can machine translate!

      ~GoRK

  8. Gnome i18n by eimaj · · Score: 2

    I was pretty impressed by Gnome's i18n when
    I realized I could do things like,

    $ LANG=es_ES gnome-help-browser

    or even run my whole X session in another
    language,

    $ LANG=fr_FR startx

    Not everything is translated, but its
    still pretty impressive.

  9. Sounds interesting by Lev_Arris · · Score: 2

    It sure does sound interesting. I only hope that they do not mess it up like Micros~1 did in Windows (Have you ever installed an English program onto a German windows which came with some French patches and then read the Dialog box: Do you want to continue? Ja/Nein/Annuler ;)

    I think it will be difficult not to run into that again because a great part of the software for Linux is written by single persons, who do not necessarily have the time or the resources to translate all of their software into multiple languages. The Open Source concept of course comes in very handy here (Everybody can read the code and translate the text messages himself) but I guess there will always be some programs which won't be available in your preferred language.

    I appreciate the effort which will add even more popularity to Linux but personally I will always stick to the English version of Linux.

  10. CJK by crbill · · Score: 2

    Good idea! The support for Chinese, Japanese, and Korean languages just doesn't cut it. Sure, the programs can display text, but I've had a miserable time trying to enter text and get appropriate conversions. The only reason I use Windows is so that I can email or write documents in Japanese with OE5 and Word 2000 -- all on an English version of Win98. Unfortunately, I think MS has definitely seen the way: if people who communicate in non-latin based languages can't input characters, they're not gonna use your software! And that right there was enough to make me dump Netscape...

  11. Myth on i18n and "One World Language" by tai · · Score: 2

    It's not a direct replay to the article, but I just want to point out two myths about the subject.

    1. Unicode solves all i18n problems
      It doesn't. Original goal of Unicode was to 1) provide virtually unlimited number of characters needed to express textual expression in any language, and 2) to integrate namespace of each encodings, so whereever the character is located, you can tell what that character really means (existing encoding scheme "switches" mode by context, so this can't be done).
      As it turned out that Unicode failed to accomplish BOTH, although it is superior in some part compared to current scheme, it has solved nothing in concept. I know most ASCII-only people who would probably never experience problems on this doesn't care, but I just hope people stop saying "Unicode is the land of promise" type marketspeak.
    2. Use ONLY English - because this is the language _everyone_ speak.
      If you're talking about population, then everyone should be forced to speak either Chinese or Spanish by now...Although English seems to be dominant language in some world out there (which probably includes only USA, Austraria, and (part of) Europe), things are different on the Earth in whole.
      I sometimes wonder where do people really mean when they use the word "world"...
    1. Re:Myth on i18n and "One World Language" by Per+Abrahamsen · · Score: 2

      1. There is no silver bullet. But a large scale adaption of Unicode will solve more problems than it creates.

      2. You forgot to include large parts of Africa and South Asia in the part of the world where educated people speak English.

  12. No, i18n == internationalization by Zico · · Score: 3

    That is, "i" + 18 letters + "n".

    Cheers,
    ZicoKnows@hotmail.com

  13. Re:Use UTF-8 by Alex+Belits · · Score: 2
    UTF-8 is inadequate for any use other than word processing, and is not used by any developed nation except ones that are using ASCII and ISO8859-1 already -- that happened to be first 255 characters of Unicode, UTF-8 is based on.

    And yes, you are a troll here.

    --
    Contrary to the popular belief, there indeed is no God.
  14. Lacking in content? by Emil+Brink · · Score: 2

    Hm, I couldn't find any real content on that site... Sure, there were lots of information about how LI18NUX (darn, that is a b*ch to type) is organized, who its members are, what the role of the steering committe is, and so on. But I really couldn't care less; I want technical information and practical advise about how to go about making my application(s) I18N-aware. I think I've seen some site about that previously, but this new organization seems like the obvious place to collect all that stuff. I would like tutorials, HOW-TOs, and FAQs describing the various APIs that are used for internationalization. Of course, a clearinghouse for translators would be useful as well (hm, I think I've seen that somewhere, too). Well, I guess we'll just have to wait and see what they come up with.

    --
    main(O){10<putchar(4^--O?77-(15&5128 >>4*O):10)&&main(2+O);}