Slashdot Mirror


Microsoft Forced To Translate Office Into Nynorsk

An anonymous reader writes "Beeb reports, "The main organisation working for the Nynorsk language got most of Norway's high schools to threaten to boycott all Microsoft software if they didn't come up with a New Norwegian version of Office." Which brings up questions for Open Source developers: What's involved in translating programs? Is there a process that can be followed to make the inevitable easier? Is there a group providing guidelines for this already? -- Do you work in program translation? Step up and do tell."

24 of 303 comments (clear)

  1. That's why having resources in files is helpful by swissmonkey · · Score: 5, Informative

    Most Microsoft applications use the concept of resource to separate the text from the application, translating the application becomes then simply a matter of translating the strings in the resource and updating the binary.

    Linux has something similar by using the gettext() function.

    The hardest part is really translating correctly the text, taking into account the particularities of every language, the customs,... and obviously, keeping the translated version up to date.

    1. Re:That's why having resources in files is helpful by Anonymous Coward · · Score: 1, Informative

      Well, first of all, microsoft does not translate it's own titles. They pay others (like www.bowneglobal.com) to make that.
      And it's not an easy task, it's not just translating the string tables because for example translating from english to spanish "expands" the lengh of the text and that makes some dialogs look pretty bad. Now : resize those dialogs, check all hot-keys, check all localization issues (like time format, currency, etc), translation of the help files and docs, and a lot more things.
      And sometimes translation break some functions... you can figure out the rest.

    2. Re:That's why having resources in files is helpful by bokmann · · Score: 5, Informative

      I manage a project for the U.S. State Department that is translated into about a dozen languages.

      It is not as simple as just translating Strings, but that is probably the biggest part of it. You also have to be aware of Date formats in different locales, customs for displaying large numbers (some countries separate with commas, spaces, or even periods), currency display, and if your application does something with it, Units of Measure (such as feet, meters, miles, etc).

      There are even cultural sensitivities for icons - Think how often you see an icon in an application that is based on something like a Street Sign (like a stop sign). All of these have to be localizable.

      ISO has standards on all of these things, and it is hard to go wrong by sticking with standards.

      Java has beena big win for us here. Besides being able to keep all the strings out of the application and in Resource Bundles, it is aware of a bunch of 'locales', and when you set the locale, classes like Date just Do The Right Thing. The MessageFormatter also helps when you want to build sentences by suppliying words in the middle, but sentence structure changes from language to language.

      There are actually TWO different skills here:

      The first is called Internationalization (oftern abreviated I18n), and it involves all the skills necessary to write an application so it is neutral to cultural biases. All Strings in resource files, all messages composed with MessageFormetters, all Icons loaded from the filesystem and with a naming convention so they can be substituted in the future, and managing the layout of windows so that they 'grow' nicely when a 4 letter word gets subsituted by a 4 word phrase in another language.

      The second is called 'Localization', (L10N) and needs to occur for each Locale you are planning to customize your application for. This is best done by native language speakers who ALSO speak the language of the developers or domain experts. If the Internationalization was done right, then it just involves editing 'configuration', and no real coding.

  2. String tables. by autopr0n · · Score: 5, Informative

    Generaly, if a program is well-designed its not any harder to translate then a book, I mean, beyond issues of layout and the like.

    Generaly what you do is put all the text in a file or compiled-in resource called a string-table. Then you refrence strings by their ID in the program, rather then their literal. When you want to ship to a diffrent country, you just swap the string table. (Although, you would probably want to include lots of tables for switching locals on the fly)

    I'm certan microsoft uses this method with their software.

    --
    autopr0n is like, down and stuff.
    1. Re:String tables. by The+Bungi · · Score: 2, Informative
      No. "Locale" refers to localization (appropriately enough). It's "where you are". It controls things such as date formatting, currency symbols, etc. Codepage is (mostly) a fancy name for a character set. It defines how code points get translated to glyphs when they're rendered on screen or to a printer.

      The language ID (LANGID) is a combination of a primary language ID and a sublanguage ID. It defines one of the known combinations of specific locale/language pairs, like "Spanish [Nicaragua]" or "English [Australian]". Together with the SORTID (sorting identifier) and the LCID (locale ID) you can pretty much tell Windows what country you're in and what language you speak (and optionally if you use a different one for writing, as in EUC or Big5) and have everything look and behave correctly.

      I'm not sure how it works on Unix-ish OSes, but I assume it's pretty much the same.

  3. i18n for java by Anonymous Coward · · Score: 1, Informative
  4. Re:Boycotts work by snillfisk · · Score: 5, Informative

    While the point about buying from whomever decide to provide the software in their language is quite valid, there really isn't much software that support nynorsk - and even less that support the third language used in some northern parts of Norway, 'Lappish' or 'Samisk'. The main point here being that schools wasn't even going to *CONSIDER* buying MS software unless they got support for 'Nynorsk' in the software packages, and while it still remains up to each and single school to choose what software they want to use, it will still make sure that the 'Nynorsk' language gets preserved in those cases where they DO select to use Microsoft software. As the article also states, this may give hope to other "small" languages a bit more acceptance and usage, giving Catalan as an example.

    The trend in Norway is however quite the opposite, more and more schools are realizing that there is several good alternatives, Linux being one of them. Norway is (afaik) one of the few countries that has their own Linux distro just for schools - which support regular Norwegian, Nynorsk ("New Norwegian") and Samisk (Lappish). read more about it (in norwegian! :-)) here .. It's gotten support from the department of education and science and all the work are done on a volountarily basis. It's quite amazing to see that several schools now are switching and several others are considering the same.

    --
    mats
    One man's ceiling is another man's floor.
  5. Microsoft Linguistic Expertise by otisaardvark · · Score: 2, Informative
    Microsoft Research is pumping hefty money and brainpower into automated translation.

    For an example of the scale and progress of their projects, see here.

    Its all part of their huge research drive into Natural Language Processing. They do world-class research and have some great innovations to their name. Perhaps the one which will prove most useful is MindNet.

    Computational Linguistics is the BIG growth area, and it seems that Microsoft isn't going to miss the party.

  6. Re:My success... by Anonymous Coward · · Score: 3, Informative

    This may do for a small and simple program, but in the general case it is not good.

    Translating single words only works for buttons.
    You cannot translate words one-to-one and then use them in different places in the program.
    It may be that where in English the translation for "yes" and "no" can be used in different places, other languages would in certain contexts use words like "on" and "off" or "enabled" and "disabled" and your table will not be able to translate them unless you use a separate entry for each use.

    The use of %s etc will not work when more than one argument is present and the sequence of the arguments depends on the language you translate to.

  7. translations & OS by pamri · · Score: 5, Informative
    Which brings up questions for Open Source developers: What's involved in translating programs? Is there a process that can be followed to make the inevitable easier? Is there a group providing guidelines for this already? -- Do you work in program translation? Step up and do tell." Yes to all. Translating OS s/w is no big deal & doesn't require any programming skills. Kde & Gnome have great documentation, resources all neatly organised. So, I will let them do the talking:

    The GNOME Translation Project

    KDE i18n project

    Translation howto for kannada - This is a howto I wrote yesterday for people wanting to translate their language into kannada(an indian language spoken in karnataka). But the concept applies to all indian languages & other languages too to a certain extent. [OK, I confess some self interest is involved here :-)]

    Actually, kannada support came first on windows XP thanks to the karnataka govt support & since MS & Adobe developed opentype fonts(must for complexity of indian languages), but thanks to the Pango team, we hope to have support before MS does. And many state govts in India are also pressurising MS to bring Win XP in their languages and already bengali,hindi & tamil(kde is fully translated into tamil.) are in the works. But, we hope to set it right, soon.

  8. Your AWN Editor... by Niscenus · · Score: 2, Informative

    I would like to remind you that Karl Ove Hufthammer has been translating AbiWord into Nynorsk for some time.... Why doesn't someone point these things out much earlier!?

    --
    "Yeah...it was the numbers that were irrational, not the murderous cult of vegetarians...." -- Hippasus of Metapontum
  9. Re:Most Scandavians already speak good English by KjetilK · · Score: 5, Informative

    Having worked with many Scandanavians, I am truly impressed by their command of English

    Thanks! :-) (I'm Norwegian)

    But in any case, not having Norwegian Office is not as a big of a cripple to productivity as the article may lead you to think.

    Actually, this is bigger or smaller than that, depending on how you think about it.

    Norway has two official written languages "Bokmål" and "Nynorsk" (nb and nn in iso639 (?)). I would say that neither of them are spoken, we have an incredible richness of dialects here. A huge majority of the population writes nb. Office, and the rest of Windows has always been translated and been available for nb upon launch.

    nn and nb are almost identical. nb was highly influenced by Danish, as Norway was pretty much a colony under Denmark for a few hundred years, and the official language among the elites where Danish. So, I guy named Ivar Aasen collected dialects from certain parts of the country which he believed was less influenced by Danish and constructed a written language from it. This became the foundation for nn. The controversy over these two languages where high, I can tell you, but currently there are laws that keeps nn alive. For example, all books in public schools must be available in both languages, if you write a letter to a public office, that public office must respond in the same language.

    That may sound reasonable, but these two languages are so similar, that while high-school-students bitch and moan about how difficult the other is to learn, nobody with a minimum of intelligence can honestly claim to have difficulties reading the other.

    But MS have never found it commercially viable to translate Office to nn. That is quite understandable; my father is an author, and one of his books where translated to nn, that costed NOK 100000 (that's about $16000), and it sold two copies... (he wasn't the one who lost all this money, it was a public office to had to obey this law).

    So while I think that this law causes huge wastes of money, we free software geeks have been very happy about the events so far. We can point out that KDE and Mozilla have been available for nn before nb, I believe, because there are many good developers who write nn. So, it has given us a lot of good publicity, and some regional governmental offices has funded translation of OpenOffice to nn, and hopefully, the translation will be available before MS Office, again a big win.

    I think it is a part of the story that MS was becoming quite scared of the prospect of OO eating quite a lot of marketshare because of this. They have to keep a tight grip on the market, because if they loose some of the market to OO, and reports are positive, they will loose a lot more.

    Also, the figures quoted by MS for the cost of translating Office to nn has been huge. This has also given us some good publicity, because the funds we require to translate free software is far from that big. For one thing, this has illustrated that it is free as in speech that is the important aspect of free software, but experience has shown that usually, free as in speech software is cheeper to work with. Once people get experience with alternatives, things are sliding our way.

    To avoid flames by the Norwegian nn crowd, let me say that I have nothing against nn myself. I don't write it, but I appreciate reading it and I acknowledge that much of the finest Norwegian literature is written in nn. I'm opposed to laws that require people to write either of the languages however, but I think that if you write a letter in the language of your choice, you are entitled to expect the receiver to be so well educated that he can understand it.

    --
    Employee of Inrupt, Project Release Manager and Community Manager for Solid
  10. Re:Boycotts work by Dionysus · · Score: 3, Informative

    The Norwegian Linux distribution for school is still not released, though. They were planning to release it this year, but now it has been pushed back to second quarter of 2003 (if I remember correctly). I think http://www.digi.no/ had some more information about it.

    --
    Je ne parle pas francais.
  11. OpenOffice is already translated to Nynorsk by knuty · · Score: 2, Informative

    The user interface in OpenOffice[1] has already been translated to
    Nynorsk by The Linux for School project and tre regions in Norway. The
    total translation effort with quality insurance will take arround 4500
    hours. (some older project-info in English
    http://developer.skolelinux.no/projectinf o.html.en )

    Microsoft Norway tells one of the major newspapers[2] that The Linux
    for School project has nothing to do with the fact that the user
    interface in Office 11 will be translated into Nynorsk by the summer
    2003.

    MS Norway told Norsk mållag (an organisation which promote norwegian
    language) in april 2000 that translating would cost 30.000.000
    norwegian kroner (4.100.000 Euro). After som debate MS told that
    translating would cost 10.000.000 NOK (1.370.000 Euro). Translation
    will cost around 2-3.000.000 NOK (275.000-412.000 Euro) was the
    message when Microsoft announced they should translate the user
    interface in Office 11 to Nynorsk 5. nov 2002.

    Gaute Hvoslef Kvalnes, the main translator of KDE to Nynorsk, are
    altso working full time whith translating OpenOffice to Nynorsk. In
    may 2000 Gaute was rewarded with a price (Flower of Dialect) for his
    voluntary work for the norwegian language from Norsk mållag.

    [1] http://www.openofficeorg.no/
    [2] http://www.aftenposten.no/nyheter/nett/article.jht ml?articleID=429959
    [3] http://developer.skolelinux.no/openoffice/

  12. .NET ASP i18n by HawaiianGeek · · Score: 3, Informative

    If you have used the Visual Studio method of resource strings for i18n and you are moving to .NET I would strongly recommend you review how i18n resources work in .NET before you get into your project. The paradigm has changed, especially if you have multiple threads in a worker pool.
    (Stop Reading because the Microsoft sales force has now taken over my brain...)
    Resource Strings in .Net always have fallbacks. So in the above case the users thread would first ask for the Bokmal(nb-NO) version of the resource and if it wasn't there it would then fallback to the Norwegian (NO) version of the string and then fallback to my default resource file. (English en for me).
    (more marketing BS...)
    If this were my .Net app and I already had a Norwegian (NO) resource file (resmain.no.resx - a plain text XML file) I would copy the file to resmain.nb-NO.resx (Bokmal) and another copy as resmain.nn-NO.resx (Nynorsk). You can then pick and choose which resources you actually want to be different between them.
    FYI:
    no = Norwegian (x0014) (20)
    nb-NO = Norwegian Bokmal (x0414) (1044)
    nn-NO = Norwegian Nynorsk (x0814) (2068)

  13. Re:Boycotts work by vidnet · · Score: 3, Informative
    it will still make sure that the 'Nynorsk' language gets preserved in those cases where they DO select to use Microsoft software

    Indeed. They're required by law to do so, by 9-4 of the law on education:

    9-4. Books and other teaching aides
    In subjects other than Norwegian, one can only use books and other teaching aides that are available in bokmål ["norwegian"] and nynorsk ["new norwegian"] at the same time and same price.

  14. A similar thing happened in Iceland... by neoptik · · Score: 2, Informative
    I seem to recall a few years back that the Icelandic governement had petitioned Microsoft to translate Office, IE, and Windows into Icelandic and that Microsoft basically didn't give a hoot. This doesn't really surprise me, because the population of Iceland is under 300,000. In response to the lack of action taken by Microsoft, I think the KDE team went ahead and translated most of KDE and the KApps into Icelandic.


    Here's the first google result on the Microsoft refusal to translate:
    http://www.informationcity.org/telecom -cities/arch ive/old/0885.html

    --
    I dont have a .sig just yet.
  15. Re:My success... by VZ · · Score: 2, Informative

    Congratulations, you have just reinvented (a small part of) GNU gettext package! Seriously, why not just use existing and much better solutions? For the record, gettext works just fine under Win32 and Mac and you don't have any licensing issues with using its message catalogs.

  16. There ARE excellent OS alternatives by Tete-a-tete · · Score: 2, Informative

    The Skolelinux project is a major effort to provide office and other software in both versions of Norwegian as well as in the minority language of Northern Sami.

    In addition it will provide a very ambitious Debian Woody based thin client school network with a lot of network services. Somewhat similar to the K12LTSP project.

  17. Mozilla Project by asdavis · · Score: 2, Informative

    Take a look at Mozilla i18n & L10n Guidlines and Netscape ToolCool. These projects allow mozilla to be localized without recompilation of binaries. Local language data is kept in a seperate data store that the application can pull from. Translating the app is just a matter of adding the language to the database. Seems logical and simple.

    --
    TECMATIC - Intelligent Technology News
  18. Re:Amount of work involved? by jkroll · · Score: 2, Informative

    Just out of curiousity, how much work is involved in translating, say, KDE? Looking at the stats for translation status in the KDE GUI [kde.org], it looks as though there are about 53,300 phrases (?) that need to be translated into any given language. Now, my question is, how many of those are repeats?

    One other important thing to realize, is that just because the word "File" is used in several places, in some other languages a different word may be required based on the usage context.

    As far as work effort is required, it is very tedious and difficult even for a human translator unless the development team has put a lot of effort into it. For example, if all you have is an isolated word which has several different meanings in English - now you really need to see how it is being used in the application to make the correct translation. What really needs to be provided to translators is the text to translate, limits as to how long the translated string can be (if applicable), and a description of how/when the phrase or word is used.

    Then the next problem is that virtually all of your developers/testers are not fluent in the translated language and have no way of determining the accuracy of the translated text. Another problem is that there are numerous differing dialects of several common languages. Both of these problems can make your product look bad in the eyes of a customer who uses it in a different region / language than the original development team used.

  19. Re:The problems I encountered with a translation by Anonymous Coward · · Score: 1, Informative

    The translation of strings on LiveJournal.com had some similar problems.

    One I remember specifically is the text labels used on time quantities. LiveJournal has a function which returns a pretty time specification based on a number of seconds, such as "1 minute", "2 years" and so on. That particular function became a bit of a translation nightmare, with languages with three different kinds of plurals and other complications.

    Also, many of the forms on LiveJournal return a "Success!" message when they are complete, but at least one language didn't have a generic word for success, so the word "Success" has been included in the translation system lots of times so that such languages can say "Entry Posted Successfully!" and other such things.

    More recently we've been working on the new style/template system which, unlike the old one, is designed with multiple languages in mind. In this case, since the new style system uses a procedural programming language, the translation support was significantly easier since the translation layers can include their own logic where necessary.

    Software translation poses some interesting problems, indeed.

  20. Brief Background on Nynorsk... by Joey7F · · Score: 5, Informative

    IANAN (I am not a Norwegian):

    Til Nordmenn: Fordi jeg er ikke en nordmenn rettelse alt at er feil :)

    For those that aren't up on Norwegian linguistics, (not that I am a scholar or anything ;)) Norway has two languages that are almost identical: Bokmaal and Nynorsk. The first is practically a clone of Danish. Nynorsk rose from Norwegian Nationalism and Ivar Aasen when they received independence from Sweden in the early 20th century. It is like someone made a language out of English dialects. It is supposed to be closer to what Vikings spoke (though Icelandic would be a better representation). Most Norwegians write in Bokmaal but the Nynorsk contingent is very adamant about official and equal representation of their brand of Norwegian.

    What is ironic is most of the words are exactly the same or so similar that anyone who is proficient can read both. A few examples follow:

    Norge Noreg
    Jeg Eg

    It is important because both languages are treated equally, but it is mostly irrelevant because they are so similar.

    --Joey

  21. Re:Missing the point by dvdeug · · Score: 3, Informative

    Unicode is a MBCS, and is entirely suitable for Japanese. You have to set up input and output filters (using iconv or some similar application) for the locale charset, just like any other locale. If you use GTK or Qt or Java, you don't have an option - you can use Unicode without much problem, and can't use other charsets.

    If you use GTK, it will automatically flip dialog boxes R-to-L. As for vertical presentation - who's doing vertical presentation? I don't know of anyone who supports vertical presentation on dialog boxes; the Chinese, who traditionally write vertically, are happy with L-to-R in a computer situation, and the Mongolians, the only other people I know of who write vertically, tend to use Cyrillic or at least write traditional Mongolian horizontally in a computer situation. With all due respect to the Mongolians, if they're the only people who may use vertical writing systems in computers, I don't think vertical presentation is the most important thing to worry about.

    You don't have to rewrite the world; a lot of this stuff's already been done in the standard toolkits.