Microsoft Forced To Translate Office Into Nynorsk
An anonymous reader writes "Beeb reports, "The main organisation working for the Nynorsk language got most of Norway's high schools to threaten to boycott all Microsoft software if they didn't come up with a New Norwegian version of Office." Which brings up questions for Open Source developers: What's involved in translating programs? Is there a process that can be followed to make the inevitable easier? Is there a group providing guidelines for this already? -- Do you work in program translation? Step up and do tell."
Most Microsoft applications use the concept of resource to separate the text from the application, translating the application becomes then simply a matter of translating the strings in the resource and updating the binary.
Linux has something similar by using the gettext() function.
The hardest part is really translating correctly the text, taking into account the particularities of every language, the customs,... and obviously, keeping the translated version up to date.
Generaly, if a program is well-designed its not any harder to translate then a book, I mean, beyond issues of layout and the like.
Generaly what you do is put all the text in a file or compiled-in resource called a string-table. Then you refrence strings by their ID in the program, rather then their literal. When you want to ship to a diffrent country, you just swap the string table. (Although, you would probably want to include lots of tables for switching locals on the fly)
I'm certan microsoft uses this method with their software.
autopr0n is like, down and stuff.
For java developers.
While the point about buying from whomever decide to provide the software in their language is quite valid, there really isn't much software that support nynorsk - and even less that support the third language used in some northern parts of Norway, 'Lappish' or 'Samisk'. The main point here being that schools wasn't even going to *CONSIDER* buying MS software unless they got support for 'Nynorsk' in the software packages, and while it still remains up to each and single school to choose what software they want to use, it will still make sure that the 'Nynorsk' language gets preserved in those cases where they DO select to use Microsoft software. As the article also states, this may give hope to other "small" languages a bit more acceptance and usage, giving Catalan as an example.
:-)) here .. It's gotten support from the department of education and science and all the work are done on a volountarily basis. It's quite amazing to see that several schools now are switching and several others are considering the same.
The trend in Norway is however quite the opposite, more and more schools are realizing that there is several good alternatives, Linux being one of them. Norway is (afaik) one of the few countries that has their own Linux distro just for schools - which support regular Norwegian, Nynorsk ("New Norwegian") and Samisk (Lappish). read more about it (in norwegian!
mats
One man's ceiling is another man's floor.
For an example of the scale and progress of their projects, see here.
Its all part of their huge research drive into Natural Language Processing. They do world-class research and have some great innovations to their name. Perhaps the one which will prove most useful is MindNet.
Computational Linguistics is the BIG growth area, and it seems that Microsoft isn't going to miss the party.
This may do for a small and simple program, but in the general case it is not good.
Translating single words only works for buttons.
You cannot translate words one-to-one and then use them in different places in the program.
It may be that where in English the translation for "yes" and "no" can be used in different places, other languages would in certain contexts use words like "on" and "off" or "enabled" and "disabled" and your table will not be able to translate them unless you use a separate entry for each use.
The use of %s etc will not work when more than one argument is present and the sequence of the arguments depends on the language you translate to.
The GNOME Translation Project
KDE i18n project
Translation howto for kannada - This is a howto I wrote yesterday for people wanting to translate their language into kannada(an indian language spoken in karnataka). But the concept applies to all indian languages & other languages too to a certain extent. [OK, I confess some self interest is involved here :-)]
Actually, kannada support came first on windows XP thanks to the karnataka govt support & since MS & Adobe developed opentype fonts(must for complexity of indian languages), but thanks to the Pango team, we hope to have support before MS does. And many state govts in India are also pressurising MS to bring Win XP in their languages and already bengali,hindi & tamil(kde is fully translated into tamil.) are in the works. But, we hope to set it right, soon.
I would like to remind you that Karl Ove Hufthammer has been translating AbiWord into Nynorsk for some time.... Why doesn't someone point these things out much earlier!?
"Yeah...it was the numbers that were irrational, not the murderous cult of vegetarians...." -- Hippasus of Metapontum
Thanks! :-) (I'm Norwegian)
Actually, this is bigger or smaller than that, depending on how you think about it.
Norway has two official written languages "Bokmål" and "Nynorsk" (nb and nn in iso639 (?)). I would say that neither of them are spoken, we have an incredible richness of dialects here. A huge majority of the population writes nb. Office, and the rest of Windows has always been translated and been available for nb upon launch.
nn and nb are almost identical. nb was highly influenced by Danish, as Norway was pretty much a colony under Denmark for a few hundred years, and the official language among the elites where Danish. So, I guy named Ivar Aasen collected dialects from certain parts of the country which he believed was less influenced by Danish and constructed a written language from it. This became the foundation for nn. The controversy over these two languages where high, I can tell you, but currently there are laws that keeps nn alive. For example, all books in public schools must be available in both languages, if you write a letter to a public office, that public office must respond in the same language.
That may sound reasonable, but these two languages are so similar, that while high-school-students bitch and moan about how difficult the other is to learn, nobody with a minimum of intelligence can honestly claim to have difficulties reading the other.
But MS have never found it commercially viable to translate Office to nn. That is quite understandable; my father is an author, and one of his books where translated to nn, that costed NOK 100000 (that's about $16000), and it sold two copies... (he wasn't the one who lost all this money, it was a public office to had to obey this law).
So while I think that this law causes huge wastes of money, we free software geeks have been very happy about the events so far. We can point out that KDE and Mozilla have been available for nn before nb, I believe, because there are many good developers who write nn. So, it has given us a lot of good publicity, and some regional governmental offices has funded translation of OpenOffice to nn, and hopefully, the translation will be available before MS Office, again a big win.
I think it is a part of the story that MS was becoming quite scared of the prospect of OO eating quite a lot of marketshare because of this. They have to keep a tight grip on the market, because if they loose some of the market to OO, and reports are positive, they will loose a lot more.
Also, the figures quoted by MS for the cost of translating Office to nn has been huge. This has also given us some good publicity, because the funds we require to translate free software is far from that big. For one thing, this has illustrated that it is free as in speech that is the important aspect of free software, but experience has shown that usually, free as in speech software is cheeper to work with. Once people get experience with alternatives, things are sliding our way.
To avoid flames by the Norwegian nn crowd, let me say that I have nothing against nn myself. I don't write it, but I appreciate reading it and I acknowledge that much of the finest Norwegian literature is written in nn. I'm opposed to laws that require people to write either of the languages however, but I think that if you write a letter in the language of your choice, you are entitled to expect the receiver to be so well educated that he can understand it.
Employee of Inrupt, Project Release Manager and Community Manager for Solid
The Norwegian Linux distribution for school is still not released, though. They were planning to release it this year, but now it has been pushed back to second quarter of 2003 (if I remember correctly). I think http://www.digi.no/ had some more information about it.
Je ne parle pas francais.
The user interface in OpenOffice[1] has already been translated tof o.html.en )
t ml?articleID=429959
Nynorsk by The Linux for School project and tre regions in Norway. The
total translation effort with quality insurance will take arround 4500
hours. (some older project-info in English
http://developer.skolelinux.no/projectin
Microsoft Norway tells one of the major newspapers[2] that The Linux
for School project has nothing to do with the fact that the user
interface in Office 11 will be translated into Nynorsk by the summer
2003.
MS Norway told Norsk mållag (an organisation which promote norwegian
language) in april 2000 that translating would cost 30.000.000
norwegian kroner (4.100.000 Euro). After som debate MS told that
translating would cost 10.000.000 NOK (1.370.000 Euro). Translation
will cost around 2-3.000.000 NOK (275.000-412.000 Euro) was the
message when Microsoft announced they should translate the user
interface in Office 11 to Nynorsk 5. nov 2002.
Gaute Hvoslef Kvalnes, the main translator of KDE to Nynorsk, are
altso working full time whith translating OpenOffice to Nynorsk. In
may 2000 Gaute was rewarded with a price (Flower of Dialect) for his
voluntary work for the norwegian language from Norsk mållag.
[1] http://www.openofficeorg.no/
[2] http://www.aftenposten.no/nyheter/nett/article.jh
[3] http://developer.skolelinux.no/openoffice/
If you have used the Visual Studio method of resource strings for i18n and you are moving to .NET I would strongly recommend you review how i18n resources work in .NET before you get into your project. The paradigm has changed, especially if you have multiple threads in a worker pool.
.Net always have fallbacks. So in the above case the users thread would first ask for the Bokmal(nb-NO) version of the resource and if it wasn't there it would then fallback to the Norwegian (NO) version of the string and then fallback to my default resource file. (English en for me).
.Net app and I already had a Norwegian (NO) resource file (resmain.no.resx - a plain text XML file) I would copy the file to resmain.nb-NO.resx (Bokmal) and another copy as resmain.nn-NO.resx (Nynorsk). You can then pick and choose which resources you actually want to be different between them.
(Stop Reading because the Microsoft sales force has now taken over my brain...)
Resource Strings in
(more marketing BS...)
If this were my
FYI:
no = Norwegian (x0014) (20)
nb-NO = Norwegian Bokmal (x0414) (1044)
nn-NO = Norwegian Nynorsk (x0814) (2068)
Indeed. They're required by law to do so, by 9-4 of the law on education:
9-4. Books and other teaching aides
In subjects other than Norwegian, one can only use books and other teaching aides that are available in bokmål ["norwegian"] and nynorsk ["new norwegian"] at the same time and same price.
Here's the first google result on the Microsoft refusal to translate:
http://www.informationcity.org/teleco
I dont have a
Congratulations, you have just reinvented (a small part of) GNU gettext package! Seriously, why not just use existing and much better solutions? For the record, gettext works just fine under Win32 and Mac and you don't have any licensing issues with using its message catalogs.
The Skolelinux project is a major effort to provide office and other software in both versions of Norwegian as well as in the minority language of Northern Sami.
In addition it will provide a very ambitious Debian Woody based thin client school network with a lot of network services. Somewhat similar to the K12LTSP project.
Take a look at Mozilla i18n & L10n Guidlines and Netscape ToolCool. These projects allow mozilla to be localized without recompilation of binaries. Local language data is kept in a seperate data store that the application can pull from. Translating the app is just a matter of adding the language to the database. Seems logical and simple.
TECMATIC - Intelligent Technology News
Just out of curiousity, how much work is involved in translating, say, KDE? Looking at the stats for translation status in the KDE GUI [kde.org], it looks as though there are about 53,300 phrases (?) that need to be translated into any given language. Now, my question is, how many of those are repeats?
One other important thing to realize, is that just because the word "File" is used in several places, in some other languages a different word may be required based on the usage context.
As far as work effort is required, it is very tedious and difficult even for a human translator unless the development team has put a lot of effort into it. For example, if all you have is an isolated word which has several different meanings in English - now you really need to see how it is being used in the application to make the correct translation. What really needs to be provided to translators is the text to translate, limits as to how long the translated string can be (if applicable), and a description of how/when the phrase or word is used.
Then the next problem is that virtually all of your developers/testers are not fluent in the translated language and have no way of determining the accuracy of the translated text. Another problem is that there are numerous differing dialects of several common languages. Both of these problems can make your product look bad in the eyes of a customer who uses it in a different region / language than the original development team used.
The translation of strings on LiveJournal.com had some similar problems.
One I remember specifically is the text labels used on time quantities. LiveJournal has a function which returns a pretty time specification based on a number of seconds, such as "1 minute", "2 years" and so on. That particular function became a bit of a translation nightmare, with languages with three different kinds of plurals and other complications.
Also, many of the forms on LiveJournal return a "Success!" message when they are complete, but at least one language didn't have a generic word for success, so the word "Success" has been included in the translation system lots of times so that such languages can say "Entry Posted Successfully!" and other such things.
More recently we've been working on the new style/template system which, unlike the old one, is designed with multiple languages in mind. In this case, since the new style system uses a procedural programming language, the translation support was significantly easier since the translation layers can include their own logic where necessary.
Software translation poses some interesting problems, indeed.
IANAN (I am not a Norwegian):
:)
;)) Norway has two languages that are almost identical: Bokmaal and Nynorsk. The first is practically a clone of Danish. Nynorsk rose from Norwegian Nationalism and Ivar Aasen when they received independence from Sweden in the early 20th century. It is like someone made a language out of English dialects. It is supposed to be closer to what Vikings spoke (though Icelandic would be a better representation). Most Norwegians write in Bokmaal but the Nynorsk contingent is very adamant about official and equal representation of their brand of Norwegian.
Til Nordmenn: Fordi jeg er ikke en nordmenn rettelse alt at er feil
For those that aren't up on Norwegian linguistics, (not that I am a scholar or anything
What is ironic is most of the words are exactly the same or so similar that anyone who is proficient can read both. A few examples follow:
Norge Noreg
Jeg Eg
It is important because both languages are treated equally, but it is mostly irrelevant because they are so similar.
--Joey
Unicode is a MBCS, and is entirely suitable for Japanese. You have to set up input and output filters (using iconv or some similar application) for the locale charset, just like any other locale. If you use GTK or Qt or Java, you don't have an option - you can use Unicode without much problem, and can't use other charsets.
If you use GTK, it will automatically flip dialog boxes R-to-L. As for vertical presentation - who's doing vertical presentation? I don't know of anyone who supports vertical presentation on dialog boxes; the Chinese, who traditionally write vertically, are happy with L-to-R in a computer situation, and the Mongolians, the only other people I know of who write vertically, tend to use Cyrillic or at least write traditional Mongolian horizontally in a computer situation. With all due respect to the Mongolians, if they're the only people who may use vertical writing systems in computers, I don't think vertical presentation is the most important thing to worry about.
You don't have to rewrite the world; a lot of this stuff's already been done in the standard toolkits.