Microsoft Forced To Translate Office Into Nynorsk
An anonymous reader writes "Beeb reports, "The main organisation working for the Nynorsk language got most of Norway's high schools to threaten to boycott all Microsoft software if they didn't come up with a New Norwegian version of Office." Which brings up questions for Open Source developers: What's involved in translating programs? Is there a process that can be followed to make the inevitable easier? Is there a group providing guidelines for this already? -- Do you work in program translation? Step up and do tell."
So I'm boycotting Microsoft, too, until they release Office for *nix.
In a sense, though, this is kind of what is supposed to happen with big customers.
But it is sad that the emphasis seemed to be getting MS software. They should have bought from whomever decided to provide the software in their language.
Oh well.
Most Microsoft applications use the concept of resource to separate the text from the application, translating the application becomes then simply a matter of translating the strings in the resource and updating the binary.
Linux has something similar by using the gettext() function.
The hardest part is really translating correctly the text, taking into account the particularities of every language, the customs,... and obviously, keeping the translated version up to date.
Quite simply, keep all your text in a seperate file which can be compiled completely seperately from the rest of your project. The goes for Dialogs, Menus, and Labels. This primarily makes it easier to allow users to switch from one language to another. :)
There really isnt that much that can be done other than that. What do you want us to say? Break your descriptions into simple enough language that some automatic translator can spit something out? I dont think so. Your best bet is to just keep all your text in one place, [aside from debugging messages or other things that the user is never supposed to see] so you won't have to go looking around for[and potentially miss] it when the time comes. Don't you hate it when the whole program is translated except for the one error message that it keeps giving you?
Of course documentation is a different story. Nothing you can do there except keep everything very well documented so that there will be less confusion in translation. If it's a complete idea instead of a quick phrase thrown out, it's more likely to be translated correctly.
-- 'The' Lord and Master Bitman On High, Master Of All
Generaly, if a program is well-designed its not any harder to translate then a book, I mean, beyond issues of layout and the like.
Generaly what you do is put all the text in a file or compiled-in resource called a string-table. Then you refrence strings by their ID in the program, rather then their literal. When you want to ship to a diffrent country, you just swap the string table. (Although, you would probably want to include lots of tables for switching locals on the fly)
I'm certan microsoft uses this method with their software.
autopr0n is like, down and stuff.
I write a program to be translated into 5 languages. Fortunately, all were off the ASCII set, so no multi-byte char issues were present.
.dic file in, have a dialog that at runtime looks for .dic files, and you're done.
I came up with a enum file that held lines like:
enum phrases{
IDL_YES=0,
IDL_NO,
IDL_MAX_PHRASES};
Then a file for each language:
English.dic:
Yes
No
Spanish.dic:
Si'
No
etc... At runtime it loaded the last language configured or defaulted to English.
I also added support so you could use %s, %d, %x etc, so you can use them in sprintfs. It worked damn well. No need to re-compile. Just drop another
It worked extremely well. The only thing it coulf ever ned was milti-byte support, but as I said before that was not a requirement.
PLEASE PLEASE stay waway from the way that MS Dev Studio does it. It sucks ass.
Incedentally, the same class (I used a class when I could use C++) also works well for handling various dialects of SQL. MSSQLServer.dic, PostgreSQL.dic, etc....
Very simple and fast.
The only pain is that you have to come up with a unique IDL_name for each string. I'd like to have an associateive array so you could say
IDL("Yes") and have that translated. That was the next setp for me, but I never got the time to do that.
Hope that helps!
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
Norway has two official languages.. the one used by the majority of the people, called bokmål, and then another one called nynorsk. Not that they are two seperate languages or anything.. sort of like the difference between british english and american english, only a little more. This is because we were for quite a time, many years ago, in a union with denmark, and when the union broke, many norwegians felt they needed something that would seperate them a little from denmark (as denmark had been the bigger brother in the union, so to speak). Ivar Aasen roamed the countryside and created a new language on the basis of the many dialects norwegians spoke throughout the country.. this was the birth of nynorsk. However, nynorsk never prevailed, and now we're stuck with two languages.. much to the dismay of many norwegian students, because although very, very few speak nynorsk in the big cities, you still have to have exams in both different languages.. in some areas though, many speak nynorsk.. or at least close to it.. no one really speaks as they write bokmål and nynorsk. Close, but not quite.
Think about it... they want software in their language, and it's not available. So...
- If it's closed source (MS Office), don't buy something you don't want, and tell the company what you do want. It's called "market pressure."
- If the sofware is open source, you can translate it yourself -- and likely have working, native language software faster than a closed-source solution.
This is news because they managed to get Microsoft to support a language (spoken | written | read) by (relatively) few people. The only reason Microsoft probably even paid any attention to them was the threat they'd teach the children anything but Microsoft products.Would this have happened in the absence of open source? I doubt it. I guess that means open source is working. (Strange way for it to 'work' though...)
"...America's great minds of today, teaching America's great minds of tomorrow. Poor bastards." -- A Beautiful Min
"Microsoft Forced To Translate Office Into Nyorsk"
Did anyone else read this and instantly think that some judge on the antitrust case had been hitting the eggnog way hard when he handed out this 'pentalty'?
What's involved in translating programs?
:(
It's not just as simple as translation from English to some-other-language. It involves new character set, input method and association helpers, language-specific formatting etc. In the case of Chinese version, they even have to deal with different encoding methods support in one product.
As a developer I always find merely I18N support in Linux not enough to deal with all the language-specific problems. We've very little choice here. I can understand that without commercial drive it's very difficult to develop a language-specific product. E.g. majority of the fontset we need are not free.
I read some years ago that Microsoft refused to make a 'nynorsk' version due to the high development cost. $3 million they claimed. A high price compared to the income they could expect returned from the small minorty that use 'nynorsk' in Norway.
This price seemed a bit to much for me. Gramaticaly the two norwegian written langauges differ little in actual grammar and sentence building. So word by word replacement should do most of the trick.
KDE and Gnome and their office like replacement apps have been available in both languages for a long time.
Guess the threat of working open source alternatives has forced MS into submition
An opensource project called Skolelinux (School Linux) is on its way to create a replacement for Windows for use in norwegian schools. Threatning the current MS monopoly one norways educational system.
I'm sure the Norwegians can handle the English version of Office just fine.
Having worked with many Scandanavians, I am truly impressed by their command of English -- many people from Norway, Sweden, Denmark, speak it better than many US people do, and definitely better than people from any other (non-native English speaking) country.
I think the fluency in English for Scandanavians arises from the similarity of English to the Scandavian languages, so picking it up is natural, much more so than other European languages, and of course, better than any non-Western language.
But in any case, not having Norwegian Office is not as a big of a cripple to productivity as the article may lead you to think.
There's 10 types of people in this world, those who understand binary and those who don't.
The GNOME Translation Project
KDE i18n project
Translation howto for kannada - This is a howto I wrote yesterday for people wanting to translate their language into kannada(an indian language spoken in karnataka). But the concept applies to all indian languages & other languages too to a certain extent. [OK, I confess some self interest is involved here :-)]
Actually, kannada support came first on windows XP thanks to the karnataka govt support & since MS & Adobe developed opentype fonts(must for complexity of indian languages), but thanks to the Pango team, we hope to have support before MS does. And many state govts in India are also pressurising MS to bring Win XP in their languages and already bengali,hindi & tamil(kde is fully translated into tamil.) are in the works. But, we hope to set it right, soon.
Microsoft are ignoring a very large part of their users, mainly script kiddies.
All 13 year olds should boycott them until windoze is translated into 313375p34k!
(At least that'd get rid of the DDoS attacks on IRC Networks)
Well the situation in Norway is quite interesting, because there is already a switch from Microsoft licenses to Linux in the education system. In fact, the state has sponsored a project called "Skolelinux" (SchoolLinux), where Norwegian/Nynorsk/Same language editions are being made based on the Debian operating system. One of the reasons why it was started was obviously the lowered costs, but also the ability to have more native language output. The site is at www.skolelinux.no but I think it's only in Norwegian...
The Welkin: Online Music Reviews
To fully support all languages, including Asian, there really is no alternative - the UNICODE format. That, and sticking to the use of tables for strings, menus etc.
One of the major correct things Microsoft did some time ago was realize this - hence for most of their products a different resource file is all that's needed to support another language (I'm ignoring help files etc.). IMHO, it's a great pity that the Linux system didn't realize this earlier (especially as it was written in a non English language country).
Since I'm currently working in China, this has become a very important issue, more so to me because I am designing a natural language scripting tool that has to understand both Chinese characters and syntax. Whilst we may find some translations by the Chinese into English funny, it's just because English (to them) is as foreign as Chinese is to us. All of us English speakers should realize that just because C/C++/Python etc. make sense to us, they don't to others. It's just not reasonable to say, well, if you want to learn programming, then you must learn English first.
When you translate an application it is not just translating text strings in it. You also obviously need to update documentation, online help, etc. This, as a lot of people have pointed out, is "simply" a matter of changing text strings that are external the the main source code, and referenced by the application throughout the code.
However, as well as translating text to another language, there is a lot more work to be done. Images in the interface may need to be changed, sounds used in the application, etc, may also need to be modified for the appropriate localisation. The entire user interface must be examined for culturally specific items and they need to be modified for the appropriate target market.
To allow for localisation, an application should be internationalised as it is written. How this is best accomplished is determined by the Operating System you're writing for. Most operating systems will have internationalisation features to some extent.
For example, applications written using Cocoa for Mac OS X are easily designed for localisation at a later date. Looking inside any Mac OS X Cocoa (and some Carbon applications that use packages) you will see folders named "English.lproj", "French.lproj", etc (inside Contents/Resources). These folders are how Mac OS X can automatically localise things. Any application written using the guidelines posted by Apple is ready to be localised without any changes to the code. All that needs to happen is the modifications to the interface resource files, this can include changing the complete layout of dialog boxes, as well as simple translation of text strings.
Overall, any application should be coded as if it will be internationalised. Even if you do not intend to do internationalisation, it enforces separation between the code and the interface and resources, which is almost always a good idea.
Language packs. Have each prompt and piece of text be dynamically linked to an external language link. Either integratable at compile time, in which a simple copying of a new language pack then recompile will do you, or just have it do it on the fly. I know this is being done on several projects, including the emulator Kawaks...
Non impediti ratione cogitationus.
If you have used the Visual Studio method of resource strings for i18n and you are moving to .NET I would strongly recommend you review how i18n resources work in .NET before you get into your project. The paradigm has changed, especially if you have multiple threads in a worker pool.
.Net always have fallbacks. So in the above case the users thread would first ask for the Bokmal(nb-NO) version of the resource and if it wasn't there it would then fallback to the Norwegian (NO) version of the string and then fallback to my default resource file. (English en for me).
.Net app and I already had a Norwegian (NO) resource file (resmain.no.resx - a plain text XML file) I would copy the file to resmain.nb-NO.resx (Bokmal) and another copy as resmain.nn-NO.resx (Nynorsk). You can then pick and choose which resources you actually want to be different between them.
(Stop Reading because the Microsoft sales force has now taken over my brain...)
Resource Strings in
(more marketing BS...)
If this were my
FYI:
no = Norwegian (x0014) (20)
nb-NO = Norwegian Bokmal (x0414) (1044)
nn-NO = Norwegian Nynorsk (x0814) (2068)
I translated Uropa 2 - The Ulterior Colony, an Amiga game, to Swedish on behalf of Vulcan Software.
One thing that I seem to remember causing problems was that occasionally, there were individual words in the separate translation file that were sometimes reused in multiple places, with assumptions being made about where that could happen based on what works in the English language. That is as definite no-no. Don't assume that an English word which can mean several things also has an identical word in a foreign language.
Also, don't assume that foreign languages have an easy way to change between singular and plural or that as in English, there is only one article for all nouns.
In conclusion, always give the translator the option to choose the exact wording based on the context -- even if that means that the English (or whichever is the original language of your software) version of the resource file has many words duplicated. What works in one place may not work in another, even if that is the case with your language.
Comments from GNOME knowledable people is also welcome--does GNOME have a similar page of statistics on translations as KDE?
I was a tester on Ericsson's first smart phone project.
Although they approached the problem of enabling easy translation of displayed strings by using resource files, etc (this was enabled by the Symbion OS, which strongly encourages such practice), we ran into two major problems:
1. Buffer over/underruns -- if a programmer had created a string (e.g., menu), they would allocate four characters to store that string, but often the German equivalent would be, say, 50 characters, which would cause a crash.
2. The smart phone had a relatively small screen (compared to a PC). The UI designers were working in English and designed the entire UI using English words. They didn't pay enough attention to the fact that translation would be required. For languages that tend to have longer words than English (e.g. German), this caused significant problems. These translations wouldn't fit in the allocated space, and the screen would be cluttered with text.
It would be nice to see software engineers working on UI toolkits to take problems like this into account. Ideally, applications (and GUI toolkits) should be designed in a language-neutral way. Application programmers, who typically think in terms of logic and who strive for elegance, aren't really the best sort of people to be considering language translation. It would be desirable for GUI toolkits to degrade gracefully when presented with text that doesn't fit the UI design and which does not let programmers make the buffer over/underrun mistake. It would seem likely that such a framework exists, but it doesn't seem to be ubiquitous.
"The noble art of losing face will one day save the human race"---Hans Blix
Except for software that actually processes words where the algorithms are geared for English, e.g., word processors (word selection for non-Roman languages or those that go right-to-left), search engines (the Porter word-stemming algorithm).
If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.
Consider my ongoing project to translate Nethack into something resembling Spanish. Gettext offers support for plurals, but not for gender; it provides no way to make sure that a blessed sword (espada) is bendita, but a blessed helmet (yelmo) is bendito. Languages with noun cases, such as German, Finnish, and Russian, have an additional problem: a monster is a subject when it hits you and a direct object when you hit it. Furthermore, sometimes Nethack must parse user input, as when making a wish, and differing word order and words with more than one meaning create lots of pitfalls there. Finally, Nethack is laden with jokes and puns, and many of these don't survive translation.
Ooh, moderator points! Five more idjits go to Minus One Hell!
Delendae sunt RIAA, MPAA et Windoze
IANAN (I am not a Norwegian):
:)
;)) Norway has two languages that are almost identical: Bokmaal and Nynorsk. The first is practically a clone of Danish. Nynorsk rose from Norwegian Nationalism and Ivar Aasen when they received independence from Sweden in the early 20th century. It is like someone made a language out of English dialects. It is supposed to be closer to what Vikings spoke (though Icelandic would be a better representation). Most Norwegians write in Bokmaal but the Nynorsk contingent is very adamant about official and equal representation of their brand of Norwegian.
Til Nordmenn: Fordi jeg er ikke en nordmenn rettelse alt at er feil
For those that aren't up on Norwegian linguistics, (not that I am a scholar or anything
What is ironic is most of the words are exactly the same or so similar that anyone who is proficient can read both. A few examples follow:
Norge Noreg
Jeg Eg
It is important because both languages are treated equally, but it is mostly irrelevant because they are so similar.
--Joey
Most people would also be surprised to know that the largest english speaking country is China.
Not in any meaningful sense. Chinese speak Chinese to each other. Even if over 25% of the Chinese population speaks some English, that doesn't mean they speak fluent English, or that they could read or write something of moderate complexity without a dictionary.
America makes up a very small part of the total english speaking world.
Well, America makes up almost 300 million people. Even assuming everyone in the world speaks English, that's still 5%; and while a lot of the world speaks English, a country aren't really part of the "english speaking world" until they primarily speak and write English. So Australia, New Zealand, U.K., Ireland, U.S., Canada, and to some extent India and Africa. Of the solidly English speaking countries, the U.S. is the largest.
Unicode is a MBCS, and is entirely suitable for Japanese. You have to set up input and output filters (using iconv or some similar application) for the locale charset, just like any other locale. If you use GTK or Qt or Java, you don't have an option - you can use Unicode without much problem, and can't use other charsets.
If you use GTK, it will automatically flip dialog boxes R-to-L. As for vertical presentation - who's doing vertical presentation? I don't know of anyone who supports vertical presentation on dialog boxes; the Chinese, who traditionally write vertically, are happy with L-to-R in a computer situation, and the Mongolians, the only other people I know of who write vertically, tend to use Cyrillic or at least write traditional Mongolian horizontally in a computer situation. With all due respect to the Mongolians, if they're the only people who may use vertical writing systems in computers, I don't think vertical presentation is the most important thing to worry about.
You don't have to rewrite the world; a lot of this stuff's already been done in the standard toolkits.