Open-Source Language Translator Opens For Beta
mind21_98 writes "A new machine-translator designed for language translation has offically opened for public testing. GPLTrans is a translator similiar to Babelfish. Pre-alpha testing has shown that it is the most accurate of the major Web-based machine translators. More information can be found here. "
It's common to here the pundits opine that "open source may be good at improving 30-year-old operating systems, but the open-source model just doesn't work when it comes to large scale applications." Various reasons are given, for example: "open source programmers only do what is fun and interesting, and applications aren't interesting". But here we see yet another large-scale application falling to the barbarian hordes.
Those pundits are wrong: there is no genre of software that the open-source model will never absorb. Simply because the open-source model results in better software, for reasons that are well-known. And no, there is no no software application that is so uninteresting that no volunteer anywhere in the world will touch it. On the contrary: the more an application area remains untouched, the more interesting it becomes to open-source programmers, simply because it's virgin territory.
This is the "stamp collector" syndrome: when you already have a goodly number of stamps in your collection, adding the missing ones becomes an obsession.
Life's a bitch but somebody's gotta do it.
What most of these language translation programs need is a better understanding of context. I was surprised to find that Altavista's Babelfish utility has very poor analysis of context (possibly none at all). For example, when translating from English to French, "run" always translates to "exécute". For a sentence like you get which is reasonable, but if you translate you get which doesn't make any sense. More incredibly, "store" always translates to "mémoire". You would think that, if they were going to force every word to be interpreted in one sense, they would choose the most common meaning. But this choice leads to insanity where translates to
With knowledge of context, a more advanced system could notice situations in which it was more reasonable for "run" to have a particular meaning. In the last example, "run" is followed by a prepositional phrase indicating a direction, which would imply that the meaning involving physical movement is appropriate, and so on.
Even more revealing is the fact that the confusion of meaning happens differently for different languages. If you translate
into Spanish, you get the hilarious result: For translation software that has multiple language targets, i would have expected it to first resolve the meaning of the English sentence into an internal semantic representation before using it to emit Spanish or French. The above would be evidence that the Systran software has no such representation -- or at least that their representation is too weak to indicate the difference between "store" as in "memory" and "store" as in a warehouse.-- ?!ng