Slashdot Mirror


Open-Source Language Translator Opens For Beta

mind21_98 writes "A new machine-translator designed for language translation has offically opened for public testing. GPLTrans is a translator similiar to Babelfish. Pre-alpha testing has shown that it is the most accurate of the major Web-based machine translators. More information can be found here. "

2 of 155 comments (clear)

  1. It's the Stamp Collector syndrome by SurfsUp · · Score: 5

    It's common to here the pundits opine that "open source may be good at improving 30-year-old operating systems, but the open-source model just doesn't work when it comes to large scale applications." Various reasons are given, for example: "open source programmers only do what is fun and interesting, and applications aren't interesting". But here we see yet another large-scale application falling to the barbarian hordes.

    Those pundits are wrong: there is no genre of software that the open-source model will never absorb. Simply because the open-source model results in better software, for reasons that are well-known. And no, there is no no software application that is so uninteresting that no volunteer anywhere in the world will touch it. On the contrary: the more an application area remains untouched, the more interesting it becomes to open-source programmers, simply because it's virgin territory.

    This is the "stamp collector" syndrome: when you already have a goodly number of stamps in your collection, adding the missing ones becomes an obsession.

    --
    Life's a bitch but somebody's gotta do it.
  2. Better Context Analysis by Pingster · · Score: 5

    What most of these language translation programs need is a better understanding of context. I was surprised to find that Altavista's Babelfish utility has very poor analysis of context (possibly none at all). For example, when translating from English to French, "run" always translates to "exécute". For a sentence like
    The computer ran the program.
    you get
    L'ordinateur a exécuté le programme.
    ("The computer executed the program.")
    which is reasonable, but if you translate
    I ran home.
    you get
    J'ai exécuté à la maison.
    ("I executed at the house.")
    which doesn't make any sense. More incredibly, "store" always translates to "mémoire". You would think that, if they were going to force every word to be interpreted in one sense, they would choose the most common meaning. But this choice leads to insanity where
    Tom ran to the store.
    translates to
    Tom a exécuté à la mémoire.
    ("Tom executed to the memory.")

    With knowledge of context, a more advanced system could notice situations in which it was more reasonable for "run" to have a particular meaning. In the last example, "run" is followed by a prepositional phrase indicating a direction, which would imply that the meaning involving physical movement is appropriate, and so on.

    Even more revealing is the fact that the confusion of meaning happens differently for different languages. If you translate

    Tom ran to the store.
    into Spanish, you get the hilarious result:
    Tom se ejecutó al almacén.
    ("Tom executed himself to the warehouse.")
    For translation software that has multiple language targets, i would have expected it to first resolve the meaning of the English sentence into an internal semantic representation before using it to emit Spanish or French. The above would be evidence that the Systran software has no such representation -- or at least that their representation is too weak to indicate the difference between "store" as in "memory" and "store" as in a warehouse.


    -- ?!ng