Slashdot Mirror


Open-Source Language Translator Opens For Beta

mind21_98 writes "A new machine-translator designed for language translation has offically opened for public testing. GPLTrans is a translator similiar to Babelfish. Pre-alpha testing has shown that it is the most accurate of the major Web-based machine translators. More information can be found here. "

11 of 155 comments (clear)

  1. Machine translators by theSheep · · Score: 3

    While machine translation is very practical, it can also provide entertainment. I remember a story about scientists testing an English-Russian-English translator by translating phrases to Russian and back. Input: "The spirit is willing, but the flesh is weak." Output: "The vodka is good, but the meat is rotten."

    --
    -- The Sheep --
    1. Re:Machine translators by Arjen · · Score: 4
      This is an urban legend. According to MACHINE TRANSLATION: An Introductory Guide:

      The `spirit is willing' story is amusing, and it really is a pity that it is not true. However, like most MT `howlers' it is a fabrication. In fact, for the most part, they were in circulation long before any MT system could have produced them (variants of the `spirit is willing' example can be found in the American press as early as 1956, but sadly, there does not seem to have been an MT system in America which could translate from English into Russian until much more recently --- for sound strategic reasons, work in the USA had concentrated on the translation of Russian into English, not the other way round). Of course, there are real MT howlers. Two of the nicest are the translation of French avocat (`advocate', `lawyer' or `barrister') as avocado, and the translation of Les soldats sont dans le café as The soldiers are in the coffee. However, they are not as easy to find as the reader might think, and they certainly do not show that MT is useless.

      BTW, since this book is no longer available in the stores, the whole contents is placed online. I recommend reading this book to anyone who is interested into the subject of MT. It really is a nice introduction into the subject.

  2. I need to try this at work by MonkeyPaw · · Score: 3

    I'll give this a test at the office,. because half the time I don't understand half of what the customers are saying.

    Perhaps I can use it to translate my words to the customer,. so when I say "Ok,. click on My Computer" they don't hear "restart the computer and click on the first icon you see while hitting the esc key and pulling on the power cord".

    --
    My studio - www.graylands.ca
  3. AI&Babelfish by T.Hobbes · · Score: 3

    I'm not sure if it has been done yet, but it would be quite helpful if an AI could 'evolve' along with the language (because, as we all know, language changes all the time) based on monitoring of user-editing of the post-process text. For example, if at time 'a' it was programmed to translate 'Cool' to 'Froid' in french, it would (after monitoring the changes made by users) learn to translate 'Cool' to the french equivilent of 'hip'. or something. 'cause, dammit, i can't wait until the AIs take over ;)

  4. Make it a standard desktop component! by Anonymous Coward · · Score: 4

    It would be nice if someone were to make a CORBA translation service and add this to one or more of the linux desktops. Then it could be used for email, documentation, irc, coding, etc, not just for the occasional web page. It would also be good if the data at gpltrans was snapshotted regularly and pushed around, ideally so that everyone would have their own copy.

  5. It's the Stamp Collector syndrome by SurfsUp · · Score: 5

    It's common to here the pundits opine that "open source may be good at improving 30-year-old operating systems, but the open-source model just doesn't work when it comes to large scale applications." Various reasons are given, for example: "open source programmers only do what is fun and interesting, and applications aren't interesting". But here we see yet another large-scale application falling to the barbarian hordes.

    Those pundits are wrong: there is no genre of software that the open-source model will never absorb. Simply because the open-source model results in better software, for reasons that are well-known. And no, there is no no software application that is so uninteresting that no volunteer anywhere in the world will touch it. On the contrary: the more an application area remains untouched, the more interesting it becomes to open-source programmers, simply because it's virgin territory.

    This is the "stamp collector" syndrome: when you already have a goodly number of stamps in your collection, adding the missing ones becomes an obsession.

    --
    Life's a bitch but somebody's gotta do it.
  6. Better Context Analysis by Pingster · · Score: 5

    What most of these language translation programs need is a better understanding of context. I was surprised to find that Altavista's Babelfish utility has very poor analysis of context (possibly none at all). For example, when translating from English to French, "run" always translates to "exécute". For a sentence like
    The computer ran the program.
    you get
    L'ordinateur a exécuté le programme.
    ("The computer executed the program.")
    which is reasonable, but if you translate
    I ran home.
    you get
    J'ai exécuté à la maison.
    ("I executed at the house.")
    which doesn't make any sense. More incredibly, "store" always translates to "mémoire". You would think that, if they were going to force every word to be interpreted in one sense, they would choose the most common meaning. But this choice leads to insanity where
    Tom ran to the store.
    translates to
    Tom a exécuté à la mémoire.
    ("Tom executed to the memory.")

    With knowledge of context, a more advanced system could notice situations in which it was more reasonable for "run" to have a particular meaning. In the last example, "run" is followed by a prepositional phrase indicating a direction, which would imply that the meaning involving physical movement is appropriate, and so on.

    Even more revealing is the fact that the confusion of meaning happens differently for different languages. If you translate

    Tom ran to the store.
    into Spanish, you get the hilarious result:
    Tom se ejecutó al almacén.
    ("Tom executed himself to the warehouse.")
    For translation software that has multiple language targets, i would have expected it to first resolve the meaning of the English sentence into an internal semantic representation before using it to emit Spanish or French. The above would be evidence that the Systran software has no such representation -- or at least that their representation is too weak to indicate the difference between "store" as in "memory" and "store" as in a warehouse.


    -- ?!ng

    1. Re:Better Context Analysis by moore · · Score: 3

      The problime is that most if not all of
      these systomes know nothing about meaning at all.
      All that do is try to match one set of strings to
      a difrent set of strings.
      GPL Trans works by the substuation methoud.

      >from: Mooneer Salem
      >
      > It is a system where words in a phrase that
      > can be substituted are
      > marked by %phrase%
      > For example:
      >
      > English: My name is %phrase1%.
      > Spanish: Me llamo %phrase1%.
      >

      This genreal systome can be extended in to a
      phrase sturcture grammer with pares of rules for
      each language. ex:
      english: S -> NP1 V NP2
      irish: S -> V NP1 NP2

      these rules would modal sentences like:
      english: the cat chased the dog.
      irish: chased the cat the dog.

      All this is oversimplifyed but you get the poin.
      The real problime is that you need to be trained
      as a linguist to understand what the structer of
      many seantences are and even linguestes aruge a
      LOT. The phrase structal aprouch is probly what
      altavista a such do. All thoe I rilly like the
      idea to GPL Trans I do not thik there aproch will
      get them to far; but it will be fun to see what
      thay can do.

  7. Translation methods by Y · · Score: 4

    Although the site has been slashdotted, it would be interesting to see what sort of algorithms it uses to perform the translations. Mmm, open source.

    I would be inclined to say that if it is based on grammar rules, the project won't make much headway - machine translation has been butting its head against this brick wall for forty years. The problem with hard-and-fast grammar rules, e.g.,

    S = NP VP
    NP = Det (Adj)* N
    VP = V (Adv)

    is that they don't account for rapid linguistic change, and people have this nasty habit of twisting grammar to express themselves in new and creative ways. :) In addition to this, it's very difficult to write simple, lucid grammar rules that also count for the myriad exceptions found in language.

    I imagine GPLTrans would probably be using some sort of probability frame of phrases and words occurring together, but one can't be sure without looking at the source. I think the best way to do translation software would be to convert the text into syntax, then into a more abstract semantic form, and from the semantic form, translate back into the target language's syntax, and then into the target language's text. Of course, the trick is to figure out just exactly how to do this. :) The parsing itself is a hefty (and not terribly exciting) task. I attempted to make a term project of a fairly basic English parser and ended up changing the project.

    My 2 cents/Pfennig/lire/pesos,
    Y

    --
    "There is no culture in computer science, only cults." - M. Felleisen
  8. This stuff is hard by moore · · Score: 3

    I posted this a reply to a comment but then thought maby it should be its own thread.

    The problime is that most if not all of
    these systomes know nothing about meaning at all.
    All that do is try to match one set of strings to
    a difrent set of strings.
    GPL Trans works by the substuation methoud.

    >from: Mooneer Salem
    >
    > It is a system where words in a phrase that
    > can be substituted are
    > marked by %phrase%
    > For example:
    >
    > English: My name is %phrase1%.
    > Spanish: Me llamo %phrase1%.
    >

    This genreal systome can be extended in to a
    phrase sturcture grammer with pares of rules for
    each language. ex:
    english: S -> NP1 V NP2
    irish: S -> V NP1 NP2

    these rules would modal sentences like:
    english: the cat chased the dog.
    irish: chased the cat the dog.

    All this is oversimplifyed but you get the poin.
    The real problime is that you need to be trained
    as a linguist to understand what the structer of
    many seantences are and even linguestes aruge a
    LOT. The phrase structal aprouch is probly what
    altavista a such do. All thoe I rilly like the
    idea to GPL Trans I do not thik there aproch will
    get them to far; but it will be fun to see what
    thay can do.

  9. Context and internal semantic representations by Arjen · · Score: 3
    I was surprised to find that Altavista's Babelfish utility has very poor analysis of context (possibly none at all).

    While contextual knowledge can increase the qualitiy of a translation; the amount of world knowledge necessary to translate a typical web page is simply astounding. Most users of a translation system simply do not want to wait for hours to translate a simple sentence.

    And, there is the problem of linguistic knowledge. Most web pages are not written in "proper" English, but in some Web-speak-lingo. This requires the system to be very robust.

    The most successful use of MT in corporations today are situations where a very simple grammar and lexicon is used, and very little world knowledge ois required. For instance, the Xerox corporation has its own translation system that translates component manuals. The technical writers that write the original version of the manual are required to use very simple English only, without any ambiguities and with very simple constructions.

    For translation software that has multiple language targets, i would have expected it to first resolve the meaning of the English sentence into an internal semantic representation before using it to emit Spanish or French.

    This "internal semantic representation" is called an Interlingua. It has been used in various MT systems, with varied amounts of succes.

    The most important advantage of an Interlingua-based MT system is that is does not require a translation engine for each language pair. For instance, if you create a system for English, French, Dutch and German texts, you only need to create four analysis engines:

    1. English -> interlingua
    2. French -> interlingua
    3. German -> interlingua
    4. Dutch -> interlingua
    And four generation engines:
    1. interlingua -> English
    2. interlingua -> French
    3. interlingua -> German
    4. interlingua -> Dutch
    With a non-interlingua system (which is called a Transfer system), you'd have to create 3^2=9 engines:
    1. English -> French
    2. English -> German
    3. English -> Dutch
    1. French -> English
    2. French -> German
    3. French -> Dutch
    etc..

    Clearly, it is easier to integrate new languages into a interlingua system than into a transfer system.