Slashdot Mirror


Romancing The Rosetta Stone

Roland Piquepaille writes "Not only this news release from the University of Southern California has a fantastic title, it also has a great content. This story is about one of their scientists, Franz Josef Och, whose software ranks very high among translation systems. "Give me enough parallel data, and you can have a translation system for any two languages in a matter of hours," said Dr. Och, paraphrasing Archimedes. His approach relies on two concepts, gathering huge amounts of data, and applying statistical models to this data. It completely ignores grammar rules and dictionaries. "Och's method uses matched bilingual texts, the computer-encoded equivalents of the famous Rosetta Stone inscriptions. Or, rather, gigabytes and gigabytes of Rosetta Stones." Read my summary for more details."

30 of 486 comments (clear)

  1. Where can I download his software? by georgeha · · Score: 1, Funny

    Since I mistakenly borrowed some undubbed Cowboy BeBop.

  2. Let me know by gazuga · · Score: 5, Funny

    when it's in the form of a fish, and can fit in my ear...

    --
    "I turn away with fright and horror from the lamentable evil of functions which do not have derivatives."
    1. Re:Let me know by Cruciform · · Score: 2, Funny

      That's not a real Babelfish though, it's just a Beta

  3. First on-topic post! by CubeDude213 · · Score: 1, Funny

    This could be an amazing improvent to search engines. If they could instantly translate a page before showing it in the results.

  4. What if the two texts don't match ? by Anonymous Coward · · Score: 1, Funny


    We'll have a supercharged Babelfish ?

  5. Imagine a beowulf cluster of these... by mjmalone · · Score: 1, Funny

    No really... what if it used a shared database and there were hundreds, or thousands, of the systems around the world... Seems like it could become a pretty sophisticated system. And maybe one day it will be available in the form of a small fish which you place in your ear?

  6. Oh god... by gerf · · Score: 4, Funny

    The uber-geeks are going to have a field day with Klingon...

    1. Re:Oh god... by laughing_badger · · Score: 3, Funny

      Yay! We can finally finish translating all of Shakespear into English.

      --
      Help children born unable to swallow - www.tofs.org.uk
    2. Re:Oh god... by Jeremi · · Score: 4, Funny
      I'm pretty sure you can have your throat slit for saying "Yay!" near a Klingon. Do be careful. ;)


      Having your throat slit is nothing compared to what Klingons do to people who put smiley-faces in their text messages...

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    3. Re:Oh god... by daeley · · Score: 4, Funny

      Having your throat slit is nothing compared to what Klingons do to people who put smiley-faces in their text messages...

      You're telling me! My emoticons used to have noses! Now look:

      :(

      Such a tragedy.

      --
      I watched C-beams glitter in the dark near the Tannhauser gate.
  7. Re:Obsolete? by StressedEd · · Score: 0, Funny

    So you think translation is becoming obsolete do you? Perhaps you need to "get out more".

    --
    Be nice to people on the way up. You will meet them again on your way down!
  8. Re:Obsolete? by BlackHawk-666 · · Score: 0, Funny

    You done good be rite is. We fckuing enGlish good nows.

    --
    All those moments will be lost in time, like tears in rain.
  9. The Magic Eight Ball Says: by Anonymous Coward · · Score: 1, Funny

    Am I the only one who thinks that translation is quickly becoming obsolete?

    Yes.

  10. Re:Obsolete? by Anonymous Coward · · Score: 2, Funny

    DARPA actually proposed that a forced conversion to English policy would be more cost effective for the defense department to implement through military invasion than some complicated translation scheme. Hence congress's support for the translation project.

  11. Was this article translated? by Alton_Brown · · Score: 3, Funny

    From the article: his software scored highest among 23 Arabic- and Chinese-to-English translatio systems

    Oops - guess we need some more parallel data (or a few more gigs of rosetta stones).

  12. Re:Obsolete? by DG · · Score: 5, Funny

    A man who speaks three languages is trilingual.

    A man who speaks two languages is bilingual.

    A man who speaks one language is American.

    DG

    --
    Want to learn about race cars? Read my Book
  13. Damn Babelfish! by Zog+The+Undeniable · · Score: 5, Funny
    "Most the bay only of news of the college of southern extremity California it knows an all big contents all there is this emission annular subject, it also there is a RolandPiquepaille and it writes. The Franz taxes where his software height one lyel with lines up between the translation system quite phu the Och and this history are the summary thing their scientist. The Och "it gave the data which is parallel is sufficient in me, it spread out," inside questioning the hour 2 specialties the language which it does not do of the multi Archimedes which is the possibility which there will be a hazard translation system the doctor repulsively it talked. It approach collects the sheep which data is enormous, apply the statistical model in this data a foundation in 2 concepts which it puts. It is complete and the wool of rule lu the dictionary of grammar "the m3ethode of the Och the duplex language original and the Rosetta which agree one equivalent with computer password of noble and wise pebble epitaph adopts. Or, rather, the gigaoctets and pebble gigaoctets of the Rosetta." Detail fact compared to read the hazard my synopsis.

    English --> French --> English --> Korean --> English. Of course, it helps that the first sentence is munged anyway ;-)

    --
    When I am king, you will be first against the wall.
  14. Well, so? by k98sven · · Score: 3, Funny

    What is the novelty of this?

    It's hardly news that you can always find correlations in two sufficiently large sets of data.

    Reminds me of the Steve Martin joke:

    "Chicks go for the intellectual types. I figured the best way to impress 'em was to read a lot of books. But hey, do you know how many books there are? Why, there must be, hundreds of them. But I was already a pretty smart guy. I didn't waste my time reading all those books. Heck no.
    I read, the dictionary. Hey--I figure it's got all the other books in it."

  15. translatio? by Lady+Jazzica · · Score: 2, Funny

    University of Southern California computer scientist Franz Josef Och echoed one of the most famous boasts in the history of engineering after his software scored highest among 23 Arabic- and Chinese-to-English translatio systems, commercial and experimental, tested in in recently concluded Department of Commerce trials.

    Maybe what Dr. Och should do next is write some software to double-check the work of whoever translates his press releases from the original Latin. The translator seems to have missed a few words here and there.

  16. "The vodka is strong, but the meat is rotten" by quantum+bit · · Score: 5, Funny

    You know, that actually does sound like something that would be a Russian aphorism...

  17. Re:Obsolete? by notcreative · · Score: 2, Funny
    A man who speaks no known language is Dubya.

    I don't think this translation program would be able to deal with his Texan affectations.

  18. What about C++? by MobyDisk · · Score: 4, Funny

    So, can I train this program with a bunch of requirements documents, and a bunch of implementations, and have it learn how to code? :-) If so, I think I am obsolete. *poof*

  19. Re:Obsolete? by Anonymous Coward · · Score: 1, Funny

    Then you "knew him", not "know him".

    I'm dead too.

  20. Re:The Law of Eventuality by Abcd1234 · · Score: 2, Funny

    And that's not what's being done, which is why there is interesting science going on here, hence the poster not understanding what the press release is actually about.

  21. Give me enough Slashdot antries... by Pac · · Score: 5, Funny

    ...and I will make pseudo-insightful comments based on the headline text without reading any of the source articles, until my karma is excellent?

  22. How dare you ask by Anonymous Coward · · Score: 4, Funny
    But Can It Do Klingon?
    How dare you question the honor of this program! I should kill you where you stand!
  23. Re:The vodka is strong but the meat is rotten by iastor · · Score: 3, Funny

    Let's see what google has to say:

    English: The spirit is willing but the flesh is weak.

    German: Der Geist ist bereit, aber das Fleisch ist schwach.
    back: The spirit is ready, but the flesh is weak.

    French: L'esprit est disposé mais la chair est faible.
    back: The spirit is laid out but the flesh is weak.

    Italian: Lo spirito è disposto ma la carne è debole.
    back: The spirit is arranged but the meat is weak person.

    Portugese: O espírito é disposto mas a carne é fraca.
    back: The spirit is made use but the meat is weak.

    All I can say is this spirit person needs a better pimp!

  24. Another neat application for this technique. by attaboy · · Score: 2, Funny

    1: Create a set of "Rosetta Stone" data by taking thousands of recorded phone calls to customer service/operators, etc.
    2: For each call, track what the customer service rep/operator typed into their computer terminal.

    The result would be natural language voice-recognition that would probably achieve a high degree of accuracy because it would be limited in scope (e.g. asking for a credit line increase, reporting a lost card, checking your balance, etc.) and be based on real queries from real customers.

    Since the biggest majority of calls are for very simple problems (I forgot my password is the most common tech support call we get) this should be pretty useful.. you could probably automate "Level 1 Tech support"!

    --
    The facts have a liberal bias. --The Daily Show
  25. Article text (in Babel-German-back-to-English) by Wraithlyn · · Score: 4, Funny

    I just had to. Besides, I think it's proving a point, or something.

    --

    Romancing of the Rosetta stone

    ' you give me sufficient parallel data, and you can have translation a system in the hours '

    University southern California of the computer scientist Franz Josef, which Och of most famous against-resounded, praises itself in the history of the technology, after its software counted the Arab strongly under 23 and Chinese English translatio systems, commercially and experimentally, examined inside in recently concluded Ministry of Trade of attempts.

    "you indicate a place to me to the location, and I shift the world,", after to to order a mathematical explanation for the lever said the large Greek scientist Archimedes place.

    "you give me sufficient parallel data, and you can have translation a system for all possible two languages in an affair of hours,", said Dr. Och, a computer scientist in the USC school of the institute for information science of the technology.

    Och spoke after the benchmark tests 2003 for the machine translation, which was accomplished in the May and June of this yearly by the National Institute of Standards and Technology United States of the trade department.

    Translations Ochs examined well into the 2003 head ton head tests against 7 Arab systems (5 research and 2 commercial away dregal products) and 14 Chinese systems (9 research and 5 from stock). In preceding 2002 evaluations had examined it similarly superior.

    The researcher discussed his methods held at a NIST Postmortemseminar over the Benchmarking July 22-23 of John Hopkins at the university in Baltimore, Maryland.

    Och is an outstanding exponent of a newer method of using the computers to touch in order to translate a language into other one, which became more successful in the last years, while the ability of the computers grew, large bodies of the information, and the volume of the text and the brought together translations in the digital form has, on (for example) multilingual newspaper or government net places of assembly explodes.

    Method Ochs uses brought together bilingual texts, the computer-coded equivalents of the famous Rosetta descriptions of stone. Or rather gigabytes and gigabyte Rosetta of stones.

    "our approximation uses statistic models, in order to find the most probable translation for a given entrance," Och avowedly

    "it is rather different to the older, symbolic approximations for the machine translation, which in most existing the commercial systems is used, which try, to code the grammar and the encyclopedia of a foreign language in a computer program the grammatical structure of the strange text analyzed, and produced then English, which on hard guidelines," it is based, continued.

    "employs, explaining from the computer, how one, we left it it out explains translated. First we draw the system it with a parallel korpus i.e. an accumulation of texts in the foreign language and their translations into English.

    "the computer uses these information, in order to co-ordinate the parameters of a statistic model translation of the process. During the translation of the new text, the system tries to find English sentence which is the most probable translation strange entrance of the sentence, be based in these statistic models."

    This method ignores or rolls over rather, finds express grammatical guidelines and even traditional dictionary lists of the vocabulary in favor of leaving the computer matchup samples between given Chinese or Arab (or any another language) texts and English translations.

    Such abilities grew, while computers improved, by making possible for them, from using the individual words as the fundamental unit on using the groups of words to move -- cliches.

    Versions of the different human translators of the same text change frequently considerably. Another key improvement was the use of repeated English human translations to permit the computer too its transmission by an ana

    --
    "Mind, as manifested by the capacity to make choices, is to some extent present in every electron." -Freeman Dyson
    1. Re:Article text (in Babel-German-back-to-English) by fehlschlag · · Score: 2, Funny

      Wow, that reads very similar to a lot of the /. posts I see... but with better spelling.

      Ouch, stop throwing things at me!