Slashdot Mirror


Translation Software That Learns by Reading

redcone writes "New Scientist is reporting that translation software that develops an understanding of languages by scanning through thousands of previously translated documents has been released by U.S. researchers. According to the article "The translated documents used to teach the translation algorithms can be electronic, on paper, or even audio files. The system is not only faster than other methods, but also better suited to tackling less common languages and the unusual vocabulary found in specialised or technical texts.""

16 of 308 comments (clear)

  1. Harry Potter and the Bible by MikeFM · · Score: 4, Interesting

    I remember hearing about this a couple years ago. They were using translations of Harry Potter and the Bible to teach this software to translate. It seems to work well. I wonder what it'd make of different translations of technical documentation. That'd probably be even more interesting than what it'd make out of 'quidditch'.

    This could be great if it were opensourced. It'd be nice to translate email, instant messages, websites, technical docs, and lots of other stuff we're currently using the fish for. The fish is nice but not that effecient to add to other programs and it's translations aren't usually that great.

    --
    At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
  2. Neural Nets and Machine Learning by MyIS · · Score: 2, Interesting

    In one way or another this is similar to training neural nets to recognize images, or spam filters to mark junkmail. Great way to put number-crunching power of computers to direct work.

    --
    http://zero-to-enterprise.blogspot.com/
  3. Google definitely would buy into this... by egyber · · Score: 5, Interesting

    Don't remember exactly where I read this, but google apparently has long believed that there is enough data on the internet alone to be able to intelligently translate... What these guys claim to have done is, it would seem, the missing peace of the puzzle for google. I wouldn't be surprised if google gets in on this.

    1. Re:Google definitely would buy into this... by MikeFM · · Score: 2, Interesting

      I tried that in about 1997. It did work pretty well but the biggest problem was the limitation of having copies of the same document in different languages. There are quite a few but they were dwarfed by the amount of single-language documents. Also the fact is that most text on the Internet is written the way that I write - badly. This can lead to translations that are written the way real people write which can be good for conversational bots but which is probably bad for translation software.

      Some of the more interesting things about these bots of mine were that they weren't programmed to translate but they learned to do so anyway. If you spoke to them in English they might respond in French or German but the response would be correct. That was really a very surprising finding.

      I expect that these guys have built a much more robust dictionary and that their algorithms are worked out better than mine were. They probably have taken texts off the Internet to train their dictionary but I doubt they'd want to submit random findings off the Internet.

      I'd like to see what they could come up with for simplifying language. Take some source documents written in full geek jargon and take the same documents rewritten to be for the lay person. Train the program on that. Then us geeks could translate our docs into stuff normal people could read. THAT I'd buy.

      I wonder if it'd be good enough to learn to translate source code into English or even into other programming languages? It'd seem that the same abilities would apply to this task.

      --
      At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
  4. Re:Philosophical caveat by MikeFM · · Score: 3, Interesting

    Does anybody understand the tax code? Why should software be any different?

    I think that software that can learn can be said to understand a problem just as much as a human can. The difference between understanding and just doing is having the ability to learn from new data and to change your actions as required.

    --
    At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
  5. Arabic to English by Caseyscrib · · Score: 4, Interesting
    I'd like to see an arabic-to-english translator. I was interested in reading news from the middle east, because I don't particularly trust our media to translate it properly. A good example of this is Bin Laden's transcript.

    After a quick web search, all I was able to find was this site, which has a pretty sketchy TOS agreement.

  6. Dragon Naturally Speaking by headkase · · Score: 3, Interesting

    Using statistical methods to predict the next item in a sequence is still not true hard ai though, this technique is used with the voice recognition software "Dragon Natually Speaking" creating in effect pattern chains. What Dragon did on the character level this software appears to do on the word level. This is still not true AI however, as the statistics will only map to probabilistic sequences not abstractly map instead to the concepts. What would really impress me is if they came up with a mapping algorithm that instead of using probability used a function like mini-max fitness testing on a neural-network substrate.
    It would be interesting to see the results of analysing large sections of languages however, but the only immediate use I can fathom for this would be for cryptography or information compression algorithms. However the results could probably be used to provide insight into how languages evolve or how memes spread from language to language.
    Or the brief explanation in the article did not make it clear enough how this differs from what was previously state-of-the-art, e.g. Dragon.

    --
    Shh.
    1. Re:Dragon Naturally Speaking by Anonymous Coward · · Score: 1, Interesting

      The Berger-Liaw speech recognition system has been found to be not only much better at detecting speech than other systems, it is also very much better than people at understanding speech. In very noisy environments where the noise is 1000 time as loud as the speech, the system was accurate at a rate of 75%. 1000 test subjects who had 'normal' hearing (no hearing aids or auditory problems) could only get a collective recognition rate of 15%. The recognition system was five times as accurate as the human population. Mind you, the US millitary (particularly the Navy Submarine Fleet) picked up the tech Hey! You just can't kick my door like.... You're hurting my arm sir! You're hurting my arm sir!!

  7. Scanning Audio Files by BobPaul · · Score: 2, Interesting

    Why didn't I have this software during High School Spanish?

    It says it can scan through audio files an input source. I wonder if this causes it to "learn" the auditory signatures (and thus only knows the translation when given audio input), or if it relies on text to speech from to convert it to text first?

    If it does the latter, than based on the quality of current text-to-speech software, this probably wouldn't do much good in a total immersion classroom situation...

    Sure would have helped with my German homework, though ;)

  8. How is that news? Research was done 10 years ago. by Anonymous Coward · · Score: 4, Interesting

    The basic approach has been developed over 10
    years ago by IBM: The Mathematics of Statistical Machine Translation. And even free software has been available for a while, see
    http://www.fjoch.com/GIZA++.html.

  9. No samples? by Guspaz · · Score: 3, Interesting

    Sounds interesting, but I couldn't find a single sample translation on their site; ie a block of text in language A (Say, french), and language B (Say, english). Translated from A to B by their software.

    Without even the simplest of examples or samples we have only their word on how well this works.

  10. DOOMED by FoXDie · · Score: 3, Interesting

    Recently robots have been made that can Run, Wield shotguns, and Recognize faces. Now they can read. [DOOMED I SAY]

  11. Re:Philosophical caveat by Anonymous Coward · · Score: 1, Interesting

    Prove your brain is not just a Chinese Room. Prove that you actually understand something, that you're not just an unconscious simulation able to produce the correct output.

    It cannot be done. All you can provide as evidence of your consciousness is the output of whatever goes on in your brain; but the whole premise of the Chinese Room assumes it is possible for a non-conscious process to duplicate that output. After all, if the output of conscious and non-conscious processes can be distinguished, then the Turing Test has not been passed by the non-conscious process.

    Accordingly, since we cannot show that a human being can "understand" in any meaning distinct from the way the Chinese Room can "understand", we can either declare that we cannot tell the difference between things that can and cannot "understand" (in which case it is an act of faith, not reason, to declare a Chinese Room doesn't "understand"), or we can declare that the Chinese Room does "understand".

    Either way, the distinction makes no difference, and therefore has no consequences, philosophical or otherwise. And Searle, by harping on a distinction that makes no difference, is indistinguishable from a fool.

  12. Tests by headkase · · Score: 2, Interesting

    The biggest test of the translator is converting from one language to another and then back again multiple times. If the content doesn't get corrupted then it works as advertised.

    --
    Shh.
  13. Re:High school Spanish by Servants · · Score: 2, Interesting

    Until I see this new process in the works, however, there is nothing that will make me believe it's better than finding another human who can *understand* what you are saying and the context to which you are implying.

    Heh. Then there is nothing that will make you believe, etc., etc.

    Certainly you can't do good translation without understanding syntax (which influences meaning and underlies word order) and context (to disambiguate synonyms and phrases with multiple interpretations). Machines aren't especially good at either one yet; ergo, machine translation will continue to be pretty crappy for the foreseeable future.

    Funny thing is, though, even a crappy translation turns out to be tremendously useful in most practical contexts, and worlds better than none at all; a simple word-for-word translation is typically hard to read but still conveys the proper gist. That's why I don't get excited about automatic translation "advances" these days: there are really two purposes for machine translation. One is figuring out what a piece of speech or text trying to say, and the current technology is usually good enough for that. The other is making a translation of sufficient quality to save a human translator some work, and I think that won't happen for quite a few years yet. Anything in between adds very little.

    (By the way, everything in natural language processing these days uses corpus learning techniques. Now if an improved technology had been developed manually by bilingual programmers who pulled the design out of their collective hats, then that would be a man-bites-dog story!)

  14. The first such system was built in 1993. by Dulimano · · Score: 3, Interesting

    This is news of '93, when Brown et al. at IBM built their famous statistical machine translation system. It does exactly what is described in the article. I myself work on such a system (for Hungarian-to-English translation).

    The article (press release?) is totally misleading. Kevin Knight and Daniel Marcu are building on at least 15 years of active research on statistical machine translation. On the other hand, they are really very good at it.