Translation Software That Learns by Reading

← Back to Stories (view on slashdot.org)

Translation Software That Learns by Reading

Posted by samzenpus on Wednesday February 23, 2005 @01:54PM from the it-is-fundamental dept.

redcone writes "New Scientist is reporting that translation software that develops an understanding of languages by scanning through thousands of previously translated documents has been released by U.S. researchers. According to the article "The translated documents used to teach the translation algorithms can be electronic, on paper, or even audio files. The system is not only faster than other methods, but also better suited to tackling less common languages and the unusual vocabulary found in specialised or technical texts.""

32 of 308 comments (clear)

High school Spanish by KaSkA101 · 2005-02-23 13:56 · Score: 3, Funny

Why didn't I have this software during High School Spanish?
1. Re:High school Spanish by xtrvd · 2005-02-23 14:25 · Score: 4, Informative
  
  Fortunately I had the next best thing in High School Spanish. The trick is simply going to the #spain channel on efnet and talking nice to some people. You'd be amazed as to how often my teacher would fail my fellow students because they attempted using the primitive babelfish.altavista.com to do their work for them; she could easily spot the syntax errors and mis-spelled english words which were never translated.
  
  Until I see this new process in the works, however, there is nothing that will make me believe it's better than finding another human who can *understand* what you are saying and the context to which you are implying.
2. Re:High school Spanish by Temposs · 2005-02-23 19:15 · Score: 3, Insightful
  
  Until I see this new process in the works, however, there is nothing that will make me believe it's better than finding another human who can *understand* what you are saying and the context to which you are implying. "Better" is an ambiguous term. For what these researchers made the program for, it is better than humans for one reason: speed. Sure they want the translations to be reliable, but more importantly is that a computer can do in a few days what would take a human a month, for this application at least. The NSA and the like want to have translations of huge swathes of text, and fast! The sooner they can understand things that are written, the faster they can react to threats. The time and money spent on human translators for this purpose is very slow and expensive in comparison. For your Spanish HW, the best is a native speaker giving you feedback, because the amount of work is small and the translations will be very accurate.
  
  --
  Knowledge is just opinion that you trust enough to act upon. -Orson Scott Card
technical texts by Olaserov · 2005-02-23 13:57 · Score: 4, Funny

I wonder if we could train it to translate a EULA ;)

--
* Olaserov is in the process of thinking up a signature.
translate to American please by Anonymous Coward · 2005-02-23 13:58 · Score: 3, Funny

Can someone translate that article from British english to American english please.

Thanks.
1. Re:translate to American please by Grey+Ninja · 2005-02-23 14:26 · Score: 4, Funny
  
  Here's a couple of suggestions for you:
  
  r3Ð(0n3 wr173$ "N3w $(13n71$7 1$ r3p0r71n9 7h47 7r4n$£4710n $07w4r3 7h47 Ð3v3£0p$ 4n nÐ3r$74nÐ1n9 0 £4n9493$ b¥ $(4nn1n9 7hr09h 7h0$4nÐ$ 0 pr3v10$£¥ 7r4n$£473Ð Ð0(m3n7$ h4$ b33n r3£34$3Ð b¥ .$. r3$34r(h3r$. 4((0rÐ1n9 70 7h3 4r71(£3 "7h3 7r4n$£473Ð Ð0(m3n7$ $3Ð 70 734(h 7h3 7r4n$£4710n 4£90r17hm$ (4n b3 3£3(7r0n1(, 0n p4p3r, 0r 3v3n 4Ð10 1£3$. 7h3 $¥$73m 1$ n07 0n£¥ 4$73r 7h4n 07h3r m37h0Ð$, b7 4£$0 b3773r $173Ð 70 74(|{£1n9 £3$$ (0mm0n £4n9493$ 4nÐ 7h3 n$4£ v0(4b£4r¥ 0nÐ 1n $p3(14£1$3Ð 0r 73(hn1(4£ 73x7$.""
  
  And translation #2:
  
  REDCONE WRIETS NU SCEINTIST IS R3PORTNG TAHT TRANSLATION R TAHT D3V3LOPS AN UNDERSTANDNG OF LANGUAEGS BY SCANNG THROUGH THOUSANDS OF PREVIOUSLY TRANSLAETD DOCUMENTS HAS B3N REL3AESD BY US!!!! OMG R3S3ARCHARS!!1!1!! LOL ACORDNG 2 DA ARTICL3 TEH TRANSLAETD DOCUMENTS US3D 2 T3ACH TEH TRANSLATION ALGORITHMS CAN B 3LECTRONIC ON PAEPR OR 3V3N AUDIO FIELS!!1111 TEH SYSTEM IS NOT ONLY FASTER THAN OTH3R M3THODS BUT ALSO BT3R SUIETD 2 TAKLNG LAS COMON LANGUAEGS AND TEH UNUSUAL VOCABULARY FOUND IN SPACIALIESD OR TECHNICAL TEXTS!1!! WTF
Yay! by gardyloo · 2005-02-23 13:58 · Score: 3, Funny

Hope for slashdot. I've always wondered if we only have artificially intelligent editors...
Harry Potter and the Bible by MikeFM · 2005-02-23 13:59 · Score: 4, Interesting

I remember hearing about this a couple years ago. They were using translations of Harry Potter and the Bible to teach this software to translate. It seems to work well. I wonder what it'd make of different translations of technical documentation. That'd probably be even more interesting than what it'd make out of 'quidditch'.

This could be great if it were opensourced. It'd be nice to translate email, instant messages, websites, technical docs, and lots of other stuff we're currently using the fish for. The fish is nice but not that effecient to add to other programs and it's translations aren't usually that great.

--
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
1. Re:Harry Potter and the Bible by obeythefist · 2005-02-23 14:45 · Score: 5, Funny
  
  I never read that one. I thought the next book title was going to be "Harry Potter and the Half-Blood Prince".
  
  Or did JK Rowling suddenly become pious?
  
  --
  I am government man, come from the government. The government has sent me. -- G.I.R.
Turing test by OneArmedMan · 2005-02-23 13:59 · Score: 3, Insightful

I wonder if something similar to this could be used for AI , for say Turing Test's ?
Wow! Does a much better job... by bigtallmofo · 2005-02-23 14:01 · Score: 5, Funny

Teach Software translating on scanning up

Not hard wares that sticks an comprehension of talks by scanning on thousands of fish translated papers has been vomited by US scientists.

Many existing translation not hard wares uses palm rules for botching words and phrases. But the new software, snarked by Kevin Knight and Daniel Marcu at the Information Sciences[...]

Read More...

--
I'm a big tall mofo.
That's great.... by Frodo+Crockett · 2005-02-23 14:02 · Score: 4, Funny

...bu7 (4n 17 unÐ3r$74nÐ £337?

--
"The newly born animals are then whisked off for a quick run through a giant baking oven." --heard on Food Network
That sounds like a good approach by FunWithHeadlines · 2005-02-23 14:03 · Score: 3, Insightful

I wish them luck (cuz they'll need it), but if anything is going to produce translation software that really works it will have to include learning elements of this nature. It's one thing to get dictionary translations. That's been around for decades, with its laughable results. Humans speak in metaphor and simile and slang and contractions and abbreviations of thought all the time. We're the cat's meow of language (try that, computer!).
But if you give computers a bunch of human stuff to read, you expose the dictionaries to language as it is actually used, not just as the dictionary has it. Then when odd language usage falls upon us like it's raining cats and dogs, they will have a database of similar usage to draw upon. Hey, it's an uphill climb, but this is a good avenue to try. Cheerio, computers, and a top o' the mornin' to ya.
Philosophical caveat by Raindance · 2005-02-23 14:03 · Score: 4, Insightful

As a caveat, we should be wary of saying the system "understands" a language.

I would say generally that humans able to translate between languages generally understand both languages, but whether a statistical, probabilistic model based on correlations understands a language might be a stretch.

Further reading: Searle's Chinese Room argument- http://en.wikipedia.org/wiki/Chinese_room

This is akin to asking, Does your tax software understand the tax code? Does Photoshop understand the principles of image manipulation?

Are these silly questions to ask?

Further reading: Dennett on intentionality (http://en.wikipedia.org/wiki/Dennett but the entry is pretty sparse).

RD
1. Re:Philosophical caveat by MikeFM · 2005-02-23 14:08 · Score: 3, Interesting
  
  Does anybody understand the tax code? Why should software be any different?
  
  I think that software that can learn can be said to understand a problem just as much as a human can. The difference between understanding and just doing is having the ability to learn from new data and to change your actions as required.
  
  --
  At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
2. Re:Philosophical caveat by back_pages · 2005-02-23 14:47 · Score: 4, Insightful
  
  Great example of this:
  Mom baked for three hours.
  The pie baked for three hours.
  "Mom" and "The pie" are the subjects. The verb and entire predicate are identical. Understanding the language disambiguates these sentences, but the ambiguity is part of what defines humor.
  A man walked into a bar. Ouch!
  A man wanted to win a pun contest in the local newspaper, so he entered 10 times in order to increase the chances that one of his entries would win. Unfortunately, no pun in ten did.
  You can translate that 50 ways from Sunday but without understanding the language - understanding what makes those statements interesting - the machine will lose all their meaning.
Google definitely would buy into this... by egyber · 2005-02-23 14:03 · Score: 5, Interesting

Don't remember exactly where I read this, but google apparently has long believed that there is enough data on the internet alone to be able to intelligently translate... What these guys claim to have done is, it would seem, the missing peace of the puzzle for google. I wouldn't be surprised if google gets in on this.
Translating specialised texts ... by rkmath · 2005-02-23 14:03 · Score: 4, Insightful

The article (and the text of the orginial posting) makes it seem like translating a specialized technical text is somehow harder than translating, say, a newspaper article. As someone experienced in translating technical (science/engineering) documents, I can say that any tech document is far _easier_ to translate after an initial learning curve.

The main reason (I think) is that: tech documents have specialised vocabulary and idioms, but these are much fewer than the idioms one has to master in order to understand the editorial page in a newspaper.

With a rudimentary knowledge of Russian and French, I have found it much easier to read an engineering textbook or paper in these languages, than reading any nontechnical text. (This is not necessarily the case with other languages. Any document in Japanese for instance is an entirely different ballgame ...)
1. Re:Translating specialised texts ... by Anonymous Coward · 2005-02-23 14:08 · Score: 4, Informative
  
  The article (and the text of the orginial posting) makes it seem like translating a specialized technical text is somehow harder than translating, say, a newspaper article. As someone experienced in translating technical (science/engineering) documents, I can say that any tech document is far _easier_ to translate after an initial learning curve.
  
  Of course that is true, for a human translator. Your knowledge of the technical field itself is a resource you can use to aid in your translation of technical texts. For machines, it's usually necessary to use a translator specifically geared to the subject matter. For instance, you would definitely want to use a different machine translator for a newspaper article as opposed to a biomedical research journal.
  
  This new approach is supposed to mitigate these problems. If they can do a good job of it, they may be able to bring machine translation to areas where previously human translators have been required or greatly preferred.
DadaDodo by Tripax · 2005-02-23 14:06 · Score: 4, Informative

This reminda me of Jamie Zawinskies hack Dadadodo which used probability trees to create new texts from old texts by examining the probability any given word follows the previous word/string of words. I always thought his program was cool, in that his description of it involved Markov Chains and William S. Burroughs.
Microsoft Research already does this by drdink · 2005-02-23 14:07 · Score: 4, Informative

I did a presentation for an AI class a while ago and discovered that Microsoft already does this with their MSR-MT project. Apparently the Spanish entries in their Knowledge Base were translated by this as well.

--
Beware, Nugget is watching... See?
Arabic to English by Caseyscrib · 2005-02-23 14:12 · Score: 4, Interesting

I'd like to see an arabic-to-english translator. I was interested in reading news from the middle east, because I don't particularly trust our media to translate it properly. A good example of this is Bin Laden's transcript.
After a quick web search, all I was able to find was this site, which has a pretty sketchy TOS agreement.
Dragon Naturally Speaking by headkase · 2005-02-23 14:12 · Score: 3, Interesting

Using statistical methods to predict the next item in a sequence is still not true hard ai though, this technique is used with the voice recognition software "Dragon Natually Speaking" creating in effect pattern chains. What Dragon did on the character level this software appears to do on the word level. This is still not true AI however, as the statistics will only map to probabilistic sequences not abstractly map instead to the concepts. What would really impress me is if they came up with a mapping algorithm that instead of using probability used a function like mini-max fitness testing on a neural-network substrate.
It would be interesting to see the results of analysing large sections of languages however, but the only immediate use I can fathom for this would be for cryptography or information compression algorithms. However the results could probably be used to provide insight into how languages evolve or how memes spread from language to language.
Or the brief explanation in the article did not make it clear enough how this differs from what was previously state-of-the-art, e.g. Dragon.

--
Shh.
so how can they grade you in school? by cheekyboy · 2005-02-23 14:16 · Score: 3, Insightful

One has to wonder if the language of choice English or whatever is so structured and rule ridden and not just made up on the fly. Then how come its so difficult to determine all the rules? Is it there are too many of them? too many contexes? Or just trying to translate bad grammer which fails the rules but any human can decipher it.

Sometimes brute force, ie look up tables for 100000000 translated versions can be better, so much for logic eh :-)

--
Liberty freedom are no1, not dicks in suits.
Time flies like an arrow... by Secret+Agent+99 · 2005-02-23 14:17 · Score: 5, Funny

...and fruit flies like a banana.

When an automated translator can handle that one without bursting into flames, I'll start to believe.
How is that news? Research was done 10 years ago. by Anonymous Coward · 2005-02-23 14:43 · Score: 4, Interesting

The basic approach has been developed over 10
years ago by IBM: The Mathematics of Statistical Machine Translation. And even free software has been available for a while, see
http://www.fjoch.com/GIZA++.html.
It's only a matter of time before... by gkwok · 2005-02-23 14:47 · Score: 4, Funny

Skynet begins to learn at a geometric rate. It becomes self-aware at 2:14am Eastern Time....
No samples? by Guspaz · 2005-02-23 14:47 · Score: 3, Interesting

Sounds interesting, but I couldn't find a single sample translation on their site; ie a block of text in language A (Say, french), and language B (Say, english). Translated from A to B by their software.

Without even the simplest of examples or samples we have only their word on how well this works.
DOOMED by FoXDie · 2005-02-23 14:48 · Score: 3, Interesting

Recently robots have been made that can Run, Wield shotguns, and Recognize faces. Now they can read. [DOOMED I SAY]

--
Support Liberty, Support Ron Paul
Too bad about the times it needs to think by Timbotronic · 2005-02-23 14:54 · Score: 3, Insightful

I like the approach they've taken, but machine translation can only ever go so far.
A friend of mine was trying to translate an English novel into German a while back. She had to work out a replacement for a sentance where the word 'therapist' was construed as 'the rapist'. Hell of a job and she's a professional translator.
Automatic translation looks pretty good for technical documents, news and anything completely literal. When you get writing with double meanings, humour and plays on words it gets way harder - often to the point where there is no correct translation.

--
One of these days I'm moving to Theory - everything works there
efnet spanish by Garabito · 2005-02-23 15:41 · Score: 5, Funny

k apr3ndist3 3sp4ni0l en IRC?
q w3n0! 3so si está 1337!
The first such system was built in 1993. by Dulimano · 2005-02-24 01:02 · Score: 3, Interesting

This is news of '93, when Brown et al. at IBM built their famous statistical machine translation system. It does exactly what is described in the article. I myself work on such a system (for Hungarian-to-English translation).

The article (press release?) is totally misleading. Kevin Knight and Daniel Marcu are building on at least 15 years of active research on statistical machine translation. On the other hand, they are really very good at it.