Translation Software That Learns by Reading

← Back to Stories (view on slashdot.org)

Translation Software That Learns by Reading

Posted by samzenpus on Wednesday February 23, 2005 @01:54PM from the it-is-fundamental dept.

redcone writes "New Scientist is reporting that translation software that develops an understanding of languages by scanning through thousands of previously translated documents has been released by U.S. researchers. According to the article "The translated documents used to teach the translation algorithms can be electronic, on paper, or even audio files. The system is not only faster than other methods, but also better suited to tackling less common languages and the unusual vocabulary found in specialised or technical texts.""

8 of 308 comments (clear)

Min score:

Reason:

Sort:

Turing test by OneArmedMan · 2005-02-23 13:59 · Score: 3, Insightful

I wonder if something similar to this could be used for AI , for say Turing Test's ?
That sounds like a good approach by FunWithHeadlines · 2005-02-23 14:03 · Score: 3, Insightful

I wish them luck (cuz they'll need it), but if anything is going to produce translation software that really works it will have to include learning elements of this nature. It's one thing to get dictionary translations. That's been around for decades, with its laughable results. Humans speak in metaphor and simile and slang and contractions and abbreviations of thought all the time. We're the cat's meow of language (try that, computer!).
But if you give computers a bunch of human stuff to read, you expose the dictionaries to language as it is actually used, not just as the dictionary has it. Then when odd language usage falls upon us like it's raining cats and dogs, they will have a database of similar usage to draw upon. Hey, it's an uphill climb, but this is a good avenue to try. Cheerio, computers, and a top o' the mornin' to ya.
Philosophical caveat by Raindance · 2005-02-23 14:03 · Score: 4, Insightful

As a caveat, we should be wary of saying the system "understands" a language.

I would say generally that humans able to translate between languages generally understand both languages, but whether a statistical, probabilistic model based on correlations understands a language might be a stretch.

Further reading: Searle's Chinese Room argument- http://en.wikipedia.org/wiki/Chinese_room

This is akin to asking, Does your tax software understand the tax code? Does Photoshop understand the principles of image manipulation?

Are these silly questions to ask?

Further reading: Dennett on intentionality (http://en.wikipedia.org/wiki/Dennett but the entry is pretty sparse).

RD
1. Re:Philosophical caveat by back_pages · 2005-02-23 14:47 · Score: 4, Insightful
  
  Great example of this:
  Mom baked for three hours.
  The pie baked for three hours.
  "Mom" and "The pie" are the subjects. The verb and entire predicate are identical. Understanding the language disambiguates these sentences, but the ambiguity is part of what defines humor.
  A man walked into a bar. Ouch!
  A man wanted to win a pun contest in the local newspaper, so he entered 10 times in order to increase the chances that one of his entries would win. Unfortunately, no pun in ten did.
  You can translate that 50 ways from Sunday but without understanding the language - understanding what makes those statements interesting - the machine will lose all their meaning.
Translating specialised texts ... by rkmath · 2005-02-23 14:03 · Score: 4, Insightful

The article (and the text of the orginial posting) makes it seem like translating a specialized technical text is somehow harder than translating, say, a newspaper article. As someone experienced in translating technical (science/engineering) documents, I can say that any tech document is far _easier_ to translate after an initial learning curve.

The main reason (I think) is that: tech documents have specialised vocabulary and idioms, but these are much fewer than the idioms one has to master in order to understand the editorial page in a newspaper.

With a rudimentary knowledge of Russian and French, I have found it much easier to read an engineering textbook or paper in these languages, than reading any nontechnical text. (This is not necessarily the case with other languages. Any document in Japanese for instance is an entirely different ballgame ...)
so how can they grade you in school? by cheekyboy · 2005-02-23 14:16 · Score: 3, Insightful

One has to wonder if the language of choice English or whatever is so structured and rule ridden and not just made up on the fly. Then how come its so difficult to determine all the rules? Is it there are too many of them? too many contexes? Or just trying to translate bad grammer which fails the rules but any human can decipher it.

Sometimes brute force, ie look up tables for 100000000 translated versions can be better, so much for logic eh :-)

--
Liberty freedom are no1, not dicks in suits.
Too bad about the times it needs to think by Timbotronic · 2005-02-23 14:54 · Score: 3, Insightful

I like the approach they've taken, but machine translation can only ever go so far.
A friend of mine was trying to translate an English novel into German a while back. She had to work out a replacement for a sentance where the word 'therapist' was construed as 'the rapist'. Hell of a job and she's a professional translator.
Automatic translation looks pretty good for technical documents, news and anything completely literal. When you get writing with double meanings, humour and plays on words it gets way harder - often to the point where there is no correct translation.

--
One of these days I'm moving to Theory - everything works there
Re:High school Spanish by Temposs · 2005-02-23 19:15 · Score: 3, Insightful

Until I see this new process in the works, however, there is nothing that will make me believe it's better than finding another human who can *understand* what you are saying and the context to which you are implying. "Better" is an ambiguous term. For what these researchers made the program for, it is better than humans for one reason: speed. Sure they want the translations to be reliable, but more importantly is that a computer can do in a few days what would take a human a month, for this application at least. The NSA and the like want to have translations of huge swathes of text, and fast! The sooner they can understand things that are written, the faster they can react to threats. The time and money spent on human translators for this purpose is very slow and expensive in comparison. For your Spanish HW, the best is a native speaker giving you feedback, because the amount of work is small and the translations will be very accurate.

--
Knowledge is just opinion that you trust enough to act upon. -Orson Scott Card