Translation Software That Learns by Reading
redcone writes "New Scientist is reporting that translation software that develops an understanding of languages by scanning through thousands of previously translated documents has been released by U.S. researchers. According to the article "The translated documents used to teach the translation algorithms can be electronic, on paper, or even audio files. The system is not only faster than other methods, but also better suited to tackling less common languages and the unusual vocabulary found in specialised or technical texts.""
I remember hearing about this a couple years ago. They were using translations of Harry Potter and the Bible to teach this software to translate. It seems to work well. I wonder what it'd make of different translations of technical documentation. That'd probably be even more interesting than what it'd make out of 'quidditch'.
This could be great if it were opensourced. It'd be nice to translate email, instant messages, websites, technical docs, and lots of other stuff we're currently using the fish for. The fish is nice but not that effecient to add to other programs and it's translations aren't usually that great.
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
In one way or another this is similar to training neural nets to recognize images, or spam filters to mark junkmail. Great way to put number-crunching power of computers to direct work.
http://zero-to-enterprise.blogspot.com/
Don't remember exactly where I read this, but google apparently has long believed that there is enough data on the internet alone to be able to intelligently translate... What these guys claim to have done is, it would seem, the missing peace of the puzzle for google. I wouldn't be surprised if google gets in on this.
Does anybody understand the tax code? Why should software be any different?
I think that software that can learn can be said to understand a problem just as much as a human can. The difference between understanding and just doing is having the ability to learn from new data and to change your actions as required.
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
After a quick web search, all I was able to find was this site, which has a pretty sketchy TOS agreement.
Using statistical methods to predict the next item in a sequence is still not true hard ai though, this technique is used with the voice recognition software "Dragon Natually Speaking" creating in effect pattern chains. What Dragon did on the character level this software appears to do on the word level. This is still not true AI however, as the statistics will only map to probabilistic sequences not abstractly map instead to the concepts. What would really impress me is if they came up with a mapping algorithm that instead of using probability used a function like mini-max fitness testing on a neural-network substrate.
It would be interesting to see the results of analysing large sections of languages however, but the only immediate use I can fathom for this would be for cryptography or information compression algorithms. However the results could probably be used to provide insight into how languages evolve or how memes spread from language to language.
Or the brief explanation in the article did not make it clear enough how this differs from what was previously state-of-the-art, e.g. Dragon.
Shh.
Why didn't I have this software during High School Spanish?
;)
It says it can scan through audio files an input source. I wonder if this causes it to "learn" the auditory signatures (and thus only knows the translation when given audio input), or if it relies on text to speech from to convert it to text first?
If it does the latter, than based on the quality of current text-to-speech software, this probably wouldn't do much good in a total immersion classroom situation...
Sure would have helped with my German homework, though
The basic approach has been developed over 10
years ago by IBM: The Mathematics of Statistical Machine Translation. And even free software has been available for a while, see
http://www.fjoch.com/GIZA++.html.
Sounds interesting, but I couldn't find a single sample translation on their site; ie a block of text in language A (Say, french), and language B (Say, english). Translated from A to B by their software.
Without even the simplest of examples or samples we have only their word on how well this works.
Recently robots have been made that can Run, Wield shotguns, and Recognize faces. Now they can read. [DOOMED I SAY]
Support Liberty, Support Ron Paul
Prove your brain is not just a Chinese Room. Prove that you actually understand something, that you're not just an unconscious simulation able to produce the correct output.
It cannot be done. All you can provide as evidence of your consciousness is the output of whatever goes on in your brain; but the whole premise of the Chinese Room assumes it is possible for a non-conscious process to duplicate that output. After all, if the output of conscious and non-conscious processes can be distinguished, then the Turing Test has not been passed by the non-conscious process.
Accordingly, since we cannot show that a human being can "understand" in any meaning distinct from the way the Chinese Room can "understand", we can either declare that we cannot tell the difference between things that can and cannot "understand" (in which case it is an act of faith, not reason, to declare a Chinese Room doesn't "understand"), or we can declare that the Chinese Room does "understand".
Either way, the distinction makes no difference, and therefore has no consequences, philosophical or otherwise. And Searle, by harping on a distinction that makes no difference, is indistinguishable from a fool.
The biggest test of the translator is converting from one language to another and then back again multiple times. If the content doesn't get corrupted then it works as advertised.
Shh.
Until I see this new process in the works, however, there is nothing that will make me believe it's better than finding another human who can *understand* what you are saying and the context to which you are implying.
Heh. Then there is nothing that will make you believe, etc., etc.
Certainly you can't do good translation without understanding syntax (which influences meaning and underlies word order) and context (to disambiguate synonyms and phrases with multiple interpretations). Machines aren't especially good at either one yet; ergo, machine translation will continue to be pretty crappy for the foreseeable future.
Funny thing is, though, even a crappy translation turns out to be tremendously useful in most practical contexts, and worlds better than none at all; a simple word-for-word translation is typically hard to read but still conveys the proper gist. That's why I don't get excited about automatic translation "advances" these days: there are really two purposes for machine translation. One is figuring out what a piece of speech or text trying to say, and the current technology is usually good enough for that. The other is making a translation of sufficient quality to save a human translator some work, and I think that won't happen for quite a few years yet. Anything in between adds very little.
(By the way, everything in natural language processing these days uses corpus learning techniques. Now if an improved technology had been developed manually by bilingual programmers who pulled the design out of their collective hats, then that would be a man-bites-dog story!)
This is news of '93, when Brown et al. at IBM built their famous statistical machine translation system. It does exactly what is described in the article. I myself work on such a system (for Hungarian-to-English translation).
The article (press release?) is totally misleading. Kevin Knight and Daniel Marcu are building on at least 15 years of active research on statistical machine translation. On the other hand, they are really very good at it.