Translation Software That Learns by Reading
redcone writes "New Scientist is reporting that translation software that develops an understanding of languages by scanning through thousands of previously translated documents has been released by U.S. researchers. According to the article "The translated documents used to teach the translation algorithms can be electronic, on paper, or even audio files. The system is not only faster than other methods, but also better suited to tackling less common languages and the unusual vocabulary found in specialised or technical texts.""
I don't recall where I read it exactly off-hand, but this had been done for Chinese already. The only news here is that some people are trying to sell software to do that as a commercial product.
This reminda me of Jamie Zawinskies hack Dadadodo which used probability trees to create new texts from old texts by examining the probability any given word follows the previous word/string of words. I always thought his program was cool, in that his description of it involved Markov Chains and William S. Burroughs.
I did a presentation for an AI class a while ago and discovered that Microsoft already does this with their MSR-MT project. Apparently the Spanish entries in their Knowledge Base were translated by this as well.
Beware, Nugget is watching... See?
The article (and the text of the orginial posting) makes it seem like translating a specialized technical text is somehow harder than translating, say, a newspaper article. As someone experienced in translating technical (science/engineering) documents, I can say that any tech document is far _easier_ to translate after an initial learning curve.
Of course that is true, for a human translator. Your knowledge of the technical field itself is a resource you can use to aid in your translation of technical texts. For machines, it's usually necessary to use a translator specifically geared to the subject matter. For instance, you would definitely want to use a different machine translator for a newspaper article as opposed to a biomedical research journal.
This new approach is supposed to mitigate these problems. If they can do a good job of it, they may be able to bring machine translation to areas where previously human translators have been required or greatly preferred.
I was thinking the same thing - I don't have time to investigate how it works, but if you created one that translated symbolically-represented phonemes (languages other than Germanic and Eastern probably know this concept as "spelling") you'd have a pretty good system going. From the article lead-in here on Slashdot, it sounds as if it will take the basic rules of a language and maybe some "seed" data, and from there learn by comparing text in language A and language B that have the same meaning.
Fortunately I had the next best thing in High School Spanish. The trick is simply going to the #spain channel on efnet and talking nice to some people. You'd be amazed as to how often my teacher would fail my fellow students because they attempted using the primitive babelfish.altavista.com to do their work for them; she could easily spot the syntax errors and mis-spelled english words which were never translated.
Until I see this new process in the works, however, there is nothing that will make me believe it's better than finding another human who can *understand* what you are saying and the context to which you are implying.
I never said I trusted either source. But when you can read Arabic propaganda and contrast it with your own media's propaganda, it helps you to understand what the underlying causes for war are. It is also key to recognizing the true aggressor, because in every war both governments play the "good guys" role to their citizens. Direct translation helps you to understand the culture of your enemy. Things as simple as webpage advertisements, editorials, personals, etc, are lost in translation by CNN and the other alphabet news networks.
Go to http://www.systransoft.com, choose Arabic to English
The sentence "the pie was baked for three hours" differs in meaning, because it implies that someone was there, actively baking the pie.
>|<*:=