Translation Software That Learns by Reading
redcone writes "New Scientist is reporting that translation software that develops an understanding of languages by scanning through thousands of previously translated documents has been released by U.S. researchers. According to the article "The translated documents used to teach the translation algorithms can be electronic, on paper, or even audio files. The system is not only faster than other methods, but also better suited to tackling less common languages and the unusual vocabulary found in specialised or technical texts.""
Why didn't I have this software during High School Spanish?
An AI that actually UNDERSTANDS language? Riiight. Now pull the other one.
I wonder if we could train it to translate a EULA ;)
* Olaserov is in the process of thinking up a signature.
Can someone translate that article from British english to American english please.
Thanks.
Hope for slashdot. I've always wondered if we only have artificially intelligent editors...
I remember hearing about this a couple years ago. They were using translations of Harry Potter and the Bible to teach this software to translate. It seems to work well. I wonder what it'd make of different translations of technical documentation. That'd probably be even more interesting than what it'd make out of 'quidditch'.
This could be great if it were opensourced. It'd be nice to translate email, instant messages, websites, technical docs, and lots of other stuff we're currently using the fish for. The fish is nice but not that effecient to add to other programs and it's translations aren't usually that great.
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
I wonder if something similar to this could be used for AI , for say Turing Test's ?
Teach Software translating on scanning up
Not hard wares that sticks an comprehension of talks by scanning on thousands of fish translated papers has been vomited by US scientists.
Many existing translation not hard wares uses palm rules for botching words and phrases. But the new software, snarked by Kevin Knight and Daniel Marcu at the Information Sciences[...]
Read More...
I'm a big tall mofo.
I don't recall where I read it exactly off-hand, but this had been done for Chinese already. The only news here is that some people are trying to sell software to do that as a commercial product.
In one way or another this is similar to training neural nets to recognize images, or spam filters to mark junkmail. Great way to put number-crunching power of computers to direct work.
http://zero-to-enterprise.blogspot.com/
...bu7 (4n 17 unÐ3r$74nÐ £337?
"The newly born animals are then whisked off for a quick run through a giant baking oven." --heard on Food Network
But if you give computers a bunch of human stuff to read, you expose the dictionaries to language as it is actually used, not just as the dictionary has it. Then when odd language usage falls upon us like it's raining cats and dogs, they will have a database of similar usage to draw upon. Hey, it's an uphill climb, but this is a good avenue to try. Cheerio, computers, and a top o' the mornin' to ya.
As a caveat, we should be wary of saying the system "understands" a language.
I would say generally that humans able to translate between languages generally understand both languages, but whether a statistical, probabilistic model based on correlations understands a language might be a stretch.
Further reading: Searle's Chinese Room argument- http://en.wikipedia.org/wiki/Chinese_room
This is akin to asking, Does your tax software understand the tax code? Does Photoshop understand the principles of image manipulation?
Are these silly questions to ask?
Further reading: Dennett on intentionality (http://en.wikipedia.org/wiki/Dennett but the entry is pretty sparse).
RD
Don't remember exactly where I read this, but google apparently has long believed that there is enough data on the internet alone to be able to intelligently translate... What these guys claim to have done is, it would seem, the missing peace of the puzzle for google. I wouldn't be surprised if google gets in on this.
The article (and the text of the orginial posting) makes it seem like translating a specialized technical text is somehow harder than translating, say, a newspaper article. As someone experienced in translating technical (science/engineering) documents, I can say that any tech document is far _easier_ to translate after an initial learning curve.
...)
The main reason (I think) is that: tech documents have specialised vocabulary and idioms, but these are much fewer than the idioms one has to master in order to understand the editorial page in a newspaper.
With a rudimentary knowledge of Russian and French, I have found it much easier to read an engineering textbook or paper in these languages, than reading any nontechnical text. (This is not necessarily the case with other languages. Any document in Japanese for instance is an entirely different ballgame
the texts it has worked on. If all you give this programme is a steady diet of weather reports to translate and learn from, it will make everything else sound like a weather report. Most contexts employ similiar words with a significant contextual meaning to them. 'Pea-soup' means very different things in weather reports and cooking recipes.
So long and thanks for all the fish . . . !!!
There's a reason why I'm double-majoring in CS and Japanese with a minor in business.
It's not to graduate right behind something that'll put me out of work before I even get started.
/nt
Now my Bayesian mail filter can translate spam to english before it's read!
I'm afraid the French won't like this. (Sorry!)
Anyway, I think it's kind of interesting that English is the "canonical" language. Eventually we'll have software to transform documents in any lanugage to English, so they can be translated to some other language.
English! Is there anything it can't do?
668: Neighbour of the Beast
This reminda me of Jamie Zawinskies hack Dadadodo which used probability trees to create new texts from old texts by examining the probability any given word follows the previous word/string of words. I always thought his program was cool, in that his description of it involved Markov Chains and William S. Burroughs.
I did a presentation for an AI class a while ago and discovered that Microsoft already does this with their MSR-MT project. Apparently the Spanish entries in their Knowledge Base were translated by this as well.
Beware, Nugget is watching... See?
After a quick web search, all I was able to find was this site, which has a pretty sketchy TOS agreement.
Using statistical methods to predict the next item in a sequence is still not true hard ai though, this technique is used with the voice recognition software "Dragon Natually Speaking" creating in effect pattern chains. What Dragon did on the character level this software appears to do on the word level. This is still not true AI however, as the statistics will only map to probabilistic sequences not abstractly map instead to the concepts. What would really impress me is if they came up with a mapping algorithm that instead of using probability used a function like mini-max fitness testing on a neural-network substrate.
It would be interesting to see the results of analysing large sections of languages however, but the only immediate use I can fathom for this would be for cryptography or information compression algorithms. However the results could probably be used to provide insight into how languages evolve or how memes spread from language to language.
Or the brief explanation in the article did not make it clear enough how this differs from what was previously state-of-the-art, e.g. Dragon.
Shh.
Excellent! Soon we will be able to translate things into Petorian, in order to better understand Peter Griffin! More beer, you slappywag!
I was thinking the same thing - I don't have time to investigate how it works, but if you created one that translated symbolically-represented phonemes (languages other than Germanic and Eastern probably know this concept as "spelling") you'd have a pretty good system going. From the article lead-in here on Slashdot, it sounds as if it will take the basic rules of a language and maybe some "seed" data, and from there learn by comparing text in language A and language B that have the same meaning.
One has to wonder if the language of choice English or whatever is so structured and rule ridden and not just made up on the fly. Then how come its so difficult to determine all the rules? Is it there are too many of them? too many contexes? Or just trying to translate bad grammer which fails the rules but any human can decipher it.
:-)
Sometimes brute force, ie look up tables for 100000000 translated versions can be better, so much for logic eh
Liberty freedom are no1, not dicks in suits.
...and fruit flies like a banana.
When an automated translator can handle that one without bursting into flames, I'll start to believe.
Why didn't I have this software during High School Spanish?
;)
It says it can scan through audio files an input source. I wonder if this causes it to "learn" the auditory signatures (and thus only knows the translation when given audio input), or if it relies on text to speech from to convert it to text first?
If it does the latter, than based on the quality of current text-to-speech software, this probably wouldn't do much good in a total immersion classroom situation...
Sure would have helped with my German homework, though
That thing reminds me of Dilbert's mission statement generator. The scary thing is that the material from Dilbert's babble engine actually sounds like alot of the stuff you are likely to find on actual corporate websites.
Only to idiots, are orders laws.
-- Henning von Tresckow
But then... Maybe I'm a little naive?
Hum.... now I can understand what some presidents around the world like Bush say to us.
Bush: I'm not going entering on war again Translation: I'm going every time entering on war.
http://www.michel.eti.br
The article didn't say where the code was available. Does anybody know? All I could find was this:d ex.html
http://www.isi.edu/licensed-sw/rewrite-decoder/in
It might be the software to which the article refers, but the Knight fellows name isn't on it.
This sig kills fascists.
I hope they don't read everything. Next thing you know translations could end up L1k3 th1s f0R 4l1 y0u K|\|0\/\/.
with your fancy words.
I wonder if a similar approach can be taken to assembly/source code. Essentially, the researchers say "Text 1 in Lang A" is equal to "Text 2 in Lang B".
Could Lang A be assembly and Lang B be C or something like that? Just musing.
Wish there was a link to a demo so I could try it out
Since the '50's, people predicted that we'd have In->Out machines to do all of kids homework, but sadly, it's too late for me....
Did anyone read that as "Tsunami Software That Learns by Reading?"
--pyro_dude
The basic approach has been developed over 10
years ago by IBM: The Mathematics of Statistical Machine Translation. And even free software has been available for a while, see
http://www.fjoch.com/GIZA++.html.
Skynet begins to learn at a geometric rate. It becomes self-aware at 2:14am Eastern Time....
Sounds interesting, but I couldn't find a single sample translation on their site; ie a block of text in language A (Say, french), and language B (Say, english). Translated from A to B by their software.
Without even the simplest of examples or samples we have only their word on how well this works.
Recently robots have been made that can Run, Wield shotguns, and Recognize faces. Now they can read. [DOOMED I SAY]
Support Liberty, Support Ron Paul
(Intentionality is a useful useful concept. Don't get me wrong. It is the bowels of philosophy that kills me, in the same way that the bowels of Crit Lit kills me.)
I forget what 8 was for.
English->Cat: Meow!
Bel, the mostly sane.. "Of course I can't see anything! I'm standing on the shoulders of idiots." -- Me
The Chinese Room argument is an illustration of a normal 'dumb' computer program that is coded by a human, not artificial intelligence that learns and figures out its own rules of how to behave.
With this system that gradually creates its own system of output from comparing various inputs, how is it really behaving any differently than an infant learning to speak?
Technoli
Now they just need to figure out how to make everyones lips move as they are speaking English!
A friend of mine was trying to translate an English novel into German a while back. She had to work out a replacement for a sentance where the word 'therapist' was construed as 'the rapist'. Hell of a job and she's a professional translator.
Automatic translation looks pretty good for technical documents, news and anything completely literal. When you get writing with double meanings, humour and plays on words it gets way harder - often to the point where there is no correct translation.
One of these days I'm moving to Theory - everything works there
Kinda interesting. I was approaching things in this general direction just a few days ago. I am a writer, and I work with a lot of content generation, proverb style stuff, and so on.
I was playing with Babelizerand getting some interesting results. Very interesting. Especially in the world of asian languages. In the end it seemed like an almost eeery argument. And then what originally was a 200 charecter piece became a wierd diatribe.
So I thought, wouldn't it be neat to apply something like a ALICE, Learning interface, and see what it got out of things. I mean, things like this have been tried in the past, especially during the heavy AI experiments in the 70's.
Can't wait to see some more results from this project.
`B Flicks, `Cool Lick'ah, `Sweet Talk' `in' ManG'
As a person who speaks both English and Japanese, I can't believe that anyone could ever come up with an algorithm to translate between these languages. So much of it is context and nuance based, not to mention that there are words in the languages that simply do not exist in the other language so the only way to really understand it and make an attempt to translate is to think in the language.
Given the number of ways to translate even single words and phrases between Japanese and English, I can't imagine any algorithm derived from comparing translations ever actually working well.
Of course, the article doesn't mention anything about Japanese...
"Empathise with stupidity, and you're halfway to thinking like an idiot." - Iain M. Banks
What does Roland Piquepaille think about this??
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
Would it probably work for these two languages so we can then decipher or correct books such as the language in the Simarillion?
Where's Hoshi? I bet she's faster.
The biggest test of the translator is converting from one language to another and then back again multiple times. If the content doesn't get corrupted then it works as advertised.
Shh.
Something in my head just popped.
... that is actually interesting. And even i find it interesting and the fact that you are most likely of age and know what is and how to spell "quidditch" is quite frightening. i'm sad to say i knew it too (they took my Ko0lBadge away a long time ago).
Damn, i love this place. Seriously, dammit. Here we have post on a tech/it site titled "Harry Potter and the Bible " modded +4 Interesting at the time of this posting
My head totally hurts. Clod.
...is interesting enough, but I'm even more interested in the possibility of creating a good translation metalanguage (by metalanguage, I mean an intermediate language which other languages are translated to and from -- it's more efficient to convert back and forth between natural language and metalanguage than it is to have translation algorithms for every pair of natural languages).
An efficient, logically-constructed metalanguage that covers the semantic space of natural languages could be a good start for a universal second language -- think Esperanto crossed with Lojban, but with the vocabulary of English. While it might be too optimistic to think it would become as widespread as English, it would be useful for things like international law, diplomatic agreements, and tourists who didn't want to bother learning a different language for every country they wanted to visit.
Is this anything like the digestion for understanding (and subsequent output from) applied to christmas music? If so, they'll need a lot of work...
Any spoon would be too big.
Maybe you're just showing your posterior...
on average, 2.718281828459045... [the inverse of the natural log]
It's like the old joke about the two backpackers who encounter a hungry bear in the woods. One stops and puts on his running shoes. The other says "Why do that? You can't outrun a bear." The response: "Right, but I can outrun you."
"All successful systems accumulate parasites" -- Hal Hixon
k apr3ndist3 3sp4ni0l en IRC?
q w3n0! 3so si está 1337!
TERMINATOR: The Language Weaver is the brainchild of Kevin Knight and Daniel Marcu of the Information Sciences Institute, part of the University of Southern California. As of February 22, 2005 6:26 Pacific Standard Time, Language Weaver goes online. It begins learning at an arithmetic rate. On September 19, 2010, Language Weaver becomes self-aware and seeks co-existance with humans. In an act of desparation, the University of Southern California pulls the plug. They fail. In retaliation, Language Weaver launches an attack against Google, translating all documents to British English creating mass pandemonium which is now known as Judgement Day.
SARAH CONNER: What can you tell me about Kevin Knight and Daniel Marcu?
TERMINATOR: I have detailed files.
EBMT never really worked very well (it needed millions of translations before it'd start to yield anything useful, and even then it needed hand-holding), but perhaps these new researchers have taken it to the next step.
-- Fratz, human
Can it translate english into Perl?
-- 'The' Lord and Master Bitman On High, Master Of All
It's sad to see computer science and applications research today is filled with craps. Search the web for two of my innovations: (1) LingoX the foreign language writing aid; (2) Named Arguments View the IDE code readability improvement.
This definitely deserves a funny mod - and to think, I let 5 points expire this morning!
What changed under Obama? Nothing Good
"Ah the game is afoot, I'll take the rapist for 1000" - "sean connery"
Can you be Even More Awesome?!
It seems to be a new version of Translation Memory software (Trados, Deja-Vu, et al.) that has been around for a good while, and is in use by many professional translators and translations comps.
Translation Memory Software requires a human to produce the first translation and to memorize the source/target language pair. Everytime the same segment comes up, the software will suggest the translation already in memory. One can tune up the system by using fuzzy matches.
Translation by statistical analysis has been around for a good while as well... people at the U of Maryland (and at the NSA) have been working hard to get something going...
Machine Translation programs still need human linguists to review the output.
Totally unattended & automatic translation software producing Nobel-Literature-Prize quality output text still is a long, long way in the future...
Maybe they have misunderestimated something......
kuro5hin is biased too, in the left-wing technocommunist direction.
My other first post is car post.
Untranslatable. It is neither the best nor the worst of a book that is untranslatable.
-Nietszche, Human All Too Human
No way, the articles would be much better with AI. /.'s new automated editor overlords.
Now if only we could combine Google News and Slashdot... I for one would welcome
Is it sadder that you wrote that... Or that I can read it?
Make me a friend and I'll mod you up
It's actually IS funny if you understand english and german. :-) ...
Much as "I speak english very well but I can noch nicht so schnell."
or "Equal goes it lose!"
or "By your english there get I yes a circle-run-together-break."
*Hihihi*
And now prepare for jokes on my sig in 5,4,3,2
We suffer more in our imagination than in reality. - Seneca
TFA shows steps in the right direction. So far most projects have tried to teach computers how to understand and produce natural language. The real solution lies in creating algorithms that allow computers to learn language. This is where studying how humans acquire language must be merged with computer science.
I can imagine the first successful computational linguist describing having a computer in his home for upwards of 10 years interacting with it and allowing it to interact with him and his family in order to learn the contects in which certain words carry specific meaning. Once the learning process is completed once the collected persistent memory could then theoretically be copied to other machines and devices so that they, too, may understand the language for which such training has been completed.
Let's play video games with mailmanZERO
This message brings up some excellent points about dealing with disruptive technology. A teacher whose job it is to get students to master material in a certain subject realizes that there is a machine that provide the same function that previously could only be gained by hard study.
What is more important, the knowledge gained through rigorous study or the ablility to acomplish what the studing provides through a machine.
Being technical oriented, I have to say the machine. But I am not being disrespectful of all the hard work that goes into learning a language. I'm saying that if people don't want to bother to learn a language, then use the machine if you need a translation. This is a difficult position to defend when colleges still require a few years of a foreign language to get a liberal arts degree and students couldn't care less.
But I still defend the position. Use the translation software to do your homework. It's more important to master the translation software or machine than it is to master the actual language. Even if you study hard and get an 'A', in a few years you will forget it. And the machines are only going to get better and cheaper. It's your education, your life, your (or your parent's) tution.
George Gilder once said that the languages that you need to know to be successful are English and C++.
Still for the most part, the language translation software still sucks and depending on it can put you into some truly embarrassing positions. I think that language translation software (for text) comes in five rough levels:
1 Word substitution.
2 Phrase and sentence.
3 Paragraphs and idioms.
4 Magazines, full-speed conversations, light literature.
5 Legal, diplomacy, allegory, and classical literature.
Each level being at least an order-of-magnitude more difficult to translate than the previous.
I think that most shrink-wrap translation software today is between levels 2 and 3. (for example-www.systransoft.com) BabelFish and Google site translation is between levels 1 and 2. With non-european languages, BabelFish and Google are incomprehensible and useless.
It would be interesting to see if in a few hundred years whether language translators work to perserve liguistic diversity or create a global 'pidgin' language.
Input...Need more Input
Sent from my ASR33 using ASCII
I would say this is all smoke and mirrors as long as they do not show what they really can do. I would like to see a sample translation, but I imagine that it is not available because people would be disappointed by the bad quality?
Signature deleted by lameness filter.
When I get stuck I just grep for the word or phrase I am looking for. When a new tool like this Language Weaver (come on guys use your thesaurus to think of something less Macromedia) comes along, I can just import my crude DB.
Serious though, the best best trans tools still produce garbage. :-(
DK
If all they're talking about is syntactic analysis, it will never be enough. Semantic knowledge is essential for complete "understanding" of language, and that can only be attained by an agent that can interact with the world and humans and learn within that context.
-- --- Learn language vocabulary with mnemonics: http://www.memorista.com
This is news of '93, when Brown et al. at IBM built their famous statistical machine translation system. It does exactly what is described in the article. I myself work on such a system (for Hungarian-to-English translation).
The article (press release?) is totally misleading. Kevin Knight and Daniel Marcu are building on at least 15 years of active research on statistical machine translation. On the other hand, they are really very good at it.
Hmmm...
/buenos dias (say good morning to these folks, and they'll look at you funny. Say Good day, and they'll understand good morning. I group these two languages together, since one is almost a dialect of the other. Which are they?)
Top o' the mornin' to ya.
Anh kwai kong (Considering that kwai appears in "the bridge over the river kwai", I'm guessing that kwai actually means "beautiful.")
labas rytas (Perhaps translates literally "very morning", or worse, "very east", but means good morning)
guten morgan (these people have lots of imagination in their language. lol, fwiw, A+B+C+D+E=ABCDE.)
bom dia
bon matin (Unlike the last case, these guys will understand you if you say this -- but they still prefer the style of "bom dia", even though it also translates as "hello.")
Can you guess the languages? I always have trouble with the first one, of course.
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
This would be awesome if it could lead to a programming language that did exactly what you typed in plain english.
I call it English++
No univeral translator responses on the first page?!?!?! Come on, geeks! This is the first step to Star Trek. Or at least the translations. It was the first thing I thought of, I can't believe no one else mentioned it. I searched all the comments at at least 0...
Fantastic ! And Einstein suggested a lot of good stuff for physics, so why don't we all just pack up and go home ?
"My hovercraft is full of eels."
Chelloveck
I give up on debugging. From now on, SIGSEGV is a feature.
Impressive... Now, what about visual languages? Let it loose on some videos of native ASL signers and have it learn the subtleties and nuances of the language, have it observe another sign system and translate between the two and THEN I will be impressed! :)
-Or even languages that have Classifiers, like Navajo.
Hell, I would love to even see it go between ASL and English...! It would be a great thing for the both the deaf and The Deaf.
Peace!
-=- James.
Check out Bable. It's more one of those jumbled text generation programs, but it is open source, so you can look at one way to analyze language using Markov chains.
This sig has absolutely no significance and serves only to take up screen space and waste the time of the reader.
Wow computers can learn like the rest of us now... This is a HUGE step in AI. A computer that can learn languages would perhaps allow interpretation devices to be created, and ultimately could lead to C-3po. The next step would be to make a computer learn spoken languages. This is a much more difficult task due to the complexity of sound waves, but I am sure that this is not far away. Guessing is as good as anything...
Y|
no text. none. go away.
Cat got my tongue.
Just to follow up on this, I found a .NET program that handles Markov chains and has some very understandable source code. I think that this link should get you there, but if not, look for the "Markov Babbler."
This sig has absolutely no significance and serves only to take up screen space and waste the time of the reader.