Translation Software That Learns by Reading

Turing test by OneArmedMan · 2005-02-23 13:59 · Score: 3, Insightful

I wonder if something similar to this could be used for AI , for say Turing Test's ?

That sounds like a good approach by FunWithHeadlines · 2005-02-23 14:03 · Score: 3, Insightful

I wish them luck (cuz they'll need it), but if anything is going to produce translation software that really works it will have to include learning elements of this nature. It's one thing to get dictionary translations. That's been around for decades, with its laughable results. Humans speak in metaphor and simile and slang and contractions and abbreviations of thought all the time. We're the cat's meow of language (try that, computer!).

But if you give computers a bunch of human stuff to read, you expose the dictionaries to language as it is actually used, not just as the dictionary has it. Then when odd language usage falls upon us like it's raining cats and dogs, they will have a database of similar usage to draw upon. Hey, it's an uphill climb, but this is a good avenue to try. Cheerio, computers, and a top o' the mornin' to ya.

Philosophical caveat by Raindance · 2005-02-23 14:03 · Score: 4, Insightful

As a caveat, we should be wary of saying the system "understands" a language.

I would say generally that humans able to translate between languages generally understand both languages, but whether a statistical, probabilistic model based on correlations understands a language might be a stretch.

Further reading: Searle's Chinese Room argument- http://en.wikipedia.org/wiki/Chinese_room

This is akin to asking, Does your tax software understand the tax code? Does Photoshop understand the principles of image manipulation?

Are these silly questions to ask?

Further reading: Dennett on intentionality (http://en.wikipedia.org/wiki/Dennett but the entry is pretty sparse).

RD

Re:Philosophical caveat by back_pages · 2005-02-23 14:47 · Score: 4, Insightful

Great example of this:
Mom baked for three hours.
The pie baked for three hours.
"Mom" and "The pie" are the subjects. The verb and entire predicate are identical. Understanding the language disambiguates these sentences, but the ambiguity is part of what defines humor.
A man walked into a bar. Ouch!
A man wanted to win a pun contest in the local newspaper, so he entered 10 times in order to increase the chances that one of his entries would win. Unfortunately, no pun in ten did.
You can translate that 50 ways from Sunday but without understanding the language - understanding what makes those statements interesting - the machine will lose all their meaning.
Re:Philosophical caveat by idlake · 2005-02-23 15:04 · Score: 2, Insightful

You are right: this software does not understand language; it works out statistical correspondences, but it has no understanding of the physical correlates of words. That also means that it has intrinsic limitations.

Note also that such statistical approaches are nothing new, it's just that computers are finally getting powerful enough that people can use them.

None of that has anything to do with Searle. Searle wouldn't admit that the system understands language even if it knew things about the real world. Searle's argument is arbitrary and ad hoc. You are free to believe in it if you like, but sooner or later, you will have to defend your position against a computer that will insist that if you don't admit the possibility that it is self-aware and intelligent, well, it is just going to assert that you aren't either.
Re:Philosophical caveat by lakeland · 2005-02-23 15:05 · Score: 2, Insightful

NO! NO! NO! Not the Searle argument again. That guy is an absolute nutter and should be banned! Actually, on second thoughts, as long as I never have to hear his drivel again, I don't mind what happens to him.

His argument essentially boils down to: "The computer doesn't understand because all it does is manipulate symbols. Even if it does exactly the same steps as a human, the human understood and the computer was just being a mimic. Giving the computer a body wouldn't make it any less of a mimic".

The #1 flaw in his argument is that it would result in humans being classified as non-intelligent. He constantly spouts on about how machines aren't intelligent but then says "Except that the human understands what they are doing". I think that's a close contender to the Wookie defense for world's worst argument.
Re:Philosophical caveat by Coulson · 2005-02-23 15:22 · Score: 2, Insightful

Searle's Chinese Room argument is hogwash.

In his scenario, Searle claims that neither the people moving the Chienese tokens, nor the book of instructions telling them what to do "understands" what is being said. That is obviously true, but it misses the point. That's like saying that the neurons in your head don't understand what you are saying, and so neither do you.

The workers in the Chinese Room argument are just hardware. They're akin to neurons in the brain, or chips in a computer. They're blindly executing instructions (software) from a book and recording results on blank pages (working memory). No AI proponent is arguing that the chips in the computer "understand" anything. Chips just dumbly execute instructions. What's interesting is the combination of software and persistent memory, and the apparent conciousness that can arise therefrom.

Searle's argument must either be considered void, or one is compelled to admit that humans don't "understand" anything either. As such, it's hogwash.
Re:Philosophical caveat by Anonymous Coward · 2005-02-23 16:42 · Score: 1, Insightful

It's true that humor is based on understanding, and people generally don't claim that translation programs understand a language... But even a human who understands both languages usually can't translate a pun.

Also, a sufficiently complex probability based model of language should be able to distinguish between "Mom baked for three hours" and "The pie baked for three hours", since it will have 'pie' associated with 'bake' with a high probability of being its object, and with the proper rule would conclude that the change in order indicated that 'the pie baked' was a passive construction.
Re:Philosophical caveat by evilmousse · 2005-02-23 18:34 · Score: 2, Insightful

that's exactly why i like my anime fansubbed instead of sanitized.
Re:Philosophical caveat by tgv · 2005-02-23 19:07 · Score: 2, Insightful

The translation depends on the semantic class of the subject (is the subject a potential baker -> use translation nr. 1, is the subject something that's usually being baked -> use translation nr. 2). So, ignoring other issues, this particular problem is easy to solve, it only takes a lot of work.

BTW, the fact that it isn't entirely grammatical has nothing to do with it (if we understand it, we can translate it, so any MT faces the same challenge).

Translating specialised texts ... by rkmath · 2005-02-23 14:03 · Score: 4, Insightful

The article (and the text of the orginial posting) makes it seem like translating a specialized technical text is somehow harder than translating, say, a newspaper article. As someone experienced in translating technical (science/engineering) documents, I can say that any tech document is far _easier_ to translate after an initial learning curve.

The main reason (I think) is that: tech documents have specialised vocabulary and idioms, but these are much fewer than the idioms one has to master in order to understand the editorial page in a newspaper.

With a rudimentary knowledge of Russian and French, I have found it much easier to read an engineering textbook or paper in these languages, than reading any nontechnical text. (This is not necessarily the case with other languages. Any document in Japanese for instance is an entirely different ballgame ...)

Re:Translating specialised texts ... by Beryllium+Sphere(tm) · 2005-02-23 15:55 · Score: 2, Insightful

Absolutely true. One of Beryllium Sphere's partners is a computational linguist. For quite a while her bread and butter was building representations of knowledge about heavy equipment maintenance to support automatic translation of Caterpillar Tractor technical manuals.

There's more to help you than just the specialized vocabulary. It's good that "crankshaft" is unambiguous but it also helps to know in advance that "bolt" will be a noun and not a verb.

Also, to be blunt, nobody expects technical prose to sound as good as normal prose. In fact they're happy if it's just functional. A fluent French speaker thought the automatic translation output was a bit stilted, but the first time one French mechanic read it he jumped up and down for joy and said "If only we'd had this earlier, I wouldn't have almost lost my arm!".

so how can they grade you in school? by cheekyboy · 2005-02-23 14:16 · Score: 3, Insightful

One has to wonder if the language of choice English or whatever is so structured and rule ridden and not just made up on the fly. Then how come its so difficult to determine all the rules? Is it there are too many of them? too many contexes? Or just trying to translate bad grammer which fails the rules but any human can decipher it.

Sometimes brute force, ie look up tables for 100000000 translated versions can be better, so much for logic eh :-)

--
Liberty freedom are no1, not dicks in suits.

Mission statements by Savage-Rabbit · 2005-02-23 14:24 · Score: 2, Insightful

That thing reminds me of Dilbert's mission statement generator. The scary thing is that the material from Dilbert's babble engine actually sounds like alot of the stuff you are likely to find on actual corporate websites.

--
Only to idiots, are orders laws.
-- Henning von Tresckow

Too bad about the times it needs to think by Timbotronic · 2005-02-23 14:54 · Score: 3, Insightful

I like the approach they've taken, but machine translation can only ever go so far.

A friend of mine was trying to translate an English novel into German a while back. She had to work out a replacement for a sentance where the word 'therapist' was construed as 'the rapist'. Hell of a job and she's a professional translator.

Automatic translation looks pretty good for technical documents, news and anything completely literal. When you get writing with double meanings, humour and plays on words it gets way harder - often to the point where there is no correct translation.

--

One of these days I'm moving to Theory - everything works there

Oh fuck. by hot_Karls_bad_cavern · 2005-02-23 15:08 · Score: 2, Insightful

Something in my head just popped.

Damn, i love this place. Seriously, dammit. Here we have post on a tech/it site titled "Harry Potter and the Bible " modded +4 Interesting at the time of this posting ... that is actually interesting. And even i find it interesting and the fact that you are most likely of age and know what is and how to spell "quidditch" is quite frightening. i'm sad to say i knew it too (they took my Ko0lBadge away a long time ago).

My head totally hurts. Clod.

Can it run faster than you? by ibn_khaldun · 2005-02-23 15:38 · Score: 2, Insightful

The critical issue is not whether this system will produce translations comparable to those done by a translator fluent in both languages -- it won't. However, it may do as well or better than translations by someone barely competent in one of the languages (or who is essentially just doing dictionary-based translation). English speakers have lots of examples of nearly incomprehensible technical translations from Chinese and Korean, and the Chinese and Koreans would probably have comparable examples of bad translations from English except for the fact the US doesn't make anything they want to buy that requires a manual (soybeans and car-chase movies don't require manuals). Okay, maybe software -- there is a story (probably apochryphal) that an early Spanish version of the manual for the DOS operating system (command line Windows without the viruses, for you young'uns) translated every instance of DOS as "two"

It's like the old joke about the two backpackers who encounter a hungry bear in the woods. One stops and puts on his running shoes. The other says "Why do that? You can't outrun a bear." The response: "Right, but I can outrun you."

--

"All successful systems accumulate parasites" -- Hal Hixon

Re:Time flies like an arrow... by finiteSet · 2005-02-23 16:26 · Score: 2, Insightful

"Time flies like an arrow" is a simile, and is idiomatic. There are a finite set of idioms, and they should be fine as "memorized" exceptions in a speech system (they are often memorized exceptions in humans). Most language is rule based, but I think many underestimate the number of idioms that humans encounter and have difficulty "parsing."

"Time flies like an arrow, fruit flies like a banana" is a joke. Translating it into other languages would neither be funny or especially meaningful, as the whole point is to play the idiom for a joke.

Humans are imperfect speech systems - everyday people hear things wrong, misinterpret sentences, etc. Humans just typically have lower error rates than machine systems, especially for language systems. Building a system that understands jokes, metaphors, etc. will take an extensive knowledge to draw from, which is one of the big advantages humans have in disambiguating language. Without a large knowledge-base and efficient ways of getting feedback to update that knowledge-base, computers will still have difficulty disambiguating novel phrases and words. Even then it is unrealistic for them to be able to always "understand" idioms, which rarely retain a meaning that can be deduced, just as it is unrealistic for humans to always understand an idiom when they first encounter it. Language systems should do what humans do - memorize its meaning and move on. You're welcome to wait for systems that understand jokes, and you'll probably be waiting for a while. I don't think, however, that is a useful prerequisite for "believing" in language systems.

--
If we start buying CDs then the terrorists have already won.

Re:High school Spanish by Temposs · 2005-02-23 19:15 · Score: 3, Insightful

Until I see this new process in the works, however, there is nothing that will make me believe it's better than finding another human who can *understand* what you are saying and the context to which you are implying. "Better" is an ambiguous term. For what these researchers made the program for, it is better than humans for one reason: speed. Sure they want the translations to be reliable, but more importantly is that a computer can do in a few days what would take a human a month, for this application at least. The NSA and the like want to have translations of huge swathes of text, and fast! The sooner they can understand things that are written, the faster they can react to threats. The time and money spent on human translators for this purpose is very slow and expensive in comparison. For your Spanish HW, the best is a native speaker giving you feedback, because the amount of work is small and the translations will be very accurate.

--
Knowledge is just opinion that you trust enough to act upon. -Orson Scott Card

Computers must learn like humans by mailman-zero · 2005-02-23 20:39 · Score: 2, Insightful

TFA shows steps in the right direction. So far most projects have tried to teach computers how to understand and produce natural language. The real solution lies in creating algorithms that allow computers to learn language. This is where studying how humans acquire language must be merged with computer science.

I can imagine the first successful computational linguist describing having a computer in his home for upwards of 10 years interacting with it and allowing it to interact with him and his family in order to learn the contects in which certain words carry specific meaning. Once the learning process is completed once the collected persistent memory could then theoretically be copied to other machines and devices so that they, too, may understand the language for which such training has been completed.

--
Let's play video games with mailmanZERO

Re:Computers must learn like humans by trufflemage · 2005-02-24 02:10 · Score: 2, Insightful

"having a computer in his home for upwards of 10 years...."

One thing computers are is fast. Why make it sit through ten years absorbing input at human speeds when the digital content of the web is available for scanning as fast as the machine can?

Dealing with Disruptive Technology by Simonetta · 2005-02-23 20:41 · Score: 2, Insightful

This message brings up some excellent points about dealing with disruptive technology. A teacher whose job it is to get students to master material in a certain subject realizes that there is a machine that provide the same function that previously could only be gained by hard study.
What is more important, the knowledge gained through rigorous study or the ablility to acomplish what the studing provides through a machine.
Being technical oriented, I have to say the machine. But I am not being disrespectful of all the hard work that goes into learning a language. I'm saying that if people don't want to bother to learn a language, then use the machine if you need a translation. This is a difficult position to defend when colleges still require a few years of a foreign language to get a liberal arts degree and students couldn't care less.
But I still defend the position. Use the translation software to do your homework. It's more important to master the translation software or machine than it is to master the actual language. Even if you study hard and get an 'A', in a few years you will forget it. And the machines are only going to get better and cheaper. It's your education, your life, your (or your parent's) tution.
George Gilder once said that the languages that you need to know to be successful are English and C++.

Still for the most part, the language translation software still sucks and depending on it can put you into some truly embarrassing positions. I think that language translation software (for text) comes in five rough levels:
1 Word substitution.
2 Phrase and sentence.
3 Paragraphs and idioms.
4 Magazines, full-speed conversations, light literature.
5 Legal, diplomacy, allegory, and classical literature.

Each level being at least an order-of-magnitude more difficult to translate than the previous.
I think that most shrink-wrap translation software today is between levels 2 and 3. (for example-www.systransoft.com) BabelFish and Google site translation is between levels 1 and 2. With non-european languages, BabelFish and Google are incomprehensible and useless.
It would be interesting to see if in a few hundred years whether language translators work to perserve liguistic diversity or create a global 'pidgin' language.

Re:Tests by Bugmaster · 2005-02-23 22:44 · Score: 2, Insightful

Ha ! I'd like to see a human translator (or a team of human translators) that could do that.

--
>|<*:=

Need to be embedded "in the world" by franoreilly · 2005-02-24 00:22 · Score: 2, Insightful

If all they're talking about is syntactic analysis, it will never be enough. Semantic knowledge is essential for complete "understanding" of language, and that can only be attained by an agent that can interact with the world and humans and learn within that context.

--
-- --- Learn language vocabulary with mnemonics: http://www.memorista.com

Slashdot Mirror

Translation Software That Learns by Reading

24 of 308 comments (clear)