New Algorithm for Learning Languages

← Back to Stories (view on slashdot.org)

New Algorithm for Learning Languages

Posted by samzenpus on Wednesday August 31, 2005 @04:01PM from the universal-translator dept.

An anonymous reader writes "U.S. and Israeli researchers have developed a method for enabling a computer program to scan text in any of a number of languages, including English and Chinese, and autonomously and without previous information infer the underlying rules of grammar. The rules can then be used to generate new and meaningful sentences. The method also works for such data as sheet music or protein sequences."

8 of 454 comments (clear)

Min score:

Reason:

Sort:

Didn't Google already do this? by powerline22 · 2005-08-31 16:08 · Score: 5, Interesting

Google apparently has a system like this in their labs, and entered it into some national competetion, where it pwned everyone else. Apparently, the system learned how to translate to/from chinese extremely well, without any of the people working on the project knowing the language.
1. Re:Didn't Google already do this? by spisska · 2005-08-31 17:19 · Score: 5, Interesting
  
  IIRC, Google's translator works from a source of documents from the UN. By cross referencing the same set of documetents in all kinds of different languages, it is able to do a pretty solid translation built on the work of goodness knows how many professional translators.
  
  What is a little more confusing to me is how machine translation can deal with finer points in language, like different words in a target language where the source language has only one. English for example has the word "to know" but many languages use different words depending on whether it is a thing or a person that is known. Or words that relate to the same physical object but carry very different cultural connotations -- the word for female dog is not derogatory in every language, for example, but some other animals can be extremely profane depending on who you talk to.
  
  Or situations where two entirely different real-world concepts mean similar things in their respective language -- in English, for example, you're up shit creek, but in Slavic languages you're in the pussy.
  
  I've done translation work before (Slovak -> English), and there's much more going on than differences in words and grammar. There are whole conceptual frameworks in languages that just don't translate, and this is frustrating for anyone learning a language, let alone trying to translate. English is very precise (when used as directed) in matters of time and sequence -- we have more than 20 verb tenses where most languages get away with three.
  
  Consider this:
  
  I was having breakfast when my sister, whom I hadn't seen in five years, called and asked if I was going to the county fair this weekend. I told her I wasn't because I'm having the painters come on Saturday. They'll have finished by 5:00, I told her, so we can get together afterwords.
  
  These three sentences use six different tenses: past continuous, past perfect, past simple, present continuous, future perfect, and present simple, and are further complicated by the fact that you have past tenses refering to the future, present tenses refering to the future, and the wonderful future perfect tense that refers to something that will be in the past from an arbitrary future perspective, but which hasn't actually happened yet. Still following?
  
  On the other hand, English is much less precise in things like prepositions and objects, and utterly inexplicable when it comes to things like articles, phrasal verbs, and required word order -- try explaining why:
  
  I'll pick you up after work
  
  I'll pick the kids up after work
  
  I'll pick up the kids after work
  
  are all OK, but
  
  I'll pick up you after work
  
  is not.
  Machine translation will be a wonderful thing for a lot of reasons, but because of these kinds of differences in languages, it will be limited to certain types of writing. You may be able to get a computer to translate the words of Shakespeare, but a rose, by whatever name, is not equally sweet in every language.
SCIgen by OverlordQ · 2005-08-31 16:09 · Score: 5, Interesting

SCIgen anyone?

--
Your hair look like poop, Bob! - Wanker.
Speaking as someone working on NLP by OO7david · 2005-08-31 16:12 · Score: 4, Interesting

IAALinguist doing computational things and my BA focused mainly on syntax and language acquisition, so here're my thoughts on the matter.

It's not going to be right. The algorithm is stated as being statistically based which while is similar to the way children learn languages is not exactly it. Children learn by hearing correct native languages from their parents, teachers, friends, etc. The statistics come in when children produce utterances that either do not conform to speech they hear or when people correct them. However, statistics does not come in at all with what they hear.

With respect to the learning of the algorithm the underlying grammar of a language, I am dubious enough to call it a grand, untrue claim. Basically all modern views of syntax are unscientific and we're not going to get anywhere until Chompsky dies. Think about the word "do" in english. No view of syntax describes from where that comes. Rather languages are shoehorned into our constructs.

So, either they're using a flawed view of syntax or they have a new view of syntax and for some reason aren't releasing it in any linguistics journal as far as I know.
1. Re:Speaking as someone working on NLP by OO7david · 2005-08-31 16:55 · Score: 4, Interesting
  
  It is in effect two parted:
  
  Chomsky is to linguistics as Freud to psych. He had great ideas for the time (many still stand), and the science would be nowhere close to where it is without him. However, A) he's backed off alot of supporting his own theories and B) he's published papers contradicting his original ideas so that is some question there for their veracity. Since so many linguistics undergrads hold him as the pinnical of syntax none are really deviating drastically from him.
  
  WRT the unscientificness, to make his view fit English, there has to be "do-support" which basically is that when forming an interrogative "do" just comes in to make things work without any explanation. In other words, it is in our grammar, but our view of syntax does not account for it.
2. Re:Speaking as someone working on NLP by PurpleBob · 2005-08-31 17:08 · Score: 4, Interesting
  
  You're right about Chomsky holding back linguistics. (There are all kinds of counterarguments against his Universal Grammar, but people defend it because Chomsky Is Always Right, and Chomsky himself defends it with vitriolic, circular arguments that sound alarmingly like he believes in intelligent design.)
  
  And I agree that this algorithm doesn't seem that it would be entirely successful in learning grammar. But this is not because it's statistical. I don't understand how you can look at something as complicated as the human brain and say "statistics does not come in at all".
  
  If this algorithm worked, then it could be statistical, symbolic, Chomskyan, or magic voodoo and I wouldn't care. There's no reason that computers have to do things the same way the brain does, and I doubt they'll have enough computational power to do so for a long time anyway.
  
  No, the flaws in this algorithm are that it is greedy (so a grammar rule it discovers can never be falsified by new evidence), and it seems not to discover recursive rules, which are a critical part of grammar. Perhaps it's learning a better approximation to a grammar than we've seen before, but it's not really doing the amazing, adaptive, recursive thing we call language.
  
  --
  Win dain a lotica, en vai tu ri silota
No the didn't by Ogemaniac · 2005-08-31 16:53 · Score: 5, Interesting

I played around with the Google translator for a while. I work in Japan and am half-way fluent. Google couldn't even turn my most basic Japanese emails into comprehensible English. Same is true for the other translation programs I have seen.

I will believe this new program when I see it.

Translation, especially from extremely different languages, is absurdly difficult. For example, I was out with a Japanese woman the other night, and she said "aitakatta". Literally translated, this means "wanted to meet". Translated into native English, it means "I really wanted to see you tonight". It is going to take one hell of a computer program to figure that out from statistical BS. I barely could with my enormous meat-computer and a whole lot of knowledge of the language.
Re:just thought.. by jaavaaguru · 2005-08-31 20:37 · Score: 4, Interesting

Perhaps it the algorithm could be used to identify spam more accurately. If it can understand the text, then it's got a reasonable chance of know if the text is junk.

--
Follow me