New Algorithm for Learning Languages

just thought.. by thegoogler · 2005-08-31 16:06 · Score: 3, Interesting

what if this could be integrated into a small plugin for your browser(or any program) of choice, that would then generate its own dictionary in your language.

would probably help with the problem of either downloading a small, incomplete dictionary, a dictionary with errors, or a massive dictionary file.

Re:just thought.. by Bogtha · 2005-08-31 16:49 · Score: 4, Insightful

This algorithm works with sample data. Where is the sample data going to come from? If you have to download it, then that negates the whole point of using it. If you use what you see online, well that's just rediculous, for obvious reasons :).

--
Bogtha Bogtha Bogtha
Re:just thought.. by Hikaru79 · 2005-08-31 17:26 · Score: 2, Insightful

This algorithm works with sample data. Where is the sample data going to come from? If you have to download it, then that negates the whole point of using it. If you use what you see online, well that's just rediculous, for obvious reasons :).

It's going to come from large bodies of text that exist in mmultiple langueages. Things like the Bible, the constitution, etcetera. The whole point of this technology is that by drawing conclusions from those texts, the program infers the underlying rules of the language and can therefore translate other things. Google was doing something similar. An online dictionary is completely different. First, it has to be compiled by someone. Second, it only helps for translating words verbatim. This technology would self-teach itself to translate languages, even if none of the researchers working on the project could even speak those languages themselves. That's the beauty of it.
Re:just thought.. by Bogtha · 2005-08-31 17:38 · Score: 1

Most of the times you see it spelt that way on Slashdot, it's because the person can't spell. When I spelt it that way, I was being ironic to make a point.

My point was basically that you can't use what you see online as sample data to drive this algorithm, because you'll end up with mistakes like "rediculous" in the dictionary because of the widespread mispellings on sites like Slashdot.

--
Bogtha Bogtha Bogtha
Re:just thought.. by Angst+Badger · 2005-08-31 17:41 · Score: 0

No, I believe that would be ludicrist.

--
Proud member of the Weirdo-American community.
Re:just thought.. by Bogtha · 2005-08-31 17:45 · Score: 1

It's going to come from large bodies of text that exist in mmultiple langueages.

The parent to my comment was suggesting that this algorithm be used in leau of a large dictionary download. I was pointing out that you'd have to download said "large bodies of text" to make it work, and so the whole exercise would be pointless.

The whole point of this technology is that by drawing conclusions from those texts, the program infers the underlying rules of the language and can therefore translate other things.

This isn't about machine translation, although it will probably help in those efforts. The reason multiple languages were mentioned in the summary was because this is a language-independent way of deriving grammar rules from sample data.

--
Bogtha Bogtha Bogtha
Re:just thought.. by Mac+Degger · 2005-08-31 17:46 · Score: 5, Informative

What they've develloped is something which interprets grammar; the ruleset behind the organisation of buildingblocks, apparently buildingblock agnostic.

A dictionary is just words. This algorythm cant assign meaning to the buildingblocks, it can only dicide how and in what order the buildingblocks go together.

--
-- Waht? Tehr's a preveiw buottn?
Re:just thought.. by Bogtha · 2005-08-31 17:52 · Score: 0

I'd rather people understand what I am saying than be funny.

--
Bogtha Bogtha Bogtha
Re:just thought.. by FhnuZoag · 2005-08-31 19:14 · Score: 1

Where is the sample data going to come from?
Maybe we can use spam emails v!@gra eafsdfjuas
Oh wait.
Re:just thought.. by Wizdumb · 2005-08-31 19:30 · Score: 1

Now to make it fit into a fish-looking earplug that translates sound waves into your own language
Re:just thought.. by jaavaaguru · 2005-08-31 20:37 · Score: 4, Interesting

Perhaps it the algorithm could be used to identify spam more accurately. If it can understand the text, then it's got a reasonable chance of know if the text is junk.

--
Follow me
Re:just thought.. by Anonymous+Writer · 2005-08-31 20:38 · Score: 0

"rediculous" is a perfectly cromulent word.
Re:just thought.. by psm321 · 2005-08-31 20:42 · Score: 2, Informative

From what I understand, google's thing was using purely statistics (i.e. a matches with b all the time in translations so when you see a, translate it to b), while this one actually "understands" the underlying grammer.
Re:just thought.. by qurk · 2005-08-31 20:49 · Score: 1

I agree with you. Unfortunately, I think you fell pray to an anonymous coward troll who was being deliberately obtuse :) I mean a computer comparing words to a dictionary may have a hard time comprehending...but a person who reads English? There have been studies that you just need like the first letter and the last letter and then you can like leave the vowels out and most people can still comprehend the word, except for boring people who will reply just to complain about your inadequate use of the English langauge. Too much bitchslapping by high school english teachers!!! :)
Re:just thought.. by maxwell+demon · 2005-08-31 20:53 · Score: 1

Indeed, it would help the problem that you can only get one of them, instead of directly getting a massive, incomplete dictionary with errors :-)

massive: clear, I think.
incomplete: because it will only cover the words on pages you have visited. So it won't contain a lot of words (given the type of pages you usually surf, I guess the missing words would be those which are used primarily in literature).
with errors: Wel, their haas bean enuff discusion off thhe probblem een othre ansers two yoor poust. :-)

Ok, there's one pair which it won't reconcile: Even with that technology you won't get a dictionary whichg is both small and massiva :-)

--
The Tao of math: The numbers you can count are not the real numbers.
Re:just thought.. by thomasa · 2005-08-31 22:51 · Score: 1

I wish this could be used with a small plugin in my brain. My non-native language learning skills are abysmal. Actually from the title of the article I thought it was talking about human learning not computer learning.
Re:just thought.. by KDR_11k · 2005-08-31 23:27 · Score: 1

What about non-native speakers?

--
Justice is the sheep getting arrested while an impartial judge declares the vote void.
Re:just thought.. by Mr+Z · 2005-09-01 00:40 · Score: 0, Offtopic

You're joke looses all its humor when you explain it. Try explaining grammer instead. Their more likely too get that then this.

--
Program Intellivision!
Re:just thought.. by Pollardito · 2005-09-01 02:04 · Score: 1

what if this could be integrated into a small plugin for your browser(or any program) of choice, that would then generate its own dictionary in your language.
right, because computers speaking l33t is just what we need
Re:just thought.. by Anonymous Coward · 2005-09-01 02:22 · Score: 0

I think he was making another joke about how words are spelt.
Re:just thought.. by igny · 2005-09-01 02:31 · Score: 1

If it can understand the text, then it's got a reasonable chance of know if the text is junk.

It does not have to understand the text to see that it is junk.

--
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
Re:just thought.. by Anonymous Coward · 2005-09-01 07:21 · Score: 0

It doesn't understand the text, it understands the gramatical structure behind the text.

How will it be able to differentiate between "never seen before - must be new language\character" and "never seen before - must be junk in SPAM". If you're going by quantity of the data, then feeding it multiple SPAM messages will simply means it will treat them as a new language. No, I haven't RTFA because it's /.ed and no, I won't be checking back on this comment.
Re:just thought.. by Hognoxious · 2005-09-09 21:30 · Score: 1

"this one actually "understands" the underlying grammer"
Yes, that's due to his clear Bostonian accent.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re:just thought.. by psm321 · 2005-09-15 10:14 · Score: 1

While we're at it...a string, a bareword, and another string without anything to join them? I'm not aware of any language that would be able to parse that (including human languages). :-)

Sucks to be a support tech in India by HeLLFiRe1151 · 2005-08-31 16:07 · Score: 5, Funny

Their jobs be outsourced to computers.

--
I've got 101 mod points and you can't have them!

Re:Sucks to be a support tech in India by Anonymous Coward · 2005-08-31 16:16 · Score: 2, Funny

Their jobs be outsourced to computers.

They is?
Re:Sucks to be a support tech in India by doxology · 2005-08-31 19:13 · Score: 2, Funny

Still a few bugs in this grammar interpretation software ;-)

--
sigfault. core dumped.
Re:Sucks to be a support tech in India by meadowsp · 2005-08-31 22:20 · Score: 0

Y'arghh matey, that they be.
Re:Sucks to be a support tech in India by A.Chwunbee · 2005-08-31 23:30 · Score: 0

be outsourced = are being outsourced. Are you not speaking the queens english, you foolish old chap?

--
select * from base where originalOwner = 'you' and currentOwner != 'us'. 0 rows returned.

Didn't Google already do this? by powerline22 · 2005-08-31 16:08 · Score: 5, Interesting

Google apparently has a system like this in their labs, and entered it into some national competetion, where it pwned everyone else. Apparently, the system learned how to translate to/from chinese extremely well, without any of the people working on the project knowing the language.

Re:Didn't Google already do this? by Anonymous Coward · 2005-08-31 16:45 · Score: 2, Informative

Here's the link at Google's blog.
Re:Didn't Google already do this? by spisska · 2005-08-31 17:19 · Score: 5, Interesting

IIRC, Google's translator works from a source of documents from the UN. By cross referencing the same set of documetents in all kinds of different languages, it is able to do a pretty solid translation built on the work of goodness knows how many professional translators.

What is a little more confusing to me is how machine translation can deal with finer points in language, like different words in a target language where the source language has only one. English for example has the word "to know" but many languages use different words depending on whether it is a thing or a person that is known. Or words that relate to the same physical object but carry very different cultural connotations -- the word for female dog is not derogatory in every language, for example, but some other animals can be extremely profane depending on who you talk to.

Or situations where two entirely different real-world concepts mean similar things in their respective language -- in English, for example, you're up shit creek, but in Slavic languages you're in the pussy.

I've done translation work before (Slovak -> English), and there's much more going on than differences in words and grammar. There are whole conceptual frameworks in languages that just don't translate, and this is frustrating for anyone learning a language, let alone trying to translate. English is very precise (when used as directed) in matters of time and sequence -- we have more than 20 verb tenses where most languages get away with three.

Consider this:

I was having breakfast when my sister, whom I hadn't seen in five years, called and asked if I was going to the county fair this weekend. I told her I wasn't because I'm having the painters come on Saturday. They'll have finished by 5:00, I told her, so we can get together afterwords.

These three sentences use six different tenses: past continuous, past perfect, past simple, present continuous, future perfect, and present simple, and are further complicated by the fact that you have past tenses refering to the future, present tenses refering to the future, and the wonderful future perfect tense that refers to something that will be in the past from an arbitrary future perspective, but which hasn't actually happened yet. Still following?

On the other hand, English is much less precise in things like prepositions and objects, and utterly inexplicable when it comes to things like articles, phrasal verbs, and required word order -- try explaining why:

I'll pick you up after work

I'll pick the kids up after work

I'll pick up the kids after work

are all OK, but

I'll pick up you after work

is not.
Machine translation will be a wonderful thing for a lot of reasons, but because of these kinds of differences in languages, it will be limited to certain types of writing. You may be able to get a computer to translate the words of Shakespeare, but a rose, by whatever name, is not equally sweet in every language.
Re:Didn't Google already do this? by AJWM · 2005-08-31 17:36 · Score: 2, Insightful

but

I'll pick up you after work

is not.

It can be, depending on context or emphasis. "I'll pick up the kids after lunch. I'll pick up you after work."

--
-- Alastair
Re:Didn't Google already do this? by Anonymous Coward · 2005-08-31 18:12 · Score: 0

This paper isn't at all about machine translation. It's about a technique for learning the grammar of any one particular language.
Re:Didn't Google already do this? by plumby · 2005-08-31 22:31 · Score: 1

From what I could get from the article, I'm not sure this is attempting to do translation, or even real comprehension.

It's looking for common patterns in sentences (e.g., they often start with "The") and then creating it's own, following the same structure.

My wife's a teacher, and she's got a kids' game called Silly Sentences, which is basically a simple jigsaw puzzle that only allows you to put words in a particular order (adverbs will only fit before verbs etc), and this allows you to make sentences that are pretty much guaranteed to make grammatical sense, even if you don't understand what the words on the cards mean.

My guess is that the program is looking to get the equivalent of the jigsaw shapes of the words (i.e., where they are allowed to be placed relative to other words), not their actual meaning.
Re:Didn't Google already do this? by Jerf · 2005-09-01 02:08 · Score: 1

Just to be clear, that makes the problem worse, not better.

(I think you just meant to point the possibility, AJWM; I don't mean this to be targetted at you. But it's mentioning that this increases the number of decisions a translater has to make.)
Re:Didn't Google already do this? by ephemeraleuphoria · 2005-09-01 02:35 · Score: 1

... the emphasis just allows it to sound right for a second, but I don't think that's grammatically correct at all.

--
michael greene

yeah by wesman83 · 2005-08-31 16:08 · Score: 1

only 999,000 more components and we'll have ourselves a positronic net

SCIgen by OverlordQ · 2005-08-31 16:09 · Score: 5, Interesting

SCIgen anyone?

--
Your hair look like poop, Bob! - Wanker.

PDF of paper by mattjb0010 · 2005-08-31 16:10 · Score: 5, Informative

Paper here for those who have PNAS access.

Re:PDF of paper by Anonymous Coward · 2005-08-31 16:12 · Score: 0

I have penis access but it's not helping me get to your site.
Re:PDF of paper by ksw2 · 2005-08-31 16:36 · Score: 5, Funny

Paper here for those who have PNAS access.
HEH! funniest meant-to-be-serious acronym ever.
Re:PDF of paper by downbad · 2005-08-31 17:10 · Score: 2, Informative

The project also has a website where you can download crippled implementations of the algorithm for Linux and Cygwin.
Re:PDF of paper by Anonymous Coward · 2005-09-01 00:15 · Score: 0

http://neuron.tau.ac.il/~horn/publications/pnas.pd f - link directly to the paper, in PDF format and free to view.
Re:PDF of paper by jafac · 2005-09-01 05:04 · Score: 1

Heh - reminds me of a time when I used to work at a startup back in the early 90's, P?????????? Software.

When we decided to put together a training/certification program for our resellers and field reps, and it was decided to call it the P????????? Enterprise Network Integration Specialist program. When you completed it, you were a PENIS. It was a joke, made in a meeting, but the Tech Writer who took the minutes didn't get it, and drafted the whitepaper and plan under that title, until she presented it to the CEO.

The program eventually ended up just copying Novell, and the certification was "PNE".

Ah, the good old days.

--

These are my friends, See how they glisten. See this one shine, how he smiles in the light.

Woah by SpartanVII · 2005-08-31 16:11 · Score: 4, Funny

Imagine if the editors started using this, what would everyone have to bitch about on Slashdot?

Re:Woah by Anonymous Coward · 2005-08-31 16:16 · Score: 2, Funny

dupes...
Re:Woah by Anonymous Coward · 2005-08-31 16:36 · Score: 0, Offtopic

dupes...
Re:Woah by uberdave · 2005-08-31 16:57 · Score: 0, Flamebait

Problem is that dupes and typos are so common here, that the algorithm would consider them part of the language.

--
"I'm not impatient. I just hate waiting." - My Dad

Noam Chomsky by MasterOfUniverse · 2005-08-31 16:12 · Score: 2, Informative

This is a perfect apportunity to remind that its Chomsky's contribution to Linuguistics which enabled this amazing (if true) achievement. For those of you don't know Chomsky, he is the father of modern linguistics. Many would also know him as a political activist. Very amazing character. http://www.sk.com.br/sk-chom.html

--
"There is no flag large enough to cover the shame of killing innocent people."--Howard Zinn

Re:Noam Chomsky by myowntrueself · 2005-08-31 16:48 · Score: 1

What will be really interesting will be when its exposed to actual natural languages as used by actual, normal, human beings (as opposed to pedants and linguists).

Many english speakers (and writers) appear to actively avoid using the actual *official* rules of english grammar (and I'm sure this is true or other languages and their native speakers too).

I've always assumed that natural language comprehension (as it happens in the human brain) is mostly massively parallel guesswork based on context since (having worked with non-native speakers whose english is *appaling* and yet who I manage to understand pretty well most of the time).

Presumably, this 'gadget' will barf if there really are no rules in such natural language usage...

Or will it just 'pretend' that it found rules? (which is what I imagine that linguists like Chomsky do).

(I don't think I count as a 'linguist' but I did it to stage 3 at university).

--
In the free world the media isn't government run; the government is media run.
Re:Noam Chomsky by Jeremi · 2005-08-31 17:08 · Score: 1

Presumably, this 'gadget' will barf if there really are no rules in such natural language usage...

There must be some rules in natural language, otherwise how would anyone be able to understand what anyone else was saying? The rules used may not be the "official" rules of the language, and they may not even be clearly/consciously understood by the speakers/listeners themselves, but that doesn't mean they aren't rules.

--

I don't care if it's 90,000 hectares. That lake was not my doing.
Re:Noam Chomsky by Anonymous Coward · 2005-08-31 17:15 · Score: 0

lets not forget the world's best comic:
www.postmodernhaircut.com
Re:Noam Chomsky by venicebeach · 2005-08-31 17:18 · Score: 5, Insightful

Perhaps a linguist could weigh in on this, but it seems to me that this kind of research is quite contrary to the Chomskian view of linguistics.

Instead of a language module with specialized abilities tuned to learn rule-based grammar, we have an an unsupervised learning system has surmised the grammar of the language merely from the patterns inherent in the data it is given. That a system can do this is evidence against the notion that an innate grammar module in the brain is necessary for language.
Re:Noam Chomsky by hunterx11 · 2005-08-31 17:29 · Score: 4, Insightful

Linguistics has nothing to do with prescriptive grammar, except perhaps studying what influence it has on language. Something like "don't split infinitives" is not a rule in linguistics. Something like "size descriptors come before color descriptors in English" is a rule, because it's how people actually speak. Incidentally, most people are not even aware of these rules in their native language, despite obviously having mastery over them.
If there were no rules, I could write a post using random letters for random sounds in a random order, or just using a bunch of non-letters. That wouldn't convey anything. Saying "I'm writing on slashdot" is more effective than writing "(*&$@(&^$)(#*$&"

--
English is easier said than done.
Re:Noam Chomsky by Michael+Woodhams · 2005-08-31 17:35 · Score: 1

IANAL either, but I had the same reaction.

--
Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
Re:Noam Chomsky by SparksMcGee · 2005-08-31 17:53 · Score: 4, Insightful

I took a linguistics class this previous year with a professor who absolutely disagreed with the Chomskyan view of linguistics (though she did acknowledge that he had contributed a great deal to the field). Some of the arguments against Chomsky include objections to the Chomskyan view of "universal grammar"--that essentially a series of nerual "switches" determine what language a person knows and that these in turn are purely grammatical in nature (the lexicon of different languages qualifying as "superficial"--in and of itself a somewhat tenable argument). While this holds reasonably well for English and closely related languages (English grammar in particular depends a tremendous amount upon word order and syntax, and thus lends itself well to this sort of computational model), in many languages the lines between nominally "superficial" categories--e.g. phonology, lexicon and syntax--become blurred, especially in, for instance, case languages. Whereas you can break down the grammatical elements of an English sentence fairly easily into "verb phrases" "noun phrases" and so on, this is largely because of English syntactical conventions. When a system of prefixes and suffixes can turn a base morpheme from a noun phrase to a verb phrase or any of various parts of speech, the kind of categories to which English morphemes and phrases lend themselves become much harder to apply. Add to this the fact that there exist languages (e.g. Chinese) in which grammatically superficial categories (in English) like phonology become syntactically and grammatically significant, and the sheer variety of lingiustic grammars either seriously undermines the theory in general or forces upon one the Socratic assumption that everyone knows every language and every possible grammar from birth and simply need to be exposed to the rules of whatever their native language is and to pickup superficialities like lexicon to become a fluent speaker. It's not all complete nonsense, but if it were truly correct then presumably computerized translation software (with the aid of large dictionary files for lexicons) would have been perfected some time ago).

Sorry about the rant, but like I said, my prof did *not* like the Chomskyan view of linguistics.
Oh, and as far as the notion of the "language module" goes, it might be premature to call it a module, but there *is* neurophysiological evidence to suggest that humans are physically predisposed towards learning language from birth, so that much at the very least is tenable.
Re:Noam Chomsky by lupin_sansei · 2005-08-31 18:38 · Score: 1

I don't see how this contradicts the innate ability to learn language theory that Chomsky put forth. This program itself has an innate grammar, the actual program itself, which the programmers endowed it with the ability to learn from a text. Besides a human actually understands the meaning of the text once it learns it, whereas this program mearly can statistically discover pattern rules and generate new sentences from those patterns. The program doesn't actually understand what it generates,

--
http://www.perthonline.net
Re:Noam Chomsky by pjpII · 2005-08-31 18:41 · Score: 2

Of more interest might be that it actually probably disproves Chomsky's theories of language acquisition, which rest of a basis of prior/innate facility for language acquisition which is based on prior knowledge of some sort(i.e. Universal Grammar, his most famous contribution to linguistics) in the brains of language learners, while this program works with no prior knowledge and only a statistical framework.

So Chomsky might not be too happy, as this program could potentially disprove his life's work.
Re:Noam Chomsky by stephentyrone · 2005-08-31 19:20 · Score: 2, Interesting

Actually, this fits very tidily in a Chomskian context. The program has an internal, predetermined notion of "what a grammar looks like" (i.e. a class of allowable grammars sharing certain properties), and adapts that to the source text. The way all this is presented makes it seem like unsupervised learning that can find any pattern, but the best you can hope to do with a method like this is capture an arbitrary (possibly probabilistic) context free grammar (CFG).

Even then, Gold showed a long, long time ago (1967) that the task of inducing an arbitrary CFG using only generated strings from the language is basically hopeless [Gold, E. Mark. 1967. Language Identification in the Limit. Information and Control, 10:447-474].

That said, this doesn't even seem to be that novel (to me). Andreas Stolcke wrote a very nice PhD dissertation in 1994 on learning arbitrary PCFGs from langage strings [Stolcke, Andreas. 1994. Bayesian Learning of Probabilistic Language Models. PhD Dissertation. University of California at Berkeley.]

This is probably a better, more efficient method that Stolcke produced back in '94, but I would be *very* surprised if it revolutionized the way computers interact with language, or anything else of the sort. People working in computational linguistics have a nasty habit of making grand pronouncements, only to fall far short of what they claimed.

For the record: IANAL, but i play one on TV, by which i mean i'm an applied mathematician with a couple published papers in computational linguistics.
Re:Noam Chomsky by Anonymous Coward · 2005-08-31 19:27 · Score: 0

Saying "I'm writing on slashdot" is more effective than writing "(*&$@(&^$)(#*$&"
Unfortunately, (*&$@(&^$)(#*$& is a perl program which outputs "I am writing on Slashdot."
Re:Noam Chomsky by taioankok · 2005-09-01 00:45 · Score: 1

I speak Chinese, and wonder what exactly you mean. Phonology is not exactly syntactically or grammatically significant, more semantically significant-- it being a tonal language and such. And the variety of grammars does not undermine this theory of Chomsky per se. That is to say, if Chomsky were right we wouldn't necessarily have very good computerized translation software. For one, we really have no basic understanding of the language sections of the brain, or how humans really deal with language, or even how much of our ability is something other mammals can grasp. The proof or disproof of these theories seems far off.

--
JC "What"
Re:Noam Chomsky by edibleplastic · 2005-09-01 01:25 · Score: 0, Redundant

This won't disprove Chomsky's theories, at most it will serve as evidence that language can be learned through statistical means. The reason it won't disprove anything is because we're ultimately interested in the way that *humans* learn language. Whether or not it's possible to learn a language solely through statistical means doesn't change the fact of the matter for humans, which may or may not have a genetic endowment for learning language. It's entirely possible that it's possible in principle to learn language this way, but we do it with some priors (the universal grammar).

There have been basically two prongs of the arguments in favor of the Universal Grammar debate. The first is that the task of learning an infinite grammar from a finite subset of sentences (and then only from positive evidence) appears to be too difficult to accomplish solely through statistical means. The second is an effort to show that language learning is biologically- rather than experience-based. This is the effort to show that there is a critical period in language development, which would suggest that there is a strong biological (i.e., genetic) component to langauge learning.

In my opinion, the first prong isn't very strong, since it relies on assumptions about statistical learning to make its claims. Their claims to me seem to stem more from a lack of imagination than from anything we can pin down as logically necessary. Shimon Edelman's work would work against this prong, showing that yes, it is possible to learn a language via statistcal means. (It would still have to be shown that the knowledge the computer possesses is qualitatively similar to that learned by humans... it may learn languages in a completely different way).

His findings wouldn't affect the second prong at all, though, which to my mind is the stronger of the two approaches. There have been lots of studies which suggest that there is a biological timecourse for language acquisition, suggesting that we do have an innate capacity for it.

So to sum up, while I find it a very exciting and important finding, I don't believe it will disprove the theory of Universal Grammar.
Re:Noam Chomsky by edibleplastic · 2005-09-01 01:29 · Score: 2, Interesting

This won't disprove Chomsky's theories, at most it will serve as evidence that language can be learned through statistical means. The reason it won't disprove anything is because we're ultimately interested in the way that *humans* learn language. Whether or not it's possible to learn a language solely through statistical means doesn't change the fact of the matter for humans, which may or may not have a genetic endowment for learning language. It's entirely possible that it's possible in principle to learn language this way, but we do it with some priors (the universal grammar).

There have been basically two prongs of arguments in favor of the existence of a Universal Grammar in the debate. The first is that the task of learning an infinite grammar from a finite subset of sentences (and then only from positive evidence) appears to be too difficult to accomplish solely through statistical means. The second is an effort to show that language learning is biologically- rather than experience-based. This is the effort to show that there is a critical period in language development, which would suggest that there is a strong biological (i.e., genetic) component to langauge learning.

In my opinion, the first prong isn't very strong, since it relies on assumptions about statistical learning to make its claims. Their claims to me seem to stem more from a lack of imagination than from anything we can pin down as logically necessary. Shimon Edelman's work would work against this prong, showing that yes, it is possible to learn a language via statistcal means. (It would still have to be shown that the knowledge the computer possesses is qualitatively similar to that learned by humans... it may learn languages in a completely different way).

His findings wouldn't affect the second prong at all, though, which to my mind is the stronger of the two approaches. There have been lots of studies which suggest that there is a biological timecourse for language acquisition, suggesting that we do have an innate capacity for it.

So to sum up, while I find it a very exciting and important finding, I don't believe it by itself will disprove the theory of Universal Grammar.
Re:Noam Chomsky by lysergic.acid · 2005-09-01 01:33 · Score: 1

maybe the algorithm is the language module?

i don't know much about linguistics, but from what i gather, he proposed that there were universal linguistical rules/constructs that underly all human languages. so wouldn't this unaided translation alogorithm only strengthen that notion by providing a unifying formula that can interpret all languages through common structural/syntactical underpinnings?
Re:Noam Chomsky by Anonymous Coward · 2005-09-01 01:53 · Score: 0

You wrote:

This won't disprove Chomsky's theories, at most it will serve as evidence that language can be learned through statistical means. The reason it won't disprove anything is because we're ultimately interested in the way that *humans* learn language. Whether or not it's possible to learn a language solely through statistical means doesn't change the fact of the matter for humans, which may or may not have a genetic endowment for learning language. It's entirely possible that it's possible in principle to learn language this way, but we do it with some priors (the universal grammar).
------

Well, remember that the "language organ" is supposed to shut off when you turn about eighteen. So Chomsky has said that second languages, esp. if learned as an adult, are mastered without the benefit of the language organ. He has also claimed that learning the language this way isn't really learning it, for some reason, but this seems a philosophical bias on his part.

Cheers,

JHVH1
Re:Noam Chomsky by Etienne+Steward · 2005-09-01 02:36 · Score: 1

There have been basically two prongs of the arguments in favor of the Universal Grammar debate. The first is that the task of learning an infinite grammar from a finite subset of sentences (and then only from positive evidence) appears to be too difficult to accomplish solely through statistical means. The second is an effort to show that language learning is biologically- rather than experience-based. This is the effort to show that there is a critical period in language development, which would suggest that there is a strong biological (i.e., genetic) component to langauge learning.

Chomsky's work was important because it re-oriented the way we look at language acquistion (much like Freud changed the way we look at pyschology). It doesn't mean that his model is accurate.

The fact that cognitive and sociolinguistic strategies work well to teach and learn languages post age 12 (which, if I remember correctly, is considered the "cut-off" for first language acquistion) is proof that Chomsky's theories of language acquistion are flawed. The implication here is that once the biology (in the brain) is there, congitive skills and the ability to analyze (and sensitivity to) cultural and pragmatic ques become more important than biology. Chomsky, if I remember my reading right, would contend that a non-native speaker of a language would never learn to speak "accent free" in a language he or she were attempting to acquire. This is false, as there have been many cases (not just with native English speakers going to, say, Russian, but also with Russian speakers to German -- Putin is a famous example -- or Chinese speakers to English) in which adult speakers of one language have acquired a second language without accent (or with an "appropriate" "place-able" accent -- German exchange students to the US appear to be examples of this, as every one I have ever met sounded like he or she came from Nebraska).

Chomsky's contribution was real -- in terms of provoking us to look at language in a different way -- and for that he should go down in history...But his model is, in fact, flawed.
Re:Noam Chomsky by ChrisA90278 · 2005-09-01 04:12 · Score: 1

You are Exactly right. I doubt there is a "grammar module" in the brain. Note that we can learn sign language, music and to read. I think these other abilities are due to a general purpose ability to recognise paterns. Language itself must have evolved from something that was not language. Lkey from some general ability to recognise patterns and make abstractions
Re:Noam Chomsky by edibleplastic · 2005-09-01 04:17 · Score: 1

What you say is entirely true, with the subtle modification that recent research has suggested that it's not quite an on-off switch, rather a general decline in ability after the onset of puberty (e.g., ~12).

The situation with post-critical period language learning is a bit tricky. One possibility is that like you said, you don't have contact with any genetic predisposition and you have to learn languages via other (i.e., statistical) means. But this probably isn't what actually happens. If it were strictly a matter of performing statistical learning then individuals who have not learned a first language by puberty should be able to learn one subsequently. Evidence is clearly weak on this matter (very few people are placed in situations where they aren't exposed to a first langauge) but evidence from people who weren't exposed to language suggests that they can't do so after puberty. The case of Genie, a girl who was deprived of linguistic input until she was 14 is one of these cases. When she was rescued, intensive efforts were made to teach her to speak but she made relatively poor progress and ultimately failed to learn English to any substantial degree.

An alternative is that while you may not have contact with Universal Grammar after puberty, your language learning systems have been shaped by their earlier contact with it. This would explain why people learning their second language after puberty (and the supposed critical period) manage to learn the language while people like Genie (and deaf children born to hearing parents who aren't exposed to sign language) don't.

I would agree with Chomsky that post-critical period language learning is different. For one thing, it is massively more effortful. Infants learn language automatically, with no explicit teaching. Adults, however, require extensive training and then even don't fare so well. The variability in langauge ability is much more homogenous for children (everybody learns their language to an expert level) whereas adults have a wide range of abilities. Of course adult language learning is modulated by a billion different factors including motivation, exposure, practice, etc. but even the fact that that matters suggests the two processes are somehow different.

good point, though.
Re:Noam Chomsky by Anonymous Coward · 2005-09-01 05:01 · Score: 0

You wrote:

I would agree with Chomsky that post-critical period language learning is different. For one thing, it is massively more effortful. Infants learn language automatically, with no explicit teaching. Adults, however, require extensive training and then even don't fare so well. The variability in langauge ability is much more homogenous for children (everybody learns their language to an expert level) whereas adults have a wide range of abilities. Of course adult language learning is modulated by a billion different factors including motivation, exposure, practice, etc. but even the fact that that matters suggests the two processes are somehow different.

----

Also of interest is that the ability to learn, for example, how to play a musical instrument or
complex games like chess is supposed to decline around the same time.

Although you do get exceptions. Blackbourne learned the moves as an adult; Nabakov picked up English in his 20s; Hendrix apparently came relatively late to the guitar.

Not sure how these facts impact Chomsky's orignal contention, although I remember one of my old linguistics teachers seemed to feel they were connected in subtle ways (do we learn musical/chess "syntax" all through one organ?).

Cheers,

JHVH1
Re:Noam Chomsky by wackybrit · 2005-09-01 08:37 · Score: 1

£@4343@@£$$, %$£12%$£HJwx? D9 2jdh8d£$£ !@^&.

--
mogorific carpentry experiments
Re:Noam Chomsky by s388 · 2005-09-01 10:03 · Score: 1

you must not have noticed: 1) you didn't learn to read by walking through a library and picking up books. no, you had to be explicitly taught. nobody learns to read spontaneously, but they learn to speak and use language expertly without teaching (and, no, mothers saying "honey, it's CAUGHT, not CATCHED... say CAUGHT!" doesn't have anything to do with it). the only way anybody could possibly learn to read in a way that's comparable to the way people learn to use language is if written sentences/text were immediately present for the context to which they refer. (at least present to the same degree that situational context is present for vocal utterances that refer to them.) 2) music isn't anything like a spoken human language most people are explicitly taught to read music, whereas children spontaneously learn languages-- quit automatically-- just from exposure to the ambient language. secondly, practically all human beings learn their native language (easily) to a proficiency that only a small few musicians achieve (instrumentally/compositionally.) i still don't agree on the language/music relationship you're throwing out there [as two cents....], but if you actually think about them, the facts about music versus language tend to negate your point. 3) sign languages are as grammatically complex (sometimes even richer) as spoken languages like english. sign languages have syntactic properties just like spoken languages. and interestingly, deaf babies make hand motions that are equivalent to a speaking/hearing baby's vocal babble. sign languages have a "sensitive" period for learning: an adult cannot learn a sign language to fluency, just like [almost] no adults can learn a language to fluency, in adulthood. please notice that sign language, when used, is exactly parallel to spoken language, but READING/text/written-language is not. the real thrust of this is that if it was A GENERAL ABILITY OF PATTERN RECOGNITION THAT ALLOWED YOU TO LEARN LANGUAGE, why is there a sensitive period? why can't people in adulthood learn new languages to fluency? anyway, even if you tried you couldn't possibly identify all the "patterns" that are present in the sentences you are writing. your knowledge is unconscious. so, no, language doesn't rest on a "general purpose ability" to recognize patterns. there are language-impaired people who can recognize all kinds of patterns just fine, but they can't form a sentence (even when they can speak separate words perfectly fine your post is half-ass and pretty miserable. why am i even bothering?
Re:Noam Chomsky by rikai · 2005-09-01 11:19 · Score: 1

Yes, but if it's shown that syntax can be learned with general learning abilities rather than UG, then Occam's razor comes into effect and I call bulls**t on any kind of innate UG. The biological arguments for innate-language-learning abilities are red-herrings. Sure, of course we have some biology to help with language--Chimps don't learn to speak, after all. But that's a far cry from UG and any of Chomsky's grammar systems (Plus, they're always overstated. The fact is, adults who are *completely* immersed learn language in 6 months or so--i.e., FAR faster than children)
Re:Noam Chomsky by SparksMcGee · 2005-09-14 10:42 · Score: 1

You're quite right about the distinction between semantics and sytnax. I honestly don't know enough about Chinese to knwo whether or not different tones actually represent different parts of speech (as in English "thought" the noun and "thought" the verb conjugation), in which case they would be syntactically significant. However, like I said, I don't know enough about Chinese to know whether or not this actually happens.
Re:Noam Chomsky by taioankok · 2005-09-14 15:28 · Score: 1

Tones only occasionally change when the word has a different part of speech. You could count the words in modern use with such patterns on one hand. Most words have no phonetic change even if part of speech changes.

--
JC "What"

Speaking as someone working on NLP by OO7david · 2005-08-31 16:12 · Score: 4, Interesting

IAALinguist doing computational things and my BA focused mainly on syntax and language acquisition, so here're my thoughts on the matter.

It's not going to be right. The algorithm is stated as being statistically based which while is similar to the way children learn languages is not exactly it. Children learn by hearing correct native languages from their parents, teachers, friends, etc. The statistics come in when children produce utterances that either do not conform to speech they hear or when people correct them. However, statistics does not come in at all with what they hear.

With respect to the learning of the algorithm the underlying grammar of a language, I am dubious enough to call it a grand, untrue claim. Basically all modern views of syntax are unscientific and we're not going to get anywhere until Chompsky dies. Think about the word "do" in english. No view of syntax describes from where that comes. Rather languages are shoehorned into our constructs.

So, either they're using a flawed view of syntax or they have a new view of syntax and for some reason aren't releasing it in any linguistics journal as far as I know.

Re:Speaking as someone working on NLP by tepples · 2005-08-31 16:16 · Score: 2, Insightful

However, statistics does not come in at all with what they hear.

Utterance in pattern A is heard more often than utterance in pattern B; utterances in patterns C and D are not heard at all. How is that not statistics?
Re:Speaking as someone working on NLP by OO7david · 2005-08-31 16:29 · Score: 2, Interesting

Insofar as only utterance A is heard. A kid will always hear "Are you hungry" but never "Am you hungry" or "Are he hungry".

Native speakers by definition speak correctly, and that is all the child is hearing.
Re:Speaking as someone working on NLP by mveloso · 2005-08-31 16:30 · Score: 1

However, have you read the paper? Looked at the data? It seems a bit early for you to dismiss their conclusions based on a blog entry.
Re:Speaking as someone working on NLP by Comatose51 · 2005-08-31 16:39 · Score: 2, Informative

Basically all modern views of syntax are unscientific and we're not going to get anywhere until Chompsky dies.
I really don't understand that. How are modern views of syntax unscientific? Also, if Chomsky is such an influence on linguistics, then maybe he's right about it. Aren't you essentially saying that we have no way of arguing with him so let's wait til he dies so he can't argue back? I would think the correct view should win out regardless of the speaker.
Other than what I've studied in cognitive science, I am not in any way or form a linguist. However, what you say really confuses me and contradicts what I've learned. I can only assume that what you say make sense because of your deeper knowledge. So can you please explain what you mean for the rest of us?
Thanks.

--
EvilCON - Made Famous by /.
Re:Speaking as someone working on NLP by Saanvik · 2005-08-31 16:39 · Score: 1

It's not just statistical. From the article (empahsis mine),
ADIOS relies on a statistical method for pattern extraction and on structured generalization -- two processes that have been implicated in language acquisition.
And while their paper is not being published in a linguistic journal, it is being published in the Proceedings of the National Academy of Sciences (PNAS, Vol. 102, No. 33), which is a well respected cross-discipline journal.
Although I, along with you, am skeptical of this, it sounds like it could be a very interesting article. Don't poo-poo it until you read it.
Re:Speaking as someone working on NLP by FireFlie · 2005-08-31 16:41 · Score: 1

You have never heard someone speak english using poor grammar? Perhaps this is true for smaller subsets of native language speakers, however not for a society as a whole (america at least).
Re:Speaking as someone working on NLP by NitsujTPU · 2005-08-31 16:43 · Score: 1

Almost all of modern NLP relies on statistical approaches.

I'm not going to chime in and start a flame-war, but since your view is rather iconoclast, I think it only fair to point this out to the Slashdot audience, who are probably not as informed on the topic as you or I.
Re:Speaking as someone working on NLP by OO7david · 2005-08-31 16:47 · Score: 1

That is prescriptive grammar. Descriptive grammar is what linguists and that is actually what people speak.

Consider this, who is in charge of language, an institute or the speakers? Natives cannot be wrong about their own language; they can be wrong on the standard, but A) that standard is always changing and B) given A who then is correct?
Re:Speaking as someone working on NLP by Anonymous Coward · 2005-08-31 16:48 · Score: 0

Perhaps you should read the man's papers (Edelman). I've read perhaps a dozen or two and it's clear that he is a thinker exploring possibilities. That doesn't mean that he's correct (in all instances) rather he formalizes his thoughts via computational and sometimes behavioural models.

That said, I have not read this paper...
Re:Speaking as someone working on NLP by OO7david · 2005-08-31 16:53 · Score: 1

Exactly, I can't be completely quick to dismiss this, but based upon the data given and the fact that I'm working on almost the exact problem as what the algorithm is supposed to solve, it really doesn't mesh.
Re:Speaking as someone working on NLP by OO7david · 2005-08-31 16:55 · Score: 4, Interesting

It is in effect two parted:

Chomsky is to linguistics as Freud to psych. He had great ideas for the time (many still stand), and the science would be nowhere close to where it is without him. However, A) he's backed off alot of supporting his own theories and B) he's published papers contradicting his original ideas so that is some question there for their veracity. Since so many linguistics undergrads hold him as the pinnical of syntax none are really deviating drastically from him.

WRT the unscientificness, to make his view fit English, there has to be "do-support" which basically is that when forming an interrogative "do" just comes in to make things work without any explanation. In other words, it is in our grammar, but our view of syntax does not account for it.
Re:Speaking as someone working on NLP by OO7david · 2005-08-31 16:58 · Score: 1

Right, I think that it will be a fascinating read and will ultimately help the project I'm currently doing, but my claim is that if it is linguistic then I highly doubt that it will be fully correct given the flawed (IMO) assumptions.

My argument is based soley upon this blog entry, and what it says doesn't quite seem to add up to me.
Re:Speaking as someone working on NLP by Anonymous Coward · 2005-08-31 16:59 · Score: 0

Statistics are used in NLP because describes the raw facts, not because language is statistically based. Recurrent pattern processing is the most promising approach to describe how language works.
Re:Speaking as someone working on NLP by lawpoop · 2005-08-31 17:05 · Score: 2, Informative

IIRC, the part of Chomsky's theory that is relevant to this application is that universal grammar is a series of choices about grammar -- i.e. adjectives either come before or after nouns, there are or are not postpositions, etc. I think the actual 'choices' are more obscure, but I'm trying to make this understandable ;)
According to the theory, children come with this universal grammar built-in to their mind (for some reason, Chomsky seems against genetic arguments, but good luck understanding his reasoning), and they only need to hear just a little bit of language in order to throw the choose the proper alternatives in the mind, and start building grammatically correct sentences -- the rest is just building vocabulary. What seems like a child learning language is actually the language part of the brain growing during development. I believe that these choices are called 'switches' by Chomsky.
(An easy argument for universal grammar is that children make mistakes that are more rule-following than the accepted grammar -- words such as 'breaked', 'speaked', 'foots' or 'mouses' are in a sense rule-based corrections of exceptions in the spoken language. So the children follow the rules more closely than the adults -- they certainly didn't learn them from adults, so the must be applying the rules in their minds.)
Anywho, to make a program like this, you would just have to put together the switches of universal grammar and then feed in sample data -- probably text spelled with those linguistic homophonic characters, instead of the horrendous English spelling non-system. Chinese characters might be a better sample data in that respect; I don't know.
Note that, contrary to other posters, this would not be a system for building grammatical rules for any 'language' or formal system, such as C++ or xml. It is based on universal grammar, which is a set of option for constructing a human language. So there are built in assumptions that there will be subject, verbs, objects, indirect objects, etc. in the language that this program is decoding.

--
Computers are useless. They can only give you answers.
-- Pablo Picasso
Re:Speaking as someone working on NLP by PurpleBob · 2005-08-31 17:08 · Score: 4, Interesting

You're right about Chomsky holding back linguistics. (There are all kinds of counterarguments against his Universal Grammar, but people defend it because Chomsky Is Always Right, and Chomsky himself defends it with vitriolic, circular arguments that sound alarmingly like he believes in intelligent design.)

And I agree that this algorithm doesn't seem that it would be entirely successful in learning grammar. But this is not because it's statistical. I don't understand how you can look at something as complicated as the human brain and say "statistics does not come in at all".

If this algorithm worked, then it could be statistical, symbolic, Chomskyan, or magic voodoo and I wouldn't care. There's no reason that computers have to do things the same way the brain does, and I doubt they'll have enough computational power to do so for a long time anyway.

No, the flaws in this algorithm are that it is greedy (so a grammar rule it discovers can never be falsified by new evidence), and it seems not to discover recursive rules, which are a critical part of grammar. Perhaps it's learning a better approximation to a grammar than we've seen before, but it's not really doing the amazing, adaptive, recursive thing we call language.

--
Win dain a lotica, en vai tu ri silota
Re:Speaking as someone working on NLP by Anonymous Coward · 2005-08-31 17:12 · Score: 0

You say:

"Basically all modern views of syntax are unscientific and we're not going to get anywhere until Chompsky dies. Think about the word "do" in english. No view of syntax describes from where that comes. Rather languages are shoehorned into our constructs."

What exactly is so puzzling about the word "do"? Its use in English is complex, but hardly counts as a great puzzle of linguistics.

I'm a linguist, and I'm pretty sure that you're either (1) Lying about "working on NLP" or (2) a bad mistake on the part of your employers. You seem to have only the vaguest idea what you're talking about.

Also--while normally I wouldn't consider a spelling glitch very significant, you misspell "Chomsky" in a way that doesn't make sense as a typo, but is a common misspelling among people who aren't very familiar with the name or have heard it spoken more that they've seen it written. Again, I'm pretty sure you have no clue what you're talking about.
Re:Speaking as someone working on NLP by fenodyree · 2005-08-31 17:36 · Score: 1

Could you explain "do-support" further, Google has _very_ few results, and any explanation seems nonsensical. As one example I found stated the question "Do you like me"? as being a "do-support" sentence. Now, this sentence seems fine, do is simply the verb. Take for example: "Be you man or mouse?" now, Be is the verb, you can stick it on the end if you want, "You man or mouse be?" Okay, it is old english, however, "do" is the same way. Do you like? a simple question...So, I am misunderstanding I assume, or linguists are all idiots, and having read some Chomsky, the latter I am not ruling out ;-) ... Do you explain further?

-----------
No grammatical errors are in contained this post here, me native speaker define language thus.
Re:Speaking as someone working on NLP by iabervon · 2005-08-31 17:41 · Score: 1

What exactly is so puzzling about the word "do"? Its use in English is complex, but hardly counts as a great puzzle of linguistics.

Presumably, the distribution of do-support: question inversion, not, and VP ellipsis. I don't really think it's a great mystery, but it's a pain to characterize in Chomsky's formulation of chains.
Re:Speaking as someone working on NLP by Mac+Degger · 2005-08-31 17:52 · Score: 1

But thats not the point, is it? We're talking syntax, grammar, not meaning or form.
'Do' might be an irregular verb (ie utterly irrational and not based on logical transformation), but the position of the verb (in any of it's forms) will be consistent with the rules of grammar. And with the statistical analysis thrown in, the algorythm might also be able to pick up on the irrational placements a verb sometimes gets put in.

--
-- Waht? Tehr's a preveiw buottn?
Re:Speaking as someone working on NLP by AhtirTano · 2005-08-31 18:19 · Score: 1

Also a linguist.
I read the paper quickly, and skipped some of the more technical details, but nothing in there blew my mind.
Some things I was not impressed by: They gauge the accuracy of the learning process by dividing the text in half, learn off the first half, then test against the second. Success for a particular sentence is measured by whether or not there is a perfect match between the sentence and a path through the machine. If the grammar their algorithm generates is a superset of the grammar of the language, they can still pass, despite the fact that they did not learn the language correctly. There is also no way to be sure that the paths found are what a human parser would agree to. I realize finding reliable metrics for these things is a genuinely hard problem, but this was not the more impressive attempts I have seen.
Another thing, more of a pet peeve than anything else. They claim wonderful success in learning a variety of languages, but they chose five closely related languages (English, Swedish, Danish, Spanish, and Italian), and one that is not related, but still not too dissimilar in the big scheme of things (Chinese). (And to anyone who thinks Chinese is really different from English, spend a week with a Chinese grammar, and then another one with a grammar from an Amazonian language. That'll change your mind.)
Re:Speaking as someone working on NLP by Anonymous Coward · 2005-08-31 18:28 · Score: 0

I'm supposedly a linguist. (Dropped out of a Ph.D., met all requirements except dissertation.)
I think we need to qualify your objection. Essentially, you're right that it's possible that the grammar that the program learns overgenerates, and thus the system fails to learn the "right" grammar.
However, this is in fact desirable in a text comprehension system. Why? Because you want to be able to extract meaning from text even if it's ungrammatical. (Of course, a language generation application can't get away with this, but as it turns out, it's OK for those to undergenerate.)
However, as a final point, I'll lay aside the engineering requirements of different kinds of NLP applications, and point out that the sort of grammatical theory that you have in mind (the kind that posits a clear distinction between grammatical and ungrammatical sentences) offers little aid (if any) in explaining how people can actually manage to extract meaning from, or even completely understand, expressions that the theory classifies as "ungrammatical."
Re:Speaking as someone working on NLP by Anonymous Coward · 2005-08-31 18:41 · Score: 0

"Do you like me"? ... Now, this sentence seems fine, do is simply the verb.

I have not heard the term "do-support," but the verb in the sentence "Do you like me?" is "like." My French is fading fast, but if I haven't lost it, the same sentence should be "aimez-vous moi?" (literally: "like-you me?"). French lacks the "dummy do" which props up the English verb in question (and certain other special cases, I believe). Observe the four following sentences:
1) "You like me"
2) "Vous m'aimez" ("You me like")
3) "Do you like me?"
4) "Aimez-vous moi?" ("Like you me?")
Only one sentence out of these four contains "do" (or a "do"-equivilant) - that is #3: the English question. I believe that this is the phenomenon of "do-support" that you two were discussing.

The significance of this phenomenon in the greater scheme of automatic syntactic parsing? I'm not sure. I've strayed from syntax because, honestly, it bores me.
Re:Speaking as someone working on NLP by lupin_sansei · 2005-08-31 18:42 · Score: 1

> Children learn by hearing correct native languages from their parents, teachers, friends, etc

This simply isn't true. Hundreds of corpra of children learning language have shown that the input from the parents is not grammatically correct, but in baby talk, or partial utterences. This is actually one of the arguments for Chomsky's theory: How do children who only hear a degenerate, incomplete form of the language end up learning the grammar correctly?

--
http://www.perthonline.net
Re:Speaking as someone working on NLP by Anonymous Coward · 2005-08-31 19:01 · Score: 0

>As one example I found stated the question "Do
>you like me"? as being a "do-support" sentence.
>Now, this sentence seems fine, do is simply the
>verb.

Well, the "real" verb is like. Do is just kind of there for support.

Why do we say "Are you OK?" but "Do you feel OK?" Why not "Feel you OK?" like they do in other Germanic languages? (Or to be more exotic: "Do you be OK?")

Follow you me?

Apparently, we can't simply use (most) verbs by themselves in constructions like this. The only exception I can think of off the top of my head is "be". Not even "do" itself. (We don't say "Do you OK in school?" but "Do you do OK in school?")
Re:Speaking as someone working on NLP by Estanislao+Mart�nez · 2005-08-31 19:10 · Score: 1

Hundreds of corpra of children learning language have shown that the input from the parents is not grammatically correct, but in baby talk, or partial utterences.
Have you actually looked through hundreds of corpora of parent-child interaction, or are you just parroting something you heard, and embellishing it in the process?
Sure, there is definitely such a thing as "motherese." And it's a commonplace Chomskian claim that children are exposed to many ungrammatical utterances. But I think you've said a third, blended thing that nobody actually claims: that motherese utterances are, in general, ungrammatical.
I'll point out that there's a few of the commonplace, parroted claims about child language that are actually wrong.

--
Are you adequate?
Re:Speaking as someone working on NLP by lupin_sansei · 2005-08-31 19:12 · Score: 1

Yeah I have seen them actually, they are published, and extracts are available in many text books also. Can you point me to some evidence that children hear grammatically correct sentences and then repeat and learn them?

--
http://www.perthonline.net
Re:Speaking as someone working on NLP by AhtirTano · 2005-08-31 20:01 · Score: 1

However, this is in fact desirable in a text comprehension system. Why? Because you want to be able to extract meaning from text even if it's ungrammatical.
Agreed, but that is not really relevant for the issue at hand. They make a point of stating that they can avoid certain kinds of overgeneration that a text comprehension system would not want to avoid: long distance gender agreement in pronouns. So clearly they are trying to avoid the kind of overgeneration you have in mind.
They claim the grammar learned can be used for recognition or production of novel sentences. Overgeneration is bad for the latter use (as you noted).
the sort of grammatical theory that you have in mind (the kind that posits a clear distinction between grammatical and ungrammatical sentences) offers little aid (if any) in explaining how people can actually manage to extract meaning from, or even completely understand, expressions that the theory classifies as "ungrammatical."
Wow. You read far more in my comments than I intended to put in them. They set the criterion for success as the ability to match a string to a path through the machine, and they don't want to overgenerate. That suggests a binary distinction between grammatical and ungrammatical to me. Perhaps I read too much into them.
Incidentally, I agree with you that maintaining a clear grammatical/ungrammatical distinction is undesirable in a good theory of comprehension (and production). I don't know any serious, modern syntacticians who think there is such a distinction in the real world. It is a useful idealization for many questions, however, such as formal learning theory.
Re:Speaking as someone working on NLP by Michael+Spencer+Jr. · 2005-08-31 20:55 · Score: 1

My BS was plain old Computer Science, but I had Programming Languages (which made heavy use of BNF dictionaries, which also touch your field) and Intro to AI. So I'm no expert either...

The article made me think this system was building a 'grammar' in the strictest sense of the word, but definitely one without any mapping back to real world concepts. They mention statistical significance, so that makes me think they're using machine learning algorithms to guess an "optimal" set of rules they can stitch realistic-sounding sentences together with, without actually processing any of the meaning.

Aha, yes, the article contained some Google fodder, which pointed to this academic paper: http://www.tau.ac.il/~zsolan/papers/soletalb2002.p df Quote from the PDF:

Equation 1 balances two opposing "forces" in pattern formation: (1) the length of the pattern, and (2) the number and the cohesiveness of the set of examples that support it. On the one hand, shorter patterns are likely to be supported by more examples; on the other hand, they are also more likely to lead to over-generalization, because shorter patterns mean less context.
(end PDF quote)

So while I don't think this system can translate unknown language into meaningful human language any time soon, it does seem like this system can help a team of humans develop a more reliable way to machine-translate natural language.

For example, spoken American English is full of common idioms and sets of phrases. Without a system like this, a Japanese translation system developer would literally translate a common idiom because she didn't realize it was a common idiom, and then would need to find some way to resolve this pigs-flying or happy-as-a-pig-in statement. A system like this would identify these common bits of language, so this system developer would know to parse that set of words as if it's one word.

Does that make sense? Or is the article touting this method as a major breakthrough, when actually this pattern recognition system is already used in your field?
Re:Speaking as someone working on NLP by Anonymous Coward · 2005-09-01 01:41 · Score: 0

arguments that sound alarmingly like he believes in intelligent design.

Gasp! Perish the thought! No, stone the fool!

Although, most of the nation probably agrees with him in some form or another... Oops.
Re:Speaking as someone working on NLP by lysergic.acid · 2005-09-01 01:45 · Score: 1

i think it'd be interesting to see someone actual provide some of these counter-arguments against chomskyan linguistics, but so far it seems all those opposed to his views can summon are ad hominem/red herring arguments with little factual support.
Re:Speaking as someone working on NLP by sjb21043 · 2005-09-01 02:52 · Score: 1

Which was kinda his point. Especially in today's world, there are many children who will hear 'incorrect' examples while they're learning. Young children of recent immigrants, for instance. Children with poorly educated parents, as well.

It's statistics that let the child learn to distinguish the contexts in which "He be running" and "He is running" are each "correct".
Re:Speaking as someone working on NLP by Flagran · 2005-09-01 04:03 · Score: 1

IAAL, and I am strongly biased towards a descriptivist methodology. However, even perfectly fluent native speakers make mistakes in the things they way. People don't always utter exactly what it is they intended to utter. Even the most staunch descriptivist linguist is well aware of the high rate of errors that occur in natural speech. Add to this the fact that a child may grow up being exposed to several different dialects/styles/registers of a given language, and will likely be able to learn the subtleties of each. The task of separating this data into the related chunks is a quite difficult task (recognizing that a particular person was utilizing a particular speech style at a particular time, etc.) So, in the end, we must still confront the problem of dealing with noisy data.

--
Make love, not sigs
Re:Speaking as someone working on NLP by Anonymous Coward · 2005-09-01 04:12 · Score: 1, Interesting

I won't put a detailed explanation why he is very wrong either. However his methodology and presentation are clearly unscientific. He fudges his data, he asserts without proof, his proofs are always circular. He assumes things that need no assuming, easily established to be true or false by simple experiments. Nevertheless he just assumes and goes on. Some are contradicted by experimental evidence but he doesn't care. He sweeps stuff under the rug when it doesn't fit his model as "outside of domain of this theory" for an unspecified "domain." So whatever the truth value of his claims are, his ideas can not be scientific truth.
This much you can easily prove yourself. But being unscientific doesn't mean wrong; you could reason the Earth must be a sphere because the most beautiful shape is a sphere and you would be right wrt shape of the Earth, even though your reasoning is unscientific junk. If you want proof that Chomsy is wrong, rather than just using a useless methodology, I'm afraid you won't find it without spending a lot of time on it.
It takes quite a bit of time to give background information on developmental psychology, pyscholinguistics, neuroscience and biology in general to someone outside the field (not that I'm an expert, but I have a degree on Cogsci.) It takes at least as much time to establish which flavor of Chomskian linguistics is rubbish and why (Chomsy made so many contradicting models of language and mind that almost everything you say against him can be countered with a simple "ah, but you don't know his X theory") So you very probably won't get any such response unless you claim a specific chomskian theory is true and sound like you know what you are talking about.
Re:Speaking as someone working on NLP by PurpleBob · 2005-09-01 06:35 · Score: 1

Geoffrey Sampson is good at giving overviews of the points against nativism. See his page on the topic.

--
Win dain a lotica, en vai tu ri silota

Re:Not Chomsky by Anonymous Coward · 2005-08-31 16:13 · Score: 0

s/Chomsky/Markov

Wow! by the_skywise · 2005-08-31 16:13 · Score: 4, Funny

They've rediscovered the Eliza program!

Input: "For example, the sentences I would like to book a first-class flight to Chicago, I want to book a first-class flight to Boston and Book a first-class flight for me, please may give rise to the pattern book a first-class flight -- if this candidate pattern passes the novel statistical significance test that is the core of the algorithm."

How does it feel to "book a first-class flight"?

Re:Wow! by Anonymous Coward · 2005-09-01 00:30 · Score: 0

The emacs/doctor version of Eliza actually replies
Why do you say you might want to book?

Isn't This the Universal Translator Idea by TastyWheat · 2005-08-31 16:13 · Score: 0, Redundant

From Star Trek???

Grammar depends on the input by Tsaac · 2005-08-31 16:14 · Score: 3, Interesting

If fed with a heap of decent grammar, what happens when it's fed with bad grammar and spelling? Will it learn, and incorporate, the tripe or reject it? That's the sort of problem with natural language apps, it's quite hard to sort the good from the bad when it's learning. Take the megahal library http://megahal.alioth.debian.org/> for example. Although possibly not as complex, it does a decent job at learning, but when fed with rubbish it will output rubbish. I don't think it's the learning that will be that hard part, but rather the recognition of the good vs. the bad that will prove how good the system is.

--
eXemplary Abstract

Re:Grammar depends on the input by innerweb · 2005-08-31 16:39 · Score: 0

That would be called slang.
InnerWeb

--
Freud might say that Intelligent Design is religion's ID.
Re:Grammar depends on the input by Anonymous Coward · 2005-08-31 16:47 · Score: 0

If fed with a heap of decent grammar, what happens when it's fed with bad grammar and spelling? Will it learn, and incorporate, the tripe or reject it? That's the sort of problem with natural language apps

Thats also the problem with slashdot

But seriously what your describing is the evolution of language or on a smaller scale, accents and regional dialects, etc. It only seems natural that the language app would also evolve and adapt to this 'tripe'.

the recognition of the good vs. the bad that will prove how good the system is.

Not as easy as it sounds, even people (with too much time on their hands) quibble about these sorts of things. Often both are correct with respect to the language they have learnt. The language I've leanrt is quite different from the language my grandparents learnt (yes its both English) Quebec French is different from Perisian French. Here are some examples:

"Did you get your invite?" Is that acceptable English?

Whats the French word for a female professor? (in Quebec it's professeuse while in France it's professeur

"Long time, no see" Is that acceptable English.

The answers to the questions, are of course relative.
Re:Grammar depends on the input by jim_v2000 · 2005-08-31 17:16 · Score: 2, Interesting

The problem with this program is that you could input the most gramatically correct sentences you can into it, and it'll still spew out senseless garbage. For this to be of any worth, the computer will need to understand the meaning each word, and how each meaning relates to what the other words in the sentence mean. And you can't program it into a computer what something is just by putting words into it. Like if I tell the machine that mice squeak, it has to know what a squeak sounds like and what a mouse is. How do you define a mouse to a computer? A small fuzzy rodent. Well, how do you define fuzzy? Or small? Or a rodent? You have to keep using more and more words...and still the computer will have no idea what you're talking about, other than just mroe word relationships.

I guess the missing thing is that a human can evision the meaning of the words as a concept or image, while the computer simply sees the words as, well, just words (or binary to specific).

--
Don't take life so seriously. No one makes it out alive.
Re:Grammar depends on the input by foonf · 2005-08-31 17:31 · Score: 1

If the system works correctly (ie, it is really capable of learning language syntax), it will learn the "bad grammar" presented to it. Can you really expect an algorithm to automatically figure out the social conventions that mark one system of communication as "good" and one as "bad", and reject or correct "bad" ones?

Actually it would be quite remarkable if this was possible, given that the reasons some dialects become privileged and others don't have nothing to do with the formal properties of those dialects, and everything to do with history and geography.

--

"(Man) tries to live his own life as if he were telling a story. But you have to choose: live or tell." --Sartre
Re:Grammar depends on the input by proteonic · 2005-08-31 17:44 · Score: 2, Informative

If you take young children and expose them to rubbish for four or five years while they're learning to speak, they'll speak rubbish too. That's the problem with young children, they can't sort the good from the bad.

But if you expose them to well strucutred language, they'll learn to speak it, without being EXPLICITLY TAUGHT THE RULES. Which is exactly what this paper is about. Unsupervised natural language learning. That's what makes the system good. It's able to build equivalency classes of verbs, nouns, adjectives, etc, with relatively few examples. The paper gives an example of the algorithm using 8 sentences to trian and be able to produce over 500 new, sensible sentences. Even a 4th order Markov chain can't do that (Megahal). The algorithm is really quite impressive.

Your comment begs the question.. why would you train a system on garbage? Finding good quality written language is a non issue. Train it on good data and it'll probably do as well as a Markov model for distinguishing good vs bad language.
Re:Grammar depends on the input by Tsaac · 2005-08-31 18:28 · Score: 1

But at some point you must supervise a child's learning a language. It may be possible for a child to learn to speak a language correctly after exposure, but would they be using correct grammar? For the most part probably, but chances are they'd be corrected by others. I doubt it would be possible for an algorithm without intervention or some form of real AI. They would have to know, at some point, correct uses of grammar to abide by it, wouldn't they ?
If it does develop decent grammar rules, which by the sounds of it it does, they'd be just that, it's own. Sure, it may be similar to what the language would define as good grammar, but it wouldn't necessarily be correct.
If I'm not mistaken, there is also a different between good grammar, and correct grammar.
Now, I don't purport to be a language or programming expert, but in English there's exceptions to lots of different rules which I don't think it would be possible to learn without being told so (ie, intervention). Again, the difference between good and correct grammer could come into play here.

--
eXemplary Abstract
Re:Grammar depends on the input by Krach42 · 2005-08-31 18:35 · Score: 1

That just the thing though. Children exposed to bad grammar will actually CORRECT that bad grammar.

Now, if you're going to start complaining about colloquial speech, and calling it bad, you're misunderstanding the meaning of grammar.

Functionally there are two types of Grammar. Formalized grammar, and real grammar. Formalized grammar is the stuff that they teach you in school, and define an ideal language usage. Real grammar is what is actually there, and explains why the vast majority of people don't have to be told how to use the reflexive (myself/yourself/hisself/herself/ourselves/theirse lves).

So, children exposed to grammatical garbage (for instance, a pidgin language) will produce a grammatical structure (for instance, a creole)

Children are literally Garbage in, Corrected Information out.

If this system being fed a pidgin does not formulate a proper creole from it, then it's easy to say that it does not use a process similar to what humans use.

--

I am unamerican, and proud of it!
Re:Grammar depends on the input by Anonymous Coward · 2005-08-31 23:34 · Score: 0

Do you mean
myself/yourself/himself/herself/ourselves/ themselves?

- I was once a kid. -
Re:Grammar depends on the input by Krach42 · 2005-09-01 03:34 · Score: 1

I'm not quite sure why I picked up hisself, and theirselves. They sounded right at the time.

I guess my mind setup on the whole possessive+self thing, and didn't pay attention to the odd soundingness of hisself and theirselves. Oddly, I can still hear a bit of correctness in them, but then this is why a linguist should never base their ideas soley on their own ear, because you can so expose yourself to some idea that you accept too much, or accept too little.

--

I am unamerican, and proud of it!

Protein sequences? by Anonymous Coward · 2005-08-31 16:15 · Score: 1, Interesting

Let's see what human DNA really says and means!

Re:Protein sequences? by Concerned+Onlooker · 2005-08-31 17:13 · Score: 1

That's easy. kill, eat, mate, sleep, kill, eat, mate, sleep....

--
http://www.rootstrikers.org/
Re:Protein sequences? by Safe+Sex+Goddess · 2005-08-31 17:18 · Score: 1

This sounds like a good idear. DNA just seems so much like Code of the Life Maker.
Not knowing anything about science, I sometimes wonder if DNA was brought to the solar system by some sort of seeding program.
Any bio geeks out there care to enlighten a bio-knowledge impoverished pupil?
Oh, and whip up a universal microbiocide to keep sex safe while you're at it:-)

--
Abstinence is a government conspiracy. www.SafeSexZone.co
Re:Protein sequences? by FLAGGR · 2005-08-31 17:36 · Score: 1

*sigh*

I hate it when poeple talk like DNA is this big all encompasing thing. There's nothing in my DNA that tells me to reproduce, etc. So you can't just translate DNA into english. All of your cells, and the handful of braincels work together to unbelievably create the walking chemical reaction that you are, it's a whole big picture, and DNA is just one of the tiny factors in it.

Finally some progress by drjimmy42 · 2005-08-31 16:16 · Score: 2, Funny

I know we all feel like we've been screwed by the conspicuous lack of flying cars around these days, but at least some progress is being made on the Universal Translator front...

--
If you're not part of the solution, you are part of the precipitate

Re:Finally some progress by TastyWheat · 2005-08-31 16:18 · Score: 2, Insightful

I'm starting to get the feeling that there nothing in sci fi that won't occur in reality. Except for the dorky guy getting to nail the hot busty alien babe that is. heh.
Re:Finally some progress by Infinityis · 2005-08-31 18:57 · Score: 0

Maybe you know how to have fun with Universal Translators in ways that I don't. As for me, wake me up when we get phasers and/or lightsabers.

Re:Finally! by HTTP+Error+403+403.9 · 2005-08-31 16:16 · Score: 2, Funny

Using this software, I can finally win the 'Summarize Proust Competition'!

--
I'm not a Troll, it's reverse psychology.

How about Klingon? by irving47 · 2005-08-31 16:17 · Score: 1

Or better yet, start feeding it images of crop circles that haven't been proven to be fakes (yet.)

--
I had a sucky sig.

Useful in games? by .DS_Store · 2005-08-31 16:20 · Score: 1

How long before this technology makes its way into the field of game AI? Imagine a game such as Deus Ex or SW:KoToR where you don't merely choose your response to NPC's from a predefined list, you type in your answer! Such technology combined with the simple contextual inferences that drive such oldies like Dr. Sbaitso and its Mac-equivalent Eliza could potentially launch interactivity in games to a whole new level.

Sure it's a long shot, but I can dream can't I?

Re:Useful in games? by Anonymous Coward · 2005-08-31 17:07 · Score: 0

Or better yet, talk to the guy! with a microphone and everything.
Re:Useful in games? by Anonymous Coward · 2005-08-31 17:34 · Score: 0

.DS_Store, you hacked my mac mini! Files with you're name keep showing up in every folder I visit in Finder, fuck you!
Re:Useful in games? by Anonymous Coward · 2005-08-31 19:38 · Score: 0

You can do this with bots in UT2004 (seriously)

Eg: alpha dance

and bot alpha will dance

or alpha cover me,
alpha take their flag,
alpha attack node A,
alpha suicide

etc etc

Length? by Parham · 2005-08-31 16:20 · Score: 1

I read the article and had a few questions. How long does the analyzed text have to be for the algorithm/program to pick up the grammar rules? I mean if it takes long documents and a ton of time, is it really worth it? Also, if it can only recognize languages we already know (and can only read those characters), how useful is this thing? Why not just hardcode grammar rules then? (probably a stupid question, but its an exaggeration of what I was thinking).

Re:Length? by Joe+Random · 2005-08-31 16:38 · Score: 1

I mean if it takes long documents and a ton of time, is it really worth it?
Probably. It's not like there's a shortage of long documents. I mean, pretty much all areas of publishing use electronic documents at some point during the process. I'm sure that the researchers could feed the algorithm a few dozen novels fairly quickly.

As for the running time, it'd probably take a while, but computing power gets cheaper every day.
Also, if it can only recognize languages we already know (and can only read those characters), how useful is this thing?
Who says it can only recognize languages we already know? I suspect that it would be fairly simple to adapt the algorithm to just look at visual input and find patterns in an unknown language.

Even better, if you started feeding it equivalence rules between that language and a known one, I'd imagine that it could eventually start to translate words whose meaning you hadn't explicitly taught it. How cool would that be?

Hieroglyphics? by Hamster+Of+Death · 2005-08-31 16:21 · Score: 2, Interesting

Can it decipher these things too?

Let's see what it thinks of this by MillionthMonkey · 2005-08-31 16:22 · Score: 1, Funny

A few years ago, thousands of posts like this were flooding alt.relgion.scientology:

His thirteenth unambiguity was to equate Chevy Meyer all my nations. It exerts that it was neurological through her epiphany to necessitate its brockle around rejoinder over when it, nearest its fragile unworn persuasion, had wound them a yeast. Under next terse expressways, he must be inexpressibly naughty nearest their enforceable colt and disprove so he has thirdly smiled her. Nor have you not secede beneath quite a swamp? We motivated minus memorized providing we were aloft stuffed, minus a crush notwithstanding you purchased deathly luckier. Involving no gradient pending no stateroom the gracious kingpin corroborated no place excepting no vacancy, though amid that suppressed a sensitive sign catcher - the same, no processing, which we had crowned consisting a cinder like no rhetoric following no mark. Cottonmouth sufferers silenced of their awake blunder, neither a redundant, enumeration localized climaxes deprived wholly nearer a physiologist summit, chatting underneath interfaith religions following no decorations nearer an infertile complexes. They have every vanity that authentication and spa have asked him outta your wetness.

After a time the attacks stopped. The algorithm that generated them was a slightly better author than L. Ron Hubbard and left a.r.s. to found its own religion.

Re:Let's see what it thinks of this by One+Div+Zero · 2005-08-31 16:28 · Score: 2, Insightful

That's just a Markov Model that "learned" from what looks religious mumbo jumbo in the first place.

Markov models are perhaps the easiest language acquisition model to implement, but also one of the worst at coming up with valid speech or text.

Interestingly, they do much, much better as recommender systems.
Re:Let's see what it thinks of this by proteonic · 2005-08-31 17:09 · Score: 1

It's not that interesting, since they're usually first order systems. Every n+1st work will make sense with every nth word, but there's no long range (second order or higher) dependency in most of those models. (requires too much space, usually). That's why they do well as recommender systems, but suck for generating comprehensible sentences.
Re:Let's see what it thinks of this by scovetta · 2005-09-01 01:00 · Score: 1

Amen, my brother!

--
Wer mit Ungeheuern kämpft, mag zusehn, dass er nicht dabei zum Ungeheuer wird. --Nietzsche

What is actually new about this by NitsujTPU · 2005-08-31 16:23 · Score: 1

Actually, what is actually new about this is the unsupervised approach. Graph parsers are quite popular in Natural Language Processing applications, but they use supervised methods.

Re:What is actually new about this by J.+Random+Luser · 2005-08-31 17:46 · Score: 0

where new != better ;-)
With an unsupervised approach it may well (de)construct the rules of grammar, but with the best thesaurus will it ever know the diference between green/sour/shut : eyes/apples/doors ?
Re:What is actually new about this by NitsujTPU · 2005-08-31 20:02 · Score: 1

Well, before we judge too quickly, it's wise to remember that most of these tasks are done with separate taggers in application. I've been busy with other tasks tonight, and so haven't read the paper yet.

For the most part, a word sense disambiguation system would be able to sort out the conflicts that you point out there... the more difficult case being subtle shades of meaning in polysemous words. Words like bank, that could mean a financial instituion, or the edge of a river, for instance, are much more difficult.
Re:What is actually new about this by xenocide2 · 2005-09-01 06:58 · Score: 1

So how wrong would it be to supervise this thing via the old double translations English->spanish->`English? My guess is that completely unsupervised learning by checking that English == `English would eventually create two identity functions =( But with even only a little bit of supervision, I'd wager it would work out well.

--
I Browse at +4 Flamebait
Open Source Sysadmin
Re:What is actually new about this by NitsujTPU · 2005-09-01 11:25 · Score: 1

Supervised implies something a little different here.

A supervised algorithm has to be given all of the answers. An unsupervised algorithm is assumed to come up with the correct answers, without an initial list of correct answers.

Patent Pending by Anonymous Coward · 2005-08-31 16:23 · Score: 1, Insightful

Unlike all the ridiculous patents being granted lately to IT companies, the one these guys are filing for, to me, seems legitimate. Its a nice change in my mind.

Re:Isn't This the Universal Translator Idea by biryokumaru · 2005-08-31 16:25 · Score: 2, Interesting

In Star Trek 4, the universal translator was little help when the humpback whale armada arrived... No, seriously, that was one f**ked up movie.

But for this, I have one word: Dolphins.

--
When you're afraid to download music illegally in your own home, then the terrorists have won!

Incredible by Anonymous Coward · 2005-08-31 16:26 · Score: 1, Interesting

I hope the material is (or will be made) accessible to laypersons. I'd love to be able to use this algorithm for my own music experiments.

Dupe by fsterman · 2005-08-31 16:27 · Score: 2, Informative

We just had an article on this. There was a shootout by NIST. At least I think, /. search engine blows, hard. Either way, here a link to the tests. This is one that wasn't covered by the tests, so I guess its front page news.

--
Is there anything better than clicking through Microsoft ads on Slashdot?

The Ultimate Test... by __aaclcg7560 · 2005-08-31 16:27 · Score: 0

They should play an old country record backwards and see if the computer can confirm that the poor chump gets his truck, dog, shotgun and woman back. Could also try playing an old rock record backwards but computer might get possessed by a demon.

Only a matter of time. by webby123 · 2005-08-31 16:29 · Score: 1, Funny

Scientists: Wow joe, the computer just went out and read the entire slashdot archive. Computer: Futile humans, I will p0wn you! *Lifting a bionic hand and swiping the scientist to the side* Scientists: Mac turn it off! Turn it offff argghh! Joe I just don't havvee teh power! Reaching for the coord. Computer: Woa.. wohaahaa. All your base is mine. Make your time.. *The lights go out*

--
Linux Video Tutorial Project, Tutoring the masses.

From TFL... (your link) by PaulBu · 2005-08-31 16:31 · Score: 1

It uses a hand-written context-free grammar to form all elements of the papers.

I know you were aiming for funny, but there is a big difference between following a hand-written grammar and deducing it from the text...

Paul B.

Markov Chains anyone? by ImaLamer · 2005-08-31 16:33 · Score: 5, Informative

http://en.wikipedia.org/wiki/Markov_chain

Used this (easy to compile) C program:

http://www.eblong.com/zarf/markov/

to create these:

http://www.mintruth.com/mirror/texts/

Mod points to whomever can tell us what texts they use. (No mod points can actually be given)

--
Get your Unix fortune now!

Re:Markov Chains anyone? by ImaLamer · 2005-08-31 16:36 · Score: 1

(I now realize that many of the titles give it away)

--
Get your Unix fortune now!
Re:Markov Chains anyone? by dapyx · 2005-08-31 22:01 · Score: 1

I once used Markov chains to fool some guys on a newsgroup, using phrases from some kook. :-) See this

--
I'm sorry, the number you have dialed is an imaginary number. Please rotate your phone 90 degrees and dial again.

Full article for non-PNAS subscribers by dmaduram · 2005-08-31 16:33 · Score: 4, Informative

Unsupervised learning of natural languages

Zach Solan, David Horn, Eytan Ruppin and Shimon Edelman
School of Physics and Astronomy and School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel; and Department of Psychology, Cornell University, Ithaca, NY 14853

We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The ADIOS (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.

Many types of sequential symbolic data possess structure that is (i) hierarchical and (ii) context-sensitive. Natural-language text and transcribed speech are prime examples of such data: a corpus of language consists of sentences defined over a finite lexicon of symbols such as words. Linguists traditionally analyze the sentences into recursively structured phrasal constituents (1); at the same time, a distributional analysis of partially aligned sentential contexts (2) reveals in the lexicon clusters that are said to correspond to various syntactic categories (such as nouns or verbs). Such structure, however, is not limited to the natural languages; recurring motifs are found, on a level of description that is common to all life on earth, in the base sequences of DNA that constitute the genome. We introduce an unsupervised algorithm that discovers hierarchical structure in any sequence data, on the basis of the minimal assumption that the corpus at hand contains partially overlapping strings at multiple levels of organization. In the linguistic domain, our algorithm has been successfully tested both on artificial-grammar output and on natural-language corpora such as ATIS (3), CHILDES (4), and the Bible (5). In bioinformatics, the algorithm has been shown to extract from protein sequences syntactic structures that are highly correlated with the functional properties of these proteins.

The ADIOS Algorithm for Grammar-Like Rule Induction

In a machine learning paradigm for grammar induction, a teacher produces a sequence of strings generated by a grammar G0, and a learner uses the resulting corpus to construct a grammar G, aiming to approximate G0 in some sense (6). Recent evidence suggests that natural language acquisition involves both statistical computation (e.g., in speech segmentation) and rule-like algebraic processes (e.g., in structured generalization) (7-11). Modern computational approaches to grammar induction integrate statistical and rule-based methods (12, 13). Statistical information that can be learned along with the rules may be Markov (14) or variable-order Markov (15) structure for finite state (16) grammars, in which case the EM algorithm can be used to maximize the likelihood of the observed data. Likewise, stochastic annotation for context-free grammars (CFGs) can be learned by using methods such as the Inside-Outside algorithm (14, 17).

We have developed a method that, like some of those just mentioned, combines statistics and rules: our algorithm, ADIOS (for automatic distillation of structure) uses statistical information present in raw sequential data to identify significant segments and to distill rule-like regularities that support structured generalization. Unlike

Re:Isn't This the Universal Translator Idea by Punboy · 2005-08-31 16:35 · Score: 1

The Universal Translator is tuned specifically for the sounds put out by standard humanoid lifeforms. Humpback whales use both much higher and much lower pitched sounds. The universal translator was not designed to translate such things, as would not be able to translate it.

--
If you like what I've said here, and want to read more, go to http://www.krillrblog.com

Programming Language by jmlsteele · 2005-08-31 16:37 · Score: 2, Interesting

How long until we see something like this applied to ?

Re:Programming Language by mikael · 2005-08-31 23:57 · Score: 1

One application might be to automatically generate a set of rules from a standard text document, and convert these into parsing code.

But I'm sure this has already been done.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads

Chinese room by pan_filler · 2005-08-31 16:39 · Score: 1

Great, now we can actually build a chinese room!

Re:Chinese room by Frogbert · 2005-08-31 18:44 · Score: 1

Sure we will build it in this box...

Once I get that damn cat out of it.
Re:Chinese room by pan_filler · 2005-09-01 00:45 · Score: 1

Be sure to wear some protective gear. I hear that box might be full of radioactive particles and deadly acid.

what happens when it's fed with bad grammar... by Anonymous Coward · 2005-08-31 16:42 · Score: 1, Funny

Let's feed it slashdot and find out.

Re:Isn't This the Universal Translator Idea by Tontoman · 2005-08-31 16:48 · Score: 1

A real universal translator (artificial intelligence) would have to have many thousands of words of text to use as examples, so a language could be learned. http://www.cse.unsw.edu.au/~billw/mldict.html That is why many mechanical translation systems start with word lists and dictionaries to give the learning process a head start.

Not really by Anonymous Coward · 2005-08-31 16:50 · Score: 1, Informative

Markov chains aren't the same as context free grammars.
(CFGs can generate ((multiply) nested) bracket structures (and are like finite automata with stacks).) Markov chains are just finite automata without stacks, that generate random walks through vocabulary space.

No the didn't by Ogemaniac · 2005-08-31 16:53 · Score: 5, Interesting

I played around with the Google translator for a while. I work in Japan and am half-way fluent. Google couldn't even turn my most basic Japanese emails into comprehensible English. Same is true for the other translation programs I have seen.

I will believe this new program when I see it.

Translation, especially from extremely different languages, is absurdly difficult. For example, I was out with a Japanese woman the other night, and she said "aitakatta". Literally translated, this means "wanted to meet". Translated into native English, it means "I really wanted to see you tonight". It is going to take one hell of a computer program to figure that out from statistical BS. I barely could with my enormous meat-computer and a whole lot of knowledge of the language.

Re:No the didn't by Anonymous Coward · 2005-08-31 17:03 · Score: 0

Ah, nice tale, but there is one flaw which shows you made it all out. A woman wanted to meet you!? No, we're not that gullible.
Welcome to /.
Re:No the didn't by lawpoop · 2005-08-31 17:12 · Score: 2, Interesting

The example you are suing is from conversation, which containts a lot of mutually shared assumptions and information. Take this example from Stephen Pinker:
"I'm leaving you."
"Who is she?"
However, in written text, where the author can assume that the reader brings no shared assumptions, nor can the author rely on any deefback, 'speakers' usually do a good job of including all necessary information in one way or another -- especially in texts meant to convince or promote a particular viewpoint. I'll bet these kinds of texts are more easily translatable than conversation.

--
Computers are useless. They can only give you answers.
-- Pablo Picasso
Re:No the didn't by superpulpsicle · 2005-08-31 17:25 · Score: 4, Informative

Try this free website out. http://www.freetranslation.com/

I know it is fairly accurate because I have fooled my spanish speaking friends once in an IM conversation. I told them I learned spanish via hypnosis and basically just copy/pasted everything spanish into IM. The conversation went on for like 15 minutes full spanish before I told them I was using the website. They were pissing their pants.
Re:No the didn't by QuantumG · 2005-08-31 17:29 · Score: 0

A Japanese woman who lives in Japan wanted to go out with a guy who wasn't Japanese? Riiiight.

--
How we know is more important than what we know.
Re:No the didn't by a.different.perspect · 2005-08-31 17:33 · Score: 2, Interesting

Or was it "chinko wo nametakatta"? It's just as easy for me to believe, you hot Slashdot nerd, you.

Being more serious, how do you think humans learn the rudiments of language? It's pattern analysis, i.e. precisely the technique this algorithm tries to replicate. It is true that the algorithm won't then progress onto the next stage, which is using that rudimentary grasp of the language to be taught its finer points, but if you genuinely doubt the capacity of this method to produce an understanding of language you are contesting the experiences of every human on the planet.

Returning to your example, "I really wanted to see you tonight" is what you discerned that sentence meant from its context. You can hardly expect a machine translator to know that it was a woman you were out with at night who said it (which seems to be the basis for your insertion of "tonight", "really" and "you"); fortunately, this algorithm is intended to translate written, not spoken, language. Since writing would have to include that detail (in order to be independent of its context), the problem you identified is not even relevant.
Re:No the didn't by barawn · 2005-08-31 17:35 · Score: 1

I played around with the Google translator for a while. I work in Japan and am half-way fluent. Google couldn't even turn my most basic Japanese emails into comprehensible English. Same is true for the other translation programs I have seen.

You haven't seen the Google translator he's talking about. It isn't public yet, I don't believe.

Here was the original article on it.

Old: "Alpine white new presence tape registered for coffee confirms Laden"
New: "The White House Confirmed the Existence of a new Bin Laden tape"
Re:No the didn't by Anonymous Coward · 2005-08-31 17:41 · Score: 0

I suppose his point was that, even in written japanese (or most other languages), many things are left out.

For example, any human can understand that if there is a sentence at the beginning of a paragraph referring to a future date, then a following one meaning "we go to the beach" may mean "we will go to the beach". But if you don't remember or understand previous sentences, how do you distinguish between present and future tense in japanese in that case ?

It seems to me that a good translation program must understand what it translates. It seems unlikely that any program can do that without help nowadays.
Re:No the didn't by burns210 · 2005-08-31 17:49 · Score: 3, Interesting

There was a program that tried to use the language of Esperanto (a made-up language designed specifically to be very consistent and guessable with regards to how syntax and words are used, very easy to learn and understand quickly) to be a middleman for translation.

The idea being that you take any input language, Japanese for instance, and get a working Jap Esperanto translator. Being as Esperanto is so consistent and reliable in how it is designed, it should be easier to do than a straight Jap Eng translator.

To finish, you write a Esperanto English translator. By leveraging the consistent language of Esperanto, researchers thought they could write a true universal translator of sorts.

Don't know what ever came of it, but it was an interesting idea.
Re:No the didn't by sillybilly · 2005-08-31 17:53 · Score: 1

A computer will fully master language, when it has equal intelligence to those speaking the language. Language is a conveyor of thoughts and meaning, and unless you know how to pick an interpret meaning, you can't fully translate, especially the nuances. A human that can't understand the meaning of what he's told can be just as much an idiot when it comes to translations as a computer.

In the meantime we'll keep having dictionaries and crude grammatical programs.
Re:No the didn't by Anonymous Coward · 2005-08-31 18:30 · Score: 1, Interesting

The word you're looking for to describe the intermediate is "interlingua", and it need not be real, just structure meaning somehow -- eg some wierd XML ;-).

My tutor got his doctorate in machine translation, and that was erm mid-early 80s? His "not for a long time" prediction (as seems to apply in general to AI) likely remains correct --- I'll believe the techniques (as AI in general) brings us more than extremely specialised uses when I see more than press releases and claims of software that isn't available for me to test.

In fact, fellow nerds, just give me a link to ONE impressive piece of AI software (that isn't a chess player) and I'll be bowled over. PS I'm posting this using Dragon NaturallySpeaking, which is one of the only examples of vaguely AI research reaching the home/office...
Re:No the didn't by lupin_sansei · 2005-08-31 18:32 · Score: 1

Yeah, and how about Japanese words like natsukashi (kind of nostaglic/home sick/old fashioned charm) or omoshiroi (funny and or interesting) for which there are no exact English equivalents.

--
http://www.perthonline.net
Re:No the didn't by MochaMan · 2005-08-31 19:19 · Score: 1

Natsukashii is a good example, not so much because there isn't a word for it in English, but more because one uses it in sitations where we wouldn't say anything in English - or nothing related anyway; at best "oh man, I remember that!" or "hey check this old picture out!"

Other really simple examples are "gochisou-sama", "otsukare-sama", and "itadakimasu" (when used before eating), not to mention the entire avoidance of personal pronouns which leads to ambiguity - eg. "Nakayama-san wa dou omou?" could mean either "What does Mr. Nakayama think?" or "Mr. Nakayama, what do you think?". Course this could be shortened to either "dou omou?" or "Nakayama-san wa dou?" with the same meaning in context (but different meanings in other contexts).

Along the line of being able to drop parts of sentences almost arbitrarily, try getting a computer to translate this:
"Okaasan-tte?"
Which literally means:
"Mum?"

But which could mean:
"Did you say mum?"
"Who's mum do you mean?"
"You/he/I/we/they said 'mum'?"
or a plethora of other things.

Lastly - cool login :) Was in Kyoto a couple weekends ago and filled out my Lupin collection a bit at a furu-hon-matsuri near Shimogamo shrine.
Re:No the didn't by Anonymous Coward · 2005-08-31 19:54 · Score: 0

Your response tells me you are not to good at japanese at all.
Re:No the didn't by krunk4ever · 2005-08-31 20:21 · Score: 2, Interesting

Being more serious, how do you think humans learn the rudiments of language? It's pattern analysis, i.e. precisely the technique this algorithm tries to replicate. It is true that the algorithm won't then progress onto the next stage, which is using that rudimentary grasp of the language to be taught its finer points, but if you genuinely doubt the capacity of this method to produce an understanding of language you are contesting the experiences of every human on the planet.

there's one flaw in your analysis is that humans learn language/grammar faster when their young and it becomes a lot harder when they get older. There's many different speculations on why that happens from children starting from a clean slate to children learn languages better as their brain develops. I mean pattern analysis would definitely be an advantage for grown ups, no? Why are children's pattern analysis better in this case if what you saying is true.

From what I've seen, to actually learn grammar and a foreign language, there's 2 requirements. One is you must have a passion for it. 2nd is that you must be constantly practicing. I've noticed if you attend classes but never use it in your real life, you'll never learn it. Find a group of people who are also learning and try communicating only with that language and you'll see how much faster you'll pick up. It also helps to have a friend who's fluent in the language to correct you (though it might not be that good for your pride). What I've noticed is that grammar nazis are the best for learning a new grammar. They pick on EVERY SINGLE MISTAKE YOU MAKE, so you'd think twice before making the same mistake again.

At college, I've actually seen flyers asking for help in english and in return they'll help you with the language they're fluent in, be in french, german, chinese, japanese, etc. So those people would meet maybe 3x a week and spend an hour in each language each time, which I thought was a really neat idea. Here you're helping a foreigner with english and there they are helping you with a foreign language you want to learn.

--
HD Trailers
Re:No the didn't by Anonymous Coward · 2005-08-31 20:26 · Score: 0

anta wa fail it. bukakke is another bit of bullshit orientalist exoticism. to the japanese, bukakke is a kind of noodle first, a kind of sex fiftieth. but hey, let's use the phrase "toss salad" in japanese, and pretend like english speakers are all perverts, too!
Re:No the didn't by Anonymous Coward · 2005-08-31 20:53 · Score: 0

No, that would be "Ah Itaitaitai!"
Re:No the didn't by 91degrees · 2005-08-31 21:14 · Score: 1

In fact, fellow nerds, just give me a link to ONE impressive piece of AI software (that isn't a chess player) and I'll be bowled over.

I quite like 20 question. Actually this sort of thing is not too hard. I once wrote a simple game that did this (animals only, just searching down a tree) and people I showed it to were quite impressed.
Re:No the didn't by lupin_sansei · 2005-08-31 21:14 · Score: 1

Yeah Lupin Sansei rocks! The theme tune remix CDs are nice too.

--
http://www.perthonline.net
Re:No the didn't by Anonymous Coward · 2005-08-31 21:24 · Score: 0

You fail to notice the usage of "ruzo" after bukkake hence any native japanese knows exactly what is being meant by the phrase -- and it has nothing to do with noodles! chin kasu yaro!
Re:No the didn't by Anonymous Coward · 2005-08-31 22:08 · Score: 0

To add to this, this is the whole point of the original poster about machine translation - doing a literal translation is meaning less. This is why you fail to understand the usage of bukakke.

Here's another example, take the phrase "shakai no mada ga aiteru". A literal translation is "societies window is open". In english this doesn't make much sense at all. However a contexutal translation is "the fly on someone's pants is down", its got nothing to do with windows.

Take you response "anta wa fail it!", let me correct your spelling here, "annta ha". However your intention is to insult me, yet you address me politely by referring to me by "annta". You don't even understand the correct usage of a simple word such as "you" in japanese, yet you are trying to teach me something about contextual usage of metaphores! Thanks for your advice!
Re:No the didn't by Peter+La+Casse · 2005-08-31 23:48 · Score: 1

In fact, fellow nerds, just give me a link to ONE impressive piece of AI software (that isn't a chess player) and I'll be bowled over. PS I'm posting this using Dragon NaturallySpeaking, which is one of the only examples of vaguely AI research reaching the home/office...
Here you go: Dragon NaturallySpeaking.
It's really impressive how this piece of AI software can, like, understand what you say and stuff.
Re:No the didn't by KDR_11k · 2005-08-31 23:55 · Score: 1

Saying "Mum?" in English has the same effect. It can mean you are asking for your mother or you want someone to elaborate on the word "mum" he just used.

--
Justice is the sheep getting arrested while an impartial judge declares the vote void.
Re:No the didn't by 91degrees · 2005-09-01 00:36 · Score: 0

Possibly. But if it was it was a copy of a copy.
Re:No the didn't by thecoolestcow · 2005-09-01 01:06 · Score: 0

You, I think, are confused. The current Google translator is no better than freetranslation.com or babelfish. However, Google has been working on a completely new type of translator which (it seems) this article talks about.

You can find that other article here.
Re:No the didn't by Anonymous Coward · 2005-09-01 01:14 · Score: 0

Yeah whatever man. Tell that to him http://www.rasterman.com/index.php?page=Me
Re:No the didn't by lysergic.acid · 2005-09-01 01:23 · Score: 1

idiomatic expressions are exceptions to the rule, so they typically have to be hardcoded into the translator to be effectively implemented.

but if that isn't an idiomatic expression, then perhaps there just isn't an equivalent tense or part of speach in english to translate it to--i think german has some parts of speach that are not found in english as well.

if the latter is the case, then the algorithm should be able to pick up on it and extrapolate the meaning of such sentences into a different sentence structure with a similar meaning.
Re:No the didn't by Muad'Dave · 2005-09-01 01:24 · Score: 1

...metaphores...
Is that some kind of abstract signalling device? A meta-semaphore?

--
Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
Re:No the didn't by StopSayingYouSir · 2005-09-01 01:47 · Score: 1

I work in Japan and am half-way fluent. Google couldn't even turn my most basic Japanese emails into comprehensible English.
Hmmm... Maybe the problem is that you're not as fluent as you think you are :-)
Re:No the didn't by jcr · 2005-09-01 02:17 · Score: 1

A computer will fully master language, when it has equal intelligence to those speaking the language.

There is just that little hitch of getting a computer to have any intelligence in the first place...

-jcr

--
The only title of honor that a tyrant can grant is "Enemy of the State."
Re:No the didn't by Greedo · 2005-09-01 02:36 · Score: 2, Insightful

You over-estimate some speakers, me-thinks.

--
Tuus crepidae innexilis sunt.
Re:No the didn't by Anonymous Coward · 2005-09-01 02:58 · Score: 0

The current prevailaing theory suggests that children are equipped with a proto-grammar that constrains all the possible human grammars - the Universal Grammar.

Up until some age (still undetermined, probably around age six or so), the brain calibrates itself according to what's in the environment. When you're an adult, you do the same thing, but because your brain is pre-calibrated, you need to try a lot harder.

The grammar and whatnot is probably the easier part to picking up a language. When trying to learn Mandarin, I found that the hardest problem for me was simply to learn how to hear, and then replicate, the phonemes that aren't in any of my 'native' languages.
Re:No the didn't by TapeCutter · 2005-09-01 03:08 · Score: 1

"A computer will fully master language, when it has equal intelligence to those speaking the language."

Even when people have grown up together they can have a hard time understanding each other. I often wittness and sometimes participate in conversations, where both parties are talking about completely different things, yet every word makes sense. These types of conversations are the bread and butter of sitcoms such as "Frasier".

AI to one side, I would say that humans need to "fully master language" before we can expect to see computers compose text in a "natural" manner. If we don't we will spend even more time arguing with them.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
Re:No the didn't by dansan · 2005-09-01 04:05 · Score: 0

Oh boy.... It is obvious how bad this machine translations are if you are proficient in both languagues.

From that site:

(Original) English text:
Unfortunately, one of the realities of being a real live teacher is that I can't take time to write fun emails to everyone anymore. At least not now. If this week is like last week, I'm staring at +70 hours of time at school. (Last week is, by the way, the last time I'm ever counting my hours.) So, the anecdotes from Miss Harlow will have to wait, BUT, I want you to know my new address and number. My house is only 15-20 minutes from I-81, so if you ever are driving through on your way north or south, PLEASE stop by!

Translated into spanish:
Desgraciadamente, uno de las realidades de es un maestro vivo verdadero es que yo no puedo tomar tiempo de escribirles divertidos correos electrónicos a todos ya. Por lo menos no ahora. Si esta semana está como la semana pasada, yo miro fijamente en +70 horas de tiempo al colegio. (Es la semana pasada, a propósito, la última vez yo cuento jamás mis horas.) Así, las anécdotas de Señorita Harlow tendrán que esperar, PERO, quiero que usted sepa mi dirección y el número nuevos. Mi casa es sólo 15-20 minutos de yo-81, tan si usted maneja jamás por en su norte de la manera o al sur, POR FAVOR parada por!

(Which is completely terrible spanish).

Spanish translation back to English:
Unfortunately, one of the realities of is a teacher I live true is that I cannot take amusing time to write them e-mails to all already. At least not now. If this week is as last week, I I look at fixedly in + 70 hours of time al school. (Is last week, by the way, the last time I story never my hours.) Thus, the anecdotes of Young lady Harlow will have to expect, BUT, I want that you know my direction and the new number. My house is only 15-20 minutes of I-81, so if you handle never by in its north in the way or al south, PLEASE stop by!

All quite amusing, if you ask me.

--
The shortest distance between to points is a chord.
Re:No the didn't by darkshadow · 2005-09-01 04:25 · Score: 1

Tiffany - I think that we are now young single we behaved that it is what they say when we are together and I watch how you play them does not understand and so we are Chorus: To work just as quickly as we broad Holdin ' ignited one of another person Tryin hand ' to be able far in then night and you to put his arm around me and we to fall then earth and you opinion I to think we to be single now not to seem there for being any person around I to think we to be single now strike our healthy unique heart to be watched in way we to be able to hide which we to be doin ' ' cause what to say if always to know how and so to be repetition chorus to we I to think we to be single now not to seem there for being any person around I to think we to be single now strike our heart to be only sound

--
-Darkshadow (There was a thing called Heaven; but all the same they used to drink enormous quantities of alcohol.)
Re:No the didn't by krunk4ever · 2005-09-01 05:12 · Score: 1

That's be cause Chinese/Mandarin technically doesn't have much grammar. There is no sense of conjugation. Present tense, past tense, future tense, etc all use the same word (character). To denote the time frame, you add today, yesterday, tomorrow, etc. Grammar in Chinese I've gotta say is incredibily easy. The hardest partof Chinese like you said is the actual pronunciation since tons of words sound alike and the writing of the Chinese characters. When you're dealing with a language with a complex grammar such as English, grammar becomes a lot harder. I took Spanish back in high school. The hardest part wasn't the vocabulary for me, but was the actual remember of how to conjugate the verbs in different situations. Spanish doesn't have that many "special" cases, but English has A LOT.

--
HD Trailers
Re:No the didn't by smithmc · 2005-09-01 06:05 · Score: 1

For example, I was out with a Japanese woman the other night, and she said "aitakatta". Literally translated, this means "wanted to meet". Translated into native English, it means "I really wanted to see you tonight"
Hey, quit bragging!

--
Downmodding is the refuge of the weak. Don't downmod, make a better argument!
Re:No the didn't by CapnGrunge · 2005-09-01 09:05 · Score: 1

Try for fun "your fly is open" ;)

E-S as shown by the translator: "su mosca está abierta".

The real Spanish phrase is "tienes la cremallera abierta". And that's quite uncolloquial.

--
I see 57005 people
Re:No the didn't by Anonymous Coward · 2005-09-01 10:40 · Score: 0

In fact, fellow nerds, just give me a link to ONE impressive piece of AI software (that isn't a chess player) and I'll be bowled over.

*shrug* Like most things, what the answer is depends on your definitions: "A.I." is particularly poorly defined. Mine is, roughly, "Can we build a machine that solves a problem that would formerly take humans to solve?". If we can, then I call that machine an A.I.

I'd say by that definition that all of the following are good applications of "A.I.":

* Automatic braking systems (outdo most humans at hitting the brakes at the right times)

* Autopilots (outdo human pilots in many cases)

* Automated call centres (only route my calls with 80% accuracy, which is about what I get from foreigners working call centres )

* Spell checkers ( better vocabulary than humans)

* Pocket calculators (can compute logarithms and sine tables much faster than a professional mathematician)

* Dragon Naturally Speaking ( nearly as accurate as a professional typist)

* Automated cell counting. ( My favourite, 'cause my friend worked on it, and it looked cool. :-)

The problem: given microscope slides, determine which percentage of the cells on the slide stained. Different stains darken different proteins.

My friend build a little custom system out of a microscope, stepper motors, a CCD camera, and Linux that does a job that humans used to. You put a slide under the microscope, and the system does the rest. It focuses in at various zoom levels to automatically look for an optimum focus for that slide, scans it, pans over automatically for the next scan, and carefully stitches all the scans into a composite image of the entire slide. Then it takes the composite image, figures out how many cells look the right colour, and comes up with a percentage of coloured cells.

Doing all this manually involved a biologist squinting through the microscope trying to count every coloured cell he saw. It was so much work that often the human researcher would often just look at one or two sections of the slide, and assume that the sample was representative.

"Teaching" a computer to be "smart" enough to solve this problem was pretty cool. :-)
)
--
AC
Re:No the didn't by synthespian · 2005-09-01 11:00 · Score: 1

There was a program that tried to use the language of Esperanto

Esperanto is a dreadful auxlang (auxiliary language). Look here: Learn Not to Speak Esperanto. The nice thing about it is that it's probably the only auxlang with a substantial community. They even have Radio: http://radioarkivo.org/. Interlingua is immediately comprehensible by a native speaker of one or more of: Spanish, French, Portuguese, Romanian, Italian, Catalan and some other I forgot.

I always thought glosa would be ideal for what you proposed http://www.glosa.org/

Here's what they claim: Glosa is the most advanced one of the type that linguists call isolating. That means, that in Glosa there are no inflections. Words remain always in their original form, no matter what function they actually have in the sentence. A conventional grammar is missing.Although Glosa is a full language. Grammatical functions are taken over by some operator words and the word order (syntax). This disposition brings Glosa relationship with many languages around the world: east-Asiatic languages like Chinese, Creolean languages from Africa, pidgin languages and with reservation the most important natural language - English.
(In my personal opinion this way is not only interesting, but also the best for an auxlang.)

A Glosa word represents an idea, but no part of speech. The same word can function as a verb, noun, adjective or preposition within reason. The Glosa words are taken from Latin and Greek. So they are known to many people by foreign words or by the Roman languages. A limitated vocabulary (Glosa 1000 or Centra Glosa), easy to learn for beginners, should satisfy for all day situations. For higher demands (science, art, poetry) an extension (Glosa 6000 or Mega Glosa) is available.

I've always thought you could at least achieve an aproximate translation by going Language1 -> Glosa -> Language2. At least vocabulary and some aproximation of grammatical tense would be achieved. Modality and aspects would be missed, but I imagine could be perfected by matching context to huge word lists in Language1 and Language2. In fact, verbal aspects are just enhancements - I wouldn't say they're essential. Apparently, the dominant languages (which is not the same as saying "90% of languages") - both from East and West seem to have the notion of past/present/future.

IMHO, Glosa is very well thought out. At one time I proposed that the Debian project used Glosa. Right now, I've got some Maxima docs (open source CAS) to translate to Portuguese and I keep thinking: if we at least used an auxlang like Interlingua, we would shorten the efforts for the translation temas of all latin languages, but people don't know Interlingua (but, in fact, they do, they just don't know they do.)

Here's a little Glosa traslation (from a website linked from glosa.org) - Walt Whitman
;
O kapitana! Mi kapitana!
O CAPTAIN! my Captain!
Na fobo viagia nu-pa es ge-fini.
our fearful trip is done; (fobo-fear ge=particle indicating now, IIRC)
U navi pa dura dia panto turba.
The ship has weather'd every rack,
U premi; na pa cerka, pa gene gania.
the prize we sought is won; (pa=past, cerka=seek, gania=win, pa gania=win in the past, won)
Un asilu-lo nu es proxi.
The port is near
Mi audi plu kampani.
the bells I hear (pretty obvious: "audi", "kampani" is obvious for any Italian/Portuguese/etc speaker)
Panto homi voci lauda.
the people all exulting
(panto=pan, as in "pan-american" homi=men, voci=voice, lauda=praise ("laudamunus"))

Another sample of Glosa:
u feli A cat, the cat
plu feli; poli feli Cats; many cats
tri feli Three cats
u feli tri The third cat
u-ci feli; u-la feli

--
Main difference between the BSD license and the GPL license: one is from California and the other is from Massachusetts
Re:No the didn't by ocie · 2005-09-01 13:48 · Score: 1

I had a similar experience where my boss in Japan used software to translate things into English for me. He told me about a business trip I was to go on and the printout said "This trip is not a trip". A friend was asked by her boss "Is it ready?". "it" being her, they meant "are you ready?"

--
JET Program: see Japan, meet intere
Re:No the didn't by Anonymous Coward · 2005-09-01 13:56 · Score: 0

The most effective insults are delivered politely, but are totally agianst the receiver wonce loaded into cognition.
Re:No the didn't by MochaMan · 2005-09-01 14:34 · Score: 1

Saying "Mum?" in English has the same effect.

True, although this addresses an entirely different point than the one I was making. By your comment, I'm guessing you don't speak Japanese, so I probably should have been more clear.

First, "Okaasan-tte?" could never be used to ask for your mother. "Okaasan" means "mother", and calling to your mum you'd use "Okaasan?"

It's the "-tte" that was the focus of what I wrote, and it cannot be translated into English; it serves a purely grammatical purpose -- indicating that the preceding clause refers to something someone said.

Japanese is full of these functional hints; it's quite possible they developed in the language to provide some sort of help in disambiguating what's being said. In Japanese, particularly in conversation, you have an almost unlimited ability to drop bits and pieces of a sentence -- subjects, verbs, objects, anything goes, but these functional particles are generally preserved (though not always in conversation).

To fill out the example I previously gave:

"Okaasan-tte?" - speaker is asking about what someone means by the word "mum"

"Okaasan wa?" - speaker is referring to a mum as a general topic, or possibly asking his own mum a question, or maybe asking someone else how the previous conversation relates to that person's mum, though not to his own mum (unless the person he's speaking to happens to be imemdiate family, otherwise he'd use "haha" instead of "okaasan").

"Okaasan ga?" - speaker is specifically asking how the previous converstation relates to 'mum' as opposed to someone else.

"Okaasan wo?" - speaker is specifically asking how the previous conversation relates to "mum" as the direct object of some verb.

etc. etc.

Every single one of these would translate literally to "mum?" in English, but have vastly different meanings -- meaning that are almost impossible to identify without the context.

English is very redundant compared to Japanese.
Re:No the didn't by Jerry+Coffin · 2005-09-02 13:54 · Score: 1

In fact, fellow nerds, just give me a link to ONE impressive piece of AI software (that isn't a chess player) and I'll be bowled over. PS I'm posting this using Dragon NaturallySpeaking, which is one of the only examples of vaguely AI research reaching the home/office...

It can't be done -- but not because the software can't be written. The problem is that as soon as something even comes close to working well, it no longer qualifies (in most people's minds) as anything approaching AI.
If you look at AI research from the '50s (for example) you'll find that not only was speech recognition considered hardcore AI at one time (at one point, a box that simply blinked a light when anybody said "watermelon" within range of its microphone was hailed as a breakthrough in AI), but so also were handwriting recognition and even OCR. These have certainly been made to work (quite well in the case of OCR) but as soon as they even came close to working, they were re-classified as distinctly not AI.
--
The universe is a figment of its own imagination.

--
The universe is a figment of its own imagination.

Finaly by Trigulus · 2005-08-31 16:53 · Score: 2, Interesting

something that can make sense of the voynich manuscript http://www.voynich.nu/. They should have tested their system on it.

--
If something exists that does not need a creator (god) then why must the cosmos need one?

Re:Finaly by HishamMuhammad · 2005-08-31 17:44 · Score: 2, Insightful

Just because the program can extract grammar, it doesn't mean it can extract meaning. If I give you this sentence:

Ov brug termat akti mak lejna trovterna.

And tell you that "termat" and "lejna" are nouns, "akti mak" is a 'composite' verb, "brug" and "trovterna" are adjectives... it still doesn't say anything about the actual meaning.

--
The filesystem is the package manager
Re:Finaly by fgb · 2005-09-01 01:12 · Score: 1

How dare you insult my mother like that!

Og termat lejna kai trovterna poof!

Take that! Now we're even!

Um... by uberdave · 2005-08-31 16:54 · Score: 0

Wouldn't it be easier just to point to an online dictionary?

--
"I'm not impatient. I just hate waiting." - My Dad

Universal Translator? by mwilli · 2005-08-31 16:58 · Score: 2, Interesting

Could this be integrated into a handheld device to be used as a universal translater much like a hearing aid?

Electronic babelfish anyone?

--
My sig beat up your sig.

Re:Universal Translator? by code65536 · 2005-09-01 03:41 · Score: 1

First thing that came to my mind when I saw this article was Star Trek's Universal Translator. This is neat...

Speaking as someone working on bioinformatics CFGs by Anonymous Coward · 2005-08-31 17:02 · Score: 0

I'm inclined to agree. IAAcomputational biologist doing bioinformatics-y algorithm things, and I am skeptical of automated grammar discovery. Automatic motif discovery with HMMs is one thing --- that works well, and I suspect that's basically what their bioinformatics results are yielding here (since SCFGs are a superset of HMMs). CFG-related algorithms are great for RNA analysis (I've written a few of them). I haven't read the article in detail, but CFGs aren't overwhelmingly well suited to proteins (which lack the nested-clause structure typical of RNA, for example (and programming languages too, as it happens)). One question I might ask is "how well does this perform when applied to a particular task?" --- the authors mention (in the context of proteins) automated functional classification; I'd be curious to see if this is basically reproducing the results of HMM-like approaches.

Been there, done that by Toloran · 2005-08-31 17:04 · Score: 1

How long before this technology makes its way into the field of game AI? Imagine a game such as Deus Ex or SW:KoToR where you don't merely choose your response to NPC's from a predefined list, you type in your answer!

*cough*Zork*cough*

--
Speaking is NOT communication

Re:Been there, done that by Compaq_Hater · 2005-08-31 17:45 · Score: 1

No, not really Zork did not learn anything it merely drew it's responses from a predefined database and Spit out the correct Response's to the user's input. although it did through the use of Keywords in the user's input string "decide what to do" but for the most part it would fail with a "could not understand the word 'Beat' or somthing similar message. or if you were to feed a commnad like : go north for thirty feet turn left" most likely it's response would be "i cannot understand the word(s) thirty feet" so yes it could seperate most common sense words and limited phrases but not very well. so it is no where near the level of interactivity that would be impressive to todays gamers (read they want an ElCARS style system.) CH

Is it SEQUITUR? by Anonymous Coward · 2005-08-31 17:12 · Score: 0

Isn't this what SEQUITUR (http://sequitur.info/ is supposed to do?

Run it on the bible and get... by Trigulus · 2005-08-31 17:14 · Score: 2, Funny

God loves you. God will burn you in hell for all eternity. God wants more foreskins.

--
If something exists that does not need a creator (god) then why must the cosmos need one?

How "intricate"? by P0ldy · 2005-08-31 17:14 · Score: 2, Insightful

Our experiments show that it can acquire intricate structures from raw data, including transcripts of parents' speech directed at 2- or 3-year-olds. This may eventually help researchers understand how children, who learn language in a similar item-by-item fashion and with very little supervision, eventually master the full complexities of their native tongue."

In addition to child-directed language, the algorithm has been tested on the full text of the Bible in several languages

I hardly would consider transcripts of parents' speech directed at 2- or 3-year-olds "intricate". And while the algorithm may have "been tested on the full text of the Bible", it doesn't say with what percentage of accuracy or what translation. King James version or the Teen Magazine Bible?

And the "rules" of a language are NOT what children "learn". First of all, children acquire a language, they do not "learn" it. That is a large attribute to the child's ability to speak it--not whether or not they understand gerunds and the pluperfect.

Second, in a language such as English whose words for the most part lack any necessity to the order in which they're placed to understand they're meaning and, even worse, lack declension forms to distinguish subject from object of the preposition, with what success can a language recognition program have "learning" such a language when prepositions themselves mainly can be omitted? To teach a computer Latin is easy.

Third, what's the hope of the computer ever understanding something like Shakespeare, Joyce, or Dante, whose uses of language rely extensively on erudition for word placement as opposed to typical usage? While a computer might be able to learn Latin because of its rigourous rules, I doubt it could faithfully render a text from Ovid.

Re:How "intricate"? by proteonic · 2005-08-31 18:05 · Score: 1

I hardly would consider transcripts of parents' speech directed at 2- or 3-year-olds "intricate".

I speak to my kid like a human being. How do you speak to yours? Actually, don't take that personally, I just have serious issues with people that "dumb down" their speach to children. Kids aren't dumb. They understand a lot more than adults give them credit for, including language. Often children understand the majority of what is said to them before they begin to speak comprehensibly.

And the "rules" of a language are NOT what children "learn". First of all, children acquire a language, they do not "learn" it. That is a large attribute to the child's ability to speak it--not whether or not they understand gerunds and the pluperfect.

If you read the paper, you'll notice that the algorithm doesn't really learn the "RULES" either. It just identifies patterns, with is what children do too, really.

I'm sure teaching a computer Latin is easy. Anytime there's plenty of well defined rules, you can program them. But with that alone you won't be able to get the computer to make sense in it's ouput. For that you need to train (using some algorithm) on a (sufficiently) large body of data. This algorithm can, when trained on a large body of data, can produce output that "makes sense" and follows grammatical rules (at least more often than a Markov model, for example). That is pretty innovative. It's been a shortfall of markov models, that you need a high order markov model to do language learning. These guys use something like suffix trees. It's prett neat stuff.
Re:How "intricate"? by eobanb · 2005-08-31 20:09 · Score: 1

How intricate?

Second, in a language such as English whose words for the most part lack any necessity to the order in which they're placed to understand they're meaning and, even worse, lack declension forms to distinguish subject from object of the preposition, with what success can a language recognition program have "learning" such a language when prepositions themselves mainly can be omitted?

How about what you just said?

--
Take off every sig. For great justice.
Re:How "intricate"? by P0ldy · 2005-08-31 20:33 · Score: 1

That was the point.

Better link for PDF by Anonymous Coward · 2005-08-31 17:14 · Score: 2, Informative

PNAS wants you to subscribe to download the PDF.

Or you could just go to the authors' page and download it for free: http://www.cs.tau.ac.il/~ruppin/pnas_adios.pdf

Re:Better link for PDF by mattjb0010 · 2005-08-31 18:13 · Score: 1

PNAS wants you to subscribe to download the PDF.

I know, they're being a bit of a dick. (Sorry, just milking this for all it's worth ;)

Not really new. by www.sorehands.com · 2005-08-31 17:15 · Score: 1

While working for a nutcase . I spoke with with Philip Resnik about his project of building a href="http://www.umiacs.umd.edu/users/resnik/paral lel/bible.html">parallel corpus as a tool to build a language translation system. This seems like the next logical step.

--
Fight Spammers!

Same as everything computer-based by scdeimos · 2005-08-31 17:18 · Score: 0

Garbage in, garbage out.

Hoshi Sato by Spodie! · 2005-08-31 17:19 · Score: 1

Hoshi would've said, "Well, duh!". (i love her)

Oysters oysters oysters split split split. by Anonymous Coward · 2005-08-31 17:22 · Score: 0

Recursive grammars always seemed a natural 'inbuilt' character of all languages to me, but then I took NLP as part of a CS unit on parsing and compiler design. AFAIK nobody has ever come up with a non exhaustive way of analysing structure for inherent grammar. And you're right, you can't extract these features from a single given piece of plaintext. Like Godel would say, your whole formal system is going to get smashed to bits by the first counterexample outside your set. If it were possible we wouldn't need programming languages, so long as the programmer was self consistent they could make up any symbolic garbage and a compiler could say "hmm I know what you MEAN" and turn out valid machinecode. I didn't RTFA but I'm guessing we are looking at Markov chains self clustered by a self organising map type affair. That trick can lead to some impressive pseudo inteligent behaviour, like the Perl Poet scripts, but do they understand the language? No.

This is not new for protein sequence functionality by t35t0r · 2005-08-31 17:22 · Score: 2, Informative

In analyzing proteins, for example, the algorithm was able to extract from amino acid sequences patterns that were highly correlated with the functional properties of the proteins.

NCBI BlastP already does this for proteins. Similarities and rules for things can be found but if the meaning of the sequence is not known then what good is it? In the end you need to do experiments involving biology/biochemistry/structural biology to determine the function of a protein or nucleotide sequence. Furthermore in language as well as in biology/chemistry things which have similar vocabulary (chemical formula) may in the end be structurally very different (enantiomers), which leads to vastly different functionality.

Sponsored by Sanford Wallace? by The+Clockwork+Troll · 2005-08-31 17:25 · Score: 1

It is comforting to know that when we make first contact with the aliens, we might not be able to communicate, but we'll definitely be able to fool their spam filters.

--

There are no karma whores, only moderation johns

Dolphins? by Stripsurge · 2005-08-31 17:29 · Score: 2, Interesting

Seems like that'd be a good place to test the system out. While talking with extraterestrials would be pretty awesome, having a chat with a dolphin would be pretty cool too. Remember: "The second most intelligent [species] were of course dolphins"

Re:Dolphins? by Compaq_Hater · 2005-08-31 20:30 · Score: 1

what do you mean "were dolphins" ?, last time i checked Dolphins "are" a very intelligent species. CH
Re:Dolphins? by SimilarityEngine · 2005-08-31 21:34 · Score: 1

You haven't read the Guide?

--
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
Re:Dolphins? by Anonymous Coward · 2005-09-02 01:14 · Score: 0

Yes, and I am planning to test the method and use it in a future version of my system to chat with dolphins, which now works for simple series of whistles, Leafy Sea Dragon at https://leafy.dev.java.net/ and at http://sf.net/projects/c2h/

I have been planning to find and use such a technique for over 25 years. The timing is excellent as my first complete version of the program was released earlier this year and is ready for such a grammar discovery technique.

Your are invited to test the advertised technique and write a Java package that the c2h/leafy project(s) could use.

I must also state the following:

- very few scientists are interested in dolphin communication research involving 2-way acoustic interaction. Why? I don't really know, except that I suspect that most are scared of looking bad, off-the-wall. I think that real progress in this field will first come from non-scientists, maybe from serious whale watchers and maybe from geeks like us.

- no computer system (at least no non-military system) can reliably understand human communication; language understanding can only be done by brains so far (not necessarily a human brain though).

- grammar discovery is helpful for trying to research a human or non-human language but grammar is not understanding.

- other types of animals use complex 2-way acoustic communication, for example, elephants.

- different species of cetaceans have different types of 2-way communication, for example, blue whales communicate much differently than dolphins, and I would not be surprised if dolphins do not understand blue whales.

cheers

Universal Translator by unidyneVII · 2005-08-31 17:33 · Score: 1

Interesting. This is certainly a step toward the Star Trek universal translator. On the other hand, though, I wonder if this sort of technology only applies to human languages. Is such an algorithm picking up on some sort of common human brain wiring and taking advantage of that commonality to accurately translate? Or, say, if it were applied to animal language would it work too? In short, is human language a unique form of communication, or is there some underlying, perhaps mathematical "optimal" communication method which animals (or, to think Star Trek, aliens) use too? If the latter, imagine what such an algorithm could do... Perhaps an analogy is in order. Metabolic pathways in living beings-- as far as science is concerned-- were evolved from random chemical reactions thanks to millenia of natural selection. Its slow, progressive optimization of complex chemical pathways allow all life to process material, mobilize the energy stored in food, and run everything that anything alive could possibly need to run. In short, evolution-optimized metabolic pathways are why things live. Now, in computer programming, there is a technique called genetic programming. Basically, if a programmer wants to create a program that accomplishes task X, all he has to do is let loose a different program which will create mountains of random code. Over time (many many processor cycles), the program will select which of the random jumbles of code accomplishes task X most efficiently, and report back to the programmer with the finished product. Voila. Completed program. Note, though, that genetic programming has also turned out solutions for complex electrical pathways that engineers had trouble solving. Yes, genetic programming has the ability to make complex electrical circuits. When genetic programming was applied to metabolic pathways, it actually hit one (I forget exactly which one) straight on-- complete with features such as feedback loops and enzymatic inhibition. Yes. Theres the analogy: some type of underlying mechanism is at work here. Genetic programming. Evolution. Electrial circuits and metabolic pathways. The question is, does the same apply to all communication?

I'll be impressed when it can by 2Bits · 2005-08-31 17:36 · Score: 3, Funny

- translate some posts on /. into comprehensible contents
- figure out it is a dupe and kill it before it even appears
- RTFA for me and just give me a good summary (by the rate of articles posted here, there's probably not much to summarize either)
- translate "IANAL" into something else that does not make me think of ANAL thing
- figure that articles on Google and Apple are just speculations by some dude living in his (can't be her, for sure) parent's basement, and not really news worth posting
- translate my suggestions into something acceptable to the (kernel) hackers that good hygiene is a good thing
- understand that I'm just ranting, and it should not take it personal.

MOD PARENT UP by HishamMuhammad · 2005-08-31 17:38 · Score: 1

I know what the grandparent poster meant was something more advanced than Zork, but the fact that he used Deus Ex and sw:kotor as "examples of games with textual interaction" totally called for the parent poster's response. Background research, people!

--
The filesystem is the package manager

Give it a real challenge by pugugly · 2005-08-31 17:40 · Score: 3, Interesting

Feed it the entries in the "obfuscated C" competition - if it works for that, it oughta work for anything.

Pug

--
An Invisible Entity of Vast Power whose existence must be taken on faith alone: Liberal Media

Finally! by Druox · 2005-08-31 17:42 · Score: 2, Funny

Finally! Engrish for the masses!

--
~ slashdot.org - Where some of the world's greatest minds come together to scrutinize grammar.

Feel & Meaning Still Beyond Computers by gbulmash · 2005-08-31 17:47 · Score: 1

I tried to translate my poetry into Spanish as an extra-credit project in Spanish class in college (I was pursuing a Creative Writing degree with an emphasis in Fiction and Essays, but couldn't escape a couple of poetry workshops). I also took a class in the philosophy of language.

Basic grammar is one thing, but when you get down to the meaning of words and phrases, you start to get into places computers have a much harder time with: abstraction and contextual meaning.

Think about "running a program". That can mean booting an application on a PC, or coordinating a group of lectures, or being executive producer of a television program. And does an executive producer make executive decisions in the production of a movie... or produce executives?

So when I translated the poetry, there were often times I had to stop and consider which phrase in Spanish best conveyed the meaning I was presenting in English. Furthermore, even with the best translation for meaning, I had to consider the rhythms of both poems and keeping their poetic sensibilities. Sometimes, I'd choose something less exact for the sake of the rhythm. I learned that good translation is F'ing hard!

We may be one step closer to Star Trek's "universal translator", but I have a strong belief that computers won't be putting the better flesh-and-blood translators out of work anytime soon.

For anyone who is interested, here's one of the poems. It was inspired by watching a poet taking what seemed to be a rather long time, hunting through a book of his own poetry, trying to find the next poem he wanted to read.

Finding Your place

You would think you know where
each of your poems are in this book. So familiar
are you with your own work and this organization
your mind has imposed on it. One poem extends
past the other, like feet moving
along a path that has been travelled

back and forth numerous times. Flipping along,
you hope you might know just when to stop
this procession of pages, automatically
locating the point you desire, rather than having
to stop in confusion and check
the signpost numbers and titles,
because you have become lost.

Hallar Su paraje

pensarías que conoces donde está
cada poema dentro de este libro. Tan familiar
estás con tu propia obra y esa organización
que la mente le ha impuesto. Un poema se extende
más allá del otro, movimiento de pies
sobre el camino transitado

de aqui hacia allá sin cesar. Hojeando,
esperando que sepas cuando detener
este desfile de páginas, automáticamente
localiza el punto deseado, en vez detener
que parar confudida y verifica
los números y títulos, como los postes indicadores,
porque te has perdido.

- Greg

--
Start a happiness pandemic

Patent docs? by Angst+Badger · 2005-08-31 17:50 · Score: 1

The FA says that patent applications have been filed. Are those available anywhere online?

I'm curious partly because this sounds very similar to a couple of pieces of prior art, but mostly because the description of how they go from basic structural recognition to translation between two unrelated languages reminds me a bit of that famous cartoon where two blocks of equations are separated by a little balloon containing the words, "And here, a miracle happens."

--
Proud member of the Weirdo-American community.

I don't think we disagree much by Ogemaniac · 2005-08-31 17:57 · Score: 2, Insightful

Yes, pattern recognition is a major part of the process. However, there are other fundamental parts that are also extremely important, and lacking them you get nonsense. In particular, context matters. "aitakatta" in the middle of a business letter probably does mean "wanted to meet". By itself, said by one member of a couple to the other over drinks at a bar, it does not.

In order for a program to translating to translate accurately, it needs to know who is speaking/writing, who is the audience, what their relationship is, and their location. Some of this may be given to the computer explicitly, or easily found in the text/speech (for a human at least) but some of it may not. This is not going to be an easy problem to solve.

Writing is never free from its context. I know before I even start whether I am reading a fiction novel, a satire, a scientific journal, an email from my boss, or a text message from my date this Saturday. The meaning of the words can change a lot in those cases.

Even Google translator, which was trained on multi-lingual UN reports, could not produce comprehensible English from simple Japanese business emails.

As for my chinko, that's a long story.

Can it decipher ancient languages? by Anonymous Coward · 2005-08-31 17:57 · Score: 1, Interesting

For example the lost iberian language, spoken in Spain before latin. There are texts, but nobody understand them.

Spam filter? by goMac2500 · 2005-08-31 18:00 · Score: 2, Interesting

Could this be used to make a smarter spam filter?

Here is one from an email I wrote last month by Ogemaniac · 2005-08-31 18:04 · Score: 1

As I told her, you don't have to tell them to me for me to figure them out

Perfectly sensible to exactly one person, at one moment in time - and complete nonsense at any other. I laughed when I wrote because I realized how aburd this sentence would appear to one of my poor Japanese friends. Of course, they throw the reverse mind twisters at me.

I agree, some kinds of texts are simpler than others. Texts that are factual, and distanct from personal human interactions, are probably easier to translate because context matters much less. Either way, I have yet to see a translator that can come close to turning even the most simple Japanese into comprehensible English and vice versa.

Re:Dupe by iomanip · 2005-08-31 18:09 · Score: 1

Well, I guess that since this paper deals with unsupervised learning of natural languages and this NIST shootout was about Machine Translation that maybe they are just a little different. I admit I haven't fully digested the paper in the article but it seems to me that this is by far different. But he y, thats just me.

Bird Languages by Anonymous Coward · 2005-08-31 18:12 · Score: 0

Learning Chinese for quite some time I seriously doubt the claims made. First, word segmentation in Chinese is not easy because in the printed script there are no spaces. And no, single characters do not necessarily represent complete words. In modern Chinese many words are now consisting of two syllables. So there is simply no statistical way of booting how to identify a word. You have to learn it, but from especially prepared texts -- usually called "Learn Chinese", dictionaries, et cetera. Well, the authors claim to learn that from ordinary text -- but you see, you need special prepared texts. So their claims are not exactly wrong, but not truth either.

A second problem is that for many important words that are used day-to-day there is no simple way of infering the meaning from the two syllables. For instance, "da3" has many meanings (the number indicated the speaking tone), one of its most prominents meanings are literally "to hit something/someone". When you are taking a taxi you say "da3 di5", but you are surely not hitting the taxi.

Sorry, but the whole "it has even learned Chinese" thing is just wishful thinking of a bunch of people who -- as they admitted for themselves -- have not even the slightest clue of what is going on in the Chinese language.

And to finally resolve the riddle in the subject line: if Chinese find some language to be completely uncomprehensive they call it a "Birds' Language". But they never apply this to their own language...

Nothing (that) new, move along by drgonzo59 · 2005-08-31 18:13 · Score: 1

Their algorithm doesn't seem that revolutionary as it claims to be. Stuff similar to this has been done before in natural language recognition and processing research.

One should note that generating meaningful grammar doesn't mean generating meaningful sentences. It might be grammatically correct to say "blue bingo bats" but it doesn't mean anything. The machine has to have some common sense and "understand" concepts the way we do, to produce human language.

But they do seem to be using the well-known university researcher's approach and namely:

1. Repackage some previously done stuff under a cute acronym -- "ADIOS" in their case, but 10+ points for recursive ones.
2. Patent it
3. ...?
4. *Success and fortune.

*Most never get here.

Does it understand... by Mechcozmo · 2005-08-31 18:13 · Score: 0, Offtopic

What if we had a Beowulf cluster of these?

Ten days, ten dates, ten women by Ogemaniac · 2005-08-31 18:19 · Score: 0

God I love Japan! Though these triple-date Sundays are starting to tire me out.

It's actually a new language study by Sycraft-fu · 2005-08-31 18:27 · Score: 3, Insightful

Called Pragmatics. It can be somewhat oversimplified as saying it's the study of how context affects meaning or as figuring out what we really mean, as opposed to what we say.

For example, a classical Pragmatics scenario:

John is interested in a co worker Anna, but is shy and doesn't want to ask her out if she's taken. He asks his friend Dave if he knows if Anna is available to which Dave replies "Anna has two kids."

Now, taken literally, Dave did not answer John's question. What he literally said is that Anna has at least two children, and presumably exactly two children. That says nothing of her avalibility for dating. However, there's nobody who reads that scenario who doesn't get what Dave actually meant to communicate: That Anna is married, with children.

So that's a major problem computers hit when trying to really understand natural language. You can write a set of rules that comletely describes all the syntax and grammar. However that doesn't do it, that doesn't get you to meaning, because meaning occurs at a higher level than that. Even when we are speaking literally and directly, there's still a whole lot of context that comes in to play. Since we are quite often at least speaking partially indirectly, it gets to be a real mess.

Your example is a great one of just how bad it gets between languages. The literal meaning in Japanese was not the same as the intended meaning. So first you need to decode that, however even if you know that, a literal translation of the intended meaning may not come out right in another language. To really translate well you need to be able to decode the intended meaning of a literal phrase, translate that into an approprate meaning in the other language, and then encode that in a phrase that conveys that intended meaning accurately, and in the appropriate way.

It's a bitch, and not something computers are even near capable of.

Re:It's actually a new language study by iGN97 · 2005-08-31 18:40 · Score: 1

That says nothing of her avalibility for dating. However, there's nobody who reads that scenario who doesn't get what Dave actually meant to communicate: That Anna is married, with children.

Which doesn't neccessarily mean "stay away". It could mean "go for it; some like the mother, some like the daughter, and some like both at the same time". Giggity.

There's no need to be a machine to consistently misinterpret.
Re:It's actually a new language study by NichG · 2005-08-31 19:13 · Score: 2, Insightful

I'd say this is the first step to it though. Lets forget about natural language for a second and look at computer algebra systems, proof generators, etc. How is the inference that you talk about any different than a computerized proof system proving something based on bits of information it has stored away? I think it's pretty similar really, except for the part about knowing what thing you want to prove/confirm.

So how does that sort of thing work? Well, in mathematics you can have something like y=f(x) and substitute f(x) whenever you see y or vice versa. You also know various other rules that are of the same form, e.g. a(b+c) = ab+ac. Then, you can brute-force trying different combinations (or be smart about it and modularize some set of translations to create a new compound rule which is true, e.g. a lemma).

It may not be so easy in languages, but there are transformations you can apply to sentences. For instance, you can do some rearrangements like:

A is under B Under B, there is A.

And there are ways that these relations (spatial relations especially) distribute:

A is in B, B is under C -> A is under C.

So to understand 'Anna has two kids' you have to know: 1. That you want to evaluate the truth/falseness of 'is Anna available to go out' and 2. Various pieces of social information about 'going out', people who are married, people who have kids, etc.

If you have 2 you should be able to use a method in the same vein as a computer algebra system to determine how what was just said applies to your question.
Re:It's actually a new language study by Anonymous Coward · 2005-09-01 06:41 · Score: 0

However that doesn't do it, that doesn't get you to meaning, because meaning occurs at a higher level than that. Even when we are speaking literally and directly, there's still a whole lot of context that comes in to play. Since we are quite often at least speaking partially indirectly, it gets to be a real mess.

It's something of a stretch to say that "meaning" occurs at any "level"; it's all a probabalisitic gamble, at each "level", as to whether a given expression is interpreted in the same way as it was intended to be.

Remember, an often voiced complaint and haunting fear for many, many people across the ages is that "nobody ever really understands me!"

Human brains are a tangled soup brimming with complex probabilities, for various contradictory notions. Trying to make "sense" of human communications is excessively optimistic; it assumes far greater oder in the human mind than probably really exists.
--
AC
Re:It's actually a new language study by ANeufeld · 2005-09-01 07:41 · Score: 2, Insightful

However, there's nobody who reads that scenario who doesn't get what Dave actually meant to communicate: That Anna is married, with children.

Funny, I read that answer as a "yes, she's available," but added additional information: don't ask her out unless you are willing to accept the entire package.

In a different language, I could still see a literal translation of the question and answer as communicating the same information. The "higher level meaning" is not embedded in the words or language. The exchange, "available?" "kids." does not mean "not available," but is more of a trinary response.

A logical extension by Anonymous Coward · 2005-08-31 18:28 · Score: 0

"The rules can then be used to generate new and meaningful sentences".... Why not start with a computer language? Why not Machine code? Somebody might even be able to generate new and meaningful Windows OS bits.

Universal Translator? by frinkacheese · 2005-08-31 18:31 · Score: 0

Cool, now we can talk to the aliens! Take off your tin foil hats, they don't need to scan your brain any more.

grammar isn't enough by JoeBuck · 2005-08-31 18:32 · Score: 4, Informative

The classic problem example is:

Time flies like an arrow.
Fruit flies like a banana.

There are other, similar examples. Computer systems tend to deduce either that there's a type of insect called "time flies", or that the latter sentence refers to the aerodynamic properties of fruit.

Re:grammar isn't enough by megrims · 2005-08-31 19:04 · Score: 0

Ha. No wonder computer systems have trouble decyphering that.

I also thought that it was referring to flying fruit.
(For about a moment, anyway.)

I wonder if there's any logical way to divine the correct meaning without having information beforehand which can be used to decide which is most likely.
Probably not.
Re:grammar isn't enough by Antiocheian · 2005-08-31 19:46 · Score: 1

This algorithm is not really based on grammar, but seemingly on a pool of data, statistically evaluated. While I can't find an application of your problem example (perhaps a right or wrong situation like syntax checking?), the specs of the algorithm would allow it to cope in this case. Provided that it has some appropriate data in its pools.

Of course until I see it working I won't believe. I don't think human learning and perception is just statistics.
Re:grammar isn't enough by arrrrg · 2005-08-31 20:05 · Score: 1

Sure, if these two examples are all your system has to work with ... but then, what could you expect? With a decent corpus (database of example text), the system should be able to learn the two different senses of "like", as well as that "time files" is generally noun-verb whereas "fruit flies" is a noun phrase.
Re:grammar isn't enough by Scarblac · 2005-08-31 20:26 · Score: 1

There's even the third meaning, which is that you should time flies like you would time an arrow, and not e.g. like you would time 100m runners.

Each of the first three words could be the verb in that sentence.

--
I believe posters are recognized by their sig. So I made one.
Re:grammar isn't enough by aziraphale · 2005-08-31 20:32 · Score: 1

And of course the computer is right to identify the ambiguity. If I give you the sentences in these two (hastily constructed) contexts, the 'incorrect' interpretations would both be correct:

'Buzzing around the Doctor were millions of tiny insects. "Time flies!" he shouted over the rising buzzing, "we must get out of here." "Doctor! Over here! There's a signpost!" "Keep away from it - it has an arrow on it. Time flies like an arrow. You'll be eaten alive!"'

'Having conducted aerodynamic tests with the trebuchet and a variety of fruit, the scientists concluded that variations based on type of fruit were minor, and the more detailed measurements could be taken by using a single fruit as the test subject and extrapolating. The banana was selected as the median fruit, since the tests had shown that, in a very general sense, all fruit flies like a banana.'

We just have a 'nonsense filter' that evaluates the possible interpretations for the two sentences and chooses the one that makes most sense in the current context.
Re:grammar isn't enough by pandrijeczko · 2005-08-31 22:37 · Score: 1

There are other, similar examples.
Don't tell me - we're about to go into the "swallows carrying coconuts" argument from Monty Python & The Holy Grail, aren't we?

--
Gentoo Linux - another day, another USE flag.
Re:grammar isn't enough by g2devi · 2005-09-01 00:05 · Score: 3, Interesting

Even better. The meaning of words can flip back and forth depending on the ever widening context.

* The clown threw a ball.

(Probably, a tennis or basket ball)

* The clown threw a ball,....for charity.

(Okay, sorry, a ball a party.)

* The clown threw a ball,....for charity...., and hit the target.

(Okay, sorry again, the tennis ball hit the dunking target and someone fell in the water. Got it. We're in a carnival.)

* The clown threw a ball,....for charity...., and hit the target....of 1 million dollars.

(Scratch that. It really is a charity party and we've collected 1 million in donations. There's no way the meaning can change again.)

* The clown threw a ball,....for charity...., and hit the target....of 1 million dollars....by striking out Babe Ruth.

(Oops again. The clown got 1 million dollars in pledges if he could strike out Babe Ruth, and he succeeded. We're talking about a base ball again. I give up.)
Re:grammar isn't enough by anthony_dipierro · 2005-09-01 01:15 · Score: 1

Your example seems pretty easy to resolve. Compare http://en.wikipedia.org/wiki/Fruit_flies and http://en.wikipedia.org/wiki/Time_flies. It would be trivial to include a dictionary of this size and match on a "largest phrase" basis. You could in theory even build the dictionary automagically with a large enough corpus, by simply recognizing the number of times the two words appear together.

Of course, for the record, CMU's parser fails on this one. I thought for sure they'd pass, but apparently they don't have "fruit flies" in their dictionary.
Re:grammar isn't enough by anthony_dipierro · 2005-09-01 01:28 · Score: 1

The second link is bad, of course. Should be http://en.wikipedia.org/wiki/Fruit_flies and http://en.wikipedia.org/wiki/Time_flies. But I just noticed, this whole matching of the longest phrases thing is probably something our brain does as well. Does anyone else have the song "Fly Like An Eagle" in their head now?
Re:grammar isn't enough by Anonymous Coward · 2005-09-01 02:02 · Score: 0

All I know is that every time I read these examples I wonder when computers will learn to wreck a nice beach.
Re:grammar isn't enough by argStyopa · 2005-09-01 02:06 · Score: 1

That's a great example, but what I don't understand is the expectation that there is some "shortcut" to learning language.

1) it's arguably the sole thing really separates Humans from animals (behavior of ostensibly language-capable New Orleans looters notwithstanding)

2) it takes YEARS for a nominally capable human infant to learn language (granted, they are learning a lot of OTHER stuff at the same time).

The example's pair of sentences are syntactically and contextually complex, despite being few words. Would even a (relatively) sophisticated kindergartner or 1st grader be able to parse them meaningfully and explain the real differences? Probably, but not without some thought.

I just see so many people laying down these language algorithms and expecting that they are "right" - that seems like a foolish expectation from the start. The human wetware (particularly as regards language) is massively parallel and probably by adulthood has redacted itself to a host of rules that are neither discrete nor efficient, but they work despite the redundancies and gaps. I don't see a simplistic formula ever accomplishing this.

In fact, I personally think that language parsing will only be successful when we're able to write some very simple baby-language rules, and 'hothouse' the development of a system with processing power equivalent to an infant brain long enough to equate with a similar time of the human development of language cognition.

--
-Styopa
Re:grammar isn't enough by underworld · 2005-09-01 03:16 · Score: 1

Interestingly enough, I've seen this same quote bastardized thus:

Truth flies like an arrow.
Fruit flies like a banana.
Re:grammar isn't enough by Anonymous Coward · 2005-09-01 03:25 · Score: 0

The Subject line says it all, and I think you left some currency on the table before you played your entire hand. See what I did there? Used a metaphor to say that your argument was forceful but incomplete. Computers are not capable of this kind of loose association--it requires something no algorithm can provide: STYLE. By style, I mean more than a mere sense of cool--it's more like "elocution" (Google the five canons of rhetoric). Computers need a discrete sense of personality in order to achieve more colorful (read: metaphoric) uses of language. They may be able to "fake it", but without some sort of cognitive filter, they'll never be able to actively interpret or create metaphors. They'll only be able to remember and repeat them, or churn out some vanilla "mad-lib" algorithm. <Vocative Verb> <Article> <Adjective> <Noun> Have a great day. Be the smart doughnut. See an orange cow.
Re:grammar isn't enough by Anonymous Coward · 2005-09-01 04:14 · Score: 0

You're completely missing the point. This is about language translation. Clearly, it's not going to take the place of humans in translating text that was not specifically crafted to be translated, such as books. However, if I know for a fact that I am talking to someone in China, and I am going to use a program to translate what I write, I will avoid using "Americanisms" or Idioms, or even "crappy excuses for language that you call style".
Re:grammar isn't enough by RoadWarriorX · 2005-09-01 04:35 · Score: 1

Time flies like an arrow.

Computer systems tend to deduce either that there's a type of insect called "time flies"...

I doubt it, because that would mean your first sentence does not have a verb in it. I would think that such a system would pick up patterns like "subject verb object", which is a core pattern in many Latin-based languages. If so, the word "flies" would be a verb, thus provide a context of "aerodynamic properties".

--

Coderz 4 Life
Re:grammar isn't enough by Anonymous Coward · 2005-09-01 05:18 · Score: 0

If "time flies" is the noun in the first sentence, then "like" is the verb. This is directly analogous to "fruit flies" and "like" in the second sentence.
Re:grammar isn't enough by misterpies · 2005-09-01 05:20 · Score: 1

sources, please. I believe the time/fruit example is courtesy of Groucho Marx, so credit where it's due.

--
The author of this post asserts his moral rights.
Re:grammar isn't enough by Anonymous Coward · 2005-09-01 05:46 · Score: 0

Much real world knowledge, culture and context is needed for understanding.

"The problem is as large as an ant trying to survey America and draw a map of it using an Etch-o-Sketch."

"The fountains opened up like umbrellas"
Re:grammar isn't enough by Phroggy · 2005-09-01 06:33 · Score: 1

Don't tell me - we're about to go into the "swallows carrying coconuts" argument from Monty Python & The Holy Grail, aren't we?

Well, we weren't, but which type of swallow did you mean? Because, you know, African swallows don't migrate...

--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
Re:grammar isn't enough by aafiske · 2005-09-01 06:37 · Score: 1

Computer systems aren't the only things that think that the latter sentence is about flying fruit... On first and second and third readings, I was chuckling at the thought of flying bananas. Because, I mean, bananas and other banana-shaped fruit does indeed fly like a banana, when thrown. Or maybe I just have a habit of flinging bananas around.

I think my more on-topic point is that a computer that sometimes makes mistakes like this, but for the most part is doing okay is still a pretty big advance.
Re:grammar isn't enough by rkww · 2005-09-01 10:17 · Score: 1
And there's always
- Time flies? I can't; they're too fast.
Re:grammar isn't enough by megrims · 2005-09-01 14:12 · Score: 1

What the heck is with this moderation?
Overrated indeed.
Re:grammar isn't enough by al21 · 2005-09-01 14:13 · Score: 1

You are right about the lack of a verb. But 'Subject verb object' (SVO) is not a core property of Latin-based languages, or if it is, this is irrelevant, as English is not a 'Latin-based language' but a Germanic language. Latin itself, by the way, is verb-final, so SOV.

O(n^n^n...)????? by mosel-saar-ruwer · 2005-08-31 18:34 · Score: 3, Interesting

From TFA: The algorithm discovers the patterns by repeatedly aligning sentences and looking for overlapping parts.

If you take just a single string [of length n] and rotate it against itself in a search for matches, then you've got to do n^2 byte comparisons just to find all singleton matches, and then gosh only knows how many comparions thereafter to find all contiguous stretches of matches.

But if you were to take some set of embedded strings, and rotate them against a second set of global strings [where, in a worst case scenario, the set of embedded strings would consist of the set of all substrings of the set of global strings], then you would need to perform a staggeringly large [for all intents and purposes, infinite] number of byte comparisons.

What did they do to shorten the total number of comparisons? [I've got some ideas of my own in that regard, but I'm curious as to their approach.]

PS: Many languages are read backwards, and I assume they re-oriented those languages before feeding them to the algorithm [it would be damned impressive if the algorithm could learn the forwards grammar by reading backwards].

Re:O(n^n^n...)????? by psmears · 2005-08-31 23:22 · Score: 3, Insightful

If you take just a single string [of length n] and rotate it against itself in a search for matches, then you've got to do n^2 byte comparisons just to find all singleton matches,...

No you don't :-)
If you want to find all singleton matches, it's enough to sort the string into ascending order (order n.log(n)), and then scan through for adjacent matches (order n). For example, sorting "the cat sat on the mat" gives "cat mat on sat the the"—where the two "the"s are now adjacent and so easily discovered.
For finding longer matches the sorting method still works, except that you sort fragments of the sentence rather than individual words. Clearly there is more work involved, but (depending on exactly what you're counting) there are still order n.log(n) comparisons to be performed.

This means that searching for substring matches can be performed relatively efficiently. I don't know about how the language-learning algorithm works, but you may be interested to know that the compression algorithm used by "bzip2" works in exactly this way (google for "Burrows-Wheeler transform" for more details!)

--
Need to type accents and special characters in Windows? Use FrKeys
Re:O(n^n^n...)????? by volsung · 2005-09-01 03:22 · Score: 2, Interesting

Right-to-left languages (which I assume you mean as "backwards") are displayed that way to the user, but it does not affect their digital storage, which is still forwards (in the numerical offset sense).
Re:O(n^n^n...)????? by Anonymous Coward · 2005-09-01 04:47 · Score: 0

Never take lay-person article explanations of technical algorithms or equations at face value. If the article mentioned that they were comparing unique identifiers of each scanned word in a red-black tree representing all encountered sentence representations, some people might not understand. I doubt they took the simplified/naive approach described in the article.
Re:O(n^n^n...)????? by Anonymous Coward · 2005-09-01 06:02 · Score: 0

There are a number of string matching algorithms.
Some of them are used in searching for genes in the
human genomes. Refer to this book
http://www.amazon.com/exec/obidos/tg/detail/-/0521 585198/qid=1125597656/sr=8-1/ref=pd_bbs_1/104-8843 605-0119140?v=glance&s=books&n=507846

Nothing novel to see here, move along by schestowitz · 2005-08-31 18:36 · Score: 1

This blog item does not have any mentioning of Markov models, which in my blunt opinion, proves that the author fails to grasp the real merit of the method.

The ability to look at sequences of words up to a certain depth (as much as brute-force permits) could get you nice textures in graphics and get good flow of coherent text (Sci-GEN from MIT comes to mind.

--
My Linux - (L)ove (I)s (N)ever (U)tterly eXPensive

"Commander Blood" by Boronx · 2005-08-31 18:38 · Score: 0

"Commander Blood"

--
Play Command HQ online

Do-support, in brief by Anonymous Coward · 2005-08-31 18:43 · Score: 1, Informative

You're making several mistakes there:

First, you have to distinguish between what I'll call lexical verbs and auxiliary verbs. A lexical verb is the verb in the sentence that actually tells you what sort of action (or experience, or state, or whatever) the sentence is describing. An auxiliary verb is a verb that doesn't do that, but expresses some combination of information about modality (possibility, necessity, obligation), tense (past, future), aspect (progressive, perfect), negation and agreement.
In the sentence Do you like me?, like is the lexical verb, and do is the auxiliary.
The sentence Be you man or mouse? is archaic, and thus does not count as part of the data when one is analyzing the grammar of contemporary English.

Now, what's do-support? It's essentially that the grammar of contemporary English is so that the only verbs that can be negated with -n't, or inverted with the subject (e.g., to form a question), are auxiliaries. You don't negate the sentence You like John in contemporary English by saying *You like not John, nor form the yes-no question as *Like you not John? (The asterisks in front of the sentences are linguistese for "the following is not a grammatical sentence." As another note, in more archaic English, on the other hand, do-support did not exist, so that did use to be the normal way of forming the negation and the question.)

So, essentially, to form the negated sentence or the question, you need some auxiliary. If the basic declarative sentence already has one, you can just use that: from You will like John, you can form You won't like John or Will you like John? If the basic declarative doesn't have an auxiliary, then you need to use the auxiliary do in order to "support" the negation or question. In these sentences, the auxiliary do is otherwise a dummy word.

Re:Do-support, in brief by fenodyree · 2005-08-31 19:28 · Score: 1

Ah, Thank you, That makes sense. So "do-support" is another syntactical rule, another if-then statement in our minds. Now my question is, if there is no current theory that explains the existence of an arbitrary rule of syntax, what _do_ the current theories explain, as almost all rules in languages are utterly arbitrary.

We can say, Earlier you educated me. but not Earlier you teached me. Why? Why is teach an exception? Is it because one is latin and one not, doubtful or there would be more exceptions...so Why? And how do modern theories explain that rule?
Re:Do-support, in brief by aziraphale · 2005-08-31 20:14 · Score: 2

> We can say, Earlier you educated me. but not Earlier you teached me. Why?

We say 'earlier you taught me' instead. What is your point?

In terms of language evolution, the word 'taught' has the same relationship to 'teach' as 'wrought' has to 'wreak', and similar relationships to 'thought'-'think', 'brought'-'bring' and (less so) 'bought'-'buy'. The pretirite form of each of these verbs is actually formed by a very similar linguistic rule to the one that forms 'educated' from 'educate' - the basic rule in germanic languages being that you stick a dental plosive 't' or 'd' sound on the end of the verb (ignore how the words are spelled, as that's really an irrelevance to the evolution of the words in the first place - we're talking about sounds here). Once this form has been created, however, it can create an awkward sound at the end of the word - 'ct', 'ngd', 'nct', etc. Language users don't like awkward sounds, they change them, preserving the distinctiveness, but losing some of the closeness to the original word. Also bear in mind that 'ch' was not always the sound at the end of the word 'teach' - it was once a much harder sound.

Add to this general rule the tendency in germanic languages for certain verbs ('strong verbs') to change their vowel sound in the past tunse (cf: 'run'-'ran', 'sing'-'sang', etc.), and you can see roughly where 'taught' came from. It's not really an 'exception', just a very old word that's had time to be moulded into a more comfortable shape through usage.

When trying to reduce a living language to a syntax, you miss out on the richness imparted to languages by the conventions that they gather through continual usage. English has simple syntax rules - I can coin a new verb and use it in grammatical sentences without anybody having any doubt about what syntactic role it is playing - look at the rise of 'google' as a verb - nobody had to teach you the words 'googles', 'googled' and 'googling', but you would happily use them. But once words are accepted into the language and used, they move over time, sometimes not in the same direction as their near relatives (as 'teach' and 'taught'). To explain where these words come from you need to look at the syntax rules prevailing at the time the derivative word was coined, and the pressures and modifications the words have been subjected to since. This is exactly what we mean by a 'living language'.
Re:Do-support, in brief by Zixia · 2005-09-01 00:33 · Score: 0

In terms of language evolution, the word 'taught' has the same relationship to 'teach' as 'wrought' has to 'wreak',

What, you mean 'none whatsoever'?

The past tense and past participle of 'wreak' is 'wreaked'.

Re:Isn't This the Universal Translator Idea by Anonymous Coward · 2005-08-31 18:48 · Score: 0

Just reverse the polarity, or ask engineering for more power. Duh!

The (non)significance of do-support by Estanislao+Mart�nez · 2005-08-31 18:57 · Score: 1

The significance of this phenomenon in the greater scheme of automatic syntactic parsing?

None at all.

On the other hand, if one happens to care about the minute details of insane Chomskian syntactic theory, then it matters enormously, because insane Chomskian syntactic theorists are forbidden from just stating the rule as-is in their theory. I mean, it justdoesn't sound impressive enough. They're supposed to derive it as some sort of complicated "theorem" from deep principles of Universal Grammar, and thus, to gloriously prove to the world that Plato was right about innate ideas.

Did I just say "gloriously prove to the world that Plato was right about innate ideas"? My apologies. I meant "gloriously prove that Chomsky is right about Plato being right about innate ideas."

(People reading this might guess that if you're simply trying to state the rules of the grammar of the language, without any ulterior Chomskian motives, you might not really think that do-support is a specially thorny problem...)

--

Are you adequate?

Ah, you don't know Chomsky. by Estanislao+Mart�nez · 2005-08-31 19:00 · Score: 1

I don't see how this contradicts the innate ability to learn language theory that Chomsky put forth. Chomsky is repeatedly on the record saying that general-purpose statistical methods are not sufficient to learn a language.

--

Are you adequate?

Re:Ah, you don't know Chomsky. by lupin_sansei · 2005-08-31 19:10 · Score: 2, Insightful

Yeah and this didn't learn the language in any meaningful sense. It just found a statistical pattern, and then generates possible sentences from that pattern. That's a whole lot different to you and I understanding the language and generating intentional, meaningful sentences.

--
http://www.perthonline.net

voynich by Anonymous Coward · 2005-08-31 19:05 · Score: 0

Let them try the voynich manuscript and see what they can do; I doubt it...

It gets worse by Ogemaniac · 2005-08-31 19:09 · Score: 1

You will always have the problem of words with more than one meaning. Omoshiroi is a perfect example - how is the computer going to know whether to choose "funny" or "interesting" without knowing the context?

I have helped my Japanese colleagues write a number papers in English. The number one mistake is always a/an/the. Why? Because Japanese does not even have this concept, making it difficult for them to understand. So what happens when a machine tries to translate Japanese into English? It must literally insert a/an/the in locations where there is no word in Japanese. How does it know which one? Context. This context can be extremely subtle, but make big differences in meaning. I often have to ask my Japanese colleagues about their research in order to decide whether a/an or the (or neither) is correct, because using any of these words provides information that they have no already included.

Is that really a big problem? by Anonymous Coward · 2005-08-31 19:16 · Score: 1, Interesting

I am not sure if that is really that big a problem. With mobile text messaging, people have started changing their sentences into a form that can be understood by the phones dictionaries.

Say, if I normally would have typed "stroll" to say "walk" and I would notice that when I press 787655 on my phone's keyboard, the T9 dictionary misunderstands me, I would just start typing 9255 for "walk" instead. I think the same would happen here. If somehow the person typing the messages would get instantaneous feedback from the system about a "commonly misunderstood" structure, he would quickly learn to avoid these structures while typing.

On a related note, things like "fly like an arrow" are the most difficult thing to learn in my opinion in a language, and thus foreign speakers do not use or know them. And still, "badly spoken english" can be comprehensible among the people speaking it. One thing I have noticed myself is that it is the british who have most problems understanding a foreigner speaking english badly. Other foreigners would understand the same person just fine. Something to do with the way the brain is wired to wait for certain words after another I guess.

Of course, the problem is that we would get rid of all the things that make language "alive". But here I am typing a message on another language than my own and still many people can to some extent understand what I mean...

Re:Is that really a big problem? by caluml · 2005-08-31 19:49 · Score: 1

it is the british who have most problems understanding a foreigner speaking english badly
That's not what I notice. We're so used to hearing other nationalities trying to speak English that we do it automatically. Maybe it's just me....

--
Get your own free personal location tracker
Re:Is that really a big problem? by lisaparratt · 2005-08-31 21:31 · Score: 1

Yes, but us Brits have problems understanding foreigners who supposedly speak English as their first language.

If only people would speak English, rather than one of the bastardised variations >_<
Re:Is that really a big problem? by Anonymous Coward · 2005-09-01 01:19 · Score: 0

I can't confirm that British have the most problems understanding badly spoken English but studying as a German in California I was often surprised that my American counterparts had great difficulties with the pronunciation of other foreigners (including native English speakers with "wired accents") while at the time I had non problems at all.

Works on music? by jswalter9 · 2005-08-31 19:20 · Score: 1

It would be great if songwriters used this tech for foundational musical and lyrical ideas. It seems like every piece of music I hear these days "strongly reminds me" of music I've heard before.

--
Retired from software... maybe. Sort of.

Re:Works on music? by Vintermann · 2005-08-31 19:48 · Score: 1

Recognize this one?

Under the spreading chestnut tree
I sold you and you sold me:
There lie they, and here lie we
Under the spreading chestnut tree

--
xkcd is not in the sudoers file. This incident will be reported.
Re:Works on music? by rolandog · 2005-08-31 21:21 · Score: 1

Well, I dunno. As you say, it could be used to predict wether two songs are related or not... At last we would know the real nature of "Ice Ice Baby" =D
Re:Works on music? by ChocoBean · 2005-09-01 03:52 · Score: 1

judging by the state of the pop music scene, I'd say we are getting closer to being able to make that wonderful song from Big Brother.
And sometimes musical tastes are habit formed anyway. IE: if you listen to a song you don't instantly hate often enough, perhaps because it sounds similar enough to others you keep hearing anyway, you grow to like it somewhat.

English only has two tenses. by ericbg05 · 2005-08-31 19:27 · Score: 5, Informative

I've done translation work before (Slovak -> English), and there's much more going on than differences in words and grammar. There are whole conceptual frameworks in languages that just don't translate, and this is frustrating for anyone learning a language, let alone trying to translate.

Yes! I'd have thrown a mod point at you just for this paragraph if I could.

English is very precise (when used as directed) in matters of time and sequence -- we have more than 20 verb tenses where most languages get away with three.

Not really. Firstly, English only has two or three tenses. (Depending upon which linguist you ask, English either has a past/non-past distinction or past/present/future distinctions. See [1], [2]. The general consensus seems to be in favor of the former, although I humbly disagree with the general consensus.) It maintains a variety of aspect distinctions (perfective vs imperfective, habitual vs continuous, nonprogressive vs progressive). See [3]. Its verbs also interact with modality, albeit slightly less strongly.

It's a very common mistake to count the combinations of tense, aspect, and modality in a language and arrive at some astronomical number of "tenses". It's an even more common mistake (for native English speakers, anyway) to think that English is special or different or strange compared to other languages. In most cases, it's not -- especially when compared with other Indo-European languages.

Secondly, and more interestingly IMHO, most languages do not have three distinct tenses. The most common cases are either to have a future/non-future distinction or a past/non-past distinction. In any case, the future tense, if it exists, is normally derived from modal or aspectual markers and is diachronically weak (which is linguist-babble meaning "future tenses forms don't stick around for very long"). See [3].

English is a perfect example: will, of course, used to refer to the agent's desire (his or her will) to do something. Only recently has it shifted to have a more temporal sense, and it still maintains some of its modal flavor. In fact, the least marked way of making the future (in the US, at least) is to use either gonna or a present progressive form: I'm having dinner with my boss tonight. I'm gonna ask him for a raise. See Comrie [1] again.

So as not to be anglo-centric, I'll give another example. Spanish has three widespread means of forming the future tense. Two of these are periphrastic and are exemplified by he de cantar 'I've gotta sing' and voy a cantar 'I'm gonna sing'. The last is the synthetic form, cantaré 'I'll sing'.

Most high school or college Spanish teachers would tell you that the "pure" future is cantaré. Actually, it's historically derived from the phrase cantar he 'I have to sing' (from Latin cantáre habeo), and is being displaced by the other two forms all across the Spanish-speaking world. I'm told, for example, that cantaré has been largely lost in in Argentina and southern Chile (see [4]).

In any case, the parent's main point still holds. It's a b?tch to deal with cross-linguistic differences in major semantic systems computationally. But good lord, it's fun to try. :)

References:

Comrie, Bernard. Tense. Cambridge, UK: Cambridge University Press, 1985.
Davidsen-Nielsen, Niels. "Has English a Future?" Acta Linguistica Hafniensia 21 (1987): 5-20.
Frawley, William.

Re:English only has two tenses. by bhiestand · 2005-09-01 19:41 · Score: 1

If only I had more mod points and you weren't already at 5.

Excuse my non-linguistic-babble, but I'm sick as hell and not feeling technical. I don't know all the languages of the planet, but, of the ones I know , english seems to have the most adjectives. I could be wrong, but I've had a hell of a time trying to translate documents from english and think of words that carry the same meaning as words like "exquisite".

Speaking of tenses, though, Tagalog is a rather interesting language in this aspect. They mainly use syllable duplication, then use words basically equivalent to "already" and "still" for the aspect.

A simple set of examples:
Pumunta si Eric sa dagat. (Eric went to the beach)
Pumupunta si Eric sa dagat. (Eric goes to the beach).
Pupunta si Eric sa dagat. (Eric will go to the beach).
Pupunta na si Eric sa dagat bago alas 12 na. (Eric will have gone to the beach by the time it is noon already).

Anyways, *I* find it interesting...

--
SWM seeks new sig for a brief fling

Uh Hmm... by red990033 · 2005-08-31 19:29 · Score: 1

I'd 1ik3 70 533 7h47 d4mn 41g0ri7hm w0rk 0n 7hiz 5hi7!

--
Do what I say, cuz I said it.
-Meatwad

Random test ... by Mostly+a+lurker · 2005-08-31 19:34 · Score: 5, Funny

I know it is fairly accurate because I have fooled my spanish speaking friends once in an IM conversation. I told them I learned spanish via hypnosis and basically just copy/pasted everything spanish into IM. The conversation went on for like 15 minutes full spanish before I told them I was using the website. They were pissing their pants.

English to German produces:

Ich weiß, dass es ziemlich genau ist, weil ich mein Spanisch getäuscht habe, Freunde einmal in einer IM Konversation zu sprechen. Ich habe sie erzählt, dass ich Spanisch über Hypnose und im Grunde nur Kopie gelernt habe/hat eingefügt alles Spanisch in IM. Die Konversation ist weitergegangen für wie 15 Minuten volles Spanisch, bevor ich sie erzählt habe, dass ich die Website benutzte. Sie pissten ihre Hose

Then, German to English:

I know that it rather exactly is, because I deceived my Spanish to speak friends once in one IN THE conversation. I told it, learned would have inserted that I Spanish over hypnosis and in the reason only copy all Spanish in IN THAT. The conversation is gone on for Spanish full like 15 minutes before I told it, that I the websites used. You pissten its pair of pants

My conclusion is that there is still a place for human translators.

Re:Random test ... by kerrle · 2005-08-31 19:39 · Score: 1

That's actually impressively correct in my opinion. I speak English and German, so I can see a bit more of how it got the results it did.
I've used the site before, and really, it is actually fairly readable most of the time - enough that I've used it to read foreign news fairly often.
Re:Random test ... by dunkelfalke · 2005-08-31 19:56 · Score: 2, Funny

in my opinion it is not impressive at all (i speak english and german myself). try translate.ru

english to german

Ich weiß, dass es ziemlich genau ist, weil ich meine spanischen sprechenden Freunde sobald in einem IM Gespräch zum Narren gehalten habe. Ich sagte ihnen, dass ich Spanisch über Hypnose lernte und grundsätzlich gerade alles Spanisch in IM kopieren/aufkleben. Das Gespräch ging seit ähnlichen 15 Minuten volles Spanisch weiter, bevor ich ihnen sagte, dass ich die Website verwendete. Sie waren pissing ihre Hosen.

the result back to english

I know that it is quite precise because I have held my Spanish speaking friends as soon as in one in the conversation to the fool. I said to them that I learned Spanish about hypnosis and basically just all Spanish in IN copy / stick. The conversation went on since similar 15 minutes of full Spanish, before I said to them that I used the website. They were pissing her trousers.

--
Conservatism: The fear that somewhere, somehow, someone you think is your inferior is being treated as your equal.
Re:Random test ... by Mostly+a+lurker · 2005-08-31 22:21 · Score: 1

My German is limited, but sufficient to recognise the ambiguities it failed to correctly resolve. As computer translations go, it was not terrible: especially the first sentence. However, much was still not only horribly incorrect but actually incomprehensible.
Until a computer translation can from the context correctly differentiate, for instance, between "Instant Messenger" and "IN THAT"; between "they" and "you"; and between "them" and "it", human translators will be indispensable for most serious purposes.
Re:Random test ... by aug24 · 2005-08-31 22:22 · Score: 1

Surely it's only fair to use Spanish, as the GP did:

I know is quite exact because I have deceived my Spaniard that speaks friends once in a conversation of IM. I said I learned him Spaniard way the hypnosis and every Spaniard basically barely copies/hit in IM. The conversation passed so that wants 15 full Spanish minutes before I said utilized them the website. They meaban its pants.

Still bollocks ;-)

Justin.

--
You're only jealous cos the little penguins are talking to me.
Re:Random test ... by Anonymous Coward · 2005-08-31 22:26 · Score: 0

Ok, try this:

1) Visit www.google.com

2) Click on "Language Tools"

3) Type "Britney's mom is very nice" in the Translate Text box.

4) Select "English to Spanish" in the combo below.

5) Press Translate and wait for the translation.

6) Now copy the translated text from the above text and paste it in the Translate text box below.

7) Select "Spanish to English" in the combo below.

HAR HAR HAR! (Mods: -1, Beavis & Butthead humour)
Re:Random test ... by idokus · 2005-08-31 22:36 · Score: 1

Considering it is automatically translated it is quite cool: it is not correct, but you do get the general idea, for the translation, though there are some errors in the first translation, and there are some errors in the second translation. So it's not perfect, but close enough for some informal translation.

The mistakes made here are less important than the mistakes made in some manual from Sony, which replaces random Dutch words with its German, rendering it unreadable for those who don't speak German. Though these mistakes are less often than in this translation.

For a bigger audience, or an extremly impatient audience I'd still get a human translator, but for private messages, it'll work just fine.
Re:Random test ... by Godwin+O'Hitler · 2005-08-31 23:11 · Score: 3, Interesting

I AM a professional human translator, and believe me, if a machine translation did even a half decent job of producing intelligible, natural text, I would use it to get a jump start and save a lot of time.

But as things stand, I'd spend more time knocking the bad translation into shape than if I translated the whole thing from scratch.

Translators are often asked to copy edit other translators' work (customers tend to call it this "proof reading", presumably to devalue it and get it done on the cheap, but it involves much more than hunting typos). That's fair enough if you want a quality check. But some smart-arse people try sending machine translations for copy editing. And you can bet they get sent straight back!

--
No, your children are not the special ones. Nor are your pets.
Re:Random test ... by erkulikondrio · 2005-08-31 23:28 · Score: 1

"Britney's mom is very nice" --> "La mama de Britney es muy agradable"
The translation is correct, but perhaps 'agradable' is not the best translation for 'nice' in this context. The problem comes at reverse translation:
"La mama de Britney es muy agradable" --> "The breast of Britney is very pleasant"
The reason for that behaviour is that Google translate words forgetting accents. And, in Spanish, the mother's diminutive "mamá" and one of the forms for breast "mama" are equal but the accent. If you push the accent in mama that's the result:
"La mamá de Britney es muy agradable"-->"The mother of Britney is very pleasant"
When we can see that Google can't translate correctly the Genitivo Sajón.

--

Let me apologize for my poor level of English...
Re:Random test ... by Anonymous Coward · 2005-09-01 01:41 · Score: 0

Sie pissten ihre Hose

Jawohl! :-)
Re:Random test ... by Pollardito · 2005-09-01 02:01 · Score: 1

i would expect a double translation to lose more fidelity than a single translation, so maybe it's not as bad going in one direction
Re:Random test ... by jcr · 2005-09-01 02:14 · Score: 1

It's interesting to see where the translator made its mistakes. I notice that the acronym "IM" was taken as the German word "im". Interesting also that it came out capitalized.

I wonder how this new algorithm would work with languages where the word order isn't that important.

-jcr

--
The only title of honor that a tyrant can grant is "Enemy of the State."
Re:Random test ... by Anonymous Coward · 2005-09-01 03:42 · Score: 0

I'm studying german now, and that's a decent loose translation. It can't catch nuances (like how "sagen" or "to say/tell" would make more sense than "erzahlen" ("to talk with") in a few cases, but that's because there's no equivalent in one language.
Re:Random test ... by ChaosDiscord · 2005-09-01 04:15 · Score: 1

The Engligh to German to English looks just like a depressingly large number of instant messages and email messages I receive. So I guess it is good enough.

--
Search 2010 Gen Con events
Re:Random test ... by qeveren · 2005-09-01 05:04 · Score: 1

Seemed to do almost okay with the Litany Against Fear...
I should not fear. The fear is the mind-murderous one. The fear is the small-death that brings the obliteración total. I will face my fear. I will permit to pass over give me and by me. And when it has gone past, I will rotate the interior eye to see its path. Where the fear has gone there will not be anything. Only I will remain.
(That was English > Spanish > English)

--
Don't just stand there, get that other dog!
Re:Random test ... by smithmc · 2005-09-01 06:08 · Score: 1

My conclusion is that there is still a place for human translators.
Maybe Spanish is easier to translate than German?

--
Downmodding is the refuge of the weak. Don't downmod, make a better argument!
Re:Random test ... by CommandoB · 2005-09-01 06:30 · Score: 1

That's actually not too bad. I think much more was lost in the German -> English translation than English -> German.

What's particularly fun is to take sentences like these, and just go back and forth English -> German -> English -> German -> English ... to see if it ever converges on a final translation.

Often (in the case of Babelfish a while back) it does not, and instead just keeps inserting instances of the passive voice wherever it gets confused, until eventually, the English sentences are just filled with "to be to be to be"

--
Not that I post on slashdot or anything.
Re:Random test ... by xenocide2 · 2005-09-01 06:49 · Score: 1

That is a pretty shitty german to english translation; in particular it doesn't seem to recognize the difference between sie (plural third person) and sie formal. Usually one must derive this from the 'context,' which I'm told is "German for you're screwed."

The german doesn't seem half bad, through the lens of a guy who took four years of it in high school some time ago, given that the english source isn't grammatically correct.

--
I Browse at +4 Flamebait
Open Source Sysadmin
Re:Random test ... by Anonymous Coward · 2005-09-01 07:59 · Score: 0

The English-German is fairly decent. The German-English isn't - likely because everyone with a decent education in Germany already can speak and read conversational English as a requirement for graduation....

Necessity being the mother of invention.

No, you didn't by Spy+Hunter · 2005-08-31 19:44 · Score: 1

I played around with the Google translator for a while

No, you didn't. At least, not the one he's talking about. It only translates arabic and chinese to english so far AFAIK, and it's not available to the public. The one on their website is not theirs, it's licensed from Systran, same as every other internet translator. The new research one looks very impressive: in a comparison to old automatic translators, a sentence previously translated "alpine white new presence tape registered for coffee confirms laden" was correctly rendered as "the white house confirmed the existence of a new bin laden tape."

--
main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}

And you say you're "Halfway fluent" ? by Ray+Alloc · 2005-08-31 19:54 · Score: 0

Yeah sure, you can't even understand immediately the "aitakatta" a shivering young japanese lady tells you, whereas it is probably one of the most frequent sentences you hear on dating Japanese...

Believe me, young man, the other "half way" to fluency is going to be extremely long, at that rate.

Same test results via Japanese and Babelfish by Ogemaniac · 2005-08-31 20:03 · Score: 1

English to Japanese and back:

I I a certain thing which one time my Spanish friend of IM conversation is deceived being, have known that that considerably is accurate. I Spanish called to IM to Spanish which is learned by I in those due to fair copy/pasted in hypnosis and the basis entirely. As for conversation the Spanish of 15 parts way I before I said to those, continued the fact that the web sight is used sufficiently because. They had upset their pants.

The Japanese version was also utterly incomprehensible. I can't post it here because of character issues.

About the only part the thing seems to have gotten close enough to understand deals with underpants.

Re:Same test results via Japanese and Babelfish by maxwell+demon · 2005-08-31 21:08 · Score: 1, Funny

Same through Lost in Translation:

I know that it is to something exact, due to half I a time the friends
the Spanish language in an argument in betrog. I they has said to that
Hypnose and copy/pasted Spanish ends in the right of the general in
some Spanish of instructd of the way inside inside. The argument is
stopped continued during the 15 Spaniards of minuteren completely,
before this one that you said that that I used the place of the Web
its trousers has the orinato.

And if you include Chinese, Japanese and Korean, you get:

The mine INSIDE converged my Spanish friend, preg-reprehending, is
certain that the material, knew, that it is considerable demands.
Academic the Spanish flame too much INSIDE with in buttock due to
copy/pasted also in the circumstance and the Spanish basic types of
the Hypnose is the complete company. That the Spanish ways if the
preoccupation of the 15 opposites some continues in me with the
buttock emit very famous like this, the place of the Web the situation
sufficient, ordered excess to be used because. They had given return
to his trousers.

--
The Tao of math: The numbers you can count are not the real numbers.
Re:Same test results via Japanese and Babelfish by arkanes · 2005-09-01 03:12 · Score: 1

What the hell kind of translation tool has "preg-reprehending" in it's dictionary?

The development -- which has a patent pending by davro · 2005-08-31 20:13 · Score: 1

Algorithm for learning languages its not learning/feeling the free/open source language/movement.

Yes, it is going to take a long time by Ogemaniac · 2005-08-31 20:18 · Score: 1

I understood it right away, but I brought it up because of the mental gymnastics you have to through to turn it into an English thought (which we talked about after I responded properly enough in Japanese).

If you want to quibble, I'd say I was closer to 1/3 of the way to fluency.

Re:Yes, it is going to take a long time by Ray+Alloc · 2005-09-01 10:30 · Score: 0

I'm sorry for you, but you're at most 1/30 way to fluency. Maybe even less than that. You're merely a beginner, and you still think in english, which is very bad if you aim at fluency.

No the didn't-Meat-Ball Computers. by Anonymous Coward · 2005-08-31 20:19 · Score: 0

"I barely could with my enormous meat-computer and a whole lot of knowledge of the language."

That's thinking with your gland!

Seriously maybe we could use this research on computer languages.

How about Dolphinese? by Tatarize · 2005-08-31 20:28 · Score: 2, Insightful

Klingon has simple grammar.

How about Dolphinese? Research shows that they seem to be able to scout and transfer information from one individual to his/her pod. If there's some grammar it would be pretty good nut to crack.

--

It is no longer uncommon to be uncommon.

Japanese? by qurk · 2005-08-31 20:34 · Score: 1

Japanese is markedly different from Chinese, other than some of the same letters :) Generally google or babelfish.altavista or anything has a hard time translating Japanese to English. I mean you get the idea, but the trolls on Slashdot would cut you to pieces in their meaningless, retarded, airheaded corrections if you spoke with 1/10 of the grammatical errors that the current Japanese>English engines put out. ANY translation is good though, especially like this algorithm in the article, as most Americans think that 26 letters is a lot. Chinese has how many thousands?

Re:Japanese? by sagenumen · 2005-08-31 23:04 · Score: 1

The thing is that the "letters" of which you speak are not actually letters. Each character in Chinese represents a word (so to speak) with combinations of characters that were created to have verbal representations of things as they were discovered or invented (e.g. "electric" + "brain" = "computer").

There *is* a sort of alphabet that is used to teach young children how to pronounce the characters, but it is not used once the characters are learned.

The last figure I heard was that the average college student in China knows ~5000 characters upon graduation. How accurate that is, I don't know.
Re:Japanese? by windowpain · 2005-09-01 00:52 · Score: 1

"but the trolls on Slashdot would cut you to pieces in their meaningless, retarded, airheaded corrections"

I've seen many corrections to poorly written articles and comments on Slashdot but I must have missed the "meaningless, retarded, [and] airheaded" corrections.

Grammar and spelling matter.

--
Insert witty sig here.

to their credit... by Garridan · 2005-08-31 20:38 · Score: 1

They aren't claiming to have learned the language. They've claimed that they can statistically analyze the grammar to the point that they can produce further text which could make sense in some context. There's a huge difference.

Does it work on Lance Armstrong? by Anonymous Coward · 2005-08-31 20:46 · Score: 0

someone should have told Lance: http://www.livestrong.org/ about this before he came up with "I Live Strong"

How did Chomsky get such influence? by CemeteryWall · 2005-08-31 20:46 · Score: 1

Having done a bit of phrase structured syntax analysis back in the 60s and, learning from a paperback adulation of Chomsky what tranformational grammars were, I thought "Is that all it is?"

From time to time I get to speak to people in the academic linguistics racket: professors, students etc. Unwittingly they impress on me the uselessness of linguistics over the past 30 years or so. But how did it get so influential? This would be an interesting topic for research.

Instead of hundreds universities world-wide lining up students to be force-fed Chomsky grammars why not let them gain some real research skills and find out how this transformational pixie dust grew into the acaemic industry it is today.

I started in computing, in 1967, I remember the promise "English to Chinese by next year".

Phrasinator by Hanno · 2005-08-31 20:47 · Score: 1

I wrote a program that made a statistical analysis of the party platform for the upcoming German national elections. The Phrasinator is able to write new nonsense texts based on the original material.

It's more satire than science, making fun of political blabla.

The idea is more than 20 years old and based on an old article by Bryan Hayes, "A progress report on the fine art of turning literature into drivel".

This "new method" by Edelman and his colleagues sounds rather similar. I'm really curious what they did to improve it.

--

------------------
You may like my a cappella music

Im using it now!!! by Gentlewhisper · 2005-08-31 21:19 · Score: 0

Wow, its really great! Their still sum issues wif it though, like teh times when it corrects moi based on spelling as learnt from teh comments on /., but generally I think it will be a necceity in the future!

It really begs the question, why isn't every one using it now?

--
Online backup with Mozy, sounds like Ozzie, but more!

Hormones are not proteins? by drgonzo59 · 2005-08-31 21:51 · Score: 1

DNA will be expressed into proteins such as hormones. Pump yourself with testosterone and other such chemicals that control your brain chemistry and you'll want to reproduce with the nearest tree.

You might or might not want be able to resist. If the chemicals control a specific organ, it might shut down or go into overdrive without you 'wanting' it, just by having a mutation in the DNA that will upgregulate that particular chemical.

But ultimately you are right, the DNA can only create predispositions (to cancer, to heart disease etc...). The debate is then where does one end and the other begin? Can/should I kill because I am predisposed to violent behavior and thus not be found guilty because of it?

protein sequences : function/structure prediction by digitalderbs · 2005-08-31 21:53 · Score: 1

The article claims that the program can correlate protein sequence to function. I don't doubt that it can find small regions of contiguous amino-acid sequences that are common between a few proteins of the same function, but I highly doubt that it can predict function from from a protein sequence. Predicting a protein structure is already a very difficult problem for computational biophysicists , which is a prerequisite for studying function. For example, the CASP4 competition compares various structure predicition programs from an amino-acid sequence. Understanding function from a structure is even more difficult because it involves identifying the active site or functional regions as well as protein dynamics.

Comparative sequence searching, known as homology alignment, is not fool proof either. See the PSI-BLAST tool for homology alignments. This is a very difficult problem for biophysicists because of insertion mutations, functional mutations, and many other reasons. Two sequences with low homology may or may not have similar structures (folds) and/or function. Likewise, homologous sequences may have very different functions.

Protein structure prediction, which precedes function prediction, is already quite a difficult problem for biophysicists to tackle.

Wow! by ChrisZermatt · 2005-08-31 21:54 · Score: 0

Would you guys look at the quality of the spelling in these comments!

Thought I was in /. there for a second...

"Backwards??" by Anonymous Coward · 2005-08-31 22:16 · Score: 0

Many languages are read backwards

What languages are read backwards? I can't think of a single one. (I know of many that are read from right-to-left, and even some that are read from top-to-bottom, but those aren't "backwards".)

Re:"Backwards??" by Anonymous Coward · 2005-09-01 02:55 · Score: 0

-1, Cultural pedant.

I JUST got it. by Poromenos1 · 2005-08-31 22:31 · Score: 1

Seriously, I have been seeing this example for ages and I just realised that "flies" in the second sentence is a noun :( I was all like, "Fruit doesn't fly like a banana! Bananas don't fly!" What chance does a computer stand? :P

--
Send email from the afterlife! Write your e-will at Dead Man's Switch.

Re:I JUST got it. by TwistedSquare · 2005-09-01 01:14 · Score: 1

I agree. It took me ages to get as well, even when people were saying it aloud to me I never got the point. I think the problem is that we tolerate mistakes in humans understanding "sorry?" "pardon?" "No I didn't mean it like that" but we seem to expect machines to be perfect, even in nigh on impossible situations. One of my favourite quotes is from somebody questioning Babbage about his machine; "Tell me, if you give the machine the wrong inputs, does it still give the correct answer?". Ever since the birth of computing people have expected machines to be perfect. And as we all know, there is no such magic to them.
Re:I JUST got it. by Anonymous Coward · 2005-09-01 03:13 · Score: 0

Dude, researchers totally need to hire you to go up against their AI in a Turing test. They'd come out much better.

Mad LIbs by LEPP · 2005-08-31 22:54 · Score: 1

This sounds suspiciously like Mad libs. For those of you who don't remember Mad Libs, check this out.

Lepp

More Press Release Hooey by gvc · 2005-08-31 23:18 · Score: 1

I wish Slashdot would not parrot press releases that contain no information.

Grammatical inference is a hard problem. A couple of researchers, who appear to have legitimate credentials, have published a paper (that's what they do for a living) and acquired a patent (see any number of threads for how much novelty that may or may not require).

I would be much more impressed if there were at least some whiff of a scientific claim, like "after processing 1GB of English text for 10000 hours, the prototype implementation generated text that 51.7% of observers could not distinguish from a transcript of English spoken by a 6-year-old."

It is impossible to determine what, if any, contribution to natural language processing is included in this paper. Therefore the press release is not news.

If somebody wants to *read* the paper and tell us what it is about, I'll be all ears.

http://www.mt-archive.info/ by Anonymous Coward · 2005-08-31 23:22 · Score: 0

Automatic generation of language pairs has been done since the early 70's.

DNA Analysys by da5idnetlimit.com · 2005-08-31 23:28 · Score: 2, Interesting

Anyone else thinking about using the tech to learn something about "the grammar of DNA"?

If they can use it for analysing proteine sequences, maybe they can tackle "the grammar of Life" and kickstart the whole Bioengeenering sector into a new life...

OTOH, the integrist christians will probably denounce this as an evil thing...

--
It takes 40+ muscles to frown, but only four to extend your arm and bitchslap the motherfucker

Oversimplification will put you back on trees by Anonymous Coward · 2005-08-31 23:48 · Score: 0

Just how much more simple and blank will your language be after you've learnt to speak so that computer transalation programs understand you?
Imagination, gibberish, idioms and emotion-driven deviations of standard grammar make a language 'alive'. It was never for sharing pure information only. How do you render several words for "I" from Japanse into English? How do you hear/how do you _feel_ about Southern American English (I know it's old but.. read Hucklyberry Finn's Adventures)? How do you translate those into foreign languages? Are you sure you don't loose a thing when words with distorted spelling are translated into some foreign laguage?
Mother tongue is about subtlety and emotions.
Enter the human translation. ;-)

GIGO by hummassa · 2005-08-31 23:51 · Score: 1

(garbage in, garbage out)
have you noticed that the original is not in "correct" English?
I went to the site and tried with:

I know it's fairly accurate because I have fooled my Spanish-speaking friends once in an IM conversation. I told them that I learned Spanish via hypnosis and basically just copied and pasted everything Spanish into IM. The conversation went on fully in Spanish for approximately 15 minutes before I told them I was using the website. They were pissing in their pants.

and I had:

Sé es bastante exacto porque he engañado a mis amigos Hispanohablantes una vez en una conversación de IM. Dije ellos que aprendí español vía el hipnosis y todo español básicamente apenas copié y pegué en IM. La conversación pasó completamente en español para aproximadamente 15 minutos antes yo los dije utilizaba el sitio web. Ellos meaban en sus pantalones.

which is nice translation (some pronouns are missing, ok), and putting it back to English brought me:

I know is quite exact because I have deceived my Spanish-speaking friends once in a conversation of IM. I said they that I learned Spaniard way the hypnosis and every Spaniard basically barely I copied and I hit in IM. The conversation passed completely in spanish for approximately 15 minutes before I said utilized them the website. They meaban in its pants.

(what's up with it not knowing that "mear" is "to piss" and "Spaniard"???)

--
It's better to be the foot on the boot than the face on the pavement. ~~ tkx Kadin2048

Re:GIGO by Al+Dimond · 2005-09-01 00:29 · Score: 1

The "Spaniard" part is because the word "español" when used as a noun can refer to anything that's Spanish.

The "mear" thing is wierd; also I would think they woulda wanted to conjugate "mear" in the preterite rather than the imperfect (although knowing that really requires knowledge of the author's intention... perhaps a translator that asks questions to the user to find out stuff like that would be useful...). But I haven't spoken or studied much Spanish in the last four years or so, so I may be missing something.

The other thing is the "sus" at the end automatically going to "its" rather than trying to get some context on it. Obviously it reads some context to figure out the subject of English verbs, so in a simple sentence like that it may have been able to figure out that "sus" was "de ellos". Although that, also, does require knowledge of the intentions of the Spanish author.
Re:GIGO by aug24 · 2005-09-01 00:51 · Score: 1

Good point, not too bad in the end. A few too many Spaniards and not enough piss is hardly a problem ;-)

--
You're only jealous cos the little penguins are talking to me.
Re:GIGO by zakharin · 2005-09-01 06:23 · Score: 0

So does the word "Spanish." So why not use it?
Re:GIGO by rca66 · 2005-09-01 06:42 · Score: 1

(garbage in, garbage out) have you noticed that the original is not in "correct" English?

Very good point. I tried it out with Personal Translator (disclaimer: product from the german company Linguatec, which pays my salary...) into German:

Ich weiß, dass es ziemlich genau ist, weil ich meine spanischsprechenden Freunde hereingelegt habe, einmal in einem IM Gespräch. Ich sagte ihnen, dass ich Spanisch über Hypnose lernte und grundsätzlich gerade alles spanisch in IM kopierte und einklebte. Das Gespräch ging vollständig etwa 15 Minuten auf Spanisch weiter, bevor ich ihnen sagte, dass ich die Website verwendete. Sie pissten in ihrer Unterhose.

As a native German speaker I can assure you: just by this text I could hardly tell, it wasn't written by a German. It's not really 100% perfect, but nearly.

Garbage in, ??? out? by Junior+J.+Junior+III · 2005-08-31 23:58 · Score: 1

So what happens if you feed this learning algorithm "bad" grammar? Considering the following scenarios:

You feed it a bunch of grammatically correct "Standard" English, and then make a mistake or two on a few sentences somewhere in the mix. Can the algorithm discern what is incorrect and what is an uncommon, but valid, construction?
You feed it a bunch of grammatically correct Olde or Middle English.
You feed it English from a nonstandard dialect such as Welsh or Black American English.
You feed it English that's been filtered through the shizzolator.

If the learning algorithm can handle all of these scenarios, I'll be impressed.

--
You see? You see? Your stupid minds! Stupid! Stupid!

Crashing their program by coinreturn · 2005-09-01 00:16 · Score: 1

It worked great except on teenage IM and SMS abbreviations, then the scientists got the blue screen of death. It seems teenage speak is more indecipherable than Klingon.

Of course in Japanese, there is nothing by Ogemaniac · 2005-09-01 00:21 · Score: 1

equivalent to "the". So if a computer sees "watashi wa hon wo katta" it has to figure out whether it means "I bought a book" or "I bought the book" based on whether the particular book that was bought had already been established to the listener, or whether there was only one book in the world. Actually, native Japanese would usually drop the "watashi wa" (ie, I) and therefore the computer would have to guess the subject from context, too.

Re:Of course in Japanese, there is nothing by plumby · 2005-09-01 02:22 · Score: 1

The thing is, my understanding of what it's doing suggests that it probably doesn't care at all whether it's "the book" or "a book" or even that a book is involved at all.

It cares that the word "wa" often appears after the word "watashi" at the start of sentences (presumably - I don't speak any Japanese), so when generating sentences, it will often put these together.

Apply it to SETI data... by Circlotron · 2005-09-01 00:25 · Score: 1

...and see if it finds anything quicker than what they are using right now.

Long Chinko(joke) by oddmake · 2005-09-01 00:47 · Score: 1

>As for my chinko, that's a long
Oh, you have a long chinko,don't you?
Envy of Japanese man would be everywhere!

Grammar on teh internets! by Aphoric · 2005-09-01 00:54 · Score: 1

and what part of speech is "develloped?" I would hazard a guess that most people do not use proper grammar in daily speech, and even less do on the internet. This is also true for foreign languages. In my experience, if you learn proper grammar, there will still be difficulty communicating with native speakers until you learn all the improper uses and bad habits that are common among native speakers.

--
People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf.

Nonsense by jandersen · 2005-09-01 00:57 · Score: 1

This has to be a hoax - or written by somebody who hasn't had a lot of exposure to languages. Try to dip into a few books about the subject - langauges and grammars are much weirder than what you'd think. Translating from Chinese to English is fairly straightforward in that context, and even then there are many examples of things that don't translate easily.

But have a look at eg. a language called Piraha, here's a link to what Daniel L. Everett has to say: (http://lings.ln.man.ac.uk/Info/staff/DE/DEHome.ht ml)

Or read something about Papuan languages (spoken in Papua New Guinea) - there are some that are seriously different.

The real test of this is compression by Baldrson · 2005-09-01 01:04 · Score: 1

People are arguing about the significance of this grammar generator but they don't have a metric to compare it to anything else.

That metric is compression.

If there were funding for the C-Prize these guys might have walked away with a large chunk of it but then they might not have been able to acquire the monopoly rights they're pursuing via the patent application. The C-Prize description follows:

Since all technology prize awards are geared toward solving crucial problems, the most crucial technology prize award of them all would be one that solves the rest of them:

The C-Prize -- A prize that solves the artificial intelligence problem.

The C-Prize award criterion is as follows:

Let anyone submit a program that produces, with no inputs, one of the major natural language corpora as output.

S = size of uncompressed corpus
P = size of program outputting the uncompressed corpus
R = S/P (the compression ratio).

Award monies in a manner similar to the M-Prize:

Previous record ratio: R0
New record ratio: R1=R0+X
Fund contains: $Z at noon GMT on day of new record
Winner receives: $Z * (X/(R0+X))

Compression program and decompression program are made open source.

ExplanationA very severe meta-problem with artificial intelligence is the question of how one can define the quality of an artificial intelligence.

Fortunately there is an objective technique for ranking the quality of artificial intelligence:

Kolmogorov Complexity

Kolmogorov Complexity is a mathematically precise formulation of Ockham's Razor, which basically just says "Don't over-simplify or over-complicate things." More formally, the Kolmogorov Complexity of a given bit string is the minimum size of a Turing machine program required to output, with no inputs, the given bit string.

Any set of programs which purport to be the standards of artificial intelligence can be compared by simply comparing their Artificial Intelligence Quality. Their AIQs can be precisely measured as follows:

Take an arbitrarily large corpus of writings sampled from the world wide web. This corpus will establish the equivalent of an IQ test. Give the AIs the task of compressing this corpus into the smallest representation. This representation must be a program that, taking no outside inputs, produces the exact sample it compressed. The AIQ of an AI is simply the ratio of the size of the uncompressed writings to the size of the program that, when executed, produces the uncompressed writings.

In other words, the AIQ is the compression ratio achieved by the AI on the AIQ test.

The reason this works as an AI quality test is that compression requires predictive modeling. If you can predict what someone is going to say, you have modeled their mental processes and by inference have a superset of their mental faculties.

Mechanics The C-Prize is to be modeled after the Methusela Mouse Prize or M-Prize where people make pledges of money to the prize fund. If you would like to help with the set up and/or administration of this prize award similar to the M-Prize let me know by email.

--
Seastead this.

In Hungarian it's the same... by Danuvius · 2005-09-01 01:30 · Score: 1

For example, I was out with a Japanese woman the other night, and she said "aitakatta". Literally translated, this means "wanted to meet". Translated into native English, it means "I really wanted to see you tonight".

I guess that would be an idiom? Coincidentally Hungarian has the same one: "Találkozni akartam." (To-meet I-wanted. [reversing the word order is no problem, but would emphasizes the "I-wanted" as opposed to "I-did-not-want" part--whereas the current word order emphasizes the "to-meet" as opposed to "to-jump-into-bed" part]). So if you translated it from Japanese to Hungarian literally, it would still make sense quite the same way.

I guess a big part of successfully translating and interpreting languages has to do with knowing the idioms in both the source and the target languages.

Given the considerable semantic ambiguities involved, I think it may take quite some time before we have capable machine translation software... not to mention interpretation, which is more difficult still. (Think incomplete words, redundant 'like's, grammatically non-question intonation based questions, etc.)

--
Akarsz Magyar Gentoo fórumot? Akkor

Not that new by Martin+Spamer · 2005-09-01 01:30 · Score: 1

Apparently nothing NEW here, all they've described is a straight forward Abstract syntax tree. These where taught in undergrade CS courses 15years ago.

IAA Linguist... by INT+21h · 2005-09-01 01:41 · Score: 1

And my worry is that these hundreds (linguistics-departments aren't exactly the golden goose) of students and academics spend their time wanking over abstractions instead of finding out what is in fact possible and not possible in language, by studying specimens. Or in English: go forth and study unstudied languages to see what features they have and what features they don't have, in order to later build abstractions that aren't blown out of the water each time an actual field-linguist comes back from the bush with a "... but this language does it this way and your framework can't account for it".

The fact-collectors haven't finished their part of the job, and the abstraction-lovers are actively hindering the fact-collectors from doing their job, as (per Chomsky) all languages are equally complex, therefore all languages can be studied by only studying English...

Lexicalizing by tepples · 2005-09-01 01:42 · Score: 1

A kid will always hear "Are you hungry" but never "Am you hungry" or "Are he hungry".

A child hears "Aryoo hungry" or "Izzy hungry" and lexicalizes "Aryoo" and "Izzy". Later stages of language acquisition separate out the sound patterns conventionally denoted as "are", "you", "is", and "he", in "Are you hungry" and "Is 'e hungry".

Can it parse pre-computer h4x0r? by Ranger · 2005-09-01 01:43 · Score: 1

"MR ducks."

"MR not."

"MR2."

"MR not."

"CM wangs."

"LIBMR ducks."

--
"You'll get nothing, and you'll like it!"

Paragraph of life by roberto0 · 2005-09-01 01:44 · Score: 1

Unfortunately, we've been hacking away at this problem for quite some time now. (Leanring the Grammar of DNA). We're actually quite good at understanding the grammar. It's decoding the meaning of the individual elements that seems to be the hard part.

Biologists have been, for about 10 years now, very very good at decoding the grammar of raw DNA sequences. The crux of the problem is figuring out how exactly those gene products function in the body, what elements of their sequence/structure cause them to behave in such a way, and how much does each element contribute to the overall picture.

It's one thing to be able to put together a well-formed sentence, but another thing entirely to make that sentence communicate something worthwhile, as part of a greater whole.

Indeed, biologists are now trying to put together the "paragraphs" and "chapters" of life, rather than the sentences.

--
Those who can, do. Those who can't, simulate.

two-way street by mbius · 2005-09-01 01:50 · Score: 3, Funny

It works the other way too:

"I'm leaving you." What? "I'm leaving you, Alice." I don't understand what you're trying to do. "I've met someone." What do you mean 'met'? "Look...just read the pamphlet." I don't have the pamphlet. "I have to go." Which way do you want to go? "Uh...west." You would need a machete to head further west.

I can't tell you how many of my break-ups have ended with needing a machete.

--
you can have my violent video games when you pry them from my cold, dead hands.
Prime UID Club

I do not think that word means what you think... by bubbaD · 2005-09-01 01:55 · Score: 1

it means! -"Chomskyan"
Seriously, though, I'm not terribly familiar with linguistics, although I've enjoyed Steven Pinker's books, and apparently he is a "Chomskyan." It seems to me like a controversy for which evidence from nuerologists and nuerobiologists is very helpful. Haven't read Pinker?--
http://www.amazon.com/exec/obidos/tg/detail/-/0670 031518/qid=1125582276/sr=8-2/ref=pd_bbs_2/102-0573 459-7462505?v=glance&s=books&n=507846

obligitory penis joke by Anonymous Coward · 2005-09-01 02:02 · Score: 0

I have access to my PNAS but I've been warned before for playing with at work while looking at websites.

New algorithm understands slashdot comments! by mrogers · 2005-09-01 02:02 · Score: 1

I trained my gibberish generator, er, statistical linguistic analysis tool Sprong on the text of this discussion and here's what it came out with:

on http www computer and the speaking and on it human into some or examplete altern as so it are of a going is not a google of a finding on in learning as and pare to so the learning inten it computern learning is pare alter relation in the formation http www east are a simplete the first score senter and produce into shold the what human does the alter and of language language what have i was score net the cal to alter relation in the which intern a probable cal algorite the programmar from language to the the the which is it the programmar do with and as the to beneath you was befor a mean does structive befor altern as score a from language working the pare alter the working in the a from language stand a man the for from a

In the natural language processing business they call this "the same level of understanding as a two-year-old child". ;-)

Re:New algorithm understands slashdot comments! by TapeCutter · 2005-09-01 02:31 · Score: 2, Funny

In the natural language processing business they call this "the same level of understanding as a two-year-old child".

Can you teach it to take a breath?

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
Re:New algorithm understands slashdot comments! by mrogers · 2005-09-01 02:46 · Score: 1

if (drand48() < 0.1) putchar (',');
Re:New algorithm understands slashdot comments! by 2short · 2005-09-01 11:03 · Score: 1

"In the natural language processing business they call this 'the same level of understanding as a two-year-old child'"

Not if they know what they are talking about. That would be very impressive.

At my daughters 2 year old checkup, she was feeling shy. The doctor wanted her to say something, so he could confirm she was capable of putting two words together meaningfully. After some coaxing she said, "Dad, I don't want to be at the doctor! I want to go home. This is poopy!"

It is worrisome if a child cannot combine multiple words together in meaningful combinations on their second birthday. Your gibberish generator does not appear to be anywhere close. Heck, my 11 month old can understand and answer questions like "Do you want to eat more or go nigh-nigh?"
Re:New algorithm understands slashdot comments! by mrogers · 2005-09-01 23:04 · Score: 1

I'm not claiming that my gibberish generator has any level of competence - in fact it doesn't consider word order at all - I was just poking fun at some of the claims made by a company I used to work for.

Only needing first and last letters. by SeanDuggan · 2005-09-01 02:03 · Score: 1

Och... wish I had the time to find the proper citation, but that little tidbit has been more or less disproven. Basically, the article often bandied about scrambles just the right letters to keep it still readable.

The qsoietun rinmeas, wulod stmhnieg lkie tihs slitl be rdlaebae?

--
This sig has absolutely no significance and serves only to take up screen space and waste the time of the reader.

Re:Only needing first and last letters. by arkanes · 2005-09-01 03:02 · Score: 1

Yes, because humans are amazingly capable of inferring meaning from contextual clues. In fact, it's how pretty much all of our senses work.
Re:Only needing first and last letters. by Anonymous Coward · 2005-09-01 09:12 · Score: 0

Yes it is quite readable still
Re:Only needing first and last letters. by coopex · 2005-09-02 02:18 · Score: 1

You misspelled stmohnieg.

--
The road to hell is paved with good intentions.

Hip Hop Interpretation by IceSabre · 2005-09-01 02:14 · Score: 1

I hooked up a collection of hip hop CDs to the program to try to figure out what the heck they are trying to say and left it running overnight. I came back in the morning to find my computer with gold chains hanging off the cd drive, the case cover slipped down so I could see the edge of the hard drive and when it booted it called me "beyatch".

Grammar eh by ChocoBean · 2005-09-01 03:33 · Score: 1

sure this sounds like a good thing for recognizing patterns and heck i'll even give it credit for being a possible new Grammar Checker... But generating new, useful and meaningful content? I think not. For one thing, anything that it can learn from past text is not wholly new, and the material generated will still need *real* human beings to make sense of it, interpret the implications of the data, and decide what to do with it. How is that different from a million monkeys typing on a million type writers? You still need someone to read all those potential texts of Hamlet before you find one that's useful. And as to meaningful, all I can say is that colorless green ideas sleep furiously

Somehow I doubt that this is a panacea... by Leadhyena · 2005-09-01 03:43 · Score: 1

Issues of this translation dictionary finder in order:

How does this algorithm handle exceptions to the normal word patterns from sentences that don't exhibit the pattern? For example, if not for the word receive one wouldn't know that there are exceptions to the i before e rule.
This will create a translation based upon a supposedly authoritarian source. Many times in language translation there is an argument as to how things are translated. How are multiple authorities handled?
Along with the other posts I read I doubt the efficacy of the algorithm in both time and space measurements. Unless there are some serious restrictions, I doubt the problem is even tractable.
Finally, if this code were that good it could be used to crack any code that isn't a one-time pad system from enough consecutive examples of ciphertext to plaintext. For example, it could be used to find the private key of a 2-key system by encrypting everything you could using a public key and finding the reverse translation once the dictionary has been deciphered.

There are not enough published details to make a judgement on this one way or the other. It's a shame the algorithm is patented; otherwise peer review would take place and possibly find some holes or some other interesting uses of this seemingly black-box algorithm.

Just an FYI about child language learning by Acy+James+Stapp · 2005-09-01 03:52 · Score: 1

Children learn from hearing correct utterances. Whether their utterances are corrected or not makes essentially no difference in their learning.

--
-- Too lazy to get a lower UID.

The problem with Chomsky isn't Chomsky by Anonymous Coward · 2005-09-01 04:08 · Score: 0

When and whether modern views of syntax are going to advance has little to do with Chomsky being alive or dead. It's not the person Chomsky that has much relevance, here, it's the Chomskyan paradigm.
Typically in science, any old successful paradigm will die hard, any new paradigm has a hard time becoming mainstream. That's a general fact, and would not be any different, if Chomsky had died 20 years ago.
Chomsky did indeed found the new paradigm, but he really can't be blamed in person for stagnation in linguistics. Don't expect a scientist (any more than any other human being) to radically change their own views in old age.
BTW, if you ask me, just why Chomsky was so influential - that may or may not have much to do with his theory per se. More important, I think, is that the question "Hey, let's try to find stuff that's common to all languages" triggered a lot of (novel and exciting) research.

This only makes sense if... by Nom+du+Keyboard · 2005-09-01 04:10 · Score: 1

This only makes sense if you write -- or scan -- good grammar to start with.

Otherwise, GIGO.

--
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."

Can it be used on Etruscan? by skintigh2 · 2005-09-01 04:16 · Score: 1

There are hundreds of thousands of Etruscan artifacts with writing on them that pre-date the greeks and romans, but nobody has ever been able to translate them. They basically used (invented?) the modern alphabet, but used no spacing and they may have written right to left. I wonder if this software could work on it...

Re:Can it be used on Etruscan? by cnerd2025 · 2005-09-01 05:28 · Score: 1

Actually, the Etruscans did not invent the modern alphabet. If we were being technical, the Phoenicians invented the modern alphabet, gave it to Greece, and then Alexander spread it throughout the world. Of course, the Etruscans were one of the civilizations prior to Rome, but were by no means the only. The Latins inhabited the Italian Peninsula, and the Etruscans, who had migrated presumably from Greece, formed their Northern Border. The last three Kings of Rome in the Monarchy were actually Etruscan. The last, Tarquinius Superbus, was overthrown because of his harsh despotism, and the Republic was founded (509 BC).

Interesting... by cr0sh · 2005-09-01 04:40 · Score: 1

Where do you live, where are you from, or how old are you?

I am serious here. It is pretty conclusive that our brains are composed of a system of neural nets which learn patterns, and these neural nets likely are connected to one another in such a manner that patterns beget patterns. Jeff Hawkins in his book "On Intelligence" describes his theory of the structure of the cerebral cortex (he doesn't discount the other portions of the brain, but in that work he focuses on the portion that controls high-level thought and reasoning) as being a hierarchical structure, which has built in feedback loops among its own parts, as well as back to the sensory and other parts - that is, we learn a pattern, and during that learning we "play it back" at the same time, so that in the end, that is all we are doing - playing back patterns in similar manner to see if it matches with the pattern we already know.

Imagine you know how to hit a baseball thrown with a regular pitch. A baseball is round, it moves through the air in a certain way - or so you think. Now imagine you are thrown a curve ball or something - a pitch which causes the ball to move differently from the way you learned and know how to hit. You swing, likely just like you learned before, but the pattern doesn't match, and you miss. But seeing the motion of the ball and how the bat missed, sets up new patterns connected to the original that modify the original for the new pattern of "curve ball pitch". If you get thrown enough curve balls, you will likely at some point have a pattern that does connect with the ball, and as you refine it, you now have a pattern subset of the original pitch, overlayed with the original pattern.

Eventually, over a lifetime, the number of patterns you have and how you "play them back" to fit your mind-model of the world to interpret it becomes astounding. It all starts out simply as "flailing" as a baby (try throwing a wad of tissue at a baby to see them react - do it over and over and you will see the pattern matching and buildup occur. Then change what you throw and watch the chaos as the pattern no longer fits, but continue and watch it change to fit, then switch back to the wad of tissue paper - both will continue to work, the overlay and hierarchical linkage is complete), and as time continues, we build up tons of patterns...

I recently experienced a mild form of "synesthesia" (sp?) - where I looked at the color purple and thought "grape flavor" - that is, I could "taste" the "grape soda" flavor. The pattern between the color and the flavor of grape soda was so strong that the sight of one triggered the feeling and flavor of the other. The feeling of "deja vu" is the same thing: patterns and partial patterns play off one another and trigger each (and the feelings associated - other inputs, you see), causing things to "seem similar" (well, in a way they are!). Another similar pattern playback response: smells triggering memories and feelings. Just about everybody has had this experience in one manner or another...

So, your case is interesting in that for most people in the United States (particularly those in the western half of the country) recognize the term "fruit flies" as a noun - a type of insect which has caused massive infestation in fruit growing areas. It was all over the news in the 1980's - at one time it seemed like every broadcast had some kind of reference to fruit flies in it. A lot of genetic engineering advances came from experimentation with fruit flies to try to figure out how to eradicate and/or control them. They caused a lot of damage economically and so they were a "news-worthy" item. For people my age (I am 32), we were innundated as kids with the term "fruit flies" and what they are and what they meant (especially if you lived at the time, as I did then, in California). The pattern was quickly set up, and now these people can't help but think of the term "fruit flies" as a noun.

If you didn't live in the United States at the time, or you are younger than 20-25 years old - this pattern

--
Reason is the Path to God - Anon

Babble? by 1001011010110101 · 2005-09-01 05:08 · Score: 1

Does anyone else remember of an old program called Babble that did something like this? :)

OMG by aminorex · 2005-09-01 05:08 · Score: 1

"U.S. and Israeli researchers"

Oh my! Now that they have this technology, they will be able to enslave and pillage the whole world!

Oh... I guess they already did. Nevermind.

--
-I like my women like I like my tea: green-

Universes are like languages by Anonymous Coward · 2005-09-01 08:25 · Score: 0

As soon as someone has a theory that explains the whole universe, it turns into something weirder and more complex.

Same goes for languages.

cryptography? by xpyr · 2005-09-01 09:40 · Score: 1

I wonder how it would do at decrypting data.

--
My Gawd WTF...

Unfortunately, so does "watashi ga" by Ogemaniac · 2005-09-01 10:22 · Score: 1

I still can't figure that one out. The difference can be extraordinarly subtle.

Cultural bias by Anonymous Coward · 2005-09-01 11:31 · Score: 0

However, there's nobody who reads that scenario who doesn't get what Dave actually meant to communicate: That Anna is married, with children.

Hey! I'm not a nobody!!! :-(

You're making a cultural assumption, not a linguistic one. It's particularly interesting because it's an old cultural assumption that isn't valid today. It might hold true in the Bible Belt or in Islamic countries, but in the free world, many women have children, yet are not married.

Dave might really mean "I'm prudish, and disapprove of unwed mothers. Don't date Anna, because she has two kids out of wedlock, and that's proof she's a bad person.", or maybe "Don't date Anna: I watched her break two guys hearts by having their kids and then kicking them out the door". We don't know what Dave means; and to infer that Anna is married isn't realistic.

It's a bitch, and not something computers are even near capable of.

By your very example, humans aren't capable of solving this task in a uniform way. I didn't assume Anna was married; I just assume Dave didn't want John to date her. [1]

And if we humans get things "wrong", there isn't really a right answer, is there? Computers will never be "capable" of solving this sort of problem until we pin down what our expectations really are. Until you say what "right" is, you can't blame the computer for constantly getting things "wrong".
--
AC

Re:Cultural bias by bhiestand · 2005-09-01 19:27 · Score: 1

Hell, I didn't even read that as Dave not wanting John to date her!

I figured it meant "Well, she's already had two kids, so you know she'll put out!"

Or possibly a "Go for the kids instead" or a "you're going to have to deal with her kids too" or any of those choices. And yes, you're very correct.

--
SWM seeks new sig for a brief fling

Definitely more than 1/30th by Ogemaniac · 2005-09-01 12:28 · Score: 1

I have studied five university level semesters and have lived in Japan for a total of six months (where language has not been my primary focus). I read about 800 kanji and write about half that. I understand about 40% of native Japanese that is not directed at me, and about 90% of that which is. I have been on a number of dates entirely in Japanese, can give and receive directions and instructions over the phone, and generally understand the point of technical discussions in my own field.

Obviously, this is a long process but I would say 1/3 is a fair estimate.

Re:Definitely more than 1/30th by Ray+Alloc · 2005-09-01 16:25 · Score: 0

Fluency is not a matter of the proportion of the minimal set of characters you believe you know, fluency is being able to think in the language. And judging by the amount of inner-translating process you use in order to communicate, you are not even an inch close to that state of mind.

Oh, and 6 months is ridiculous, come back to brag when you have lived at least 10 years in the country.

New Algorithm for Learning Languages by PigIronBob · 2005-09-01 15:43 · Score: 1

Being born and raised in the Netherlands I had to learn 3 languages besides Dutch (English, German and French), with English being my preferred. I thought I was pretty good at it until I moved to Australia in 1984, only to discover that all the things they teach you in school (syntax etc.) is only HALF of what you need to truly MASTER the language, after 21 years I can (sometimes) manage to pass for a dinkum Aussie only because I seldom speak Dutch, married an ozzie girl and have ozzie kids.

--
You never catch me alive

I doubt I ever will by Ogemaniac · 2005-09-01 17:52 · Score: 1

I sure don't want to live here permanently or attempt to raise a family here. I fail to see where I have been "bragging", or why you have such a hostile attitude. Half the people on earth speak a second language. Virtually all of my coworkers can speak English as well or better than I can speak Japanese, which means I am loosing to all of them on that count. Hardly something to brag about, nor anywhere near my highest levels of accomplishment. Heck, there is an 18-year-old American living in my complex who speaks better Japanese than I do. Also, I am a bit baffled as to why spending 10 years in a country is something to brag about. Even people dumb as rocks have done that countless times before.

This is Great! Chinese can now visit my site. by newpath4comVersion2 · 2005-09-02 01:52 · Score: 0

This is great! Now the Chinese can visit my site and join the rest of you in not understanding it. http://www.newpath4.com/ Which, if extrapolated, should induce so much RAW PAIN into Chinese Society they'll never drink the water here. Nor want our land. hehehehe They'll figure it's some new kind of radiation poisoning. So, since we needn't fear the RED CHINESE ARMY anymore, we should be able to safely scale back our military and use the money to help cure whatever it is my website does to people... (translated for the new software: sihT si taerg! woN eht esenihC nac tisiv ym etis dna nioj eht tser fo ouy ni ton gindnatsrednu ti. eheheheheheheh)

Contrived Example by zakharin · 2005-09-02 03:59 · Score: 0

Nobody mentioned that both of these sentences are gibberish, if not completely incorrect. Time does not fly like an arrow. There is absolutely no point incomparing time with an arrow. The second one is even worse. It fails the subject/object agreement test. Fruit flies like bananas. At best fruit flies like a particular banana (fruit flies like the banana or that banana). It is not very hard to tell the difference once the sentences make sense.

No, you're just illiterate. by Anonymous Coward · 2005-09-15 03:48 · Score: 0

The example of the parent was comparing a metaphoric use of "like" (as a preposition) to an active use of "like" (as an intransitive verb).

My point was to emphasize this difference, which is the key to unlocking the puzzle from an analytical standpoint. My own conclusion was that computers aren't capable of this nuance, because there is no contextual evidence that will support the computer's ability to determine usage based on metaphor versus usage based on concrete allusion.

You really need to take a grammar class. This has nothing to do with idiomatic expressions. The problem as emphasized by the parent correctly addresses the great problem in a computer's ability to create grammatical abstractions based solely on textual artifacts, as the posted article purports to do.

And another thing... by Anonymous Coward · 2005-09-15 04:10 · Score: 0

This is about language translation.

Really? Then why does TFA make not one single mention of language translation? TFA is all about grammar. Quoth the article:

The algorithm -- the computational method -- for language learning and processing that we have developed can take a body of text, abstract from it a collection of recurring patterns or rules and then generate new material.

Slashdot Mirror

New Algorithm for Learning Languages

454 comments