New Algorithm for Learning Languages
An anonymous reader writes "U.S. and Israeli researchers have developed a method for enabling a computer program to scan text in any of a number of languages, including English and Chinese, and autonomously and without previous information infer the underlying rules of grammar. The rules can then be used to generate new and meaningful sentences. The method also works for such data as sheet music or protein sequences."
would probably help with the problem of either downloading a small, incomplete dictionary, a dictionary with errors, or a massive dictionary file.
Their jobs be outsourced to computers.
I've got 101 mod points and you can't have them!
Google apparently has a system like this in their labs, and entered it into some national competetion, where it pwned everyone else. Apparently, the system learned how to translate to/from chinese extremely well, without any of the people working on the project knowing the language.
only 999,000 more components and we'll have ourselves a positronic net
SCIgen anyone?
Your hair look like poop, Bob! - Wanker.
Paper here for those who have PNAS access.
Imagine if the editors started using this, what would everyone have to bitch about on Slashdot?
This is a perfect apportunity to remind that its Chomsky's contribution to Linuguistics which enabled this amazing (if true) achievement. For those of you don't know Chomsky, he is the father of modern linguistics. Many would also know him as a political activist. Very amazing character. http://www.sk.com.br/sk-chom.html
"There is no flag large enough to cover the shame of killing innocent people."--Howard Zinn
IAALinguist doing computational things and my BA focused mainly on syntax and language acquisition, so here're my thoughts on the matter.
It's not going to be right. The algorithm is stated as being statistically based which while is similar to the way children learn languages is not exactly it. Children learn by hearing correct native languages from their parents, teachers, friends, etc. The statistics come in when children produce utterances that either do not conform to speech they hear or when people correct them. However, statistics does not come in at all with what they hear.
With respect to the learning of the algorithm the underlying grammar of a language, I am dubious enough to call it a grand, untrue claim. Basically all modern views of syntax are unscientific and we're not going to get anywhere until Chompsky dies. Think about the word "do" in english. No view of syntax describes from where that comes. Rather languages are shoehorned into our constructs.
So, either they're using a flawed view of syntax or they have a new view of syntax and for some reason aren't releasing it in any linguistics journal as far as I know.
s/Chomsky/Markov
They've rediscovered the Eliza program!
Input: "For example, the sentences I would like to book a first-class flight to Chicago, I want to book a first-class flight to Boston and Book a first-class flight for me, please may give rise to the pattern book a first-class flight -- if this candidate pattern passes the novel statistical significance test that is the core of the algorithm."
How does it feel to "book a first-class flight"?
From Star Trek???
If fed with a heap of decent grammar, what happens when it's fed with bad grammar and spelling? Will it learn, and incorporate, the tripe or reject it? That's the sort of problem with natural language apps, it's quite hard to sort the good from the bad when it's learning. Take the megahal library http://megahal.alioth.debian.org/> for example. Although possibly not as complex, it does a decent job at learning, but when fed with rubbish it will output rubbish. I don't think it's the learning that will be that hard part, but rather the recognition of the good vs. the bad that will prove how good the system is.
eXemplary Abstract
Let's see what human DNA really says and means!
I know we all feel like we've been screwed by the conspicuous lack of flying cars around these days, but at least some progress is being made on the Universal Translator front...
If you're not part of the solution, you are part of the precipitate
Using this software, I can finally win the 'Summarize Proust Competition'!
I'm not a Troll, it's reverse psychology.
Or better yet, start feeding it images of crop circles that haven't been proven to be fakes (yet.)
I had a sucky sig.
How long before this technology makes its way into the field of game AI? Imagine a game such as Deus Ex or SW:KoToR where you don't merely choose your response to NPC's from a predefined list, you type in your answer! Such technology combined with the simple contextual inferences that drive such oldies like Dr. Sbaitso and its Mac-equivalent Eliza could potentially launch interactivity in games to a whole new level.
Sure it's a long shot, but I can dream can't I?
I read the article and had a few questions. How long does the analyzed text have to be for the algorithm/program to pick up the grammar rules? I mean if it takes long documents and a ton of time, is it really worth it? Also, if it can only recognize languages we already know (and can only read those characters), how useful is this thing? Why not just hardcode grammar rules then? (probably a stupid question, but its an exaggeration of what I was thinking).
Can it decipher these things too?
After a time the attacks stopped. The algorithm that generated them was a slightly better author than L. Ron Hubbard and left a.r.s. to found its own religion.
Actually, what is actually new about this is the unsupervised approach. Graph parsers are quite popular in Natural Language Processing applications, but they use supervised methods.
Unlike all the ridiculous patents being granted lately to IT companies, the one these guys are filing for, to me, seems legitimate. Its a nice change in my mind.
But for this, I have one word: Dolphins.
When you're afraid to download music illegally in your own home, then the terrorists have won!
I hope the material is (or will be made) accessible to laypersons. I'd love to be able to use this algorithm for my own music experiments.
We just had an article on this. There was a shootout by NIST. At least I think, /. search engine blows, hard. Either way, here a link to the tests.
This is one that wasn't covered by the tests, so I guess its front page news.
Is there anything better than clicking through Microsoft ads on Slashdot?
They should play an old country record backwards and see if the computer can confirm that the poor chump gets his truck, dog, shotgun and woman back. Could also try playing an old rock record backwards but computer might get possessed by a demon.
Scientists: Wow joe, the computer just went out and read the entire slashdot archive. Computer: Futile humans, I will p0wn you! *Lifting a bionic hand and swiping the scientist to the side* Scientists: Mac turn it off! Turn it offff argghh! Joe I just don't havvee teh power! Reaching for the coord. Computer: Woa.. wohaahaa. All your base is mine. Make your time.. *The lights go out*
Linux Video Tutorial Project, Tutoring the masses.
It uses a hand-written context-free grammar to form all elements of the papers.
I know you were aiming for funny, but there is a big difference between following a hand-written grammar and deducing it from the text...
Paul B.
http://en.wikipedia.org/wiki/Markov_chain
Used this (easy to compile) C program:
http://www.eblong.com/zarf/markov/
to create these:
http://www.mintruth.com/mirror/texts/
Mod points to whomever can tell us what texts they use. (No mod points can actually be given)
Get your Unix fortune now!
Unsupervised learning of natural languages
Zach Solan, David Horn, Eytan Ruppin and Shimon Edelman
School of Physics and Astronomy and School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel; and Department of Psychology, Cornell University, Ithaca, NY 14853
We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The ADIOS (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.
Many types of sequential symbolic data possess structure that is (i) hierarchical and (ii) context-sensitive. Natural-language text and transcribed speech are prime examples of such data: a corpus of language consists of sentences defined over a finite lexicon of symbols such as words. Linguists traditionally analyze the sentences into recursively structured phrasal constituents (1); at the same time, a distributional analysis of partially aligned sentential contexts (2) reveals in the lexicon clusters that are said to correspond to various syntactic categories (such as nouns or verbs). Such structure, however, is not limited to the natural languages; recurring motifs are found, on a level of description that is common to all life on earth, in the base sequences of DNA that constitute the genome. We introduce an unsupervised algorithm that discovers hierarchical structure in any sequence data, on the basis of the minimal assumption that the corpus at hand contains partially overlapping strings at multiple levels of organization. In the linguistic domain, our algorithm has been successfully tested both on artificial-grammar output and on natural-language corpora such as ATIS (3), CHILDES (4), and the Bible (5). In bioinformatics, the algorithm has been shown to extract from protein sequences syntactic structures that are highly correlated with the functional properties of these proteins.
The ADIOS Algorithm for Grammar-Like Rule Induction
In a machine learning paradigm for grammar induction, a teacher produces a sequence of strings generated by a grammar G0, and a learner uses the resulting corpus to construct a grammar G, aiming to approximate G0 in some sense (6). Recent evidence suggests that natural language acquisition involves both statistical computation (e.g., in speech segmentation) and rule-like algebraic processes (e.g., in structured generalization) (7-11). Modern computational approaches to grammar induction integrate statistical and rule-based methods (12, 13). Statistical information that can be learned along with the rules may be Markov (14) or variable-order Markov (15) structure for finite state (16) grammars, in which case the EM algorithm can be used to maximize the likelihood of the observed data. Likewise, stochastic annotation for context-free grammars (CFGs) can be learned by using methods such as the Inside-Outside algorithm (14, 17).
We have developed a method that, like some of those just mentioned, combines statistics and rules: our algorithm, ADIOS (for automatic distillation of structure) uses statistical information present in raw sequential data to identify significant segments and to distill rule-like regularities that support structured generalization. Unlike
The Universal Translator is tuned specifically for the sounds put out by standard humanoid lifeforms. Humpback whales use both much higher and much lower pitched sounds. The universal translator was not designed to translate such things, as would not be able to translate it.
If you like what I've said here, and want to read more, go to http://www.krillrblog.com
How long until we see something like this applied to ?
Great, now we can actually build a chinese room!
Let's feed it slashdot and find out.
A real universal translator (artificial intelligence) would have to have many thousands of words of text to use as examples, so a language could be learned. http://www.cse.unsw.edu.au/~billw/mldict.html That is why many mechanical translation systems start with word lists and dictionaries to give the learning process a head start.
Markov chains aren't the same as context free grammars.
(CFGs can generate ((multiply) nested) bracket structures (and are like finite automata with stacks).) Markov chains are just finite automata without stacks, that generate random walks through vocabulary space.
I played around with the Google translator for a while. I work in Japan and am half-way fluent. Google couldn't even turn my most basic Japanese emails into comprehensible English. Same is true for the other translation programs I have seen.
I will believe this new program when I see it.
Translation, especially from extremely different languages, is absurdly difficult. For example, I was out with a Japanese woman the other night, and she said "aitakatta". Literally translated, this means "wanted to meet". Translated into native English, it means "I really wanted to see you tonight". It is going to take one hell of a computer program to figure that out from statistical BS. I barely could with my enormous meat-computer and a whole lot of knowledge of the language.
something that can make sense of the voynich manuscript http://www.voynich.nu/. They should have tested their system on it.
If something exists that does not need a creator (god) then why must the cosmos need one?
Wouldn't it be easier just to point to an online dictionary?
"I'm not impatient. I just hate waiting." - My Dad
Electronic babelfish anyone?
My sig beat up your sig.
I'm inclined to agree. IAAcomputational biologist doing bioinformatics-y algorithm things, and I am skeptical of automated grammar discovery. Automatic motif discovery with HMMs is one thing --- that works well, and I suspect that's basically what their bioinformatics results are yielding here (since SCFGs are a superset of HMMs). CFG-related algorithms are great for RNA analysis (I've written a few of them). I haven't read the article in detail, but CFGs aren't overwhelmingly well suited to proteins (which lack the nested-clause structure typical of RNA, for example (and programming languages too, as it happens)). One question I might ask is "how well does this perform when applied to a particular task?" --- the authors mention (in the context of proteins) automated functional classification; I'd be curious to see if this is basically reproducing the results of HMM-like approaches.
Speaking is NOT communication
Isn't this what SEQUITUR (http://sequitur.info/ is supposed to do?
God loves you. God will burn you in hell for all eternity. God wants more foreskins.
If something exists that does not need a creator (god) then why must the cosmos need one?
And the "rules" of a language are NOT what children "learn". First of all, children acquire a language, they do not "learn" it. That is a large attribute to the child's ability to speak it--not whether or not they understand gerunds and the pluperfect.
Second, in a language such as English whose words for the most part lack any necessity to the order in which they're placed to understand they're meaning and, even worse, lack declension forms to distinguish subject from object of the preposition, with what success can a language recognition program have "learning" such a language when prepositions themselves mainly can be omitted? To teach a computer Latin is easy.
Third, what's the hope of the computer ever understanding something like Shakespeare, Joyce, or Dante, whose uses of language rely extensively on erudition for word placement as opposed to typical usage? While a computer might be able to learn Latin because of its rigourous rules, I doubt it could faithfully render a text from Ovid.
PNAS wants you to subscribe to download the PDF.
Or you could just go to the authors' page and download it for free: http://www.cs.tau.ac.il/~ruppin/pnas_adios.pdf
While working for a nutcase . I spoke with with Philip Resnik about his project of building a href="http://www.umiacs.umd.edu/users/resnik/paral lel/bible.html">parallel corpus as a tool to build a language translation system. This seems like the next logical step.
Fight Spammers!
Garbage in, garbage out.
Hoshi would've said, "Well, duh!". (i love her)
Recursive grammars always seemed a natural 'inbuilt' character of all languages to me, but then I took NLP as part of a CS unit on parsing and compiler design. AFAIK nobody has ever come up with a non exhaustive way of analysing structure for inherent grammar. And you're right, you can't extract these features from a single given piece of plaintext. Like Godel would say, your whole formal system is going to get smashed to bits by the first counterexample outside your set. If it were possible we wouldn't need programming languages, so long as the programmer was self consistent they could make up any symbolic garbage and a compiler could say "hmm I know what you MEAN" and turn out valid machinecode. I didn't RTFA but I'm guessing we are looking at Markov chains self clustered by a self organising map type affair. That trick can lead to some impressive pseudo inteligent behaviour, like the Perl Poet scripts, but do they understand the language? No.
In analyzing proteins, for example, the algorithm was able to extract from amino acid sequences patterns that were highly correlated with the functional properties of the proteins.
NCBI BlastP already does this for proteins. Similarities and rules for things can be found but if the meaning of the sequence is not known then what good is it? In the end you need to do experiments involving biology/biochemistry/structural biology to determine the function of a protein or nucleotide sequence. Furthermore in language as well as in biology/chemistry things which have similar vocabulary (chemical formula) may in the end be structurally very different (enantiomers), which leads to vastly different functionality.
It is comforting to know that when we make first contact with the aliens, we might not be able to communicate, but we'll definitely be able to fool their spam filters.
There are no karma whores, only moderation johns
Seems like that'd be a good place to test the system out. While talking with extraterestrials would be pretty awesome, having a chat with a dolphin would be pretty cool too. Remember: "The second most intelligent [species] were of course dolphins"
Interesting. This is certainly a step toward the Star Trek universal translator. On the other hand, though, I wonder if this sort of technology only applies to human languages. Is such an algorithm picking up on some sort of common human brain wiring and taking advantage of that commonality to accurately translate? Or, say, if it were applied to animal language would it work too? In short, is human language a unique form of communication, or is there some underlying, perhaps mathematical "optimal" communication method which animals (or, to think Star Trek, aliens) use too? If the latter, imagine what such an algorithm could do... Perhaps an analogy is in order. Metabolic pathways in living beings-- as far as science is concerned-- were evolved from random chemical reactions thanks to millenia of natural selection. Its slow, progressive optimization of complex chemical pathways allow all life to process material, mobilize the energy stored in food, and run everything that anything alive could possibly need to run. In short, evolution-optimized metabolic pathways are why things live. Now, in computer programming, there is a technique called genetic programming. Basically, if a programmer wants to create a program that accomplishes task X, all he has to do is let loose a different program which will create mountains of random code. Over time (many many processor cycles), the program will select which of the random jumbles of code accomplishes task X most efficiently, and report back to the programmer with the finished product. Voila. Completed program. Note, though, that genetic programming has also turned out solutions for complex electrical pathways that engineers had trouble solving. Yes, genetic programming has the ability to make complex electrical circuits. When genetic programming was applied to metabolic pathways, it actually hit one (I forget exactly which one) straight on-- complete with features such as feedback loops and enzymatic inhibition. Yes. Theres the analogy: some type of underlying mechanism is at work here. Genetic programming. Evolution. Electrial circuits and metabolic pathways. The question is, does the same apply to all communication?
- translate some posts on /. into comprehensible contents
- figure out it is a dupe and kill it before it even appears
- RTFA for me and just give me a good summary (by the rate of articles posted here, there's probably not much to summarize either)
- translate "IANAL" into something else that does not make me think of ANAL thing
- figure that articles on Google and Apple are just speculations by some dude living in his (can't be her, for sure) parent's basement, and not really news worth posting
- translate my suggestions into something acceptable to the (kernel) hackers that good hygiene is a good thing
- understand that I'm just ranting, and it should not take it personal.
I know what the grandparent poster meant was something more advanced than Zork, but the fact that he used Deus Ex and sw:kotor as "examples of games with textual interaction" totally called for the parent poster's response. Background research, people!
The filesystem is the package manager
Feed it the entries in the "obfuscated C" competition - if it works for that, it oughta work for anything.
Pug
An Invisible Entity of Vast Power whose existence must be taken on faith alone: Liberal Media
Finally! Engrish for the masses!
~ slashdot.org - Where some of the world's greatest minds come together to scrutinize grammar.
Basic grammar is one thing, but when you get down to the meaning of words and phrases, you start to get into places computers have a much harder time with: abstraction and contextual meaning.
Think about "running a program". That can mean booting an application on a PC, or coordinating a group of lectures, or being executive producer of a television program. And does an executive producer make executive decisions in the production of a movie... or produce executives?
So when I translated the poetry, there were often times I had to stop and consider which phrase in Spanish best conveyed the meaning I was presenting in English. Furthermore, even with the best translation for meaning, I had to consider the rhythms of both poems and keeping their poetic sensibilities. Sometimes, I'd choose something less exact for the sake of the rhythm. I learned that good translation is F'ing hard!
We may be one step closer to Star Trek's "universal translator", but I have a strong belief that computers won't be putting the better flesh-and-blood translators out of work anytime soon.
For anyone who is interested, here's one of the poems. It was inspired by watching a poet taking what seemed to be a rather long time, hunting through a book of his own poetry, trying to find the next poem he wanted to read.
Finding Your place
You would think you know where
each of your poems are in this book. So familiar
are you with your own work and this organization
your mind has imposed on it. One poem extends
past the other, like feet moving
along a path that has been travelled
back and forth numerous times. Flipping along,
you hope you might know just when to stop
this procession of pages, automatically
locating the point you desire, rather than having
to stop in confusion and check
the signpost numbers and titles,
because you have become lost.
Hallar Su paraje
pensarías que conoces donde está
cada poema dentro de este libro. Tan familiar
estás con tu propia obra y esa organización
que la mente le ha impuesto. Un poema se extende
más allá del otro, movimiento de pies
sobre el camino transitado
de aqui hacia allá sin cesar. Hojeando,
esperando que sepas cuando detener
este desfile de páginas, automáticamente
localiza el punto deseado, en vez detener
que parar confudida y verifica
los números y títulos, como los postes indicadores,
porque te has perdido.
- Greg
Start a happiness pandemic
The FA says that patent applications have been filed. Are those available anywhere online?
I'm curious partly because this sounds very similar to a couple of pieces of prior art, but mostly because the description of how they go from basic structural recognition to translation between two unrelated languages reminds me a bit of that famous cartoon where two blocks of equations are separated by a little balloon containing the words, "And here, a miracle happens."
Proud member of the Weirdo-American community.
Yes, pattern recognition is a major part of the process. However, there are other fundamental parts that are also extremely important, and lacking them you get nonsense. In particular, context matters. "aitakatta" in the middle of a business letter probably does mean "wanted to meet". By itself, said by one member of a couple to the other over drinks at a bar, it does not.
In order for a program to translating to translate accurately, it needs to know who is speaking/writing, who is the audience, what their relationship is, and their location. Some of this may be given to the computer explicitly, or easily found in the text/speech (for a human at least) but some of it may not. This is not going to be an easy problem to solve.
Writing is never free from its context. I know before I even start whether I am reading a fiction novel, a satire, a scientific journal, an email from my boss, or a text message from my date this Saturday. The meaning of the words can change a lot in those cases.
Even Google translator, which was trained on multi-lingual UN reports, could not produce comprehensible English from simple Japanese business emails.
As for my chinko, that's a long story.
For example the lost iberian language, spoken in Spain before latin. There are texts, but nobody understand them.
Could this be used to make a smarter spam filter?
As I told her, you don't have to tell them to me for me to figure them out
Perfectly sensible to exactly one person, at one moment in time - and complete nonsense at any other. I laughed when I wrote because I realized how aburd this sentence would appear to one of my poor Japanese friends. Of course, they throw the reverse mind twisters at me.
I agree, some kinds of texts are simpler than others. Texts that are factual, and distanct from personal human interactions, are probably easier to translate because context matters much less. Either way, I have yet to see a translator that can come close to turning even the most simple Japanese into comprehensible English and vice versa.
Well, I guess that since this paper deals with unsupervised learning of natural languages and this NIST shootout was about Machine Translation that maybe they are just a little different. I admit I haven't fully digested the paper in the article but it seems to me that this is by far different. But he y, thats just me.
Learning Chinese for quite some time I seriously doubt the claims made. First, word segmentation in Chinese is not easy because in the printed script there are no spaces. And no, single characters do not necessarily represent complete words. In modern Chinese many words are now consisting of two syllables. So there is simply no statistical way of booting how to identify a word. You have to learn it, but from especially prepared texts -- usually called "Learn Chinese", dictionaries, et cetera. Well, the authors claim to learn that from ordinary text -- but you see, you need special prepared texts. So their claims are not exactly wrong, but not truth either.
A second problem is that for many important words that are used day-to-day there is no simple way of infering the meaning from the two syllables. For instance, "da3" has many meanings (the number indicated the speaking tone), one of its most prominents meanings are literally "to hit something/someone". When you are taking a taxi you say "da3 di5", but you are surely not hitting the taxi.
Sorry, but the whole "it has even learned Chinese" thing is just wishful thinking of a bunch of people who -- as they admitted for themselves -- have not even the slightest clue of what is going on in the Chinese language.
And to finally resolve the riddle in the subject line: if Chinese find some language to be completely uncomprehensive they call it a "Birds' Language". But they never apply this to their own language...
One should note that generating meaningful grammar doesn't mean generating meaningful sentences. It might be grammatically correct to say "blue bingo bats" but it doesn't mean anything. The machine has to have some common sense and "understand" concepts the way we do, to produce human language.
But they do seem to be using the well-known university researcher's approach and namely:
1. Repackage some previously done stuff under a cute acronym -- "ADIOS" in their case, but 10+ points for recursive ones. ...?
2. Patent it
3.
4. *Success and fortune.
*Most never get here.
What if we had a Beowulf cluster of these?
God I love Japan! Though these triple-date Sundays are starting to tire me out.
Called Pragmatics. It can be somewhat oversimplified as saying it's the study of how context affects meaning or as figuring out what we really mean, as opposed to what we say.
For example, a classical Pragmatics scenario:
John is interested in a co worker Anna, but is shy and doesn't want to ask her out if she's taken. He asks his friend Dave if he knows if Anna is available to which Dave replies "Anna has two kids."
Now, taken literally, Dave did not answer John's question. What he literally said is that Anna has at least two children, and presumably exactly two children. That says nothing of her avalibility for dating. However, there's nobody who reads that scenario who doesn't get what Dave actually meant to communicate: That Anna is married, with children.
So that's a major problem computers hit when trying to really understand natural language. You can write a set of rules that comletely describes all the syntax and grammar. However that doesn't do it, that doesn't get you to meaning, because meaning occurs at a higher level than that. Even when we are speaking literally and directly, there's still a whole lot of context that comes in to play. Since we are quite often at least speaking partially indirectly, it gets to be a real mess.
Your example is a great one of just how bad it gets between languages. The literal meaning in Japanese was not the same as the intended meaning. So first you need to decode that, however even if you know that, a literal translation of the intended meaning may not come out right in another language. To really translate well you need to be able to decode the intended meaning of a literal phrase, translate that into an approprate meaning in the other language, and then encode that in a phrase that conveys that intended meaning accurately, and in the appropriate way.
It's a bitch, and not something computers are even near capable of.
"The rules can then be used to generate new and meaningful sentences".... Why not start with a computer language? Why not Machine code? Somebody might even be able to generate new and meaningful Windows OS bits.
Cool, now we can talk to the aliens! Take off your tin foil hats, they don't need to scan your brain any more.
-
Time flies like an arrow.
-
Fruit flies like a banana.
There are other, similar examples. Computer systems tend to deduce either that there's a type of insect called "time flies", or that the latter sentence refers to the aerodynamic properties of fruit.From TFA: The algorithm discovers the patterns by repeatedly aligning sentences and looking for overlapping parts.
If you take just a single string [of length n] and rotate it against itself in a search for matches, then you've got to do n^2 byte comparisons just to find all singleton matches, and then gosh only knows how many comparions thereafter to find all contiguous stretches of matches.
But if you were to take some set of embedded strings, and rotate them against a second set of global strings [where, in a worst case scenario, the set of embedded strings would consist of the set of all substrings of the set of global strings], then you would need to perform a staggeringly large [for all intents and purposes, infinite] number of byte comparisons.
What did they do to shorten the total number of comparisons? [I've got some ideas of my own in that regard, but I'm curious as to their approach.]
PS: Many languages are read backwards, and I assume they re-oriented those languages before feeding them to the algorithm [it would be damned impressive if the algorithm could learn the forwards grammar by reading backwards].
This blog item does not have any mentioning of Markov models, which in my blunt opinion, proves that the author fails to grasp the real merit of the method.
The ability to look at sequences of words up to a certain depth (as much as brute-force permits) could get you nice textures in graphics and get good flow of coherent text (Sci-GEN from MIT comes to mind.
My Linux - (L)ove (I)s (N)ever (U)tterly eXPensive
"Commander Blood"
Play Command HQ online
- First, you have to distinguish between what I'll call lexical verbs and auxiliary verbs. A lexical verb is the verb in the sentence that actually tells you what sort of action (or experience, or state, or whatever) the sentence is describing. An auxiliary verb is a verb that doesn't do that, but expresses some combination of information about modality (possibility, necessity, obligation), tense (past, future), aspect (progressive, perfect), negation and agreement.
- The sentence Be you man or mouse? is archaic, and thus does not count as part of the data when one is analyzing the grammar of contemporary English.
Now, what's do-support? It's essentially that the grammar of contemporary English is so that the only verbs that can be negated with -n't, or inverted with the subject (e.g., to form a question), are auxiliaries. You don't negate the sentence You like John in contemporary English by saying *You like not John, nor form the yes-no question as *Like you not John? (The asterisks in front of the sentences are linguistese for "the following is not a grammatical sentence." As another note, in more archaic English, on the other hand, do-support did not exist, so that did use to be the normal way of forming the negation and the question.)In the sentence Do you like me?, like is the lexical verb, and do is the auxiliary.
So, essentially, to form the negated sentence or the question, you need some auxiliary. If the basic declarative sentence already has one, you can just use that: from You will like John, you can form You won't like John or Will you like John? If the basic declarative doesn't have an auxiliary, then you need to use the auxiliary do in order to "support" the negation or question. In these sentences, the auxiliary do is otherwise a dummy word.
Just reverse the polarity, or ask engineering for more power. Duh!
None at all.
On the other hand, if one happens to care about the minute details of insane Chomskian syntactic theory, then it matters enormously, because insane Chomskian syntactic theorists are forbidden from just stating the rule as-is in their theory. I mean, it justdoesn't sound impressive enough. They're supposed to derive it as some sort of complicated "theorem" from deep principles of Universal Grammar, and thus, to gloriously prove to the world that Plato was right about innate ideas.
Did I just say "gloriously prove to the world that Plato was right about innate ideas"? My apologies. I meant "gloriously prove that Chomsky is right about Plato being right about innate ideas."
(People reading this might guess that if you're simply trying to state the rules of the grammar of the language, without any ulterior Chomskian motives, you might not really think that do-support is a specially thorny problem...)
Are you adequate?
I don't see how this contradicts the innate ability to learn language theory that Chomsky put forth. Chomsky is repeatedly on the record saying that general-purpose statistical methods are not sufficient to learn a language.
Are you adequate?
Let them try the voynich manuscript and see what they can do; I doubt it...
You will always have the problem of words with more than one meaning. Omoshiroi is a perfect example - how is the computer going to know whether to choose "funny" or "interesting" without knowing the context?
I have helped my Japanese colleagues write a number papers in English. The number one mistake is always a/an/the. Why? Because Japanese does not even have this concept, making it difficult for them to understand. So what happens when a machine tries to translate Japanese into English? It must literally insert a/an/the in locations where there is no word in Japanese. How does it know which one? Context. This context can be extremely subtle, but make big differences in meaning. I often have to ask my Japanese colleagues about their research in order to decide whether a/an or the (or neither) is correct, because using any of these words provides information that they have no already included.
I am not sure if that is really that big a problem. With mobile text messaging, people have started changing their sentences into a form that can be understood by the phones dictionaries.
Say, if I normally would have typed "stroll" to say "walk" and I would notice that when I press 787655 on my phone's keyboard, the T9 dictionary misunderstands me, I would just start typing 9255 for "walk" instead. I think the same would happen here. If somehow the person typing the messages would get instantaneous feedback from the system about a "commonly misunderstood" structure, he would quickly learn to avoid these structures while typing.
On a related note, things like "fly like an arrow" are the most difficult thing to learn in my opinion in a language, and thus foreign speakers do not use or know them. And still, "badly spoken english" can be comprehensible among the people speaking it. One thing I have noticed myself is that it is the british who have most problems understanding a foreigner speaking english badly. Other foreigners would understand the same person just fine. Something to do with the way the brain is wired to wait for certain words after another I guess.
Of course, the problem is that we would get rid of all the things that make language "alive". But here I am typing a message on another language than my own and still many people can to some extent understand what I mean...
It would be great if songwriters used this tech for foundational musical and lyrical ideas. It seems like every piece of music I hear these days "strongly reminds me" of music I've heard before.
Retired from software... maybe. Sort of.
Yes! I'd have thrown a mod point at you just for this paragraph if I could.
English is very precise (when used as directed) in matters of time and sequence -- we have more than 20 verb tenses where most languages get away with three.
Not really. Firstly, English only has two or three tenses. (Depending upon which linguist you ask, English either has a past/non-past distinction or past/present/future distinctions. See [1], [2]. The general consensus seems to be in favor of the former, although I humbly disagree with the general consensus.) It maintains a variety of aspect distinctions (perfective vs imperfective, habitual vs continuous, nonprogressive vs progressive). See [3]. Its verbs also interact with modality, albeit slightly less strongly.
It's a very common mistake to count the combinations of tense, aspect, and modality in a language and arrive at some astronomical number of "tenses". It's an even more common mistake (for native English speakers, anyway) to think that English is special or different or strange compared to other languages. In most cases, it's not -- especially when compared with other Indo-European languages.
Secondly, and more interestingly IMHO, most languages do not have three distinct tenses. The most common cases are either to have a future/non-future distinction or a past/non-past distinction. In any case, the future tense, if it exists, is normally derived from modal or aspectual markers and is diachronically weak (which is linguist-babble meaning "future tenses forms don't stick around for very long"). See [3].
English is a perfect example: will, of course, used to refer to the agent's desire (his or her will) to do something. Only recently has it shifted to have a more temporal sense, and it still maintains some of its modal flavor. In fact, the least marked way of making the future (in the US, at least) is to use either gonna or a present progressive form: I'm having dinner with my boss tonight. I'm gonna ask him for a raise. See Comrie [1] again.
So as not to be anglo-centric, I'll give another example. Spanish has three widespread means of forming the future tense. Two of these are periphrastic and are exemplified by he de cantar 'I've gotta sing' and voy a cantar 'I'm gonna sing'. The last is the synthetic form, cantaré 'I'll sing'.
Most high school or college Spanish teachers would tell you that the "pure" future is cantaré. Actually, it's historically derived from the phrase cantar he 'I have to sing' (from Latin cantáre habeo), and is being displaced by the other two forms all across the Spanish-speaking world. I'm told, for example, that cantaré has been largely lost in in Argentina and southern Chile (see [4]).
In any case, the parent's main point still holds. It's a b?tch to deal with cross-linguistic differences in major semantic systems computationally. But good lord, it's fun to try. :)
References:
I'd 1ik3 70 533 7h47 d4mn 41g0ri7hm w0rk 0n 7hiz 5hi7!
Do what I say, cuz I said it.
-Meatwad
No, you didn't. At least, not the one he's talking about. It only translates arabic and chinese to english so far AFAIK, and it's not available to the public. The one on their website is not theirs, it's licensed from Systran, same as every other internet translator. The new research one looks very impressive: in a comparison to old automatic translators, a sentence previously translated "alpine white new presence tape registered for coffee confirms laden" was correctly rendered as "the white house confirmed the existence of a new bin laden tape."
main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
Yeah sure, you can't even understand immediately the "aitakatta" a shivering young japanese lady tells you, whereas it is probably one of the most frequent sentences you hear on dating Japanese...
Believe me, young man, the other "half way" to fluency is going to be extremely long, at that rate.
English to Japanese and back:
I I a certain thing which one time my Spanish friend of IM conversation is deceived being, have known that that considerably is accurate. I Spanish called to IM to Spanish which is learned by I in those due to fair copy/pasted in hypnosis and the basis entirely. As for conversation the Spanish of 15 parts way I before I said to those, continued the fact that the web sight is used sufficiently because. They had upset their pants.
The Japanese version was also utterly incomprehensible. I can't post it here because of character issues.
About the only part the thing seems to have gotten close enough to understand deals with underpants.
Algorithm for learning languages its not learning/feeling the free/open source language/movement.
I understood it right away, but I brought it up because of the mental gymnastics you have to through to turn it into an English thought (which we talked about after I responded properly enough in Japanese).
If you want to quibble, I'd say I was closer to 1/3 of the way to fluency.
"I barely could with my enormous meat-computer and a whole lot of knowledge of the language."
That's thinking with your gland!
Seriously maybe we could use this research on computer languages.
Klingon has simple grammar.
How about Dolphinese? Research shows that they seem to be able to scout and transfer information from one individual to his/her pod. If there's some grammar it would be pretty good nut to crack.
It is no longer uncommon to be uncommon.
Japanese is markedly different from Chinese, other than some of the same letters :) Generally google or babelfish.altavista or anything has a hard time translating Japanese to English. I mean you get the idea, but the trolls on Slashdot would cut you to pieces in their meaningless, retarded, airheaded corrections if you spoke with 1/10 of the grammatical errors that the current Japanese>English engines put out. ANY translation is good though, especially like this algorithm in the article, as most Americans think that 26 letters is a lot. Chinese has how many thousands?
They aren't claiming to have learned the language. They've claimed that they can statistically analyze the grammar to the point that they can produce further text which could make sense in some context. There's a huge difference.
someone should have told Lance: http://www.livestrong.org/ about this before he came up with "I Live Strong"
Having done a bit of phrase structured syntax analysis back in the 60s and, learning from a paperback adulation of Chomsky what tranformational grammars were, I thought "Is that all it is?"
From time to time I get to speak to people in the academic linguistics racket: professors, students etc. Unwittingly they impress on me the uselessness of linguistics over the past 30 years or so. But how did it get so influential? This would be an interesting topic for research.
Instead of hundreds universities world-wide lining up students to be force-fed Chomsky grammars why not let them gain some real research skills and find out how this transformational pixie dust grew into the acaemic industry it is today.
I started in computing, in 1967, I remember the promise "English to Chinese by next year".
I wrote a program that made a statistical analysis of the party platform for the upcoming German national elections. The Phrasinator is able to write new nonsense texts based on the original material.
It's more satire than science, making fun of political blabla.
The idea is more than 20 years old and based on an old article by Bryan Hayes, "A progress report on the fine art of turning literature into drivel".
This "new method" by Edelman and his colleagues sounds rather similar. I'm really curious what they did to improve it.
------------------
You may like my a cappella music
Wow, its really great! Their still sum issues wif it though, like teh times when it corrects moi based on spelling as learnt from teh comments on /., but generally I think it will be a necceity in the future!
It really begs the question, why isn't every one using it now?
Online backup with Mozy, sounds like Ozzie, but more!
You might or might not want be able to resist. If the chemicals control a specific organ, it might shut down or go into overdrive without you 'wanting' it, just by having a mutation in the DNA that will upgregulate that particular chemical.
But ultimately you are right, the DNA can only create predispositions (to cancer, to heart disease etc...). The debate is then where does one end and the other begin? Can/should I kill because I am predisposed to violent behavior and thus not be found guilty because of it?
The article claims that the program can correlate protein sequence to function. I don't doubt that it can find small regions of contiguous amino-acid sequences that are common between a few proteins of the same function, but I highly doubt that it can predict function from from a protein sequence. Predicting a protein structure is already a very difficult problem for computational biophysicists , which is a prerequisite for studying function. For example, the CASP4 competition compares various structure predicition programs from an amino-acid sequence. Understanding function from a structure is even more difficult because it involves identifying the active site or functional regions as well as protein dynamics.
Comparative sequence searching, known as homology alignment, is not fool proof either. See the PSI-BLAST tool for homology alignments. This is a very difficult problem for biophysicists because of insertion mutations, functional mutations, and many other reasons. Two sequences with low homology may or may not have similar structures (folds) and/or function. Likewise, homologous sequences may have very different functions.
Protein structure prediction, which precedes function prediction, is already quite a difficult problem for biophysicists to tackle.
Would you guys look at the quality of the spelling in these comments!
/. there for a second...
Thought I was in
Many languages are read backwards
What languages are read backwards? I can't think of a single one. (I know of many that are read from right-to-left, and even some that are read from top-to-bottom, but those aren't "backwards".)
Seriously, I have been seeing this example for ages and I just realised that "flies" in the second sentence is a noun :( I was all like, "Fruit doesn't fly like a banana! Bananas don't fly!" What chance does a computer stand? :P
Send email from the afterlife! Write your e-will at Dead Man's Switch.
This sounds suspiciously like Mad libs. For those of you who don't remember Mad Libs, check this out.
Lepp
I wish Slashdot would not parrot press releases that contain no information.
Grammatical inference is a hard problem. A couple of researchers, who appear to have legitimate credentials, have published a paper (that's what they do for a living) and acquired a patent (see any number of threads for how much novelty that may or may not require).
I would be much more impressed if there were at least some whiff of a scientific claim, like "after processing 1GB of English text for 10000 hours, the prototype implementation generated text that 51.7% of observers could not distinguish from a transcript of English spoken by a 6-year-old."
It is impossible to determine what, if any, contribution to natural language processing is included in this paper. Therefore the press release is not news.
If somebody wants to *read* the paper and tell us what it is about, I'll be all ears.
Automatic generation of language pairs has been done since the early 70's.
Anyone else thinking about using the tech to learn something about "the grammar of DNA"?
If they can use it for analysing proteine sequences, maybe they can tackle "the grammar of Life" and kickstart the whole Bioengeenering sector into a new life...
OTOH, the integrist christians will probably denounce this as an evil thing...
It takes 40+ muscles to frown, but only four to extend your arm and bitchslap the motherfucker
Just how much more simple and blank will your language be after you've learnt to speak so that computer transalation programs understand you? ;-)
Imagination, gibberish, idioms and emotion-driven deviations of standard grammar make a language 'alive'. It was never for sharing pure information only. How do you render several words for "I" from Japanse into English? How do you hear/how do you _feel_ about Southern American English (I know it's old but.. read Hucklyberry Finn's Adventures)? How do you translate those into foreign languages? Are you sure you don't loose a thing when words with distorted spelling are translated into some foreign laguage?
Mother tongue is about subtlety and emotions.
Enter the human translation.
have you noticed that the original is not in "correct" English?
I went to the site and tried with: and I had:which is nice translation (some pronouns are missing, ok), and putting it back to English brought me:(what's up with it not knowing that "mear" is "to piss" and "Spaniard"???)
It's better to be the foot on the boot than the face on the pavement. ~~ tkx Kadin2048
If the learning algorithm can handle all of these scenarios, I'll be impressed.
You see? You see? Your stupid minds! Stupid! Stupid!
It worked great except on teenage IM and SMS abbreviations, then the scientists got the blue screen of death. It seems teenage speak is more indecipherable than Klingon.
equivalent to "the". So if a computer sees "watashi wa hon wo katta" it has to figure out whether it means "I bought a book" or "I bought the book" based on whether the particular book that was bought had already been established to the listener, or whether there was only one book in the world. Actually, native Japanese would usually drop the "watashi wa" (ie, I) and therefore the computer would have to guess the subject from context, too.
...and see if it finds anything quicker than what they are using right now.
>As for my chinko, that's a long
Oh, you have a long chinko,don't you?
Envy of Japanese man would be everywhere!
and what part of speech is "develloped?" I would hazard a guess that most people do not use proper grammar in daily speech, and even less do on the internet. This is also true for foreign languages. In my experience, if you learn proper grammar, there will still be difficulty communicating with native speakers until you learn all the improper uses and bad habits that are common among native speakers.
People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf.
This has to be a hoax - or written by somebody who hasn't had a lot of exposure to languages. Try to dip into a few books about the subject - langauges and grammars are much weirder than what you'd think. Translating from Chinese to English is fairly straightforward in that context, and even then there are many examples of things that don't translate easily.
t ml)
But have a look at eg. a language called Piraha, here's a link to what Daniel L. Everett has to say: (http://lings.ln.man.ac.uk/Info/staff/DE/DEHome.h
Or read something about Papuan languages (spoken in Papua New Guinea) - there are some that are seriously different.
That metric is compression.
If there were funding for the C-Prize these guys might have walked away with a large chunk of it but then they might not have been able to acquire the monopoly rights they're pursuing via the patent application. The C-Prize description follows:
Since all technology prize awards are geared toward solving crucial problems, the most crucial technology prize award of them all would be one that solves the rest of them:
The C-Prize -- A prize that solves the artificial intelligence problem.
The C-Prize award criterion is as follows:
Let anyone submit a program that produces, with no inputs, one of the major natural language corpora as output.
S = size of uncompressed corpus
P = size of program outputting the uncompressed corpus
R = S/P (the compression ratio).
Award monies in a manner similar to the M-Prize:
Previous record ratio: R0
New record ratio: R1=R0+X
Fund contains: $Z at noon GMT on day of new record
Winner receives: $Z * (X/(R0+X))
Compression program and decompression program are made open source.
ExplanationA very severe meta-problem with artificial intelligence is the question of how one can define the quality of an artificial intelligence.
Fortunately there is an objective technique for ranking the quality of artificial intelligence:
Kolmogorov Complexity
Kolmogorov Complexity is a mathematically precise formulation of Ockham's Razor, which basically just says "Don't over-simplify or over-complicate things." More formally, the Kolmogorov Complexity of a given bit string is the minimum size of a Turing machine program required to output, with no inputs, the given bit string.
Any set of programs which purport to be the standards of artificial intelligence can be compared by simply comparing their Artificial Intelligence Quality. Their AIQs can be precisely measured as follows:
Take an arbitrarily large corpus of writings sampled from the world wide web. This corpus will establish the equivalent of an IQ test. Give the AIs the task of compressing this corpus into the smallest representation. This representation must be a program that, taking no outside inputs, produces the exact sample it compressed. The AIQ of an AI is simply the ratio of the size of the uncompressed writings to the size of the program that, when executed, produces the uncompressed writings.
In other words, the AIQ is the compression ratio achieved by the AI on the AIQ test.
The reason this works as an AI quality test is that compression requires predictive modeling. If you can predict what someone is going to say, you have modeled their mental processes and by inference have a superset of their mental faculties.
Mechanics The C-Prize is to be modeled after the Methusela Mouse Prize or M-Prize where people make pledges of money to the prize fund. If you would like to help with the set up and/or administration of this prize award similar to the M-Prize let me know by email.
Seastead this.
I guess a big part of successfully translating and interpreting languages has to do with knowing the idioms in both the source and the target languages.
Given the considerable semantic ambiguities involved, I think it may take quite some time before we have capable machine translation software... not to mention interpretation, which is more difficult still. (Think incomplete words, redundant 'like's, grammatically non-question intonation based questions, etc.)
Akarsz Magyar Gentoo fórumot? Akkor
Apparently nothing NEW here, all they've described is a straight forward Abstract syntax tree. These where taught in undergrade CS courses 15years ago.
The fact-collectors haven't finished their part of the job, and the abstraction-lovers are actively hindering the fact-collectors from doing their job, as (per Chomsky) all languages are equally complex, therefore all languages can be studied by only studying English...
A kid will always hear "Are you hungry" but never "Am you hungry" or "Are he hungry".
A child hears "Aryoo hungry" or "Izzy hungry" and lexicalizes "Aryoo" and "Izzy". Later stages of language acquisition separate out the sound patterns conventionally denoted as "are", "you", "is", and "he", in "Are you hungry" and "Is 'e hungry".
"MR ducks."
"MR not."
"MR2."
"MR not."
"CM wangs."
"LIBMR ducks."
"You'll get nothing, and you'll like it!"
Unfortunately, we've been hacking away at this problem for quite some time now. (Leanring the Grammar of DNA). We're actually quite good at understanding the grammar. It's decoding the meaning of the individual elements that seems to be the hard part.
Biologists have been, for about 10 years now, very very good at decoding the grammar of raw DNA sequences. The crux of the problem is figuring out how exactly those gene products function in the body, what elements of their sequence/structure cause them to behave in such a way, and how much does each element contribute to the overall picture.
It's one thing to be able to put together a well-formed sentence, but another thing entirely to make that sentence communicate something worthwhile, as part of a greater whole.
Indeed, biologists are now trying to put together the "paragraphs" and "chapters" of life, rather than the sentences.
Those who can, do. Those who can't, simulate.
It works the other way too:
"I'm leaving you."
What?
"I'm leaving you, Alice."
I don't understand what you're trying to do.
"I've met someone."
What do you mean 'met'?
"Look...just read the pamphlet."
I don't have the pamphlet.
"I have to go."
Which way do you want to go?
"Uh...west."
You would need a machete to head further west.
I can't tell you how many of my break-ups have ended with needing a machete.
you can have my violent video games when you pry them from my cold, dead hands.
Prime UID Club
it means! -"Chomskyan"0 031518/qid=1125582276/sr=8-2/ref=pd_bbs_2/102-0573 459-7462505?v=glance&s=books&n=507846
Seriously, though, I'm not terribly familiar with linguistics, although I've enjoyed Steven Pinker's books, and apparently he is a "Chomskyan." It seems to me like a controversy for which evidence from nuerologists and nuerobiologists is very helpful. Haven't read Pinker?--
http://www.amazon.com/exec/obidos/tg/detail/-/067
I have access to my PNAS but I've been warned before for playing with at work while looking at websites.
on http www computer and the speaking and on it human into some or examplete altern as so it are of a going is not a google of a finding on in learning as and pare to so the learning inten it computern learning is pare alter relation in the formation http www east are a simplete the first score senter and produce into shold the what human does the alter and of language language what have i was score net the cal to alter relation in the which intern a probable cal algorite the programmar from language to the the the which is it the programmar do with and as the to beneath you was befor a mean does structive befor altern as score a from language working the pare alter the working in the a from language stand a man the for from a
In the natural language processing business they call this "the same level of understanding as a two-year-old child". ;-)
The qsoietun rinmeas, wulod stmhnieg lkie tihs slitl be rdlaebae?
This sig has absolutely no significance and serves only to take up screen space and waste the time of the reader.
I hooked up a collection of hip hop CDs to the program to try to figure out what the heck they are trying to say and left it running overnight. I came back in the morning to find my computer with gold chains hanging off the cd drive, the case cover slipped down so I could see the edge of the hard drive and when it booted it called me "beyatch".
sure this sounds like a good thing for recognizing patterns and heck i'll even give it credit for being a possible new Grammar Checker... But generating new, useful and meaningful content? I think not. For one thing, anything that it can learn from past text is not wholly new, and the material generated will still need *real* human beings to make sense of it, interpret the implications of the data, and decide what to do with it. How is that different from a million monkeys typing on a million type writers? You still need someone to read all those potential texts of Hamlet before you find one that's useful. And as to meaningful, all I can say is that colorless green ideas sleep furiously
- How does this algorithm handle exceptions to the normal word patterns from sentences that don't exhibit the pattern? For example, if not for the word receive one wouldn't know that there are exceptions to the i before e rule.
- This will create a translation based upon a supposedly authoritarian source. Many times in language translation there is an argument as to how things are translated. How are multiple authorities handled?
- Along with the other posts I read I doubt the efficacy of the algorithm in both time and space measurements. Unless there are some serious restrictions, I doubt the problem is even tractable.
- Finally, if this code were that good it could be used to crack any code that isn't a one-time pad system from enough consecutive examples of ciphertext to plaintext. For example, it could be used to find the private key of a 2-key system by encrypting everything you could using a public key and finding the reverse translation once the dictionary has been deciphered.
There are not enough published details to make a judgement on this one way or the other. It's a shame the algorithm is patented; otherwise peer review would take place and possibly find some holes or some other interesting uses of this seemingly black-box algorithm.Children learn from hearing correct utterances. Whether their utterances are corrected or not makes essentially no difference in their learning.
-- Too lazy to get a lower UID.
When and whether modern views of syntax are going to advance has little to do with Chomsky being alive or dead. It's not the person Chomsky that has much relevance, here, it's the Chomskyan paradigm.
Typically in science, any old successful paradigm will die hard, any new paradigm has a hard time becoming mainstream. That's a general fact, and would not be any different, if Chomsky had died 20 years ago.
Chomsky did indeed found the new paradigm, but he really can't be blamed in person for stagnation in linguistics. Don't expect a scientist (any more than any other human being) to radically change their own views in old age.
BTW, if you ask me, just why Chomsky was so influential - that may or may not have much to do with his theory per se. More important, I think, is that the question "Hey, let's try to find stuff that's common to all languages" triggered a lot of (novel and exciting) research.
Otherwise, GIGO.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
There are hundreds of thousands of Etruscan artifacts with writing on them that pre-date the greeks and romans, but nobody has ever been able to translate them. They basically used (invented?) the modern alphabet, but used no spacing and they may have written right to left. I wonder if this software could work on it...
I am serious here. It is pretty conclusive that our brains are composed of a system of neural nets which learn patterns, and these neural nets likely are connected to one another in such a manner that patterns beget patterns. Jeff Hawkins in his book "On Intelligence" describes his theory of the structure of the cerebral cortex (he doesn't discount the other portions of the brain, but in that work he focuses on the portion that controls high-level thought and reasoning) as being a hierarchical structure, which has built in feedback loops among its own parts, as well as back to the sensory and other parts - that is, we learn a pattern, and during that learning we "play it back" at the same time, so that in the end, that is all we are doing - playing back patterns in similar manner to see if it matches with the pattern we already know.
Imagine you know how to hit a baseball thrown with a regular pitch. A baseball is round, it moves through the air in a certain way - or so you think. Now imagine you are thrown a curve ball or something - a pitch which causes the ball to move differently from the way you learned and know how to hit. You swing, likely just like you learned before, but the pattern doesn't match, and you miss. But seeing the motion of the ball and how the bat missed, sets up new patterns connected to the original that modify the original for the new pattern of "curve ball pitch". If you get thrown enough curve balls, you will likely at some point have a pattern that does connect with the ball, and as you refine it, you now have a pattern subset of the original pitch, overlayed with the original pattern.
Eventually, over a lifetime, the number of patterns you have and how you "play them back" to fit your mind-model of the world to interpret it becomes astounding. It all starts out simply as "flailing" as a baby (try throwing a wad of tissue at a baby to see them react - do it over and over and you will see the pattern matching and buildup occur. Then change what you throw and watch the chaos as the pattern no longer fits, but continue and watch it change to fit, then switch back to the wad of tissue paper - both will continue to work, the overlay and hierarchical linkage is complete), and as time continues, we build up tons of patterns...
I recently experienced a mild form of "synesthesia" (sp?) - where I looked at the color purple and thought "grape flavor" - that is, I could "taste" the "grape soda" flavor. The pattern between the color and the flavor of grape soda was so strong that the sight of one triggered the feeling and flavor of the other. The feeling of "deja vu" is the same thing: patterns and partial patterns play off one another and trigger each (and the feelings associated - other inputs, you see), causing things to "seem similar" (well, in a way they are!). Another similar pattern playback response: smells triggering memories and feelings. Just about everybody has had this experience in one manner or another...
So, your case is interesting in that for most people in the United States (particularly those in the western half of the country) recognize the term "fruit flies" as a noun - a type of insect which has caused massive infestation in fruit growing areas. It was all over the news in the 1980's - at one time it seemed like every broadcast had some kind of reference to fruit flies in it. A lot of genetic engineering advances came from experimentation with fruit flies to try to figure out how to eradicate and/or control them. They caused a lot of damage economically and so they were a "news-worthy" item. For people my age (I am 32), we were innundated as kids with the term "fruit flies" and what they are and what they meant (especially if you lived at the time, as I did then, in California). The pattern was quickly set up, and now these people can't help but think of the term "fruit flies" as a noun.
If you didn't live in the United States at the time, or you are younger than 20-25 years old - this pattern
Reason is the Path to God - Anon
Does anyone else remember of an old program called Babble that did something like this? :)
"U.S. and Israeli researchers"
Oh my! Now that they have this technology, they will be able to enslave and pillage the whole world!
Oh... I guess they already did. Nevermind.
-I like my women like I like my tea: green-
As soon as someone has a theory that explains the whole universe, it turns into something weirder and more complex.
Same goes for languages.
I wonder how it would do at decrypting data.
My Gawd WTF...
I still can't figure that one out. The difference can be extraordinarly subtle.
However, there's nobody who reads that scenario who doesn't get what Dave actually meant to communicate: That Anna is married, with children.
:-(
Hey! I'm not a nobody!!!
You're making a cultural assumption, not a linguistic one. It's particularly interesting because it's an old cultural assumption that isn't valid today. It might hold true in the Bible Belt or in Islamic countries, but in the free world, many women have children, yet are not married.
Dave might really mean "I'm prudish, and disapprove of unwed mothers. Don't date Anna, because she has two kids out of wedlock, and that's proof she's a bad person.", or maybe "Don't date Anna: I watched her break two guys hearts by having their kids and then kicking them out the door". We don't know what Dave means; and to infer that Anna is married isn't realistic.
It's a bitch, and not something computers are even near capable of.
By your very example, humans aren't capable of solving this task in a uniform way. I didn't assume Anna was married; I just assume Dave didn't want John to date her. [1]
And if we humans get things "wrong", there isn't really a right answer, is there? Computers will never be "capable" of solving this sort of problem until we pin down what our expectations really are. Until you say what "right" is, you can't blame the computer for constantly getting things "wrong".
--
AC
I have studied five university level semesters and have lived in Japan for a total of six months (where language has not been my primary focus). I read about 800 kanji and write about half that. I understand about 40% of native Japanese that is not directed at me, and about 90% of that which is. I have been on a number of dates entirely in Japanese, can give and receive directions and instructions over the phone, and generally understand the point of technical discussions in my own field.
Obviously, this is a long process but I would say 1/3 is a fair estimate.
Being born and raised in the Netherlands I had to learn 3 languages besides Dutch (English, German and French), with English being my preferred. I thought I was pretty good at it until I moved to Australia in 1984, only to discover that all the things they teach you in school (syntax etc.) is only HALF of what you need to truly MASTER the language, after 21 years I can (sometimes) manage to pass for a dinkum Aussie only because I seldom speak Dutch, married an ozzie girl and have ozzie kids.
You never catch me alive
I sure don't want to live here permanently or attempt to raise a family here. I fail to see where I have been "bragging", or why you have such a hostile attitude. Half the people on earth speak a second language. Virtually all of my coworkers can speak English as well or better than I can speak Japanese, which means I am loosing to all of them on that count. Hardly something to brag about, nor anywhere near my highest levels of accomplishment. Heck, there is an 18-year-old American living in my complex who speaks better Japanese than I do. Also, I am a bit baffled as to why spending 10 years in a country is something to brag about. Even people dumb as rocks have done that countless times before.
This is great! Now the Chinese can visit my site and join the rest of you in not understanding it. http://www.newpath4.com/ Which, if extrapolated, should induce so much RAW PAIN into Chinese Society they'll never drink the water here. Nor want our land. hehehehe They'll figure it's some new kind of radiation poisoning. So, since we needn't fear the RED CHINESE ARMY anymore, we should be able to safely scale back our military and use the money to help cure whatever it is my website does to people... (translated for the new software: sihT si taerg! woN eht esenihC nac tisiv ym etis dna nioj eht tser fo ouy ni ton gindnatsrednu ti. eheheheheheheh)
Nobody mentioned that both of these sentences are gibberish, if not completely incorrect. Time does not fly like an arrow. There is absolutely no point incomparing time with an arrow. The second one is even worse. It fails the subject/object agreement test. Fruit flies like bananas. At best fruit flies like a particular banana (fruit flies like the banana or that banana). It is not very hard to tell the difference once the sentences make sense.
The example of the parent was comparing a metaphoric use of "like" (as a preposition) to an active use of "like" (as an intransitive verb).
My point was to emphasize this difference, which is the key to unlocking the puzzle from an analytical standpoint. My own conclusion was that computers aren't capable of this nuance, because there is no contextual evidence that will support the computer's ability to determine usage based on metaphor versus usage based on concrete allusion.
You really need to take a grammar class. This has nothing to do with idiomatic expressions. The problem as emphasized by the parent correctly addresses the great problem in a computer's ability to create grammatical abstractions based solely on textual artifacts, as the posted article purports to do.
Really? Then why does TFA make not one single mention of language translation? TFA is all about grammar. Quoth the article: