Slashdot Mirror


New Algorithm for Learning Languages

An anonymous reader writes "U.S. and Israeli researchers have developed a method for enabling a computer program to scan text in any of a number of languages, including English and Chinese, and autonomously and without previous information infer the underlying rules of grammar. The rules can then be used to generate new and meaningful sentences. The method also works for such data as sheet music or protein sequences."

85 of 454 comments (clear)

  1. just thought.. by thegoogler · · Score: 3, Interesting
    what if this could be integrated into a small plugin for your browser(or any program) of choice, that would then generate its own dictionary in your language.

    would probably help with the problem of either downloading a small, incomplete dictionary, a dictionary with errors, or a massive dictionary file.

    1. Re:just thought.. by Bogtha · · Score: 4, Insightful

      This algorithm works with sample data. Where is the sample data going to come from? If you have to download it, then that negates the whole point of using it. If you use what you see online, well that's just rediculous, for obvious reasons :).

      --
      Bogtha Bogtha Bogtha
    2. Re:just thought.. by Hikaru79 · · Score: 2, Insightful

      This algorithm works with sample data. Where is the sample data going to come from? If you have to download it, then that negates the whole point of using it. If you use what you see online, well that's just rediculous, for obvious reasons :).

      It's going to come from large bodies of text that exist in mmultiple langueages. Things like the Bible, the constitution, etcetera. The whole point of this technology is that by drawing conclusions from those texts, the program infers the underlying rules of the language and can therefore translate other things. Google was doing something similar. An online dictionary is completely different. First, it has to be compiled by someone. Second, it only helps for translating words verbatim. This technology would self-teach itself to translate languages, even if none of the researchers working on the project could even speak those languages themselves. That's the beauty of it.

    3. Re:just thought.. by Mac+Degger · · Score: 5, Informative

      What they've develloped is something which interprets grammar; the ruleset behind the organisation of buildingblocks, apparently buildingblock agnostic.

      A dictionary is just words. This algorythm cant assign meaning to the buildingblocks, it can only dicide how and in what order the buildingblocks go together.

      --
      -- Waht? Tehr's a preveiw buottn?
    4. Re:just thought.. by jaavaaguru · · Score: 4, Interesting

      Perhaps it the algorithm could be used to identify spam more accurately. If it can understand the text, then it's got a reasonable chance of know if the text is junk.

    5. Re:just thought.. by psm321 · · Score: 2, Informative

      From what I understand, google's thing was using purely statistics (i.e. a matches with b all the time in translations so when you see a, translate it to b), while this one actually "understands" the underlying grammer.

  2. Sucks to be a support tech in India by HeLLFiRe1151 · · Score: 5, Funny

    Their jobs be outsourced to computers.

    --
    I've got 101 mod points and you can't have them!
    1. Re:Sucks to be a support tech in India by Anonymous Coward · · Score: 2, Funny

      Their jobs be outsourced to computers.

      They is?

    2. Re:Sucks to be a support tech in India by doxology · · Score: 2, Funny

      Still a few bugs in this grammar interpretation software ;-)

      --
      sigfault. core dumped.
  3. Didn't Google already do this? by powerline22 · · Score: 5, Interesting

    Google apparently has a system like this in their labs, and entered it into some national competetion, where it pwned everyone else. Apparently, the system learned how to translate to/from chinese extremely well, without any of the people working on the project knowing the language.

    1. Re:Didn't Google already do this? by Anonymous Coward · · Score: 2, Informative

      Here's the link at Google's blog.

    2. Re:Didn't Google already do this? by spisska · · Score: 5, Interesting

      IIRC, Google's translator works from a source of documents from the UN. By cross referencing the same set of documetents in all kinds of different languages, it is able to do a pretty solid translation built on the work of goodness knows how many professional translators.

      What is a little more confusing to me is how machine translation can deal with finer points in language, like different words in a target language where the source language has only one. English for example has the word "to know" but many languages use different words depending on whether it is a thing or a person that is known. Or words that relate to the same physical object but carry very different cultural connotations -- the word for female dog is not derogatory in every language, for example, but some other animals can be extremely profane depending on who you talk to.

      Or situations where two entirely different real-world concepts mean similar things in their respective language -- in English, for example, you're up shit creek, but in Slavic languages you're in the pussy.

      I've done translation work before (Slovak -> English), and there's much more going on than differences in words and grammar. There are whole conceptual frameworks in languages that just don't translate, and this is frustrating for anyone learning a language, let alone trying to translate. English is very precise (when used as directed) in matters of time and sequence -- we have more than 20 verb tenses where most languages get away with three.

      Consider this:

      I was having breakfast when my sister, whom I hadn't seen in five years, called and asked if I was going to the county fair this weekend. I told her I wasn't because I'm having the painters come on Saturday. They'll have finished by 5:00, I told her, so we can get together afterwords.

      These three sentences use six different tenses: past continuous, past perfect, past simple, present continuous, future perfect, and present simple, and are further complicated by the fact that you have past tenses refering to the future, present tenses refering to the future, and the wonderful future perfect tense that refers to something that will be in the past from an arbitrary future perspective, but which hasn't actually happened yet. Still following?

      On the other hand, English is much less precise in things like prepositions and objects, and utterly inexplicable when it comes to things like articles, phrasal verbs, and required word order -- try explaining why:

      I'll pick you up after work

      I'll pick the kids up after work

      I'll pick up the kids after work

      are all OK, but

      I'll pick up you after work

      is not.

      Machine translation will be a wonderful thing for a lot of reasons, but because of these kinds of differences in languages, it will be limited to certain types of writing. You may be able to get a computer to translate the words of Shakespeare, but a rose, by whatever name, is not equally sweet in every language.
    3. Re:Didn't Google already do this? by AJWM · · Score: 2, Insightful

      but

      I'll pick up you after work

      is not.


      It can be, depending on context or emphasis. "I'll pick up the kids after lunch. I'll pick up you after work."

      --
      -- Alastair
  4. SCIgen by OverlordQ · · Score: 5, Interesting

    SCIgen anyone?

    --
    Your hair look like poop, Bob! - Wanker.
  5. PDF of paper by mattjb0010 · · Score: 5, Informative

    Paper here for those who have PNAS access.

    1. Re:PDF of paper by ksw2 · · Score: 5, Funny
      Paper here for those who have PNAS access.

      HEH! funniest meant-to-be-serious acronym ever.

    2. Re:PDF of paper by downbad · · Score: 2, Informative

      The project also has a website where you can download crippled implementations of the algorithm for Linux and Cygwin.

  6. Woah by SpartanVII · · Score: 4, Funny

    Imagine if the editors started using this, what would everyone have to bitch about on Slashdot?

    1. Re:Woah by Anonymous Coward · · Score: 2, Funny

      dupes...

  7. Noam Chomsky by MasterOfUniverse · · Score: 2, Informative

    This is a perfect apportunity to remind that its Chomsky's contribution to Linuguistics which enabled this amazing (if true) achievement. For those of you don't know Chomsky, he is the father of modern linguistics. Many would also know him as a political activist. Very amazing character. http://www.sk.com.br/sk-chom.html

    --
    "There is no flag large enough to cover the shame of killing innocent people."--Howard Zinn
    1. Re:Noam Chomsky by venicebeach · · Score: 5, Insightful

      Perhaps a linguist could weigh in on this, but it seems to me that this kind of research is quite contrary to the Chomskian view of linguistics.

      Instead of a language module with specialized abilities tuned to learn rule-based grammar, we have an an unsupervised learning system has surmised the grammar of the language merely from the patterns inherent in the data it is given. That a system can do this is evidence against the notion that an innate grammar module in the brain is necessary for language.

    2. Re:Noam Chomsky by hunterx11 · · Score: 4, Insightful
      Linguistics has nothing to do with prescriptive grammar, except perhaps studying what influence it has on language. Something like "don't split infinitives" is not a rule in linguistics. Something like "size descriptors come before color descriptors in English" is a rule, because it's how people actually speak. Incidentally, most people are not even aware of these rules in their native language, despite obviously having mastery over them.

      If there were no rules, I could write a post using random letters for random sounds in a random order, or just using a bunch of non-letters. That wouldn't convey anything. Saying "I'm writing on slashdot" is more effective than writing "(*&$@(&^$)(#*$&"

      --
      English is easier said than done.
    3. Re:Noam Chomsky by SparksMcGee · · Score: 4, Insightful
      I took a linguistics class this previous year with a professor who absolutely disagreed with the Chomskyan view of linguistics (though she did acknowledge that he had contributed a great deal to the field). Some of the arguments against Chomsky include objections to the Chomskyan view of "universal grammar"--that essentially a series of nerual "switches" determine what language a person knows and that these in turn are purely grammatical in nature (the lexicon of different languages qualifying as "superficial"--in and of itself a somewhat tenable argument). While this holds reasonably well for English and closely related languages (English grammar in particular depends a tremendous amount upon word order and syntax, and thus lends itself well to this sort of computational model), in many languages the lines between nominally "superficial" categories--e.g. phonology, lexicon and syntax--become blurred, especially in, for instance, case languages. Whereas you can break down the grammatical elements of an English sentence fairly easily into "verb phrases" "noun phrases" and so on, this is largely because of English syntactical conventions. When a system of prefixes and suffixes can turn a base morpheme from a noun phrase to a verb phrase or any of various parts of speech, the kind of categories to which English morphemes and phrases lend themselves become much harder to apply. Add to this the fact that there exist languages (e.g. Chinese) in which grammatically superficial categories (in English) like phonology become syntactically and grammatically significant, and the sheer variety of lingiustic grammars either seriously undermines the theory in general or forces upon one the Socratic assumption that everyone knows every language and every possible grammar from birth and simply need to be exposed to the rules of whatever their native language is and to pickup superficialities like lexicon to become a fluent speaker. It's not all complete nonsense, but if it were truly correct then presumably computerized translation software (with the aid of large dictionary files for lexicons) would have been perfected some time ago).


      Sorry about the rant, but like I said, my prof did *not* like the Chomskyan view of linguistics.

      Oh, and as far as the notion of the "language module" goes, it might be premature to call it a module, but there *is* neurophysiological evidence to suggest that humans are physically predisposed towards learning language from birth, so that much at the very least is tenable.

    4. Re:Noam Chomsky by pjpII · · Score: 2

      Of more interest might be that it actually probably disproves Chomsky's theories of language acquisition, which rest of a basis of prior/innate facility for language acquisition which is based on prior knowledge of some sort(i.e. Universal Grammar, his most famous contribution to linguistics) in the brains of language learners, while this program works with no prior knowledge and only a statistical framework.

      So Chomsky might not be too happy, as this program could potentially disprove his life's work.

    5. Re:Noam Chomsky by stephentyrone · · Score: 2, Interesting

      Actually, this fits very tidily in a Chomskian context. The program has an internal, predetermined notion of "what a grammar looks like" (i.e. a class of allowable grammars sharing certain properties), and adapts that to the source text. The way all this is presented makes it seem like unsupervised learning that can find any pattern, but the best you can hope to do with a method like this is capture an arbitrary (possibly probabilistic) context free grammar (CFG).

      Even then, Gold showed a long, long time ago (1967) that the task of inducing an arbitrary CFG using only generated strings from the language is basically hopeless [Gold, E. Mark. 1967. Language Identification in the Limit. Information and Control, 10:447-474].

      That said, this doesn't even seem to be that novel (to me). Andreas Stolcke wrote a very nice PhD dissertation in 1994 on learning arbitrary PCFGs from langage strings [Stolcke, Andreas. 1994. Bayesian Learning of Probabilistic Language Models. PhD Dissertation. University of California at Berkeley.]

      This is probably a better, more efficient method that Stolcke produced back in '94, but I would be *very* surprised if it revolutionized the way computers interact with language, or anything else of the sort. People working in computational linguistics have a nasty habit of making grand pronouncements, only to fall far short of what they claimed.

      For the record: IANAL, but i play one on TV, by which i mean i'm an applied mathematician with a couple published papers in computational linguistics.

    6. Re:Noam Chomsky by edibleplastic · · Score: 2, Interesting

      This won't disprove Chomsky's theories, at most it will serve as evidence that language can be learned through statistical means. The reason it won't disprove anything is because we're ultimately interested in the way that *humans* learn language. Whether or not it's possible to learn a language solely through statistical means doesn't change the fact of the matter for humans, which may or may not have a genetic endowment for learning language. It's entirely possible that it's possible in principle to learn language this way, but we do it with some priors (the universal grammar).

      There have been basically two prongs of arguments in favor of the existence of a Universal Grammar in the debate. The first is that the task of learning an infinite grammar from a finite subset of sentences (and then only from positive evidence) appears to be too difficult to accomplish solely through statistical means. The second is an effort to show that language learning is biologically- rather than experience-based. This is the effort to show that there is a critical period in language development, which would suggest that there is a strong biological (i.e., genetic) component to langauge learning.

      In my opinion, the first prong isn't very strong, since it relies on assumptions about statistical learning to make its claims. Their claims to me seem to stem more from a lack of imagination than from anything we can pin down as logically necessary. Shimon Edelman's work would work against this prong, showing that yes, it is possible to learn a language via statistcal means. (It would still have to be shown that the knowledge the computer possesses is qualitatively similar to that learned by humans... it may learn languages in a completely different way).

      His findings wouldn't affect the second prong at all, though, which to my mind is the stronger of the two approaches. There have been lots of studies which suggest that there is a biological timecourse for language acquisition, suggesting that we do have an innate capacity for it.

      So to sum up, while I find it a very exciting and important finding, I don't believe it by itself will disprove the theory of Universal Grammar.

  8. Speaking as someone working on NLP by OO7david · · Score: 4, Interesting

    IAALinguist doing computational things and my BA focused mainly on syntax and language acquisition, so here're my thoughts on the matter.

    It's not going to be right. The algorithm is stated as being statistically based which while is similar to the way children learn languages is not exactly it. Children learn by hearing correct native languages from their parents, teachers, friends, etc. The statistics come in when children produce utterances that either do not conform to speech they hear or when people correct them. However, statistics does not come in at all with what they hear.

    With respect to the learning of the algorithm the underlying grammar of a language, I am dubious enough to call it a grand, untrue claim. Basically all modern views of syntax are unscientific and we're not going to get anywhere until Chompsky dies. Think about the word "do" in english. No view of syntax describes from where that comes. Rather languages are shoehorned into our constructs.

    So, either they're using a flawed view of syntax or they have a new view of syntax and for some reason aren't releasing it in any linguistics journal as far as I know.

    1. Re:Speaking as someone working on NLP by tepples · · Score: 2, Insightful

      However, statistics does not come in at all with what they hear.

      Utterance in pattern A is heard more often than utterance in pattern B; utterances in patterns C and D are not heard at all. How is that not statistics?

    2. Re:Speaking as someone working on NLP by OO7david · · Score: 2, Interesting

      Insofar as only utterance A is heard. A kid will always hear "Are you hungry" but never "Am you hungry" or "Are he hungry".

      Native speakers by definition speak correctly, and that is all the child is hearing.

    3. Re:Speaking as someone working on NLP by Comatose51 · · Score: 2, Informative
      Basically all modern views of syntax are unscientific and we're not going to get anywhere until Chompsky dies.

      I really don't understand that. How are modern views of syntax unscientific? Also, if Chomsky is such an influence on linguistics, then maybe he's right about it. Aren't you essentially saying that we have no way of arguing with him so let's wait til he dies so he can't argue back? I would think the correct view should win out regardless of the speaker.

      Other than what I've studied in cognitive science, I am not in any way or form a linguist. However, what you say really confuses me and contradicts what I've learned. I can only assume that what you say make sense because of your deeper knowledge. So can you please explain what you mean for the rest of us?

      Thanks.

      --
      EvilCON - Made Famous by /.
    4. Re:Speaking as someone working on NLP by OO7david · · Score: 4, Interesting

      It is in effect two parted:

      Chomsky is to linguistics as Freud to psych. He had great ideas for the time (many still stand), and the science would be nowhere close to where it is without him. However, A) he's backed off alot of supporting his own theories and B) he's published papers contradicting his original ideas so that is some question there for their veracity. Since so many linguistics undergrads hold him as the pinnical of syntax none are really deviating drastically from him.

      WRT the unscientificness, to make his view fit English, there has to be "do-support" which basically is that when forming an interrogative "do" just comes in to make things work without any explanation. In other words, it is in our grammar, but our view of syntax does not account for it.

    5. Re:Speaking as someone working on NLP by lawpoop · · Score: 2, Informative
      IIRC, the part of Chomsky's theory that is relevant to this application is that universal grammar is a series of choices about grammar -- i.e. adjectives either come before or after nouns, there are or are not postpositions, etc. I think the actual 'choices' are more obscure, but I'm trying to make this understandable ;)

      According to the theory, children come with this universal grammar built-in to their mind (for some reason, Chomsky seems against genetic arguments, but good luck understanding his reasoning), and they only need to hear just a little bit of language in order to throw the choose the proper alternatives in the mind, and start building grammatically correct sentences -- the rest is just building vocabulary. What seems like a child learning language is actually the language part of the brain growing during development. I believe that these choices are called 'switches' by Chomsky.

      (An easy argument for universal grammar is that children make mistakes that are more rule-following than the accepted grammar -- words such as 'breaked', 'speaked', 'foots' or 'mouses' are in a sense rule-based corrections of exceptions in the spoken language. So the children follow the rules more closely than the adults -- they certainly didn't learn them from adults, so the must be applying the rules in their minds.)

      Anywho, to make a program like this, you would just have to put together the switches of universal grammar and then feed in sample data -- probably text spelled with those linguistic homophonic characters, instead of the horrendous English spelling non-system. Chinese characters might be a better sample data in that respect; I don't know.

      Note that, contrary to other posters, this would not be a system for building grammatical rules for any 'language' or formal system, such as C++ or xml. It is based on universal grammar, which is a set of option for constructing a human language. So there are built in assumptions that there will be subject, verbs, objects, indirect objects, etc. in the language that this program is decoding.

      --
      Computers are useless. They can only give you answers.
      -- Pablo Picasso
    6. Re:Speaking as someone working on NLP by PurpleBob · · Score: 4, Interesting

      You're right about Chomsky holding back linguistics. (There are all kinds of counterarguments against his Universal Grammar, but people defend it because Chomsky Is Always Right, and Chomsky himself defends it with vitriolic, circular arguments that sound alarmingly like he believes in intelligent design.)

      And I agree that this algorithm doesn't seem that it would be entirely successful in learning grammar. But this is not because it's statistical. I don't understand how you can look at something as complicated as the human brain and say "statistics does not come in at all".

      If this algorithm worked, then it could be statistical, symbolic, Chomskyan, or magic voodoo and I wouldn't care. There's no reason that computers have to do things the same way the brain does, and I doubt they'll have enough computational power to do so for a long time anyway.

      No, the flaws in this algorithm are that it is greedy (so a grammar rule it discovers can never be falsified by new evidence), and it seems not to discover recursive rules, which are a critical part of grammar. Perhaps it's learning a better approximation to a grammar than we've seen before, but it's not really doing the amazing, adaptive, recursive thing we call language.

      --
      Win dain a lotica, en vai tu ri silota
  9. Wow! by the_skywise · · Score: 4, Funny

    They've rediscovered the Eliza program!

    Input: "For example, the sentences I would like to book a first-class flight to Chicago, I want to book a first-class flight to Boston and Book a first-class flight for me, please may give rise to the pattern book a first-class flight -- if this candidate pattern passes the novel statistical significance test that is the core of the algorithm."

    How does it feel to "book a first-class flight"?

  10. Grammar depends on the input by Tsaac · · Score: 3, Interesting

    If fed with a heap of decent grammar, what happens when it's fed with bad grammar and spelling? Will it learn, and incorporate, the tripe or reject it? That's the sort of problem with natural language apps, it's quite hard to sort the good from the bad when it's learning. Take the megahal library http://megahal.alioth.debian.org/> for example. Although possibly not as complex, it does a decent job at learning, but when fed with rubbish it will output rubbish. I don't think it's the learning that will be that hard part, but rather the recognition of the good vs. the bad that will prove how good the system is.

    --
    eXemplary Abstract
    1. Re:Grammar depends on the input by jim_v2000 · · Score: 2, Interesting

      The problem with this program is that you could input the most gramatically correct sentences you can into it, and it'll still spew out senseless garbage. For this to be of any worth, the computer will need to understand the meaning each word, and how each meaning relates to what the other words in the sentence mean. And you can't program it into a computer what something is just by putting words into it. Like if I tell the machine that mice squeak, it has to know what a squeak sounds like and what a mouse is. How do you define a mouse to a computer? A small fuzzy rodent. Well, how do you define fuzzy? Or small? Or a rodent? You have to keep using more and more words...and still the computer will have no idea what you're talking about, other than just mroe word relationships.

      I guess the missing thing is that a human can evision the meaning of the words as a concept or image, while the computer simply sees the words as, well, just words (or binary to specific).

      --
      Don't take life so seriously. No one makes it out alive.
    2. Re:Grammar depends on the input by proteonic · · Score: 2, Informative

      If you take young children and expose them to rubbish for four or five years while they're learning to speak, they'll speak rubbish too. That's the problem with young children, they can't sort the good from the bad.

      But if you expose them to well strucutred language, they'll learn to speak it, without being EXPLICITLY TAUGHT THE RULES. Which is exactly what this paper is about. Unsupervised natural language learning. That's what makes the system good. It's able to build equivalency classes of verbs, nouns, adjectives, etc, with relatively few examples. The paper gives an example of the algorithm using 8 sentences to trian and be able to produce over 500 new, sensible sentences. Even a 4th order Markov chain can't do that (Megahal). The algorithm is really quite impressive.

      Your comment begs the question.. why would you train a system on garbage? Finding good quality written language is a non issue. Train it on good data and it'll probably do as well as a Markov model for distinguishing good vs bad language.

  11. Finally some progress by drjimmy42 · · Score: 2, Funny

    I know we all feel like we've been screwed by the conspicuous lack of flying cars around these days, but at least some progress is being made on the Universal Translator front...

    --
    If you're not part of the solution, you are part of the precipitate
    1. Re:Finally some progress by TastyWheat · · Score: 2, Insightful

      I'm starting to get the feeling that there nothing in sci fi that won't occur in reality. Except for the dorky guy getting to nail the hot busty alien babe that is. heh.

  12. Re:Finally! by HTTP+Error+403+403.9 · · Score: 2, Funny

    Using this software, I can finally win the 'Summarize Proust Competition'!

    --
    I'm not a Troll, it's reverse psychology.
  13. Hieroglyphics? by Hamster+Of+Death · · Score: 2, Interesting

    Can it decipher these things too?

  14. Re:Isn't This the Universal Translator Idea by biryokumaru · · Score: 2, Interesting
    In Star Trek 4, the universal translator was little help when the humpback whale armada arrived... No, seriously, that was one f**ked up movie.

    But for this, I have one word: Dolphins.

    --
    When you're afraid to download music illegally in your own home, then the terrorists have won!
  15. Dupe by fsterman · · Score: 2, Informative

    We just had an article on this. There was a shootout by NIST. At least I think, /. search engine blows, hard. Either way, here a link to the tests. This is one that wasn't covered by the tests, so I guess its front page news.

    --
    Is there anything better than clicking through Microsoft ads on Slashdot?
  16. Re:Let's see what it thinks of this by One+Div+Zero · · Score: 2, Insightful

    That's just a Markov Model that "learned" from what looks religious mumbo jumbo in the first place.

    Markov models are perhaps the easiest language acquisition model to implement, but also one of the worst at coming up with valid speech or text.

    Interestingly, they do much, much better as recommender systems.

  17. Markov Chains anyone? by ImaLamer · · Score: 5, Informative

    http://en.wikipedia.org/wiki/Markov_chain

    Used this (easy to compile) C program:

    http://www.eblong.com/zarf/markov/

    to create these:

    http://www.mintruth.com/mirror/texts/

    Mod points to whomever can tell us what texts they use. (No mod points can actually be given)

  18. Full article for non-PNAS subscribers by dmaduram · · Score: 4, Informative

    Unsupervised learning of natural languages

    Zach Solan, David Horn, Eytan Ruppin and Shimon Edelman
    School of Physics and Astronomy and School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel; and Department of Psychology, Cornell University, Ithaca, NY 14853

    We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The ADIOS (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.

    Many types of sequential symbolic data possess structure that is (i) hierarchical and (ii) context-sensitive. Natural-language text and transcribed speech are prime examples of such data: a corpus of language consists of sentences defined over a finite lexicon of symbols such as words. Linguists traditionally analyze the sentences into recursively structured phrasal constituents (1); at the same time, a distributional analysis of partially aligned sentential contexts (2) reveals in the lexicon clusters that are said to correspond to various syntactic categories (such as nouns or verbs). Such structure, however, is not limited to the natural languages; recurring motifs are found, on a level of description that is common to all life on earth, in the base sequences of DNA that constitute the genome. We introduce an unsupervised algorithm that discovers hierarchical structure in any sequence data, on the basis of the minimal assumption that the corpus at hand contains partially overlapping strings at multiple levels of organization. In the linguistic domain, our algorithm has been successfully tested both on artificial-grammar output and on natural-language corpora such as ATIS (3), CHILDES (4), and the Bible (5). In bioinformatics, the algorithm has been shown to extract from protein sequences syntactic structures that are highly correlated with the functional properties of these proteins.

    The ADIOS Algorithm for Grammar-Like Rule Induction

    In a machine learning paradigm for grammar induction, a teacher produces a sequence of strings generated by a grammar G0, and a learner uses the resulting corpus to construct a grammar G, aiming to approximate G0 in some sense (6). Recent evidence suggests that natural language acquisition involves both statistical computation (e.g., in speech segmentation) and rule-like algebraic processes (e.g., in structured generalization) (7-11). Modern computational approaches to grammar induction integrate statistical and rule-based methods (12, 13). Statistical information that can be learned along with the rules may be Markov (14) or variable-order Markov (15) structure for finite state (16) grammars, in which case the EM algorithm can be used to maximize the likelihood of the observed data. Likewise, stochastic annotation for context-free grammars (CFGs) can be learned by using methods such as the Inside-Outside algorithm (14, 17).

    We have developed a method that, like some of those just mentioned, combines statistics and rules: our algorithm, ADIOS (for automatic distillation of structure) uses statistical information present in raw sequential data to identify significant segments and to distill rule-like regularities that support structured generalization. Unlike

  19. Programming Language by jmlsteele · · Score: 2, Interesting

    How long until we see something like this applied to ?

  20. No the didn't by Ogemaniac · · Score: 5, Interesting

    I played around with the Google translator for a while. I work in Japan and am half-way fluent. Google couldn't even turn my most basic Japanese emails into comprehensible English. Same is true for the other translation programs I have seen.

    I will believe this new program when I see it.

    Translation, especially from extremely different languages, is absurdly difficult. For example, I was out with a Japanese woman the other night, and she said "aitakatta". Literally translated, this means "wanted to meet". Translated into native English, it means "I really wanted to see you tonight". It is going to take one hell of a computer program to figure that out from statistical BS. I barely could with my enormous meat-computer and a whole lot of knowledge of the language.

    1. Re:No the didn't by lawpoop · · Score: 2, Interesting
      The example you are suing is from conversation, which containts a lot of mutually shared assumptions and information. Take this example from Stephen Pinker:

      "I'm leaving you."

      "Who is she?"

      However, in written text, where the author can assume that the reader brings no shared assumptions, nor can the author rely on any deefback, 'speakers' usually do a good job of including all necessary information in one way or another -- especially in texts meant to convince or promote a particular viewpoint. I'll bet these kinds of texts are more easily translatable than conversation.

      --
      Computers are useless. They can only give you answers.
      -- Pablo Picasso
    2. Re:No the didn't by superpulpsicle · · Score: 4, Informative

      Try this free website out. http://www.freetranslation.com/

      I know it is fairly accurate because I have fooled my spanish speaking friends once in an IM conversation. I told them I learned spanish via hypnosis and basically just copy/pasted everything spanish into IM. The conversation went on for like 15 minutes full spanish before I told them I was using the website. They were pissing their pants.

    3. Re:No the didn't by a.different.perspect · · Score: 2, Interesting

      Or was it "chinko wo nametakatta"? It's just as easy for me to believe, you hot Slashdot nerd, you.
       
      Being more serious, how do you think humans learn the rudiments of language? It's pattern analysis, i.e. precisely the technique this algorithm tries to replicate. It is true that the algorithm won't then progress onto the next stage, which is using that rudimentary grasp of the language to be taught its finer points, but if you genuinely doubt the capacity of this method to produce an understanding of language you are contesting the experiences of every human on the planet.
       
      Returning to your example, "I really wanted to see you tonight" is what you discerned that sentence meant from its context. You can hardly expect a machine translator to know that it was a woman you were out with at night who said it (which seems to be the basis for your insertion of "tonight", "really" and "you"); fortunately, this algorithm is intended to translate written, not spoken, language. Since writing would have to include that detail (in order to be independent of its context), the problem you identified is not even relevant.

    4. Re:No the didn't by burns210 · · Score: 3, Interesting

      There was a program that tried to use the language of Esperanto (a made-up language designed specifically to be very consistent and guessable with regards to how syntax and words are used, very easy to learn and understand quickly) to be a middleman for translation.

      The idea being that you take any input language, Japanese for instance, and get a working Jap Esperanto translator. Being as Esperanto is so consistent and reliable in how it is designed, it should be easier to do than a straight Jap Eng translator.

      To finish, you write a Esperanto English translator. By leveraging the consistent language of Esperanto, researchers thought they could write a true universal translator of sorts.

      Don't know what ever came of it, but it was an interesting idea.

    5. Re:No the didn't by krunk4ever · · Score: 2, Interesting

      Being more serious, how do you think humans learn the rudiments of language? It's pattern analysis, i.e. precisely the technique this algorithm tries to replicate. It is true that the algorithm won't then progress onto the next stage, which is using that rudimentary grasp of the language to be taught its finer points, but if you genuinely doubt the capacity of this method to produce an understanding of language you are contesting the experiences of every human on the planet.

      there's one flaw in your analysis is that humans learn language/grammar faster when their young and it becomes a lot harder when they get older. There's many different speculations on why that happens from children starting from a clean slate to children learn languages better as their brain develops. I mean pattern analysis would definitely be an advantage for grown ups, no? Why are children's pattern analysis better in this case if what you saying is true.

      From what I've seen, to actually learn grammar and a foreign language, there's 2 requirements. One is you must have a passion for it. 2nd is that you must be constantly practicing. I've noticed if you attend classes but never use it in your real life, you'll never learn it. Find a group of people who are also learning and try communicating only with that language and you'll see how much faster you'll pick up. It also helps to have a friend who's fluent in the language to correct you (though it might not be that good for your pride). What I've noticed is that grammar nazis are the best for learning a new grammar. They pick on EVERY SINGLE MISTAKE YOU MAKE, so you'd think twice before making the same mistake again.

      At college, I've actually seen flyers asking for help in english and in return they'll help you with the language they're fluent in, be in french, german, chinese, japanese, etc. So those people would meet maybe 3x a week and spend an hour in each language each time, which I thought was a really neat idea. Here you're helping a foreigner with english and there they are helping you with a foreign language you want to learn.

    6. Re:No the didn't by Greedo · · Score: 2, Insightful

      You over-estimate some speakers, me-thinks.

      --
      Tuus crepidae innexilis sunt.
  21. Finaly by Trigulus · · Score: 2, Interesting

    something that can make sense of the voynich manuscript http://www.voynich.nu/. They should have tested their system on it.

    --
    If something exists that does not need a creator (god) then why must the cosmos need one?
    1. Re:Finaly by HishamMuhammad · · Score: 2, Insightful

      Just because the program can extract grammar, it doesn't mean it can extract meaning. If I give you this sentence:

      Ov brug termat akti mak lejna trovterna.

      And tell you that "termat" and "lejna" are nouns, "akti mak" is a 'composite' verb, "brug" and "trovterna" are adjectives... it still doesn't say anything about the actual meaning.

  22. Universal Translator? by mwilli · · Score: 2, Interesting
    Could this be integrated into a handheld device to be used as a universal translater much like a hearing aid?

    Electronic babelfish anyone?

    --
    My sig beat up your sig.
  23. Run it on the bible and get... by Trigulus · · Score: 2, Funny

    God loves you. God will burn you in hell for all eternity. God wants more foreskins.

    --
    If something exists that does not need a creator (god) then why must the cosmos need one?
  24. How "intricate"? by P0ldy · · Score: 2, Insightful
    Our experiments show that it can acquire intricate structures from raw data, including transcripts of parents' speech directed at 2- or 3-year-olds. This may eventually help researchers understand how children, who learn language in a similar item-by-item fashion and with very little supervision, eventually master the full complexities of their native tongue."

    In addition to child-directed language, the algorithm has been tested on the full text of the Bible in several languages
    I hardly would consider transcripts of parents' speech directed at 2- or 3-year-olds "intricate". And while the algorithm may have "been tested on the full text of the Bible", it doesn't say with what percentage of accuracy or what translation. King James version or the Teen Magazine Bible?

    And the "rules" of a language are NOT what children "learn". First of all, children acquire a language, they do not "learn" it. That is a large attribute to the child's ability to speak it--not whether or not they understand gerunds and the pluperfect.

    Second, in a language such as English whose words for the most part lack any necessity to the order in which they're placed to understand they're meaning and, even worse, lack declension forms to distinguish subject from object of the preposition, with what success can a language recognition program have "learning" such a language when prepositions themselves mainly can be omitted? To teach a computer Latin is easy.

    Third, what's the hope of the computer ever understanding something like Shakespeare, Joyce, or Dante, whose uses of language rely extensively on erudition for word placement as opposed to typical usage? While a computer might be able to learn Latin because of its rigourous rules, I doubt it could faithfully render a text from Ovid.
  25. Better link for PDF by Anonymous Coward · · Score: 2, Informative

    PNAS wants you to subscribe to download the PDF.

    Or you could just go to the authors' page and download it for free: http://www.cs.tau.ac.il/~ruppin/pnas_adios.pdf

  26. This is not new for protein sequence functionality by t35t0r · · Score: 2, Informative

    In analyzing proteins, for example, the algorithm was able to extract from amino acid sequences patterns that were highly correlated with the functional properties of the proteins.

    NCBI BlastP already does this for proteins. Similarities and rules for things can be found but if the meaning of the sequence is not known then what good is it? In the end you need to do experiments involving biology/biochemistry/structural biology to determine the function of a protein or nucleotide sequence. Furthermore in language as well as in biology/chemistry things which have similar vocabulary (chemical formula) may in the end be structurally very different (enantiomers), which leads to vastly different functionality.

  27. Dolphins? by Stripsurge · · Score: 2, Interesting

    Seems like that'd be a good place to test the system out. While talking with extraterestrials would be pretty awesome, having a chat with a dolphin would be pretty cool too. Remember: "The second most intelligent [species] were of course dolphins"

  28. I'll be impressed when it can by 2Bits · · Score: 3, Funny

    - translate some posts on /. into comprehensible contents
    - figure out it is a dupe and kill it before it even appears
    - RTFA for me and just give me a good summary (by the rate of articles posted here, there's probably not much to summarize either)
    - translate "IANAL" into something else that does not make me think of ANAL thing
    - figure that articles on Google and Apple are just speculations by some dude living in his (can't be her, for sure) parent's basement, and not really news worth posting
    - translate my suggestions into something acceptable to the (kernel) hackers that good hygiene is a good thing
    - understand that I'm just ranting, and it should not take it personal.

  29. Give it a real challenge by pugugly · · Score: 3, Interesting

    Feed it the entries in the "obfuscated C" competition - if it works for that, it oughta work for anything.

    Pug

    --
    An Invisible Entity of Vast Power whose existence must be taken on faith alone: Liberal Media
  30. Finally! by Druox · · Score: 2, Funny

    Finally! Engrish for the masses!

    --
    ~ slashdot.org - Where some of the world's greatest minds come together to scrutinize grammar.
  31. I don't think we disagree much by Ogemaniac · · Score: 2, Insightful

    Yes, pattern recognition is a major part of the process. However, there are other fundamental parts that are also extremely important, and lacking them you get nonsense. In particular, context matters. "aitakatta" in the middle of a business letter probably does mean "wanted to meet". By itself, said by one member of a couple to the other over drinks at a bar, it does not.

    In order for a program to translating to translate accurately, it needs to know who is speaking/writing, who is the audience, what their relationship is, and their location. Some of this may be given to the computer explicitly, or easily found in the text/speech (for a human at least) but some of it may not. This is not going to be an easy problem to solve.

    Writing is never free from its context. I know before I even start whether I am reading a fiction novel, a satire, a scientific journal, an email from my boss, or a text message from my date this Saturday. The meaning of the words can change a lot in those cases.

    Even Google translator, which was trained on multi-lingual UN reports, could not produce comprehensible English from simple Japanese business emails.

    As for my chinko, that's a long story.

  32. Spam filter? by goMac2500 · · Score: 2, Interesting

    Could this be used to make a smarter spam filter?

  33. It's actually a new language study by Sycraft-fu · · Score: 3, Insightful

    Called Pragmatics. It can be somewhat oversimplified as saying it's the study of how context affects meaning or as figuring out what we really mean, as opposed to what we say.

    For example, a classical Pragmatics scenario:

    John is interested in a co worker Anna, but is shy and doesn't want to ask her out if she's taken. He asks his friend Dave if he knows if Anna is available to which Dave replies "Anna has two kids."

    Now, taken literally, Dave did not answer John's question. What he literally said is that Anna has at least two children, and presumably exactly two children. That says nothing of her avalibility for dating. However, there's nobody who reads that scenario who doesn't get what Dave actually meant to communicate: That Anna is married, with children.

    So that's a major problem computers hit when trying to really understand natural language. You can write a set of rules that comletely describes all the syntax and grammar. However that doesn't do it, that doesn't get you to meaning, because meaning occurs at a higher level than that. Even when we are speaking literally and directly, there's still a whole lot of context that comes in to play. Since we are quite often at least speaking partially indirectly, it gets to be a real mess.

    Your example is a great one of just how bad it gets between languages. The literal meaning in Japanese was not the same as the intended meaning. So first you need to decode that, however even if you know that, a literal translation of the intended meaning may not come out right in another language. To really translate well you need to be able to decode the intended meaning of a literal phrase, translate that into an approprate meaning in the other language, and then encode that in a phrase that conveys that intended meaning accurately, and in the appropriate way.

    It's a bitch, and not something computers are even near capable of.

    1. Re:It's actually a new language study by NichG · · Score: 2, Insightful

      I'd say this is the first step to it though. Lets forget about natural language for a second and look at computer algebra systems, proof generators, etc. How is the inference that you talk about any different than a computerized proof system proving something based on bits of information it has stored away? I think it's pretty similar really, except for the part about knowing what thing you want to prove/confirm.

      So how does that sort of thing work? Well, in mathematics you can have something like y=f(x) and substitute f(x) whenever you see y or vice versa. You also know various other rules that are of the same form, e.g. a(b+c) = ab+ac. Then, you can brute-force trying different combinations (or be smart about it and modularize some set of translations to create a new compound rule which is true, e.g. a lemma).

      It may not be so easy in languages, but there are transformations you can apply to sentences. For instance, you can do some rearrangements like:

      A is under B Under B, there is A.

      And there are ways that these relations (spatial relations especially) distribute:

      A is in B, B is under C -> A is under C.

      So to understand 'Anna has two kids' you have to know: 1. That you want to evaluate the truth/falseness of 'is Anna available to go out' and 2. Various pieces of social information about 'going out', people who are married, people who have kids, etc.

      If you have 2 you should be able to use a method in the same vein as a computer algebra system to determine how what was just said applies to your question.

    2. Re:It's actually a new language study by ANeufeld · · Score: 2, Insightful
      However, there's nobody who reads that scenario who doesn't get what Dave actually meant to communicate: That Anna is married, with children.

      Funny, I read that answer as a "yes, she's available," but added additional information: don't ask her out unless you are willing to accept the entire package.

      In a different language, I could still see a literal translation of the question and answer as communicating the same information. The "higher level meaning" is not embedded in the words or language. The exchange, "available?" "kids." does not mean "not available," but is more of a trinary response.

  34. grammar isn't enough by JoeBuck · · Score: 4, Informative
    The classic problem example is:
    • Time flies like an arrow.
    • Fruit flies like a banana.
    There are other, similar examples. Computer systems tend to deduce either that there's a type of insect called "time flies", or that the latter sentence refers to the aerodynamic properties of fruit.
    1. Re:grammar isn't enough by g2devi · · Score: 3, Interesting

      Even better. The meaning of words can flip back and forth depending on the ever widening context.

      * The clown threw a ball.

      (Probably, a tennis or basket ball)

      * The clown threw a ball,....for charity.

      (Okay, sorry, a ball a party.)

      * The clown threw a ball,....for charity...., and hit the target.

      (Okay, sorry again, the tennis ball hit the dunking target and someone fell in the water. Got it. We're in a carnival.)

      * The clown threw a ball,....for charity...., and hit the target....of 1 million dollars.

      (Scratch that. It really is a charity party and we've collected 1 million in donations. There's no way the meaning can change again.)

      * The clown threw a ball,....for charity...., and hit the target....of 1 million dollars....by striking out Babe Ruth.

      (Oops again. The clown got 1 million dollars in pledges if he could strike out Babe Ruth, and he succeeded. We're talking about a base ball again. I give up.)

  35. O(n^n^n...)????? by mosel-saar-ruwer · · Score: 3, Interesting

    From TFA: The algorithm discovers the patterns by repeatedly aligning sentences and looking for overlapping parts.

    If you take just a single string [of length n] and rotate it against itself in a search for matches, then you've got to do n^2 byte comparisons just to find all singleton matches, and then gosh only knows how many comparions thereafter to find all contiguous stretches of matches.

    But if you were to take some set of embedded strings, and rotate them against a second set of global strings [where, in a worst case scenario, the set of embedded strings would consist of the set of all substrings of the set of global strings], then you would need to perform a staggeringly large [for all intents and purposes, infinite] number of byte comparisons.

    What did they do to shorten the total number of comparisons? [I've got some ideas of my own in that regard, but I'm curious as to their approach.]

    PS: Many languages are read backwards, and I assume they re-oriented those languages before feeding them to the algorithm [it would be damned impressive if the algorithm could learn the forwards grammar by reading backwards].

    1. Re:O(n^n^n...)????? by psmears · · Score: 3, Insightful

      If you take just a single string [of length n] and rotate it against itself in a search for matches, then you've got to do n^2 byte comparisons just to find all singleton matches,...

      No you don't :-)

      If you want to find all singleton matches, it's enough to sort the string into ascending order (order n.log(n)), and then scan through for adjacent matches (order n). For example, sorting "the cat sat on the mat" gives "cat mat on sat the the"—where the two "the"s are now adjacent and so easily discovered.

      For finding longer matches the sorting method still works, except that you sort fragments of the sentence rather than individual words. Clearly there is more work involved, but (depending on exactly what you're counting) there are still order n.log(n) comparisons to be performed.

      This means that searching for substring matches can be performed relatively efficiently. I don't know about how the language-learning algorithm works, but you may be interested to know that the compression algorithm used by "bzip2" works in exactly this way (google for "Burrows-Wheeler transform" for more details!)

    2. Re:O(n^n^n...)????? by volsung · · Score: 2, Interesting

      Right-to-left languages (which I assume you mean as "backwards") are displayed that way to the user, but it does not affect their digital storage, which is still forwards (in the numerical offset sense).

  36. Re:Ah, you don't know Chomsky. by lupin_sansei · · Score: 2, Insightful

    Yeah and this didn't learn the language in any meaningful sense. It just found a statistical pattern, and then generates possible sentences from that pattern. That's a whole lot different to you and I understanding the language and generating intentional, meaningful sentences.

  37. English only has two tenses. by ericbg05 · · Score: 5, Informative
    I've done translation work before (Slovak -> English), and there's much more going on than differences in words and grammar. There are whole conceptual frameworks in languages that just don't translate, and this is frustrating for anyone learning a language, let alone trying to translate.

    Yes! I'd have thrown a mod point at you just for this paragraph if I could.

    English is very precise (when used as directed) in matters of time and sequence -- we have more than 20 verb tenses where most languages get away with three.

    Not really. Firstly, English only has two or three tenses. (Depending upon which linguist you ask, English either has a past/non-past distinction or past/present/future distinctions. See [1], [2]. The general consensus seems to be in favor of the former, although I humbly disagree with the general consensus.) It maintains a variety of aspect distinctions (perfective vs imperfective, habitual vs continuous, nonprogressive vs progressive). See [3]. Its verbs also interact with modality, albeit slightly less strongly.

    It's a very common mistake to count the combinations of tense, aspect, and modality in a language and arrive at some astronomical number of "tenses". It's an even more common mistake (for native English speakers, anyway) to think that English is special or different or strange compared to other languages. In most cases, it's not -- especially when compared with other Indo-European languages.

    Secondly, and more interestingly IMHO, most languages do not have three distinct tenses. The most common cases are either to have a future/non-future distinction or a past/non-past distinction. In any case, the future tense, if it exists, is normally derived from modal or aspectual markers and is diachronically weak (which is linguist-babble meaning "future tenses forms don't stick around for very long"). See [3].

    English is a perfect example: will, of course, used to refer to the agent's desire (his or her will) to do something. Only recently has it shifted to have a more temporal sense, and it still maintains some of its modal flavor. In fact, the least marked way of making the future (in the US, at least) is to use either gonna or a present progressive form: I'm having dinner with my boss tonight. I'm gonna ask him for a raise. See Comrie [1] again.

    So as not to be anglo-centric, I'll give another example. Spanish has three widespread means of forming the future tense. Two of these are periphrastic and are exemplified by he de cantar 'I've gotta sing' and voy a cantar 'I'm gonna sing'. The last is the synthetic form, cantaré 'I'll sing'.

    Most high school or college Spanish teachers would tell you that the "pure" future is cantaré. Actually, it's historically derived from the phrase cantar he 'I have to sing' (from Latin cantáre habeo), and is being displaced by the other two forms all across the Spanish-speaking world. I'm told, for example, that cantaré has been largely lost in in Argentina and southern Chile (see [4]).

    In any case, the parent's main point still holds. It's a b?tch to deal with cross-linguistic differences in major semantic systems computationally. But good lord, it's fun to try. :)

    References:

    1. Comrie, Bernard. Tense. Cambridge, UK: Cambridge University Press, 1985.
    2. Davidsen-Nielsen, Niels. "Has English a Future?" Acta Linguistica Hafniensia 21 (1987): 5-20.
    3. Frawley, William.
  38. Random test ... by Mostly+a+lurker · · Score: 5, Funny
    I know it is fairly accurate because I have fooled my spanish speaking friends once in an IM conversation. I told them I learned spanish via hypnosis and basically just copy/pasted everything spanish into IM. The conversation went on for like 15 minutes full spanish before I told them I was using the website. They were pissing their pants.
    English to German produces:
    Ich weiß, dass es ziemlich genau ist, weil ich mein Spanisch getäuscht habe, Freunde einmal in einer IM Konversation zu sprechen. Ich habe sie erzählt, dass ich Spanisch über Hypnose und im Grunde nur Kopie gelernt habe/hat eingefügt alles Spanisch in IM. Die Konversation ist weitergegangen für wie 15 Minuten volles Spanisch, bevor ich sie erzählt habe, dass ich die Website benutzte. Sie pissten ihre Hose
    Then, German to English:
    I know that it rather exactly is, because I deceived my Spanish to speak friends once in one IN THE conversation. I told it, learned would have inserted that I Spanish over hypnosis and in the reason only copy all Spanish in IN THAT. The conversation is gone on for Spanish full like 15 minutes before I told it, that I the websites used. You pissten its pair of pants
    My conclusion is that there is still a place for human translators.
    1. Re:Random test ... by dunkelfalke · · Score: 2, Funny

      in my opinion it is not impressive at all (i speak english and german myself). try translate.ru

      english to german

      Ich weiß, dass es ziemlich genau ist, weil ich meine spanischen sprechenden Freunde sobald in einem IM Gespräch zum Narren gehalten habe. Ich sagte ihnen, dass ich Spanisch über Hypnose lernte und grundsätzlich gerade alles Spanisch in IM kopieren/aufkleben. Das Gespräch ging seit ähnlichen 15 Minuten volles Spanisch weiter, bevor ich ihnen sagte, dass ich die Website verwendete. Sie waren pissing ihre Hosen.

      the result back to english

      I know that it is quite precise because I have held my Spanish speaking friends as soon as in one in the conversation to the fool. I said to them that I learned Spanish about hypnosis and basically just all Spanish in IN copy / stick. The conversation went on since similar 15 minutes of full Spanish, before I said to them that I used the website. They were pissing her trousers.

      --
      Conservatism: The fear that somewhere, somehow, someone you think is your inferior is being treated as your equal.
    2. Re:Random test ... by Godwin+O'Hitler · · Score: 3, Interesting

      I AM a professional human translator, and believe me, if a machine translation did even a half decent job of producing intelligible, natural text, I would use it to get a jump start and save a lot of time.

      But as things stand, I'd spend more time knocking the bad translation into shape than if I translated the whole thing from scratch.

      Translators are often asked to copy edit other translators' work (customers tend to call it this "proof reading", presumably to devalue it and get it done on the cheap, but it involves much more than hunting typos). That's fair enough if you want a quality check. But some smart-arse people try sending machine translations for copy editing. And you can bet they get sent straight back!

      --
      No, your children are not the special ones. Nor are your pets.
  39. Re:Do-support, in brief by aziraphale · · Score: 2

    > We can say, Earlier you educated me. but not Earlier you teached me. Why?

    We say 'earlier you taught me' instead. What is your point?

    In terms of language evolution, the word 'taught' has the same relationship to 'teach' as 'wrought' has to 'wreak', and similar relationships to 'thought'-'think', 'brought'-'bring' and (less so) 'bought'-'buy'. The pretirite form of each of these verbs is actually formed by a very similar linguistic rule to the one that forms 'educated' from 'educate' - the basic rule in germanic languages being that you stick a dental plosive 't' or 'd' sound on the end of the verb (ignore how the words are spelled, as that's really an irrelevance to the evolution of the words in the first place - we're talking about sounds here). Once this form has been created, however, it can create an awkward sound at the end of the word - 'ct', 'ngd', 'nct', etc. Language users don't like awkward sounds, they change them, preserving the distinctiveness, but losing some of the closeness to the original word. Also bear in mind that 'ch' was not always the sound at the end of the word 'teach' - it was once a much harder sound.

    Add to this general rule the tendency in germanic languages for certain verbs ('strong verbs') to change their vowel sound in the past tunse (cf: 'run'-'ran', 'sing'-'sang', etc.), and you can see roughly where 'taught' came from. It's not really an 'exception', just a very old word that's had time to be moulded into a more comfortable shape through usage.

    When trying to reduce a living language to a syntax, you miss out on the richness imparted to languages by the conventions that they gather through continual usage. English has simple syntax rules - I can coin a new verb and use it in grammatical sentences without anybody having any doubt about what syntactic role it is playing - look at the rise of 'google' as a verb - nobody had to teach you the words 'googles', 'googled' and 'googling', but you would happily use them. But once words are accepted into the language and used, they move over time, sometimes not in the same direction as their near relatives (as 'teach' and 'taught'). To explain where these words come from you need to look at the syntax rules prevailing at the time the derivative word was coined, and the pressures and modifications the words have been subjected to since. This is exactly what we mean by a 'living language'.

  40. How about Dolphinese? by Tatarize · · Score: 2, Insightful

    Klingon has simple grammar.

    How about Dolphinese? Research shows that they seem to be able to scout and transfer information from one individual to his/her pod. If there's some grammar it would be pretty good nut to crack.

    --

    It is no longer uncommon to be uncommon.
  41. DNA Analysys by da5idnetlimit.com · · Score: 2, Interesting

    Anyone else thinking about using the tech to learn something about "the grammar of DNA"?

    If they can use it for analysing proteine sequences, maybe they can tackle "the grammar of Life" and kickstart the whole Bioengeenering sector into a new life...

    OTOH, the integrist christians will probably denounce this as an evil thing...

    --
    It takes 40+ muscles to frown, but only four to extend your arm and bitchslap the motherfucker
  42. two-way street by mbius · · Score: 3, Funny

    It works the other way too:

    "I'm leaving you."

    What?

    "I'm leaving you, Alice."

    I don't understand what you're trying to do.

    "I've met someone."

    What do you mean 'met'?

    "Look...just read the pamphlet."

    I don't have the pamphlet.

    "I have to go."

    Which way do you want to go?

    "Uh...west."

    You would need a machete to head further west.



    I can't tell you how many of my break-ups have ended with needing a machete.

    --
    you can have my violent video games when you pry them from my cold, dead hands.
    Prime UID Club
  43. Re:New algorithm understands slashdot comments! by TapeCutter · · Score: 2, Funny

    In the natural language processing business they call this "the same level of understanding as a two-year-old child".

    Can you teach it to take a breath?

    --
    And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.