Slashdot Mirror


Translation Software That Learns by Reading

redcone writes "New Scientist is reporting that translation software that develops an understanding of languages by scanning through thousands of previously translated documents has been released by U.S. researchers. According to the article "The translated documents used to teach the translation algorithms can be electronic, on paper, or even audio files. The system is not only faster than other methods, but also better suited to tackling less common languages and the unusual vocabulary found in specialised or technical texts.""

67 of 308 comments (clear)

  1. High school Spanish by KaSkA101 · · Score: 3, Funny

    Why didn't I have this software during High School Spanish?

    1. Re:High school Spanish by xtrvd · · Score: 4, Informative

      Fortunately I had the next best thing in High School Spanish. The trick is simply going to the #spain channel on efnet and talking nice to some people. You'd be amazed as to how often my teacher would fail my fellow students because they attempted using the primitive babelfish.altavista.com to do their work for them; she could easily spot the syntax errors and mis-spelled english words which were never translated.

      Until I see this new process in the works, however, there is nothing that will make me believe it's better than finding another human who can *understand* what you are saying and the context to which you are implying.

    2. Re:High school Spanish by Servants · · Score: 2, Interesting

      Until I see this new process in the works, however, there is nothing that will make me believe it's better than finding another human who can *understand* what you are saying and the context to which you are implying.

      Heh. Then there is nothing that will make you believe, etc., etc.

      Certainly you can't do good translation without understanding syntax (which influences meaning and underlies word order) and context (to disambiguate synonyms and phrases with multiple interpretations). Machines aren't especially good at either one yet; ergo, machine translation will continue to be pretty crappy for the foreseeable future.

      Funny thing is, though, even a crappy translation turns out to be tremendously useful in most practical contexts, and worlds better than none at all; a simple word-for-word translation is typically hard to read but still conveys the proper gist. That's why I don't get excited about automatic translation "advances" these days: there are really two purposes for machine translation. One is figuring out what a piece of speech or text trying to say, and the current technology is usually good enough for that. The other is making a translation of sufficient quality to save a human translator some work, and I think that won't happen for quite a few years yet. Anything in between adds very little.

      (By the way, everything in natural language processing these days uses corpus learning techniques. Now if an improved technology had been developed manually by bilingual programmers who pulled the design out of their collective hats, then that would be a man-bites-dog story!)

    3. Re:High school Spanish by Temposs · · Score: 3, Insightful

      Until I see this new process in the works, however, there is nothing that will make me believe it's better than finding another human who can *understand* what you are saying and the context to which you are implying. "Better" is an ambiguous term. For what these researchers made the program for, it is better than humans for one reason: speed. Sure they want the translations to be reliable, but more importantly is that a computer can do in a few days what would take a human a month, for this application at least. The NSA and the like want to have translations of huge swathes of text, and fast! The sooner they can understand things that are written, the faster they can react to threats. The time and money spent on human translators for this purpose is very slow and expensive in comparison. For your Spanish HW, the best is a native speaker giving you feedback, because the amount of work is small and the translations will be very accurate.

      --
      Knowledge is just opinion that you trust enough to act upon. -Orson Scott Card
  2. technical texts by Olaserov · · Score: 4, Funny

    I wonder if we could train it to translate a EULA ;)

    --
    * Olaserov is in the process of thinking up a signature.
    1. Re:technical texts by Anonymous Coward · · Score: 2, Funny
      Wouldn't work without having some being translated first. Here's one for the cause:

      1. If we screw up it's not our fault
      2. If you screw up, well you're screwed. ... and you owe use your first born.

  3. translate to American please by Anonymous Coward · · Score: 3, Funny

    Can someone translate that article from British english to American english please.

    Thanks.

    1. Re:translate to American please by Grey+Ninja · · Score: 4, Funny

      Here's a couple of suggestions for you:

      r3Ð(0n3 wr173$ "N3w $(13n71$7 1$ r3p0r71n9 7h47 7r4n$£4710n $07w4r3 7h47 Ð3v3£0p$ 4n nÐ3r$74nÐ1n9 0 £4n9493$ b¥ $(4nn1n9 7hr09h 7h0$4nÐ$ 0 pr3v10$£¥ 7r4n$£473Ð Ð0(m3n7$ h4$ b33n r3£34$3Ð b¥ .$. r3$34r(h3r$. 4((0rÐ1n9 70 7h3 4r71(£3 "7h3 7r4n$£473Ð Ð0(m3n7$ $3Ð 70 734(h 7h3 7r4n$£4710n 4£90r17hm$ (4n b3 3£3(7r0n1(, 0n p4p3r, 0r 3v3n 4Ð10 1£3$. 7h3 $¥$73m 1$ n07 0n£¥ 4$73r 7h4n 07h3r m37h0Ð$, b7 4£$0 b3773r $173Ð 70 74(|{£1n9 £3$$ (0mm0n £4n9493$ 4nÐ 7h3 n$4£ v0(4b£4r¥ 0nÐ 1n $p3(14£1$3Ð 0r 73(hn1(4£ 73x7$.""

      And translation #2:

      REDCONE WRIETS NU SCEINTIST IS R3PORTNG TAHT TRANSLATION R TAHT D3V3LOPS AN UNDERSTANDNG OF LANGUAEGS BY SCANNG THROUGH THOUSANDS OF PREVIOUSLY TRANSLAETD DOCUMENTS HAS B3N REL3AESD BY US!!!! OMG R3S3ARCHARS!!1!1!! LOL ACORDNG 2 DA ARTICL3 TEH TRANSLAETD DOCUMENTS US3D 2 T3ACH TEH TRANSLATION ALGORITHMS CAN B 3LECTRONIC ON PAEPR OR 3V3N AUDIO FIELS!!1111 TEH SYSTEM IS NOT ONLY FASTER THAN OTH3R M3THODS BUT ALSO BT3R SUIETD 2 TAKLNG LAS COMON LANGUAEGS AND TEH UNUSUAL VOCABULARY FOUND IN SPACIALIESD OR TECHNICAL TEXTS!1!! WTF

    2. Re:translate to American please by chris_sawtell · · Score: 2, Funny
      No trouble at all. First take the original text, computer translate it into German, and then back into English.

      Now reorder the phrases in every sentence so that the object phrase starts the sentence, change every sentence which contains the word because so that the word because and the words following it start the sentence. Make sure that every infinitive verb has the adverb between the word to and the verb. Change every occurrence of which to that Find every word more than 3 syllables long and inject several short filler words more or less at random near the long word. Finally change every double consonant before the letter combinations ed and ing to single occurrances. Change every ise to ize, and arse to ass.

      That'll more or less do it.

      Now you know - quoting W. S. Churchill - why we are "Divided by our common language".

    3. Re:translate to American please by aussie_a · · Score: 2, Funny

      I can translate it to Australian:

      That redcone fella did say something about some rag reporting some computer thingymebob that lets me understand what all those japs are saying. The city rag reckons it's real fast.

  4. Yay! by gardyloo · · Score: 3, Funny

    Hope for slashdot. I've always wondered if we only have artificially intelligent editors...

  5. Harry Potter and the Bible by MikeFM · · Score: 4, Interesting

    I remember hearing about this a couple years ago. They were using translations of Harry Potter and the Bible to teach this software to translate. It seems to work well. I wonder what it'd make of different translations of technical documentation. That'd probably be even more interesting than what it'd make out of 'quidditch'.

    This could be great if it were opensourced. It'd be nice to translate email, instant messages, websites, technical docs, and lots of other stuff we're currently using the fish for. The fish is nice but not that effecient to add to other programs and it's translations aren't usually that great.

    --
    At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
    1. Re:Harry Potter and the Bible by obeythefist · · Score: 5, Funny

      I never read that one. I thought the next book title was going to be "Harry Potter and the Half-Blood Prince".

      Or did JK Rowling suddenly become pious?

      --
      I am government man, come from the government. The government has sent me. -- G.I.R.
  6. Turing test by OneArmedMan · · Score: 3, Insightful

    I wonder if something similar to this could be used for AI , for say Turing Test's ?

  7. Wow! Does a much better job... by bigtallmofo · · Score: 5, Funny

    Teach Software translating on scanning up

    Not hard wares that sticks an comprehension of talks by scanning on thousands of fish translated papers has been vomited by US scientists.

    Many existing translation not hard wares uses palm rules for botching words and phrases. But the new software, snarked by Kevin Knight and Daniel Marcu at the Information Sciences[...]

    Read More...

    --
    I'm a big tall mofo.
  8. Neural Nets and Machine Learning by MyIS · · Score: 2, Interesting

    In one way or another this is similar to training neural nets to recognize images, or spam filters to mark junkmail. Great way to put number-crunching power of computers to direct work.

    --
    http://zero-to-enterprise.blogspot.com/
  9. That's great.... by Frodo+Crockett · · Score: 4, Funny

    ...bu7 (4n 17 unÐ3r$74nÐ £337?

    --
    "The newly born animals are then whisked off for a quick run through a giant baking oven." --heard on Food Network
  10. That sounds like a good approach by FunWithHeadlines · · Score: 3, Insightful
    I wish them luck (cuz they'll need it), but if anything is going to produce translation software that really works it will have to include learning elements of this nature. It's one thing to get dictionary translations. That's been around for decades, with its laughable results. Humans speak in metaphor and simile and slang and contractions and abbreviations of thought all the time. We're the cat's meow of language (try that, computer!).

    But if you give computers a bunch of human stuff to read, you expose the dictionaries to language as it is actually used, not just as the dictionary has it. Then when odd language usage falls upon us like it's raining cats and dogs, they will have a database of similar usage to draw upon. Hey, it's an uphill climb, but this is a good avenue to try. Cheerio, computers, and a top o' the mornin' to ya.

    1. Re:That sounds like a good approach by maxwell+demon · · Score: 2, Funny

      Bah, one language to other and back is too little. You have to do the complete thing, of course. As you can do here. I've done it to your text for you (of course including the east asian languages):

      If it is possible and cuz of the translation of the software of the
      wealth (until the necessity to the danger) this person whom it causes,
      this member of the quality of the well-educated way and, in me who I
      consult that it examines it, of its type of the search of the thing
      the truth that the lheo requests to necessary desire of the excess
      near this person, I include. That the translation of the dictionary,
      of that the extensions he is situation. The consideration is relative
      is possible he and this result, with the smile, much hour actively in
      the duration. The person and the comparison and the slang and the
      contraction and the Synopse and the metaphor of the idea, of that
      always say it. Our cat (until a machine of the language of the
      measured value! ) it is meow. It is this exactitude of him, but and
      affinchè that reads gives and to the computers and this, truth within
      the fines of the dictionary, that is to say, is used, flagstone of the
      halting of the language the materials with the artificial enemy... It
      was distinguished has the payment in advance. If the language, that
      one that disowned, uses tractions the cat and with the autumn of the
      rain of the dog, this it them chronometers, he will have a basic
      beginning of the data of the use of the percentage of fines of history
      of him. He was presented/displayed in the ascent, he is, but of
      Cheerio and or of the calculation to the interior of the good way he
      examined this to the interior or ' mornin'.

      --
      The Tao of math: The numbers you can count are not the real numbers.
    2. Re:That sounds like a good approach by Vintermann · · Score: 2, Funny

      Yeah, it's a really cool resource. If you press the button a couple of times, it seems most of them converge, presumably onto something that is a especially reasonable statement in the mind of the computer.

      "The moon's a harsh mistress" converges quickly to

      "With the love seriously the moon"

      Whereas the text on top of the search

      "Sometimes it's fast, sometimes it's slow. Sometimes it doesn't work at all."

      takes a long time to converge to

      "To the times during the hour, this comes during the period from digiunare, this elasticity in the valve of the accumulation of the pipe, of therefore if with destiller more distilling this, the instrument plus these lengths with the fear of this not he he, Synopse, company more."

      Whereas your phrase takes a couple of clicks to converge to
      "Titmouse Quant0 of the intention these costs, those the legend?"

      Geek points to people who can find a small sentence that grows exponentially, or a big text that converges to a word or two.

      --
      xkcd is not in the sudoers file. This incident will be reported.
  11. Philosophical caveat by Raindance · · Score: 4, Insightful

    As a caveat, we should be wary of saying the system "understands" a language.

    I would say generally that humans able to translate between languages generally understand both languages, but whether a statistical, probabilistic model based on correlations understands a language might be a stretch.

    Further reading: Searle's Chinese Room argument- http://en.wikipedia.org/wiki/Chinese_room

    This is akin to asking, Does your tax software understand the tax code? Does Photoshop understand the principles of image manipulation?

    Are these silly questions to ask?

    Further reading: Dennett on intentionality (http://en.wikipedia.org/wiki/Dennett but the entry is pretty sparse).

    RD

    1. Re:Philosophical caveat by MikeFM · · Score: 3, Interesting

      Does anybody understand the tax code? Why should software be any different?

      I think that software that can learn can be said to understand a problem just as much as a human can. The difference between understanding and just doing is having the ability to learn from new data and to change your actions as required.

      --
      At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
    2. Re:Philosophical caveat by back_pages · · Score: 4, Insightful
      Great example of this:

      Mom baked for three hours.
      The pie baked for three hours.

      "Mom" and "The pie" are the subjects. The verb and entire predicate are identical. Understanding the language disambiguates these sentences, but the ambiguity is part of what defines humor.

      A man walked into a bar. Ouch!

      A man wanted to win a pun contest in the local newspaper, so he entered 10 times in order to increase the chances that one of his entries would win. Unfortunately, no pun in ten did.

      You can translate that 50 ways from Sunday but without understanding the language - understanding what makes those statements interesting - the machine will lose all their meaning.

    3. Re:Philosophical caveat by idlake · · Score: 2, Insightful

      You are right: this software does not understand language; it works out statistical correspondences, but it has no understanding of the physical correlates of words. That also means that it has intrinsic limitations.

      Note also that such statistical approaches are nothing new, it's just that computers are finally getting powerful enough that people can use them.

      None of that has anything to do with Searle. Searle wouldn't admit that the system understands language even if it knew things about the real world. Searle's argument is arbitrary and ad hoc. You are free to believe in it if you like, but sooner or later, you will have to defend your position against a computer that will insist that if you don't admit the possibility that it is self-aware and intelligent, well, it is just going to assert that you aren't either.

    4. Re:Philosophical caveat by lakeland · · Score: 2, Insightful

      NO! NO! NO! Not the Searle argument again. That guy is an absolute nutter and should be banned! Actually, on second thoughts, as long as I never have to hear his drivel again, I don't mind what happens to him.

      His argument essentially boils down to: "The computer doesn't understand because all it does is manipulate symbols. Even if it does exactly the same steps as a human, the human understood and the computer was just being a mimic. Giving the computer a body wouldn't make it any less of a mimic".

      The #1 flaw in his argument is that it would result in humans being classified as non-intelligent. He constantly spouts on about how machines aren't intelligent but then says "Except that the human understands what they are doing". I think that's a close contender to the Wookie defense for world's worst argument.

    5. Re:Philosophical caveat by Coulson · · Score: 2, Insightful

      Searle's Chinese Room argument is hogwash.

      In his scenario, Searle claims that neither the people moving the Chienese tokens, nor the book of instructions telling them what to do "understands" what is being said. That is obviously true, but it misses the point. That's like saying that the neurons in your head don't understand what you are saying, and so neither do you.

      The workers in the Chinese Room argument are just hardware. They're akin to neurons in the brain, or chips in a computer. They're blindly executing instructions (software) from a book and recording results on blank pages (working memory). No AI proponent is arguing that the chips in the computer "understand" anything. Chips just dumbly execute instructions. What's interesting is the combination of software and persistent memory, and the apparent conciousness that can arise therefrom.

      Searle's argument must either be considered void, or one is compelled to admit that humans don't "understand" anything either. As such, it's hogwash.

    6. Re:Philosophical caveat by evilmousse · · Score: 2, Insightful


      that's exactly why i like my anime fansubbed instead of sanitized.

    7. Re:Philosophical caveat by tgv · · Score: 2, Insightful

      The translation depends on the semantic class of the subject (is the subject a potential baker -> use translation nr. 1, is the subject something that's usually being baked -> use translation nr. 2). So, ignoring other issues, this particular problem is easy to solve, it only takes a lot of work.

      BTW, the fact that it isn't entirely grammatical has nothing to do with it (if we understand it, we can translate it, so any MT faces the same challenge).

    8. Re:Philosophical caveat by Bugmaster · · Score: 2, Informative
      xcept for the fact that "The pie baked for three hours" isn't good grammar.
      Why ? You could say, after all, "The pie baked normally for three hours in the oven, then it started to burn". It's acceptable grammar, but it's a confusing sentence on the semantic level.

      The sentence "the pie was baked for three hours" differs in meaning, because it implies that someone was there, actively baking the pie.

      --
      >|<*:=
  12. Google definitely would buy into this... by egyber · · Score: 5, Interesting

    Don't remember exactly where I read this, but google apparently has long believed that there is enough data on the internet alone to be able to intelligently translate... What these guys claim to have done is, it would seem, the missing peace of the puzzle for google. I wouldn't be surprised if google gets in on this.

    1. Re:Google definitely would buy into this... by MikeFM · · Score: 2, Interesting

      I tried that in about 1997. It did work pretty well but the biggest problem was the limitation of having copies of the same document in different languages. There are quite a few but they were dwarfed by the amount of single-language documents. Also the fact is that most text on the Internet is written the way that I write - badly. This can lead to translations that are written the way real people write which can be good for conversational bots but which is probably bad for translation software.

      Some of the more interesting things about these bots of mine were that they weren't programmed to translate but they learned to do so anyway. If you spoke to them in English they might respond in French or German but the response would be correct. That was really a very surprising finding.

      I expect that these guys have built a much more robust dictionary and that their algorithms are worked out better than mine were. They probably have taken texts off the Internet to train their dictionary but I doubt they'd want to submit random findings off the Internet.

      I'd like to see what they could come up with for simplifying language. Take some source documents written in full geek jargon and take the same documents rewritten to be for the lay person. Train the program on that. Then us geeks could translate our docs into stuff normal people could read. THAT I'd buy.

      I wonder if it'd be good enough to learn to translate source code into English or even into other programming languages? It'd seem that the same abilities would apply to this task.

      --
      At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
  13. Translating specialised texts ... by rkmath · · Score: 4, Insightful

    The article (and the text of the orginial posting) makes it seem like translating a specialized technical text is somehow harder than translating, say, a newspaper article. As someone experienced in translating technical (science/engineering) documents, I can say that any tech document is far _easier_ to translate after an initial learning curve.

    The main reason (I think) is that: tech documents have specialised vocabulary and idioms, but these are much fewer than the idioms one has to master in order to understand the editorial page in a newspaper.

    With a rudimentary knowledge of Russian and French, I have found it much easier to read an engineering textbook or paper in these languages, than reading any nontechnical text. (This is not necessarily the case with other languages. Any document in Japanese for instance is an entirely different ballgame ...)

    1. Re:Translating specialised texts ... by Anonymous Coward · · Score: 4, Informative

      The article (and the text of the orginial posting) makes it seem like translating a specialized technical text is somehow harder than translating, say, a newspaper article. As someone experienced in translating technical (science/engineering) documents, I can say that any tech document is far _easier_ to translate after an initial learning curve.


      Of course that is true, for a human translator. Your knowledge of the technical field itself is a resource you can use to aid in your translation of technical texts. For machines, it's usually necessary to use a translator specifically geared to the subject matter. For instance, you would definitely want to use a different machine translator for a newspaper article as opposed to a biomedical research journal.

      This new approach is supposed to mitigate these problems. If they can do a good job of it, they may be able to bring machine translation to areas where previously human translators have been required or greatly preferred.

    2. Re:Translating specialised texts ... by Beryllium+Sphere(tm) · · Score: 2, Insightful

      Absolutely true. One of Beryllium Sphere's partners is a computational linguist. For quite a while her bread and butter was building representations of knowledge about heavy equipment maintenance to support automatic translation of Caterpillar Tractor technical manuals.

      There's more to help you than just the specialized vocabulary. It's good that "crankshaft" is unambiguous but it also helps to know in advance that "bolt" will be a noun and not a verb.

      Also, to be blunt, nobody expects technical prose to sound as good as normal prose. In fact they're happy if it's just functional. A fluent French speaker thought the automatic translation output was a bit stilted, but the first time one French mechanic read it he jumped up and down for joy and said "If only we'd had this earlier, I wouldn't have almost lost my arm!".

  14. Huzzah! by Tzarius · · Score: 2, Funny

    Now my Bayesian mail filter can translate spam to english before it's read!

  15. DadaDodo by Tripax · · Score: 4, Informative

    This reminda me of Jamie Zawinskies hack Dadadodo which used probability trees to create new texts from old texts by examining the probability any given word follows the previous word/string of words. I always thought his program was cool, in that his description of it involved Markov Chains and William S. Burroughs.

  16. Microsoft Research already does this by drdink · · Score: 4, Informative

    I did a presentation for an AI class a while ago and discovered that Microsoft already does this with their MSR-MT project. Apparently the Spanish entries in their Knowledge Base were translated by this as well.

    --
    Beware, Nugget is watching... See?
  17. Arabic to English by Caseyscrib · · Score: 4, Interesting
    I'd like to see an arabic-to-english translator. I was interested in reading news from the middle east, because I don't particularly trust our media to translate it properly. A good example of this is Bin Laden's transcript.

    After a quick web search, all I was able to find was this site, which has a pretty sketchy TOS agreement.

    1. Re:Arabic to English by Caseyscrib · · Score: 2, Informative

      I never said I trusted either source. But when you can read Arabic propaganda and contrast it with your own media's propaganda, it helps you to understand what the underlying causes for war are. It is also key to recognizing the true aggressor, because in every war both governments play the "good guys" role to their citizens. Direct translation helps you to understand the culture of your enemy. Things as simple as webpage advertisements, editorials, personals, etc, are lost in translation by CNN and the other alphabet news networks.

    2. Re:Arabic to English by Anonymous Coward · · Score: 2, Informative

      Go to http://www.systransoft.com, choose Arabic to English

  18. Dragon Naturally Speaking by headkase · · Score: 3, Interesting

    Using statistical methods to predict the next item in a sequence is still not true hard ai though, this technique is used with the voice recognition software "Dragon Natually Speaking" creating in effect pattern chains. What Dragon did on the character level this software appears to do on the word level. This is still not true AI however, as the statistics will only map to probabilistic sequences not abstractly map instead to the concepts. What would really impress me is if they came up with a mapping algorithm that instead of using probability used a function like mini-max fitness testing on a neural-network substrate.
    It would be interesting to see the results of analysing large sections of languages however, but the only immediate use I can fathom for this would be for cryptography or information compression algorithms. However the results could probably be used to provide insight into how languages evolve or how memes spread from language to language.
    Or the brief explanation in the article did not make it clear enough how this differs from what was previously state-of-the-art, e.g. Dragon.

    --
    Shh.
  19. Re:Universal Translator by ari_j · · Score: 2, Informative

    I was thinking the same thing - I don't have time to investigate how it works, but if you created one that translated symbolically-represented phonemes (languages other than Germanic and Eastern probably know this concept as "spelling") you'd have a pretty good system going. From the article lead-in here on Slashdot, it sounds as if it will take the basic rules of a language and maybe some "seed" data, and from there learn by comparing text in language A and language B that have the same meaning.

  20. so how can they grade you in school? by cheekyboy · · Score: 3, Insightful

    One has to wonder if the language of choice English or whatever is so structured and rule ridden and not just made up on the fly. Then how come its so difficult to determine all the rules? Is it there are too many of them? too many contexes? Or just trying to translate bad grammer which fails the rules but any human can decipher it.

    Sometimes brute force, ie look up tables for 100000000 translated versions can be better, so much for logic eh :-)

    --
    Liberty freedom are no1, not dicks in suits.
  21. Time flies like an arrow... by Secret+Agent+99 · · Score: 5, Funny

    ...and fruit flies like a banana.

    When an automated translator can handle that one without bursting into flames, I'll start to believe.

    1. Re:Time flies like an arrow... by finiteSet · · Score: 2, Insightful

      "Time flies like an arrow" is a simile, and is idiomatic. There are a finite set of idioms, and they should be fine as "memorized" exceptions in a speech system (they are often memorized exceptions in humans). Most language is rule based, but I think many underestimate the number of idioms that humans encounter and have difficulty "parsing."

      "Time flies like an arrow, fruit flies like a banana" is a joke. Translating it into other languages would neither be funny or especially meaningful, as the whole point is to play the idiom for a joke.

      Humans are imperfect speech systems - everyday people hear things wrong, misinterpret sentences, etc. Humans just typically have lower error rates than machine systems, especially for language systems. Building a system that understands jokes, metaphors, etc. will take an extensive knowledge to draw from, which is one of the big advantages humans have in disambiguating language. Without a large knowledge-base and efficient ways of getting feedback to update that knowledge-base, computers will still have difficulty disambiguating novel phrases and words. Even then it is unrealistic for them to be able to always "understand" idioms, which rarely retain a meaning that can be deduced, just as it is unrealistic for humans to always understand an idiom when they first encounter it. Language systems should do what humans do - memorize its meaning and move on. You're welcome to wait for systems that understand jokes, and you'll probably be waiting for a while. I don't think, however, that is a useful prerequisite for "believing" in language systems.

      --
      If we start buying CDs then the terrorists have already won.
  22. Scanning Audio Files by BobPaul · · Score: 2, Interesting

    Why didn't I have this software during High School Spanish?

    It says it can scan through audio files an input source. I wonder if this causes it to "learn" the auditory signatures (and thus only knows the translation when given audio input), or if it relies on text to speech from to convert it to text first?

    If it does the latter, than based on the quality of current text-to-speech software, this probably wouldn't do much good in a total immersion classroom situation...

    Sure would have helped with my German homework, though ;)

  23. Mission statements by Savage-Rabbit · · Score: 2, Insightful

    That thing reminds me of Dilbert's mission statement generator. The scary thing is that the material from Dilbert's babble engine actually sounds like alot of the stuff you are likely to find on actual corporate websites.

    --
    Only to idiots, are orders laws.
    -- Henning von Tresckow
  24. Reading Everything by Anonymous Coward · · Score: 2, Funny

    I hope they don't read everything. Next thing you know translations could end up L1k3 th1s f0R 4l1 y0u K|\|0\/\/.

  25. How is that news? Research was done 10 years ago. by Anonymous Coward · · Score: 4, Interesting

    The basic approach has been developed over 10
    years ago by IBM: The Mathematics of Statistical Machine Translation. And even free software has been available for a while, see
    http://www.fjoch.com/GIZA++.html.

  26. It's only a matter of time before... by gkwok · · Score: 4, Funny

    Skynet begins to learn at a geometric rate. It becomes self-aware at 2:14am Eastern Time....

  27. No samples? by Guspaz · · Score: 3, Interesting

    Sounds interesting, but I couldn't find a single sample translation on their site; ie a block of text in language A (Say, french), and language B (Say, english). Translated from A to B by their software.

    Without even the simplest of examples or samples we have only their word on how well this works.

  28. DOOMED by FoXDie · · Score: 3, Interesting

    Recently robots have been made that can Run, Wield shotguns, and Recognize faces. Now they can read. [DOOMED I SAY]

  29. Easy by beldraen · · Score: 2, Funny

    English->Cat: Meow!

    --
    Bel, the mostly sane.. "Of course I can't see anything! I'm standing on the shoulders of idiots." -- Me
  30. Too bad about the times it needs to think by Timbotronic · · Score: 3, Insightful
    I like the approach they've taken, but machine translation can only ever go so far.

    A friend of mine was trying to translate an English novel into German a while back. She had to work out a replacement for a sentance where the word 'therapist' was construed as 'the rapist'. Hell of a job and she's a professional translator.

    Automatic translation looks pretty good for technical documents, news and anything completely literal. When you get writing with double meanings, humour and plays on words it gets way harder - often to the point where there is no correct translation.

    --

    One of these days I'm moving to Theory - everything works there

  31. Tests by headkase · · Score: 2, Interesting

    The biggest test of the translator is converting from one language to another and then back again multiple times. If the content doesn't get corrupted then it works as advertised.

    --
    Shh.
    1. Re:Tests by Bugmaster · · Score: 2, Insightful

      Ha ! I'd like to see a human translator (or a team of human translators) that could do that.

      --
      >|<*:=
  32. Oh fuck. by hot_Karls_bad_cavern · · Score: 2, Insightful

    Something in my head just popped.

    Damn, i love this place. Seriously, dammit. Here we have post on a tech/it site titled "Harry Potter and the Bible " modded +4 Interesting at the time of this posting ... that is actually interesting. And even i find it interesting and the fact that you are most likely of age and know what is and how to spell "quidditch" is quite frightening. i'm sad to say i knew it too (they took my Ko0lBadge away a long time ago).

    My head totally hurts. Clod.

  33. Can it run faster than you? by ibn_khaldun · · Score: 2, Insightful
    The critical issue is not whether this system will produce translations comparable to those done by a translator fluent in both languages -- it won't. However, it may do as well or better than translations by someone barely competent in one of the languages (or who is essentially just doing dictionary-based translation). English speakers have lots of examples of nearly incomprehensible technical translations from Chinese and Korean, and the Chinese and Koreans would probably have comparable examples of bad translations from English except for the fact the US doesn't make anything they want to buy that requires a manual (soybeans and car-chase movies don't require manuals). Okay, maybe software -- there is a story (probably apochryphal) that an early Spanish version of the manual for the DOS operating system (command line Windows without the viruses, for you young'uns) translated every instance of DOS as "two"

    It's like the old joke about the two backpackers who encounter a hungry bear in the woods. One stops and puts on his running shoes. The other says "Why do that? You can't outrun a bear." The response: "Right, but I can outrun you."

    --

    "All successful systems accumulate parasites" -- Hal Hixon

  34. efnet spanish by Garabito · · Score: 5, Funny

    k apr3ndist3 3sp4ni0l en IRC?
    q w3n0! 3so si está 1337!

  35. GoogleDot by cybercobra · · Score: 2, Funny

    No way, the articles would be much better with AI.
    Now if only we could combine Google News and Slashdot... I for one would welcome /.'s new automated editor overlords.

  36. Re:efnet spanish by sahonen · · Score: 2, Funny

    Is it sadder that you wrote that... Or that I can read it?

    --
    Make me a friend and I'll mod you up
  37. Computers must learn like humans by mailman-zero · · Score: 2, Insightful

    TFA shows steps in the right direction. So far most projects have tried to teach computers how to understand and produce natural language. The real solution lies in creating algorithms that allow computers to learn language. This is where studying how humans acquire language must be merged with computer science.

    I can imagine the first successful computational linguist describing having a computer in his home for upwards of 10 years interacting with it and allowing it to interact with him and his family in order to learn the contects in which certain words carry specific meaning. Once the learning process is completed once the collected persistent memory could then theoretically be copied to other machines and devices so that they, too, may understand the language for which such training has been completed.

    --
    Let's play video games with mailmanZERO
    1. Re:Computers must learn like humans by trufflemage · · Score: 2, Insightful

      "having a computer in his home for upwards of 10 years...."

      One thing computers are is fast. Why make it sit through ten years absorbing input at human speeds when the digital content of the web is available for scanning as fast as the machine can?

  38. Dealing with Disruptive Technology by Simonetta · · Score: 2, Insightful

    This message brings up some excellent points about dealing with disruptive technology. A teacher whose job it is to get students to master material in a certain subject realizes that there is a machine that provide the same function that previously could only be gained by hard study.
    What is more important, the knowledge gained through rigorous study or the ablility to acomplish what the studing provides through a machine.
    Being technical oriented, I have to say the machine. But I am not being disrespectful of all the hard work that goes into learning a language. I'm saying that if people don't want to bother to learn a language, then use the machine if you need a translation. This is a difficult position to defend when colleges still require a few years of a foreign language to get a liberal arts degree and students couldn't care less.
    But I still defend the position. Use the translation software to do your homework. It's more important to master the translation software or machine than it is to master the actual language. Even if you study hard and get an 'A', in a few years you will forget it. And the machines are only going to get better and cheaper. It's your education, your life, your (or your parent's) tution.
    George Gilder once said that the languages that you need to know to be successful are English and C++.

    Still for the most part, the language translation software still sucks and depending on it can put you into some truly embarrassing positions. I think that language translation software (for text) comes in five rough levels:
    1 Word substitution.
    2 Phrase and sentence.
    3 Paragraphs and idioms.
    4 Magazines, full-speed conversations, light literature.
    5 Legal, diplomacy, allegory, and classical literature.

    Each level being at least an order-of-magnitude more difficult to translate than the previous.
    I think that most shrink-wrap translation software today is between levels 2 and 3. (for example-www.systransoft.com) BabelFish and Google site translation is between levels 1 and 2. With non-european languages, BabelFish and Google are incomprehensible and useless.
    It would be interesting to see if in a few hundred years whether language translators work to perserve liguistic diversity or create a global 'pidgin' language.

  39. Johnny Five Alive by Anne+Thwacks · · Score: 2, Funny

    Input...Need more Input

    --
    Sent from my ASR33 using ASCII
  40. Need to be embedded "in the world" by franoreilly · · Score: 2, Insightful

    If all they're talking about is syntactic analysis, it will never be enough. Semantic knowledge is essential for complete "understanding" of language, and that can only be attained by an agent that can interact with the world and humans and learn within that context.

    --
    -- --- Learn language vocabulary with mnemonics: http://www.memorista.com
  41. The first such system was built in 1993. by Dulimano · · Score: 3, Interesting

    This is news of '93, when Brown et al. at IBM built their famous statistical machine translation system. It does exactly what is described in the article. I myself work on such a system (for Hungarian-to-English translation).

    The article (press release?) is totally misleading. Kevin Knight and Daniel Marcu are building on at least 15 years of active research on statistical machine translation. On the other hand, they are really very good at it.