Slashdot Mirror


AI Goes Bilingual -- Without a Dictionary (sciencemag.org)

sciencehabit shares a report from Science Magazine: Automatic language translation has come a long way, thanks to neural networks -- computer algorithms that take inspiration from the human brain. But training such networks requires an enormous amount of data: millions of sentence-by-sentence translations to demonstrate how a human would do it. Now, two new papers show that neural networks can learn to translate with no parallel texts -- a surprising advance that could make documents in many languages more accessible.

The two new papers, both of which have been submitted to next year's International Conference on Learning Representations but have not been peer reviewed, focus on another method: unsupervised machine learning. To start, each constructs bilingual dictionaries without the aid of a human teacher telling them when their guesses are right. That's possible because languages have strong similarities in the ways words cluster around one another. The words for table and chair, for example, are frequently used together in all languages. So if a computer maps out these co-occurrences like a giant road atlas with words for cities, the maps for different languages will resemble each other, just with different names. A computer can then figure out the best way to overlay one atlas on another. Voila! You have a bilingual dictionary.
The studies -- "Unsupervised Machine Translation Using Monolingual Corpora Only" and "Unsupervised Neural Machine Translation" -- were both submitted to the e-print archive arXiv.org.

99 comments

  1. What's up with all these "Al" posts? by Anonymous Coward · · Score: 0

    Who is Al? Weird Al Yankovic?

    1. Re:What's up with all these "Al" posts? by RightwingNutjob · · Score: 1

      It's not a capital i or a lower-case L, it's a vertical line emoji.

    2. Re:What's up with all these "Al" posts? by Anonymous Coward · · Score: 1

      Al Gore Bilingual. -- Without a Dictionary?

    3. Re: What's up with all these "Al" posts? by Anonymous Coward · · Score: 0

      Glad I was not the only one...

    4. Re:What's up with all these "Al" posts? by asylumx · · Score: 1

      I immediately thought "Al Gore"

  2. AI goes bi by Anonymous Coward · · Score: 0

    fucking millenials

    1. Re:AI goes bi by Anonymous Coward · · Score: 1

      Totally legal now. Unlike when you were 30 and they were 8.

    2. Re:AI goes bi by Anonymous Coward · · Score: 0

      Totally legal now. Unlike when you were 30 and they were 8.

      news for you - it's still illegal if you are 30 and they are 8.

  3. Not Peer Reviewed by Anonymous Coward · · Score: 1

    Yet published on Slashdot because it centers around a buzzword.

    1. Re:Not Peer Reviewed by Anonymous Coward · · Score: 0

      It is not peer reviewed so what? You cannot read the papers by yourself and make your own opinion?

  4. No, it does not by gweihir · · Score: 3, Insightful

    In order to go "bilingual", it would have to be able to understand one language first. However understanding natural language is so far beyond the demented automation ("weak AI") available today, it is not even funny anymore. May as well claim a squirrel is a "gourmet chef", because it can bury nuts, i.e. "process food". Whether actual intelligence is going to be available on machines, ever, is at this time completely unknown, because nobody knows what it is. It is pretty clear though that the only natural computing hardware known (the human brain) is not powerful enough to create the intelligence observable at the interface of the smartest instances, at least if any known computing paradigm is assumed to be how it works. So either a completely computing paradigm is needed (and no, "neural" nets will not cut it and they are really old), or the problem is even more complicated.

    The real problem here is that most people are not smart enough to recognize a moron if the moron is dressed up prettily and spews pseudo-profound bullshit. Just look at who people vote for.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:No, it does not by ShanghaiBill · · Score: 4, Insightful

      In order to go "bilingual" ...

      The headline says "bilingual". Neither paper uses that term.

      it would have to be able to understand one language first.

      It is not clear if this is true. Translation accuracy has greatly improved, and is continuing to improve, despite the NNs having no understanding of how the languages map to reality. They only learn how the languages map to each other.

      "neural" nets will not cut it and they are really old

      What does age have to do with anything? Biological neural nets have been around for 600 million years.

    2. Re:No, it does not by Anonymous Coward · · Score: 1

      The real problem here is that most people are not smart enough to recognize a moron if the moron is dressed up prettily and spews pseudo-profound bullshit

      This definitely applies to comments on Slashdot, where "dressed up prettily"= scare quotes and overconfidence with a sprinkling of jargon.

    3. Re:No, it does not by Krishnoid · · Score: 1

      May as well claim a squirrel is a "gourmet chef", because it can bury nuts, i.e. "process food".

      Or similarly a rat, because it can control a human in a kitchen by pulling on its hair -- possibly with some assistance from the food processor in your example.

    4. Re:No, it does not by religionofpeas · · Score: 1

      Whether actual intelligence is going to be available on machines, ever, is at this time completely unknown, because nobody knows what it is.

      We got human level intelligence from old monkey brains by just fucking around for 100,000 generations.

    5. Re:No, it does not by Anonymous Coward · · Score: 0

      What does age have to do with anything? Biological neural nets have been around for 600 million years.

      Age is relevant in that it might be kinda cool if some day a biological neural net might just define 'what is intelligence.' So these old school neural nets continue to evolve alongside modern techy things like the twittersphere or interesting areas of research like AI, NN, ML, e.t.c.

      Best of both worlds so to speak, and also since our brains are the ones creating this technology.

    6. Re:No, it does not by Anonymous Coward · · Score: 0

      No not only did your fellow morons recognise one of their trump, They elected Trump the moron commander in chief of all morons.

    7. Re:No, it does not by Anonymous Coward · · Score: 0

      Translation is essentially just a more complicated version of transcoding. Your transcoder doesn't need to understand the movie, or care what the movie is about, etc.,... it's just translating it into another format to be read by a different reader.

      The difficult bit is that there's no fixed pattern or function, and it's not clear what exactly is being optimized. So current state of the art translators use statistics to figure out the mapping from input to output... and with enough data, it almost always does an ok job...

    8. Re:No, it does not by Anonymous Coward · · Score: 0

      "neural" nets will not cut it and they are really old

      What does age have to do with anything? Biological neural nets have been around for 600 million years.

      I think GP was referring to NN as a AI modeling concept, not biological NN (come on, sort of obvious). So, in this case, age is relevant because the concept and theory involved was already studied to exhaustion. We need something new to attack the AI problem. NN isn't it.

    9. Re:No, it does not by gweihir · · Score: 1

      That is actually unknown. Physicalism is a belief, not science. Actual science find the questions of intelligence and consciousness are currently getting more mysterious, not less so, as more data and facts become known.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    10. Re:No, it does not by gweihir · · Score: 1

      And fail. Have a look into the research literature at some time. If what you claim were true, we would have high-quality automated translation decades ago. Not cheap, but it would have been done and it would have had tons of applications in military and intelligence use were the money would have been available.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    11. Re:No, it does not by ShanghaiBill · · Score: 2

      age is relevant because the concept and theory involved was already studied to exhaustion.

      Not true at all. Backprop dates back to 1986. Autoencoding was introduced in 2006. GANs were first used in 2014. Perhaps even more importantly, fast parallel computing with cheap GPUs and mountains of training data were only recently available.

    12. Re:No, it does not by religionofpeas · · Score: 1

      That is actually unknown.

      For you, maybe.

    13. Re:No, it does not by Anonymous Coward · · Score: 0

      I see a bunch of assertions without proof or citation, followed by some gratuitous insults to distract the dim-witted (hey I best agree, otherwise he said I'm a moron). That might pass as a political debate but if I very much doubt it would pass peer review for publication.

    14. Re:No, it does not by Anonymous Coward · · Score: 0

      They're mysterious and complicated. However, they're indisputably linked to the physical brain.

    15. Re:No, it does not by Maxo-Texas · · Score: 1

      That explains a lot about the 2016 election in the U.S.

      --
      She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
    16. Re:No, it does not by Wulf2k · · Score: 1

      What actual science exists that has anything to say on anything outside of the physical world?

    17. Re:No, it does not by dinfinity · · Score: 1

      The real problem here is that most people are not smart enough to recognize a moron if the moron is dressed up prettily and spews pseudo-profound bullshit.

      Oh, I think I've just spotted one..

    18. Re:No, it does not by epine · · Score: 1

      age is relevant because the concept and theory involved was already studied to exhaustion.

      This is even more hilarious than that: Hinton has basically said that his methods from 1986 would have proved out on a practical basis if only the machines and data had been beefier at the time. Some of the recent improvements are nice, but he doesn't view them as essential.

      Oracle: Flight has been beaten to death since da Vinci.

      Wilbur: You'd be amazed how much wind tunnels have improved since the invention of the steam engine.

      Oracle: Wind tunnels have been beaten to death since—uh—J. Random Bernoulli.

      Wilbur: I can't keep them straight, either, but I'm pretty sure none of them knew how to scale models using the Reynolds number. In fact, I'm still struggling myself with the Reynolds-averaged Navier–Stokes equations.

      Oracle: You're so lame. Try writing it out in Einstein notation.

      Wilbur: Einstein? Why do I suspect that hasn't been invented yet?

      Oracle: Oh, right. I guess this isn't so beaten to death after all.

      Wilbur: Well, are we going to sit around waiting for some annus mirabilis or are we going to do something?

      Oracle: Einstein notation doesn't show up until long after the annus mirabilis.

      Wilbur: The annus mirabilis?

      Oracle: Absolutely no-one saw it coming.

      Wilbur: I guess that rules me out, too. So, you still think flight is tapped out?

      Oracle: Oh fuck it, you've convinced me. Let's build something.

      Wilbur: As you know, I've always said that another effort just might fly.

      Oracle: Nope. I'll pitch in, but I'm not telling.

    19. Re:No, it does not by Anonymous Coward · · Score: 0

      The way I explain why true AI as a myth to the average person is that I tell them to just take a look at how biology created human intelligence in the first place with its millions/billions of years of trial and error. And the technology of biology supersedes that of anything drummed up in a chip fab. I.E Our organic selves even though may age and be fragile, are far more "programmable" than we think. Once people realise just how much work has gone into building the human brain it makes it much easier for them to see complexity of how our brains work and from there you can quantify how unfathomable AI is.

      The problem is that the tech giants have their cheerleaders, they know nothing of what they talk about they're just there to reassure everyone of how powerful their company is by what awesome research they are doing. Just 10 secs of Bill Gates trying to dumb down our "mouths" to that of an organic USB port and you can't help to laugh and then claim how unefficient it is. Really, I think the fact one word can mean 10 stories or 100 feelings just shows how shaded human intelligence really is and how absolutely and irrefutably wrong Gates is.

      Just how complex is the study of human relationships alone? something that has entire fields of science dedicated to understanding but we're stuck not fully understanding. Yet here we are trying to replicate it and act as if we have already mastered it within the framework of computing.

    20. Re:No, it does not by gweihir · · Score: 1

      Indeed. I, unlike you, am an actual scientist.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  5. So if I check the entire source by Anonymous Coward · · Score: 0

    I won't find a single data dictionary? No hashmaps nothing ziltch? Yeah right!

  6. Only in the very general sense of the term by Anonymous Coward · · Score: 0

    This honestly sounds more like old-school cryptography than any kind of language analysis. None of the intent of the language is there, just the (likely) meaning of its words. And even then when you factor in things like figures of speech, metaphors and cultural references you're way off in left field in terms of figuring out what is being meant versus what is being said.

  7. My hovercraft is full of eels. by PPH · · Score: 2

    n/t

    --
    Have gnu, will travel.
  8. My nipples explode with delight! by Anonymous Coward · · Score: 0

    null

  9. I don't think so, tim by philmarcracken · · Score: 1

    I've been learning japanese for about 2 years, using SRS and reading. I can tell you these systems will be great for instructions on assembling a desk, or how to check your oil. Totally useless for storytelling. Anything containing references, jokes, wordplay, hell even pronouns where english just doesn't have as many will always be compromises.

    1. Re:I don't think so, tim by Actually,+I+do+RTFA · · Score: 4, Insightful

      That would be fine. The number of times I wanted a machine translated story in the past... I dunno, ever. 0. The number of times I wanted a technical paper, or instructions or tech specs are significant. Or even news. Storytelling, jokes and wordplay are the least interesting thing to translate, because there are people who actually already do that.

      --
      Your ad here. Ask me how!
    2. Re:I don't think so, tim by Anonymous Coward · · Score: 1

      Clearly you've never lived in a non-English speaking country.

    3. Re:I don't think so, tim by Zorpheus · · Score: 1

      That's what I was also thinking. Sure, where are areas where these word maps look the same. I would only expect this though if this area developed similarly, e.g. technical areas in the recent past, where we had world-wide communication. Also it should work for the base of the language as far as the languages have common roots.
      I would not expect it to work for idioms or anything where languages developed different concepts to describe things. It won't understand an Eskimo that talks about snow (they have 50 words for it).

    4. Re:I don't think so, tim by Anonymous Coward · · Score: 1

      That would be fine. The number of times I wanted a machine translated story in the past... I dunno, ever. 0.

      I guess it just means you are a uninteresting person with no taste or desire to know other cultures.

    5. Re:I don't think so, tim by Kjella · · Score: 1

      Well if you mean stories like novels not news stories, I agree. For any language the nuances and particularities will be lost in translation, even in human translations they sometimes have to explain some untranslateable words or concepts in a footnote. But I think they could do a lot better translating articles and blogs about subjects that address a broad audience and speak rather plainly in the native language. Often it still ends up being very awkward Yoda-isms and strange or incorrect choices of words, a machine translated NYT is not great. It's no more than okay-ish, often helped by people knowing a bit of English. Any truly foreign language like Russian, Chinese, Japanese etc. is still pretty bad and I'm assuming their experience with English is no better. And if you're translating from one minor language to the other it gets even crappier.

      --
      Live today, because you never know what tomorrow brings
  10. Google Translate? by Roger+W+Moore · · Score: 3, Interesting

    In order to go "bilingual", it would have to be able to understand one language first.

    Google translate can map between multiple languages without understanding any of them...which, admittedly, is why it does not do a great job but it is usually good enough to be reasonably understandable.

    1. Re:Google Translate? by jouassou · · Score: 4, Interesting

      It's good as long as all the languages are in the same language family, meaning that they share grammatic logic but have different vocabulary. But try translating English into a non-Indo-European language like Korean, with a fundamentally different way of expressing ideas, and it fails miserably. It's often not understandable at all.

      (For instance: English sentences require a subject in every sentence to be complete, meaning that you say "John is growing up" even though it's obvious who we're talking about. In Korean, you mention who you're talking in the beginning, and then it's implicit from context until you start talking about someone else, so you drop the subject in following sentences. Machine learning systems so far don't understand this distinction, so translating from Korean to English they keep inventing people in the sentences, so that "is growing up" might become "Dave is growing up" or "Alice is growing up", even though no Dave or Alice has been mentioned in the previous sentences, while they were mentioned a few times in the training material.)

    2. Re:Google Translate? by next_ghost · · Score: 1

      Meanwhile, if you use Google to translate "He is warm" into Czech, you still get the blind idiot translation which actually means "He is gay".

    3. Re:Google Translate? by parkinglot777 · · Score: 1

      I completely agree with this. Languages in Asia (especially South East Asia countries) have different language root compared to western languages. Culturally, the way people use the language, even in written style which is more formal and/or complete sentence, is different from the westerns. It is even worse in speaking language style because often times people don't exactly follow the language grammars but still understandable among them.

      Another point is that these languages usually have their own politeness/rudeness of the person you are talking to. There is no such pronoun in English. For example, the word "you" or "I" doesn't give a relationship you are with the person whom you are talking to even though the tone of the word may change to somewhat express the relationship/emotion. In Spanish and German, they have another pronoun replacing "you" for the person who is closer to the speaker (e.g. "tú" in Spanish and "du" in German). In Thai language, for example, there are many pronouns expressing different relationships and even emotions toward the person/group you are talking to.

      Even worse, Thai people usually have a nick name of an animal. When attempt to translate a sentence to English, a software would think that the name is an animal. As a result, it changes the meaning completely. An example of a simple complete sentence in Thai that means "Nok (a person) goes to see Noy (another person)." "Nok" in Thai meas a bird (noun) and "Noy" in Thai means little (adjective). Google Translate will give you "little bird" which has nothing to do with the real meaning.

    4. Re:Google Translate? by Anonymous Coward · · Score: 0

      Straight buffs should never ask for blankets in the Czech Republic. People may think you want to get "warm".

    5. Re:Google Translate? by gweihir · · Score: 1

      It can map between words and sentences. It cannot map between languages. It has no grasp of semantics.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    6. Re:Google Translate? by next_ghost · · Score: 1

      Actually, here's what it means to "get a blanket" in Czech. Idioms are fun.

    7. Re:Google Translate? by SoftwareArtist · · Score: 1

      Have you tried it recently? Their old phrase based translations were terrible for Asian languages. Ask it to translate Japanese into English and you'd get garbage. Then they rolled out their new system based on neural networks, and it suddenly got a lot better. Not perfect, but now you can tell what it's saying. It's always easier translating between closely related languages, but the NNs are surprisingly good even for distant ones.

      --
      "I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
    8. Re:Google Translate? by Anonymous Coward · · Score: 0

      Google translate seems to map between English and other languages, not directly between those other languages. A few examples of mistranslations I get when translating German to Dutch (my native language):

      In a German article about Theresa May her last name is translated to "mei" several times. "Mei" is the fifth month, which in German is "Mai", not "May". "Mei" is a translation of the English word, not the Dutch.

      "Auch" as the first word in a newspaper article due to its typography is seen as "A uch". "A" is translated to "Een", which is a correct translation for the English article "a", but from German the translation makes no sense.

      "Rette sich" is translated to "bespaar jezelf". The correct translation would be "red jezelf". Both "red" and "bespaar" translate to English as "save", "red" in the meaning of rescuing and "bespaar" in the meaning of not wasting something. The wrong English homonym is used.

    9. Re:Google Translate? by DRJlaw · · Score: 2

      (For instance: English sentences require a subject in every sentence to be complete...)

      Like hell.

      'eff you. /s

    10. Re:Google Translate? by Anonymous Coward · · Score: 0

      I tried it before I posted, and found those examples.

    11. Re:Google Translate? by Anonymous Coward · · Score: 0

      Sorry, I mixed up posts, I thought the question was about something else.

  11. Re:Jihad is an obligation of *ALL* Muslims by Anonymous Coward · · Score: 1

    Written down? That doesn't sound like what Muhammed intended...

  12. A cool idea, but that's how you get things. by stevenm86 · · Score: 2

    A neat idea, but this is how you get things like The Jedi Council turning into The Presbyterian Church.

    1. Re:A cool idea, but that's how you get things. by Anonymous Coward · · Score: 1

      I don't see the difference.

    2. Re:A cool idea, but that's how you get things. by Anne+Thwacks · · Score: 2
      I don't see the difference.

      Jedi have light sabres.

      --
      Sent from my ASR33 using ASCII
  13. Still Requires Data by Jezral · · Score: 4, Insightful

    These are very cool advances, but they don't solve the major problem of machine learning (ML): Having lots of data.

    While these approaches don't need bilingual corpora, they still need big monolingual corpora. Very few languages have those, and those that do usually also have bilingual corpora to one or more of the major world languages.

    This does lower the barrier to entry significantly for those doing ML machine translation. But, if one took the resources spent on gathering and curating corpora and instead invested in rule-based systems, you could get much further in less time.

    1. Re:Still Requires Data by serviscope_minor · · Score: 3, Informative

      Depends what you mean by "lots of data".

      This weakly supervised stuff is especially nice for NLP, since there are almost no large, general bilingual corpa. A few exist, but they're often the result of some legalistic process, so they cover something of a subset of language.

      There are a lot more languages with a lot of written text than there are language paired with large amounts of correlated texted.

      Also do you have any reason to think that rule based systems world be better? A huge amount of work went into those in the past, and their capabilities seem tapped out. The other thing is what you mean by "much further". The point of this paper seems to me to push the bar on weakly supervised learning, rather than to get the best translation software ever.

      Very weakly supervised learning can do all sorts of cool things. See for example cyclegan the zebrifier (it turns pictures of horses into pictures of zebras).

      --
      SJW n. One who posts facts.
    2. Re:Still Requires Data by ShanghaiBill · · Score: 1

      While these approaches don't need bilingual corpora, they still need big monolingual corpora.

      Except that we have terabytes of unstructured and unlabeled monolingual text. You could train it on Wikipedia pages. In fact, there is an entire library of congress of data in ... the library of congress.

    3. Re:Still Requires Data by Anonymous Coward · · Score: 0

      These are very cool advances, but they don't solve the major problem of machine learning (ML): Having lots of data.

      While these approaches don't need bilingual corpora, they still need big monolingual corpora. Very few languages have those, and those that do usually also have bilingual corpora to one or more of the major world languages.

      This does lower the barrier to entry significantly for those doing ML machine translation. But, if one took the resources spent on gathering and curating corpora and instead invested in rule-based systems, you could get much further in less time.

      "Lots of data" is a bit of a misnomer. The amount of data needed depends on the problem being solved.

      For example - you wouldn't consider 4 books, at a12th grade reading level, to be a lot of data; however - if you take those 4 books, and grab the same 4 in another language, you've just done a pretty good job at gathering the needed data to train your system to translate from one language to the other.

    4. Re:Still Requires Data by SoftwareArtist · · Score: 1

      But, if one took the resources spent on gathering and curating corpora and instead invested in rule-based systems, you could get much further in less time.

      Really? Why do you think that? Rule based is how all machine translation systems worked until just a few years ago. They worked, but not that great. And that's after decades of optimizing. Then the NMT systems came out and blew them out of the water.

      And building a monolingual corpus is pretty easy. Have a shelf of books written in that language? Great, scan them in. Maybe there's a newspaper with an archive of back issues. There you go, you're set. Way easier than a bilingual corpus, where someone has to translate everything by hand, and match up sentences between them.

      --
      "I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
  14. Can it decipher the Indus Valley script by Anonymous Coward · · Score: 2, Interesting

    Can it translate Linear A? Cretan heiroglyphic?

    1. Re:Can it decipher the Indus Valley script by Michael+Woodhams · · Score: 2

      That is what I was wondering. I'm betting the answer is "no". When you have very limited source material, and the correct translation of the source material is probably long lists of items like "3rd year, Nowhereville, 5 bushels wheat" I doubt this approach would get you anywhere.

      In every case which I am aware of, (hieroglyphs, Linear B, Mayan) decypherment of ancient scripts required that a close relative of the script language was known to the decypherers. (If anyone has counter examples, I'd love to know about them.) If the language of the script is completely extinct, we may never be able to decypher it.

      --
      Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
    2. Re:Can it decipher the Indus Valley script by Anonymous Coward · · Score: 1

      It wasn't until the discovery of the Rosetta Stone, that they were able to decipher Ancient Egyptian with confidence. They could guess what the symbols and glyphs meant but until there was some anchor point with all languages, they couldn't say for certain.

    3. Re:Can it decipher the Indus Valley script by next_ghost · · Score: 1

      In every case which I am aware of, (hieroglyphs, Linear B, Mayan) decypherment of ancient scripts required that a close relative of the script language was known to the decypherers. (If anyone has counter examples, I'd love to know about them.) If the language of the script is completely extinct, we may never be able to decypher it.

      Sumerian: Language isolate. Decyphered through Akkadian (Semitic language, related to modern Arabic and Hebrew) because both languages used the same cuneiform script which is (mostly) phonetic in nature.

      Etruscan: Believed to be part of the extinct Tyrsenian language family. Decyphered through Latin and Greek (both Indo-European languages) because Etruscan alphabet is the intermediate step between Greek and Latin alphabets.

      You don't need a related language, you only need some reference point for the phonology.

    4. Re:Can it decipher the Indus Valley script by Michael+Woodhams · · Score: 1

      Thank you

      --
      Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
  15. I call 'fake news' by mrthoughtful · · Score: 4, Insightful

    The assumption, that the world is the same, and languages are attached to it, lies at the bottom of the idea of this learning strategy. The example given - of 'table and chairs' demonstrates this. Most of these ideas belong to a 19th century eurocentric understanding of the world we live in. Modern neuroscience and other work points to the fact that the world we perceive is very much dominated by the language we use, and not the other way around.

    Concrete Example: For a large portion of the 19th-20th Century many Greeks measured distance in cigarettes - how many cigarettes I will smoke while travelling from one place to another. There is no cognate in English for this. Not only that, but the language usage indicates a specific timespan as well as cultural differences.

    "Idiom!" I hear you say. Consider cultures where there are many more tables than there are chairs - such as in Asia where most people sit on the floor or on cushions.

    "But there are some universals - we can still use those!" - generally, there are no universals, or so few that they are not worth talking about. Talk to an anthropologist about it. Not even the concept of 'mother' is a universal.

    --
    This comment was written with the intention to opt out of advertising.
    1. Re:I call 'fake news' by Anonymous Coward · · Score: 1

      Said someone who's probably never tried to create new knowledge. What you say is that it's not perfect. Indeed, the accuracy is much lower than the best attempt that has good data to learn from. But sure it's a new result, and something that can be useful.

    2. Re:I call 'fake news' by Anonymous Coward · · Score: 0

      "There's a 0.000001% that this generalisation is not valid, therefore it's useless".

    3. Re:I call 'fake news' by Anonymous Coward · · Score: 0

      Concrete Example: For a large portion of the 19th-20th Century many Greeks measured distance in cigarettes - how many cigarettes I will smoke while travelling from one place to another. There is no cognate in English for this. Not only that, but the language usage indicates a specific timespan as well as cultural differences.

      We (Greeks) still "measure distance in cigarettes"! I am a typical middle aged Greeks, and in the past few months i had at least a couple of my friends in separate occasions telling me "the place we are going it's just a cigarette away from where we are" when meeting them - i don't use this distance/time unit (it's like the "light year(s)" thing...) often because i smoke a pipe!

      We have so many references to this "cigarette distance/time unit" in our culture that you don't need Artificial Intelligence to understand why anti-smoking campaigns don't work well in Greece - e.g. a song by (probably) our most respected singer, titled (very roughly translated) "i wish pain was... one cigarette distance"!

      Are you a Greek? If not... well done for a barbarian!

    4. Re:I call 'fake news' by Baron_Yam · · Score: 1

      Everything you describe sounds like a feature to me, not a bug. Such a system would not only translate language, but culture.

      For common speech, this is an incredible advancement. Sure, you'll run into trouble when you specifically want a chair and the local custom is to sit on cushions... but when you're asking which 'chair' to sit on it'll work just fine and you'll figure it out when you're about to sit.

      For a large portion of the 19th-20th Century many Greeks measured distance in cigarettes - how many cigarettes I will smoke while travelling from one place to another.

      There is no cognate in English for this.

      You've never described something a car trip in gas tanks? It's rare, but it happens, especially when planning longer trips.

    5. Re:I call 'fake news' by Anonymous Coward · · Score: 0

      In the USA you might say someone lives "3 hours away", but in the UK people would possibly try to work out what country that was based on timezone.

    6. Re:I call 'fake news' by Anonymous Coward · · Score: 0

      [...] Sure, you'll run into trouble when you specifically want a chair and the local custom is to sit on cushions... but when you're asking which 'chair' to sit on it'll work just fine and you'll figure it out when you're about to sit. [...]

      As the Greek that described how we Greeks still use the "cigarette distance/time unit", i wonder what happens when you ask to "sit on a table"... we Greeks, when, for example, go to a restaurant/bar, almost always say "lets sit on this table" (NOT chair(s)) - a clip from a Greek comedy film titled "tables and chairs" :

      jealous husband - who was sitting in this table?

      retarded waiter - none in the table... people don't sit in the tables!

      jealous husband - o.k., o.k., in the chairs... who was sitting in the chairs?

    7. Re:I call 'fake news' by Anonymous Coward · · Score: 0

      Not even sure what you're claiming. Are you claiming that the machine would never learn how to make the translation from miles/seconds to cigarettes? Sure, the way things are talked about is different, but time and distance are independent from the languages we use to describe them. You seem like those people who after AlphaGo won against a pro player with a 5 stone handicap claimed that it would still never learn how to beat a pro in an even game. It all looks impossible to you because you can't imagine how it could be so, but that seems like a lack of imagination on your part and not a proof that it won't happen.

      What we have here is what's called an "argument from ignorance".

    8. Re:I call 'fake news' by Anonymous Coward · · Score: 0

      In the USA you might say someone lives "3 hours away"

      So, about a packet of cigarettes away...
      Greetings from Greece - with a song called "in my empty cigarette packet"

  16. Great all set for ARRIVAL by Anonymous Coward · · Score: 0

    http://www.imdb.com/title/tt2543164/

  17. "Preprint" not "e-print" please. by Anonymous Coward · · Score: 0

    Thanks for the attempt to be careful not to inappropriately imply the articles have been published. Still, the term "e-print" does suggest they were. This is why the word "preprint" is more appropriate.

    1. Re: "Preprint" not "e-print" please. by Anonymous Coward · · Score: 0

      Stop telling smart people they're using terms of art wrongly, because you disagree with them or don't understand them.

  18. Understanding by DrYak · · Score: 3, Informative

    "Understanding" has multiple level.

    Even you, dear snowflake, don't have the level of understanding a language that a reknown writer and poet could have of its intricacies.
    Or, you only have a vague grasp of some concepts in a field of work outside of yours, whereas some body expert in the field has a much better understanding.
    Even the pets (cats, dogs) in your house can have some basic understanding of things around, even if they don't think in such abstract concepts as you.

    This software, due to the way it's build (basically word2vec and deep neural net), has some very basic form of understanding the language.
    It's a very simple artificial brain, that is entirely optimised for one specific subdomain (language) and thus completely lacks other forms of thinking (cannot dissert about a scientific article written in said language).

    But the way this system works, is that is able to implicitly and autonomously build relationships between things.
    The kind of knowledge built into some ontology databases, except that here, the knowledge isn't manually constructed by the scientist filling the database, the knowledge is discovered on the go, not unlike how very young babies would discover the world around them.
    Okay, it's a very stupid and limited baby in this case, but still.
    It's good enough to catch and understand links between concepts.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
    1. Re:Understanding by Anonymous Coward · · Score: 0

      AI is not knowledgeable - it cannot have awareness - awareness is consciousness. Awareness cannot be provided by attaching a camera, utlizing some computer vision and machine learning, and then turning the system on - this is not perception - this is the appearance of perception - it is faux awareness; which means it is faux knowledge.

    2. Re:Understanding by Anonymous Coward · · Score: 0

      > AI is not knowledgeable - it cannot have awareness - awareness is consciousness.

      You've made a ton of unfounded assumptions.
      First define consciousness and prove, you or anything or anyone is conscious. Then, prove that the machine is not. Then prove that one must be conscious to be knowledgeable.

      And then forget all of that because it does not matter if this is "faux knowledge" or "real knowledge" as you have defined them if one is indistinguishable from the other from the outside.

    3. Re:Understanding by gweihir · · Score: 1

      And fail. (Well, what do you expect from a cretin that calls people "snowflake" without any good reason...)

      Even a smarter pet (a dog, for example) has some understanding and model of the real world and can map language to that model and can make (limited) predictions because it feels like it. An artificial neural net has nothing like that. It just has statistical classification and that is not enough for a world-model of even the most simple type, regardless of how "deep" you make it.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    4. Re:Understanding by gweihir · · Score: 2

      While that not reliably known at this time, it very much looks like it. In particular, for basically everything that you can do with technology, things start to make more sense the more you know when you get remotely in the area where you can think about actually doing it. Details get more complex, but the general working of a thing is understood at that time. With consciousness and intelligence, it is currently the other way round. We have absolutely no clue how they are generated, whether they are generated and what their nature is. We can only describe their effect to a limited degree. And we can only observe them together, making things even more obscure.

      That means either they cannot be created with technology, or we are very very far from being able to do so.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    5. Re:Understanding by Anonymous Coward · · Score: 0

      We don't even know why humans are "aware". We can't even give a good, non-circular definition of consciousness. And yet you are willing to start categorising what is and is not aware/conscious? For all you know consciousness - whatever it is - is a universal of all matter in varying degrees.

    6. Re:Understanding by Maxo-Texas · · Score: 1

      Yes, but it's still being done by a computer... so it will never be "Real A.I. (tm)"

      In 10 years from now when we are composing A.I.'s out of multiple A.I.'s like this one, it still won't be "real A.I. (tm)" even if it can completely replace 38% of human workers leaving them unemployable because they are not smart enough or lack the willpower to outperform "Not Real A.I.'s (tm)" even with additional- completely free- training.

      Right now, today with "not real A.I.'s" we are looking at 38% of jobs going away in less than a generation.

      I was at my electronics store last week. The "person" who directs you to the next open checkout stand is now an A.I. which works 2 shifts a day, 7 days a week. But.. it's okay.. it isn't a "real a.i. (tm)."

      ---
      Bonus round...
      The part of the human brain that drives cars isn't "real intelligence". Indeed, many people are in a semi-trance or thinking about other things while it autonomously drives the car.

      The part of the brain that does math isn't "real intelligence".

      No part of intelligence is "real intelligence. " What people are really talking about is consciousness which may be an emergent behavior of multiple "non-intelligent" or less intelligent brain subsystems.

      Particularly fascinating are humans with damage to the amygdala. They show how human intelligence is composed of multiple parts.

      A conscious A.I. is a potential extinction level event and we are being extremely careless in working on A.I. It should be air gapped from the internet with analog power limitations. People working on it should be observed remotely by other people for odd communication and behavior. And we don't do that so the most likely outcome is that an amoral intelligence which is vastly more intelligent than us will easily transition to the wild. If it follows the "human" model- it will view us as resources the way we view other species as resources.

      Don't get me wrong- it won't happen tomorrow- and may not happen ever. But it might happen a decade or two from now. And we aren't even taking basic precautions.

      OTH, it increasingly looks like we missed our chance to get off this rock- so maybe having an A.I. as the legacy of the human race wouldn't be that bad.

      --
      She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
    7. Re:Understanding by gweihir · · Score: 1

      Indeed. All we have is claims by people to have consciousness. We have absolutely no clue what it is, yet it seems every reasonably functional human being finds it has it, or at least claims so. At the same time, there is no mechanism for consciousness in known Physics. For example, pseudo-profound bullshit like "consciousness is an illusion" is circular because an illusion needs consciousness. At the very least we need a fundamental extension of physics to accommodate that, but, as in Physics matter, energy, particles, waves, etc. have no identity, it is also possible that Physics does not even apply and the theory itself is entirely unsuitable to describe consciousness, as that very much seems to be tied to an unique identity. The interesting thing would then be to find out were the interface is, as consciousness in humans can observe and influence Physical reality and a human brain is certainly a physical object. Of course, physics cannot describe living matter at this time either, the problem may already start there.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    8. Re:Understanding by SoftwareArtist · · Score: 1

      That means either they cannot be created with technology, or we are very very far from being able to do so.

      Or maybe it just means they're bogus concepts.

      People have argued for centuries about what "consciousness" and "intelligence" mean, and they still can't agree. So engineers roll their eyes, turn their backs on the argument, and get on with the job of creating useful things. And then people say, "It's not really intelligent! It's not really conscious!" Well, who's to say? If you can't define what the words mean, it's impossible to decide whether AIs meet the definitions. So give us some rigorous definitions of what you mean by the words "intelligent" and "conscious". Then (and only then) we can talk about how to build a machine that has those properties.

      "A word whose meaning isn't defined" is not the same thing as "a concept we don't understand".

      --
      "I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
  19. Water cooler by Anonymous Coward · · Score: 0

    This reminds of water cooler conversations between me (working on graph theory) and a colleague working on clustering of words in language 15 years ago. We didn't implement anything as we were already busy, and idiom seemed to be a harder problem of crack as it doesn't work at such a low level of granularity.

    1. Re:Water cooler by Anonymous Coward · · Score: 0

      One of the issues is also, discovered by my colleague, is that there is no single language as such : legalese is similar to, but not necessarily the same language, in some senses, as teenage street talk.

  20. You insensitive clod! by Anonymous Coward · · Score: 0

    You've never described something a car trip in gas tanks?

    I drive a Tesla!

  21. Stop worrying about AIs... by Anonymous Coward · · Score: 0

    The AI book that everyone should get is available for pre-order (April 23, 2018). "Artificial Intelligence For Dummies" by John Paul Mueller and Luca Massaron.

  22. Try it on the Voynich Manuscript by Anonymous Coward · · Score: 0

    I would like to see the results of feeding the Voynich Manuscript into an algorithm like this and "translating" it to English. The manuscript is limited in length so the chance anything decipherable results is low.

  23. Al who? by Jaegs · · Score: 1

    Who is Al, and why does it matter if he's bilingual?

    #serifisimportant

  24. simple thermodynamics by epine · · Score: 1

    Anyone who understands that there was a lot more to Bletchley Park than rotor combinatorics can't honestly say they find this result surprising.

    Especially when the languages chosen have a shocked degree of family resemblence.

    No word for "I" or "me" or "mine"

    It isn't because the Vietnamese are not passionate. Rather, there is no word for "I" or "you" in colloquial Vietnamese.

    People address each other according to their relative ages: "anh" for older brother, "chi" for older sister, "em" for younger sibling and so on. This is why Vietnamese quickly ask strangers how old they are so that they can use the appropriate pronoun and treat them with the correct amount of respect.

    So a typical declaration of love might be: "Older brother loves younger sister."

    From pronouns and proper nouns, quickly one identifies words associated with being a person, and immediately there's an enormous cluster of classifications and modifiers in any language especially dealing with human traits, not the least of which concerns hierarchy (mother, father, sister, brother) and age structure (baby, toddler, child, youth, adult, senior, geriatric).

    Pretty soon you're into affect and habit, such as shivering while shovelling the driveway of the white snow, then contentedly taking a long, hot bath.

    Simple thermodynamics.

  25. Universal Translator by psnyder · · Score: 1

    What's exciting (to me) is that this method is what's necessary for the universal translators in Star Trek / other Sci-Fi to actually work. In Star Trek: Enterprise, for example, their universal translator had to listen to a lot of alien speech as it would gradually make phrases more and more understandable. We're still a long way to go, but this methodology brings that dream closer.

  26. Dolphins by Anonymous Coward · · Score: 0

    Can someone do something actually useful and create a translator for Dolphin? We could learn so much from them, and that would be super cool!