Slashdot Mirror


AI Goes Bilingual -- Without a Dictionary (sciencemag.org)

sciencehabit shares a report from Science Magazine: Automatic language translation has come a long way, thanks to neural networks -- computer algorithms that take inspiration from the human brain. But training such networks requires an enormous amount of data: millions of sentence-by-sentence translations to demonstrate how a human would do it. Now, two new papers show that neural networks can learn to translate with no parallel texts -- a surprising advance that could make documents in many languages more accessible.

The two new papers, both of which have been submitted to next year's International Conference on Learning Representations but have not been peer reviewed, focus on another method: unsupervised machine learning. To start, each constructs bilingual dictionaries without the aid of a human teacher telling them when their guesses are right. That's possible because languages have strong similarities in the ways words cluster around one another. The words for table and chair, for example, are frequently used together in all languages. So if a computer maps out these co-occurrences like a giant road atlas with words for cities, the maps for different languages will resemble each other, just with different names. A computer can then figure out the best way to overlay one atlas on another. Voila! You have a bilingual dictionary.
The studies -- "Unsupervised Machine Translation Using Monolingual Corpora Only" and "Unsupervised Neural Machine Translation" -- were both submitted to the e-print archive arXiv.org.

3 of 99 comments (clear)

  1. Google Translate? by Roger+W+Moore · · Score: 3, Interesting

    In order to go "bilingual", it would have to be able to understand one language first.

    Google translate can map between multiple languages without understanding any of them...which, admittedly, is why it does not do a great job but it is usually good enough to be reasonably understandable.

    1. Re:Google Translate? by jouassou · · Score: 4, Interesting

      It's good as long as all the languages are in the same language family, meaning that they share grammatic logic but have different vocabulary. But try translating English into a non-Indo-European language like Korean, with a fundamentally different way of expressing ideas, and it fails miserably. It's often not understandable at all.

      (For instance: English sentences require a subject in every sentence to be complete, meaning that you say "John is growing up" even though it's obvious who we're talking about. In Korean, you mention who you're talking in the beginning, and then it's implicit from context until you start talking about someone else, so you drop the subject in following sentences. Machine learning systems so far don't understand this distinction, so translating from Korean to English they keep inventing people in the sentences, so that "is growing up" might become "Dave is growing up" or "Alice is growing up", even though no Dave or Alice has been mentioned in the previous sentences, while they were mentioned a few times in the training material.)

  2. Can it decipher the Indus Valley script by Anonymous Coward · · Score: 2, Interesting

    Can it translate Linear A? Cretan heiroglyphic?