Slashdot Mirror


AI Goes Bilingual -- Without a Dictionary (sciencemag.org)

sciencehabit shares a report from Science Magazine: Automatic language translation has come a long way, thanks to neural networks -- computer algorithms that take inspiration from the human brain. But training such networks requires an enormous amount of data: millions of sentence-by-sentence translations to demonstrate how a human would do it. Now, two new papers show that neural networks can learn to translate with no parallel texts -- a surprising advance that could make documents in many languages more accessible.

The two new papers, both of which have been submitted to next year's International Conference on Learning Representations but have not been peer reviewed, focus on another method: unsupervised machine learning. To start, each constructs bilingual dictionaries without the aid of a human teacher telling them when their guesses are right. That's possible because languages have strong similarities in the ways words cluster around one another. The words for table and chair, for example, are frequently used together in all languages. So if a computer maps out these co-occurrences like a giant road atlas with words for cities, the maps for different languages will resemble each other, just with different names. A computer can then figure out the best way to overlay one atlas on another. Voila! You have a bilingual dictionary.
The studies -- "Unsupervised Machine Translation Using Monolingual Corpora Only" and "Unsupervised Neural Machine Translation" -- were both submitted to the e-print archive arXiv.org.

2 of 99 comments (clear)

  1. Re:Still Requires Data by serviscope_minor · · Score: 3, Informative

    Depends what you mean by "lots of data".

    This weakly supervised stuff is especially nice for NLP, since there are almost no large, general bilingual corpa. A few exist, but they're often the result of some legalistic process, so they cover something of a subset of language.

    There are a lot more languages with a lot of written text than there are language paired with large amounts of correlated texted.

    Also do you have any reason to think that rule based systems world be better? A huge amount of work went into those in the past, and their capabilities seem tapped out. The other thing is what you mean by "much further". The point of this paper seems to me to push the bar on weakly supervised learning, rather than to get the best translation software ever.

    Very weakly supervised learning can do all sorts of cool things. See for example cyclegan the zebrifier (it turns pictures of horses into pictures of zebras).

    --
    SJW n. One who posts facts.
  2. Understanding by DrYak · · Score: 3, Informative

    "Understanding" has multiple level.

    Even you, dear snowflake, don't have the level of understanding a language that a reknown writer and poet could have of its intricacies.
    Or, you only have a vague grasp of some concepts in a field of work outside of yours, whereas some body expert in the field has a much better understanding.
    Even the pets (cats, dogs) in your house can have some basic understanding of things around, even if they don't think in such abstract concepts as you.

    This software, due to the way it's build (basically word2vec and deep neural net), has some very basic form of understanding the language.
    It's a very simple artificial brain, that is entirely optimised for one specific subdomain (language) and thus completely lacks other forms of thinking (cannot dissert about a scientific article written in said language).

    But the way this system works, is that is able to implicitly and autonomously build relationships between things.
    The kind of knowledge built into some ontology databases, except that here, the knowledge isn't manually constructed by the scientist filling the database, the knowledge is discovered on the go, not unlike how very young babies would discover the world around them.
    Okay, it's a very stupid and limited baby in this case, but still.
    It's good enough to catch and understand links between concepts.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]