AI Goes Bilingual -- Without a Dictionary (sciencemag.org)
sciencehabit shares a report from Science Magazine: Automatic language translation has come a long way, thanks to neural networks -- computer algorithms that take inspiration from the human brain. But training such networks requires an enormous amount of data: millions of sentence-by-sentence translations to demonstrate how a human would do it. Now, two new papers show that neural networks can learn to translate with no parallel texts -- a surprising advance that could make documents in many languages more accessible.
The two new papers, both of which have been submitted to next year's International Conference on Learning Representations but have not been peer reviewed, focus on another method: unsupervised machine learning. To start, each constructs bilingual dictionaries without the aid of a human teacher telling them when their guesses are right. That's possible because languages have strong similarities in the ways words cluster around one another. The words for table and chair, for example, are frequently used together in all languages. So if a computer maps out these co-occurrences like a giant road atlas with words for cities, the maps for different languages will resemble each other, just with different names. A computer can then figure out the best way to overlay one atlas on another. Voila! You have a bilingual dictionary. The studies -- "Unsupervised Machine Translation Using Monolingual Corpora Only" and "Unsupervised Neural Machine Translation" -- were both submitted to the e-print archive arXiv.org.
The two new papers, both of which have been submitted to next year's International Conference on Learning Representations but have not been peer reviewed, focus on another method: unsupervised machine learning. To start, each constructs bilingual dictionaries without the aid of a human teacher telling them when their guesses are right. That's possible because languages have strong similarities in the ways words cluster around one another. The words for table and chair, for example, are frequently used together in all languages. So if a computer maps out these co-occurrences like a giant road atlas with words for cities, the maps for different languages will resemble each other, just with different names. A computer can then figure out the best way to overlay one atlas on another. Voila! You have a bilingual dictionary. The studies -- "Unsupervised Machine Translation Using Monolingual Corpora Only" and "Unsupervised Neural Machine Translation" -- were both submitted to the e-print archive arXiv.org.
Who is Al? Weird Al Yankovic?
fucking millenials
Yet published on Slashdot because it centers around a buzzword.
In order to go "bilingual", it would have to be able to understand one language first. However understanding natural language is so far beyond the demented automation ("weak AI") available today, it is not even funny anymore. May as well claim a squirrel is a "gourmet chef", because it can bury nuts, i.e. "process food". Whether actual intelligence is going to be available on machines, ever, is at this time completely unknown, because nobody knows what it is. It is pretty clear though that the only natural computing hardware known (the human brain) is not powerful enough to create the intelligence observable at the interface of the smartest instances, at least if any known computing paradigm is assumed to be how it works. So either a completely computing paradigm is needed (and no, "neural" nets will not cut it and they are really old), or the problem is even more complicated.
The real problem here is that most people are not smart enough to recognize a moron if the moron is dressed up prettily and spews pseudo-profound bullshit. Just look at who people vote for.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
I won't find a single data dictionary? No hashmaps nothing ziltch? Yeah right!
This honestly sounds more like old-school cryptography than any kind of language analysis. None of the intent of the language is there, just the (likely) meaning of its words. And even then when you factor in things like figures of speech, metaphors and cultural references you're way off in left field in terms of figuring out what is being meant versus what is being said.
n/t
Have gnu, will travel.
null
I've been learning japanese for about 2 years, using SRS and reading. I can tell you these systems will be great for instructions on assembling a desk, or how to check your oil. Totally useless for storytelling. Anything containing references, jokes, wordplay, hell even pronouns where english just doesn't have as many will always be compromises.
In order to go "bilingual", it would have to be able to understand one language first.
Google translate can map between multiple languages without understanding any of them...which, admittedly, is why it does not do a great job but it is usually good enough to be reasonably understandable.
Written down? That doesn't sound like what Muhammed intended...
A neat idea, but this is how you get things like The Jedi Council turning into The Presbyterian Church.
These are very cool advances, but they don't solve the major problem of machine learning (ML): Having lots of data.
While these approaches don't need bilingual corpora, they still need big monolingual corpora. Very few languages have those, and those that do usually also have bilingual corpora to one or more of the major world languages.
This does lower the barrier to entry significantly for those doing ML machine translation. But, if one took the resources spent on gathering and curating corpora and instead invested in rule-based systems, you could get much further in less time.
Can it translate Linear A? Cretan heiroglyphic?
The assumption, that the world is the same, and languages are attached to it, lies at the bottom of the idea of this learning strategy. The example given - of 'table and chairs' demonstrates this. Most of these ideas belong to a 19th century eurocentric understanding of the world we live in. Modern neuroscience and other work points to the fact that the world we perceive is very much dominated by the language we use, and not the other way around.
Concrete Example: For a large portion of the 19th-20th Century many Greeks measured distance in cigarettes - how many cigarettes I will smoke while travelling from one place to another. There is no cognate in English for this. Not only that, but the language usage indicates a specific timespan as well as cultural differences.
"Idiom!" I hear you say. Consider cultures where there are many more tables than there are chairs - such as in Asia where most people sit on the floor or on cushions.
"But there are some universals - we can still use those!" - generally, there are no universals, or so few that they are not worth talking about. Talk to an anthropologist about it. Not even the concept of 'mother' is a universal.
This comment was written with the intention to opt out of advertising.
http://www.imdb.com/title/tt2543164/
Thanks for the attempt to be careful not to inappropriately imply the articles have been published. Still, the term "e-print" does suggest they were. This is why the word "preprint" is more appropriate.
"Understanding" has multiple level.
Even you, dear snowflake, don't have the level of understanding a language that a reknown writer and poet could have of its intricacies.
Or, you only have a vague grasp of some concepts in a field of work outside of yours, whereas some body expert in the field has a much better understanding.
Even the pets (cats, dogs) in your house can have some basic understanding of things around, even if they don't think in such abstract concepts as you.
This software, due to the way it's build (basically word2vec and deep neural net), has some very basic form of understanding the language.
It's a very simple artificial brain, that is entirely optimised for one specific subdomain (language) and thus completely lacks other forms of thinking (cannot dissert about a scientific article written in said language).
But the way this system works, is that is able to implicitly and autonomously build relationships between things.
The kind of knowledge built into some ontology databases, except that here, the knowledge isn't manually constructed by the scientist filling the database, the knowledge is discovered on the go, not unlike how very young babies would discover the world around them.
Okay, it's a very stupid and limited baby in this case, but still.
It's good enough to catch and understand links between concepts.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
This reminds of water cooler conversations between me (working on graph theory) and a colleague working on clustering of words in language 15 years ago. We didn't implement anything as we were already busy, and idiom seemed to be a harder problem of crack as it doesn't work at such a low level of granularity.
You've never described something a car trip in gas tanks?
I drive a Tesla!
The AI book that everyone should get is available for pre-order (April 23, 2018). "Artificial Intelligence For Dummies" by John Paul Mueller and Luca Massaron.
I would like to see the results of feeding the Voynich Manuscript into an algorithm like this and "translating" it to English. The manuscript is limited in length so the chance anything decipherable results is low.
Who is Al, and why does it matter if he's bilingual?
#serifisimportant
Anyone who understands that there was a lot more to Bletchley Park than rotor combinatorics can't honestly say they find this result surprising.
Especially when the languages chosen have a shocked degree of family resemblence.
No word for "I" or "me" or "mine"
From pronouns and proper nouns, quickly one identifies words associated with being a person, and immediately there's an enormous cluster of classifications and modifiers in any language especially dealing with human traits, not the least of which concerns hierarchy (mother, father, sister, brother) and age structure (baby, toddler, child, youth, adult, senior, geriatric).
Pretty soon you're into affect and habit, such as shivering while shovelling the driveway of the white snow, then contentedly taking a long, hot bath.
Simple thermodynamics.
What's exciting (to me) is that this method is what's necessary for the universal translators in Star Trek / other Sci-Fi to actually work. In Star Trek: Enterprise, for example, their universal translator had to listen to a lot of alien speech as it would gradually make phrases more and more understandable. We're still a long way to go, but this methodology brings that dream closer.
Can someone do something actually useful and create a translator for Dolphin? We could learn so much from them, and that would be super cool!