Google's AI Translation Tool Creates Its Own Secret Language (techcrunch.com)
After a little over a month of learning more languages to translate beyond Spanish, Google's recently announced Neural Machine Translation system has used deep learning to develop its own internal language. TechCrunch reports: GNMT's creators were curious about something. If you teach the translation system to translate English to Korean and vice versa, and also English to Japanese and vice versa... could it translate Korean to Japanese, without resorting to English as a bridge between them? They made this helpful gif to illustrate the idea of what they call "zero-shot translation" (it's the orange one). As it turns out -- yes! It produces "reasonable" translations between two languages that it has not explicitly linked in any way. Remember, no English allowed. But this raised a second question. If the computer is able to make connections between concepts and words that have not been formally linked... does that mean that the computer has formed a concept of shared meaning for those words, meaning at a deeper level than simply that one word or phrase is the equivalent of another? In other words, has the computer developed its own internal language to represent the concepts it uses to translate between other languages? Based on how various sentences are related to one another in the memory space of the neural network, Google's language and AI boffins think that it has. The paper describing the researchers' work (primarily on efficient multi-language translation but touching on the mysterious interlingua) can be read at Arxiv.
That would be nice if translating sentences was the same as looking up words in a dictionary. It's not. So pointing out that there are words that have correspondences is meaningless.
Languages have a fuzzy haze of concepts and ways to parse them. I could say "I feel sick" or "I am sick" in English and they're not the same, the latter expresses certainty. But in Icelandic you'd generally say "Ég er lasin(n)" or "Ég er veik(ur)" - aka, "I am sick" - for both of them. Not "I feel sick". You *can* say "I feel as if I'm sick", but that gives a sort of connotation as if you're doubting yourself, more than "I feel sick" does in English. The latter case is "Mér líður eins og ég sé veik(ur)", which is literally "Me (dative, not nominative) feels same and I would-be(pres.) sick (depends on gender)" There's an awful lot going on in there that a word-for-word translation just doesn't catch. Even if you catch phrases, like "eins og" -> "like" rather than literally "same and", you still don't have anything close to a one-to-one mapping.
And here we're talking two Germanic languages.
A neural net that can handle translations in a way where the results aren't terrible must have a concept of the fuzziness, the interplay of how different concepts are presented in different languages. And indeed, that's what the graphic that they show seems to suggest, where you have these branching clusters with varying pathways that dart between them for different languages. Perhaps calling that internal representation a "secret language" is a stretch, but it's most definitely nothing like having "English as a bridge language".
Wingus, Dingus! Listen up!