Microsoft Announces Breakthrough In Chinese-To-English Machine Translation (techcrunch.com)
A team of Microsoft researchers announced on Wednesday they've created the first machine translation system that's capable of translating news articles from Chinese to English with the same accuracy as a person. "The company says it's tested the system repeatedly on a sample of around 2,000 sentences from various online newspapers, comparing the result to a person's translation in the process -- and even hiring outside bilingual language consultants to further verify the machine's accuracy," reports TechCrunch. From the report: The sample set, called newstest2017, was released just last fall at the research conference WMT17. Deep neural networks, a method of training A.I. systems, allowed the researchers to create more fluent and natural-sounding translations that take into account broader context that the prior approaches, called statistical machine translation. Microsoft's researchers also added their own training methods to the system to improve its accuracy -- things they equate to how people go over their own work time and again to make sure it's right.
The researchers said they used methods including dual learning for fact-checking translations; deliberation networks, to repeat translations and refine them; and new techniques like joint training, to iteratively boost English-to-Chinese and Chinese-to-English translation systems; and agreement regularization, which can generate translations by reading sentences both left-to-right and right-to-left. Zhou said the techniques used to achieve the milestone won't be limited to machine translations. The researchers caution the system has not yet been tested on real-time news stories, and there are other challenges that still lie ahead before the technology could be commercialized into Microsoft's products. You can play around with the new translation system here.
The researchers said they used methods including dual learning for fact-checking translations; deliberation networks, to repeat translations and refine them; and new techniques like joint training, to iteratively boost English-to-Chinese and Chinese-to-English translation systems; and agreement regularization, which can generate translations by reading sentences both left-to-right and right-to-left. Zhou said the techniques used to achieve the milestone won't be limited to machine translations. The researchers caution the system has not yet been tested on real-time news stories, and there are other challenges that still lie ahead before the technology could be commercialized into Microsoft's products. You can play around with the new translation system here.
by the frequent mention of an âoewater goatâ in their correspondence.
I'm still perplexed by the frequent mention of "âoe" and "â" on Slashdot.
systemd is Roko's Basilisk.
Actually bi-cultural bi-linguals. There are differences in culture which drive the different expressions and translations. Auto-translation is of course a very important tool in global human discourse. The problem, well, the less informed, the less educated, those with far less understanding, will be readily able to communicate with each across the language barrier, think say American Rednecks and Chinese Rednecks, screaming at each other about how their armies can destroy each other and flooding other parts of the internet with their rubbish. Think a few hundred millions rather lame rude and crude trolls infecting all corners of the internet, generating racism and hate across a much broader scale. Societies losers spreading their bitterness across the internet striving to drive conflict and strife, better be prepared.
The other thing of course, the translation software has to be open source and freely accessible, others corporations become the gate keepers data mining and manipulating the translation to serve their own individual psychopathic greed and ego. Every single corporation, once it has gained sufficient power become the dominant player, has every single time, demonstrated it's inherent psychopathic nature, well, at least the psychopathic nature of it's board and executive team, in cultural corporate groups, where the biggest POS rises to the top (it's built into that determination to be the dominant player, to charge monopolistic profit margins, to lie with impunity, to cheat impunity, to steal with impunity, to kill with indifference, to wholesale steal the rights of citizens, invade the privacy, seek to control their actions to serve the sick egos of corporations). You most definitely do not want any private for profit corporation to be the gatekeepers of language translation, most definitely not M$ or Google or Facebook or Twittter.
Chaos - everything, everywhere, everywhen
I read the MS blog and skimmed the actual paper. It gives a decent overview of the system design but has basically no details on the linguistics side of things. They just hired a bunch of people to do manual translation, both for training and for testing, but the only details of the results are a single table summarizing what categories of errors occurred.
A lot of relevant information was missing. To start with, saying "Chinese language" is like saying "European language" - there isn't one unified "Chinese", but rather a variety of languages, topolects and dialects, with some level of mutual intelligibility, but it varies considerably. Not all variants use the same writing system - most use Hanzi, but there's the whole Traditional vs. Simplified issue, and some obscure varieties use entirely different systems (eg. Dungan is written using Cyrillic, despite being closer to Mandarin than many Hanzi-using topolects). And secondary writing systems abound - for teaching and for computer usage, both the Latin alphabet and Bopomofo syllabary are used, in the mainland and Taiwan, respectively.
From context, they seem to be aiming for Mandarin Chinese, the most common variety, and they only accept input in Simplified Hanzi, but they don't make that at all clear from the paper. Was the training corpus exclusively Mandarin, or did it include Cantonese or Hakka or Minnan? Was it entirely Mainstream Mandarin, or were regional dialects like Sichuanese included? The nature of the logographic writing system elides a lot of differences, but I can't see how you could completely ignore the issue. At the very least, I would expect it would be a problem for false negatives in the validation - these are issues for human translators as well. Did they dig deeper into the reported translation issues, and find any were a case of "oh, the news article was written in MSM but quoted someone using Dalian dialect" and then have to figure out whether the human or the machine was more accurate? I didn't read the paper thoroughly but I didn't see any mention at all of any of this crap.
Anyways, they may or may not have made progress on the AI front. I am even less qualified to judge that than I am the linguistics side of it. But there's so many things *not* discussed in the paper that I can't help but feel like they're overstating their results. Guess I'll have to wait for the language blogs to pick up on it.