Can Your Mouth Become Multilingual?
Roland Piquepaille writes "During a videoconference last week between Karlsruhe, Germany, and Carnegie Mellon University (CMU), Pittsburgh, USA, the talk of Alex Waibel, from CMU, was automatically translated in German and Spanish. Both the Pittsburgh Post-Gazette (PPG) and the Pittsburgh Tribune-Review (PTR) attended the conference, took pictures and were impressed by this new 'open domain' speech-to-speech translation. This new computer technology is based on artificial intelligence (AI) and statistical methods. During the demonstration, the speaker had electrodes attached to his face and his neck, but the researchers think that these electrodes could be implanted into your mouth and your throat in a decade from now -- if you agree of course."
Computers have a hard time translating written things as it is... any bilingual will tell you that online translators for complete sentences will do nobody any good, for the most part. My Spanish teachers are all able to see papers with computer translations very easily, due to similarities in words and meanings (such as the word "pants" which can be colthing or breathing heavily) Not to mention, grammar and things like that are not done well at all. For the fun of it, try going to an Online translator and write something in English, translate it to Spanish, then back to English. Some results are pretty crazy. I guess the point I'm trying to make is this: what makes the translators so special compared to the ones we have now? How can they work better? Sure, there is probably a bit more effort put into these, but I don't think that a good translator will be available for another 5 years, not to mention the whole "take the speech you aren't saying" thing is hard to believe.
Another approach is from some work I saw demoed at an MIT conference in Vienna. If you capture enough video of a person speaking, you can remix/rerender video of that person saying anything you want them to say. The software works at the phonetic level so you can even synthesize words that the person has never even uttered before and even make them appear to speak languages that they don't know. They had some visually convincing video showing people saying things that the researchers claimed they never said. Yes, the demo version worked with clean test video and a professional video/image analyst could probably spot a faked/remized video. But if these technology becomes good enough, I can see it making video a nontrustworthy source of data (like skillfully retouched photos).
Two wrongs don't make a right, but three lefts do.