Slashdot Mirror


Microsoft Shows Off Adaptive, Multilingual Text to Speech System

MrSeb writes about a really cool project from Microsoft's speech research group. From the article: "Microsoft Research has shown off software that translates your spoken words into another language while preserving the accent, timbre, and intonation of your actual voice. In a demo of the prototype software, Rick Rashid, Microsoft's chief research officer, said a long sentence in English, and then had it translated into Spanish, Italian, and Mandarin. You can definitely hear an edge of digitized 'Microsoft Sam,' but overall it's remarkable how the three translations still sound just like Rashid. The translation requires an hour of training, but after that there's no reason why it couldn't be run in real time on a smartphone, or near-real-time with a cloud backend. Imagine this tech in a two-way setup. You speak into your smartphone, and it comes out in their language. Then, the person you're talking to speaks into your smartphone and their voice comes out in your language." The Techfest 2012 keynote has a demo of the technology around minute 13:00.

8 of 171 comments (clear)

  1. The big boss was impressed by another demo by Anonymous Coward · · Score: 5, Funny

    "Programmeurs, programmeurs, programmeurs, programmeurs, programmeurs!"

  2. First translation fail by HBI · · Score: 5, Funny

    "My hovercraft is full of eels" would have been perfect.

    --
    HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
  3. Re:Do they sound alike? by Pseudonym+Authority · · Score: 5, Funny

    I completely agree. It is total garbage and if it isn't absolutely flawless in every possible regard, then it should not even have been attempted.

  4. Re:microsoft and their credibility by Ethanol-fueled · · Score: 5, Funny

    My employer is a Microsoft shop. Microsoft Windows Seven optimizes my productivity with its new context-sensitive search. Microsoft Office allows me to quickly compose documents and spreadsheets of arbitrary complexity.

    It is no surprise that Excel is being used for engineering given its power and flexibility. Hell, a shop I worked for used Excel as its database.

    Now let's get down the the nitty-gritty - Visual Studio is one of the most powerful IDEs on the face of the planet. You want power? You got it. You want speed? You got it. You want both? It empowers you, the ninety-pound weakling, with both, with minimal effort. I got a raise because I used Visual Studio. I got my dick sucked by my boss' hottest secretary because I wrote an patch in C# that prevented our ERP system from total meltdown.

    Why be some boring open-source ODBC slob when you can be fast. Quick. Nimble. Packing.

    Be potent. Be Microsoft.

  5. Re:Given the torment that foreign language class by ChatHuant · · Score: 5, Insightful

    That said, I don't regret learning Spanish, but learning it just so you can get a cheaper tourist trap is not worth it at all.

    Of course it's not worth it, if all the benefit you find in knowing another language is saving a couple of bucks at some touristy place. But knowing a different language is much more than that. You have now access to new worlds of literature, movies, poetry and music first hand, without a translator to intermediate (because, as the Italians say, "traduttore, traditore"!). You can talk to more people directly, understand their culture, expand your mind. You can read a whole set of new web sites, see different perspectives, or read news that aren't easily available otherwise. It opens lots of new possibilities for you - for example if you want to work for a global company, or if you ever feel like work in a different country for a few years. And even without any of those, the very effort of learning a different language improves your brain and slows mental aging.

    I'm relatively fluent in three languages now, and can more or less read another two. I read books in all of them, and I find it really enriches my mind. I just started learning a fourth (Japanese), and am really looking forward to reading Japanese books in their original form (even though learning enough of the kanji characters will be a pain).

  6. Re:Do they sound alike? by Phics · · Score: 5, Insightful

    It's not garbage, and if they had real innovations, it would be nice. Instead, they've taken a few characteristics of a speaker, like pitch, and used those to model the computer voice in another language.

    No, if you listened to the keynote, they took speech characteristics, and then broke the target voice pattern up into 5ms pieces and reconstructed the voice to match a reference translation from a different language. What they are doing is not only very interesting, but clearly has space for improvement and a variety of applications.

    It's about as interesting as if someone said, "what would you look like if you were a boy?" (or girl, if you are male), and then sampled your eye color, hair length, nose shape, etc, and then morphed those into a stock photo of a boy. Yeah, it would have some characteristics of you, but it also wouldn't be what you would look like if you were a boy.

    That's sort of the point. The sampled voice may not speak fluent Mandarin, but if you'd like it to, this technology will allow it to. A better analogy would be along the lines of taking a computerized sample of your body shape and texture, (skin, hair, face, etc), and then using 3D animation to reconstruct a model of you doing karate, even if you didn't actually know karate.

    Eventually, as the 'resolution' improves, the bits of this that you disapprove of, (the computerized feel you are getting from the voice), will most certainly improve as well. But it's the underlying ideas and tech which are interesting here.

    --
    There are two types of people in the world; those who believe there are two types of people, and those who don't.
  7. Re:Sounds cool....but.. by Gadget_Guy · · Score: 4, Informative

    They sell Microsoft Office for operating systems other than Windows.

    This concession to the antitrust authorities and Apple is something of an exception to the general rule and it was a brutal fight to make it come about.

    What rubbish! The first version of Microsoft Office EVER was for the Mac in August 1989. The Windows release came out in November 1990. With whom did they have this "brutal fight" to get this released for the Mac?

    Interestingly, according to Wikipedia, after the release of Word for the Mac in 1985 (2 years after Word for MS-DOS and Xenix), "Word for Mac's sales were higher than its MS-DOS counterpart for at least four years". It seems that Microsoft were rather pragmatic about selling software where it would make a buck!

  8. Re:I see where this is headed. by msclrhd · · Score: 5, Informative

    Provided that the speech recognition engine is good enough, it can distinguish between the /Q/ and /A/ sounds in lot (British English: /lQt/, General American English: /lAt/), cot, hot, etc, with /A/ also appearing in father /fA:D@/. This will mean that the speech recognition engine will record the actual phonemes spoken, rather than the phonemes it thinks are being spoken. With this, it can then build up a database of phonemes to the recorded audio.

    When a given language is selected (strictly speaking it is a language + accent, as Liverpudlian English sounds different to Australian English and Mexican Spanish sounds different to Argentinian Spanish) it will have a set of rules that describe how to convert the text into phonemes specific to that accent (for example, "ook" is usually pronounced /Vk/ in English, but in Scouse English it can be /Vx/). These rules provide a set of phonemes required by the language+accent to speak it properly.

    The phonemes are transcriptions of IPA-based phonemes (http://en.wikipedia.org/wiki/International_Phonetic_Alphabet). If you plot the phonemes available by the voice on the phoneme charts, you can fill in more phonemes that are similar (e.g. using /A/ instead of /Q/ if the voice does not support /Q/, or an untrilled /r/ if the trilled version is not supported, where a trilled /r/ can be found in Spanish).

    Then, provided that the voice can handle all the phonemes in a language+accent, you can then map between the two, allowing your English speaking voice to speak German, Chinese, Afrikaans or whatever language you have data for. The eSpeak text-to-speech program does a simple version of this to make the German, Polish, Swedish, Romanian, Dutch, Hungarian, French and Afrikaans MBROLA voices speak English.

    You can also use it to have a voice support different accents, provided you have the rules for producing the correct phonemes.