Slashdot Mirror


Microsoft Shows Off Adaptive, Multilingual Text to Speech System

MrSeb writes about a really cool project from Microsoft's speech research group. From the article: "Microsoft Research has shown off software that translates your spoken words into another language while preserving the accent, timbre, and intonation of your actual voice. In a demo of the prototype software, Rick Rashid, Microsoft's chief research officer, said a long sentence in English, and then had it translated into Spanish, Italian, and Mandarin. You can definitely hear an edge of digitized 'Microsoft Sam,' but overall it's remarkable how the three translations still sound just like Rashid. The translation requires an hour of training, but after that there's no reason why it couldn't be run in real time on a smartphone, or near-real-time with a cloud backend. Imagine this tech in a two-way setup. You speak into your smartphone, and it comes out in their language. Then, the person you're talking to speaks into your smartphone and their voice comes out in your language." The Techfest 2012 keynote has a demo of the technology around minute 13:00.

35 of 171 comments (clear)

  1. AZN by willie3204 · · Score: 2, Insightful

    Japanese please!!!!

    1. Re:AZN by ChipMonk · · Score: 2

      You're hoping to understand your un-subbed tentacle porn?

  2. The big boss was impressed by another demo by Anonymous Coward · · Score: 5, Funny

    "Programmeurs, programmeurs, programmeurs, programmeurs, programmeurs!"

    1. Re:The big boss was impressed by another demo by grcumb · · Score: 2

      SAM: "Ich bin ein Developer! Developer! Developer! Developer! Developer! Developer! Developer! Developer!STOP 80000X21 OOM_MONKEYDANCE_INFINITE_LOOP"

      --
      Crumb's Corollary: Never bring a knife to a bun fight.
  3. Re:But I miss Microsoft Sam! by Anonymous Coward · · Score: 2, Funny

    Dear aunt, let's set so double the killer delete select all.

  4. First translation fail by HBI · · Score: 5, Funny

    "My hovercraft is full of eels" would have been perfect.

    --
    HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
    1. Re:First translation fail by mug+funky · · Score: 3, Funny

      instead of bobcat, hovercraft contained eels. would not buy again.

    2. Re:First translation fail by martin-boundary · · Score: 2

      "My hovercraft is full of eels" would have been perfect.

      That's what the low quality garbled voice sounded like. What the Microsoft system actually said was "Hey, google is full of evil".

  5. Re:Been done. by Anonymous Coward · · Score: 2

    Yeah, text translation is exactly the same thing as speech translation. It must have been really hard for Google to get the 'accent, timbe, and intonation' of all that text just right.

  6. Heh by MobileTatsu-NJG · · Score: 2

    Remember a couple of weeks ago when we had that story about scifi nitpicks and someone griped about aliens in Star Trek always speaking English?

    --

    "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

  7. Re:Do they sound alike? by Pseudonym+Authority · · Score: 5, Funny

    I completely agree. It is total garbage and if it isn't absolutely flawless in every possible regard, then it should not even have been attempted.

  8. Re:I see where this is headed. by poetmatt · · Score: 3, Funny

    I want to hear a TTS that can turn Punjabi into Valley Girl.

  9. Re:microsoft and their credibility by CohibaVancouver · · Score: 2

    microsoft is like nestle, never to be trusted again

    That reminds me. I need to pick up some chocolate milk powder.

  10. Re:Sounds cool....but.. by EdIII · · Score: 2

    Why the hell not? It's a product like any other.

    They sell Microsoft Office for operating systems other than Windows.

    I just hope they do the same with this and not tie into their own PBX exclusively. If they do it will make it see a hell of lot less production, that is for sure.

  11. Re:Given the torment that foreign language class by cptdondo · · Score: 3, Informative

    Hehe....

    I am bilingual in English and another language. When I go to that country, many of the tourist attractions have price lists in English, Spanish, Russian, Japanese, you name it. Then they have one in the local language. The prices on that one are half of what they are for the tourists. And they're written out in words, not numbers, so if you can't read them you're SOL.

    So yup, you don't need to speak the other guy's language, if you're willing to play by his rules.

  12. Just FAIL (pipe dream?) by theNAM666 · · Score: 3, Interesting

    1) The translations aren't semantically equivalent (as pointed out by commenters above above). I can already say "Ich bin ein dummer Amerikaner" in my own voice, without machine help. If the meaning isn't there, who cares?

    2) The machine accent ain't that great, either.

    All of this makes me think this is still somewhat of a pipe dream. The AI guys have been selling the idea of machine translation for years and years-- at least since the 50s, when it was promised to eliminate the need for trained State Department linguists. It's never emerged because it's still a hard problem. Even Google's translate, which beats the MS stuff by some yards, produces results which range from awkward phrasing to just plain inaccurate and misleading.

    He's selling a great idea, but it's kind of like the Fountain of Youth. It ain't there, vaporware.

    1. Re:Just FAIL (pipe dream?) by NoKaOi · · Score: 3, Insightful

      He's selling a great idea, but it's kind of like the Fountain of Youth. It ain't there, vaporware.

      Is he actually trying to sell a mature product, or is he just showing something cool? I'm not sure where the innovation is, if it's in being able to train text-to-speech to sound like your voice, preserving intonations and such across the translation (even though it's obviously not great at it yet), or if it's just in putting a few existing technologies together, but you have speech recognition, and a translator, and text to speech that sounds like your voice, then this is what you can have. Include preserving the intonation and you have something cool. So what if it's just showing off a cool application of existing technologies?

      Translators aren't great but are getting better...speech recognition isn't great but is getting better. Preserving intonation across the translation and including in text-to-speech in a voice that sounds kinda like your own can probably get better too. Put the 3 together and you get something useful. I think that's all it's trying to show, and I think as these technologies get better we could end up with something pretty cool.

      If this was a something out of any other company, would the same people be criticizing it?

  13. Re:Given the torment that foreign language class by phantomfive · · Score: 2

    I learned Spanish really well. It took me five years, but I've managed to trick native speakers into thinking I was a native speaker. But it was a lot of work.

    Then one day, I went to Spain, and it WAS really great. I could speak with anyone, and I was leading the group, translating for everyone. It lasted a week.

    Then I got home, and asked myself, "was that worth the time took learning Spanish?" And the answer is no, no it wasn't, not at all. Even if I travel to a Spanish-speaking country for a week of every year, it was not worth it. It was a lot of work, and I could hire a translator or pay the 'tourist' price for things, and still end up ahead.

    That said, I don't regret learning Spanish, but learning it just so you can get a cheaper tourist trap is not worth it at all.

    --
    "First they came for the slanderers and i said nothing."
  14. Re:microsoft and their credibility by Ethanol-fueled · · Score: 5, Funny

    My employer is a Microsoft shop. Microsoft Windows Seven optimizes my productivity with its new context-sensitive search. Microsoft Office allows me to quickly compose documents and spreadsheets of arbitrary complexity.

    It is no surprise that Excel is being used for engineering given its power and flexibility. Hell, a shop I worked for used Excel as its database.

    Now let's get down the the nitty-gritty - Visual Studio is one of the most powerful IDEs on the face of the planet. You want power? You got it. You want speed? You got it. You want both? It empowers you, the ninety-pound weakling, with both, with minimal effort. I got a raise because I used Visual Studio. I got my dick sucked by my boss' hottest secretary because I wrote an patch in C# that prevented our ERP system from total meltdown.

    Why be some boring open-source ODBC slob when you can be fast. Quick. Nimble. Packing.

    Be potent. Be Microsoft.

  15. The Future of International Business by guttentag · · Score: 2

    American Businessman (via translated phone call): "I think we can safely say our company would like to use your factory to produce our useless stuff people think they need."
    Chinese Businessman (via translated phone call): "An excellent idea! I suggest we sign the papers over dinner at Translate Server Error. They have the best HuMan chicken in town. And the owner prides himself on his bilingual staff."

    So, two problems.

    One, our text translation software isn't foolproof, but people expect it to be. What happens when the software confuses "galleta" (Spanish for "cookie") with "callate" (Spanish for "shut up"). They do sound similar if you say them out loud, but no one notices because you'd almost never use both in the same conversation. I foresee someone attempting a friendly gesture by offering to share her mother's recipe for "shut up."

    Two, live conversations depend upon both parties building on a shared experience. If each one has a different account of the experience, conversations break down very quickly. Ever tried to carry on a conversation with a schizophrenic? And that's just assuming the errors are innocent. What happens when corporations start using this? Your bank requires you to call a number to activate your new card and during the call they have the software "translate" some required disclosure for you, only the translation doesn't really convey what they are supposed to be disclosing. Don't think it won't happen... whoever implements this first on purpose will be running the company one day.

    Then again, this whole discussion is purely academic. Gene Roddenberry's estate will just claim prior art and prevent this from ever becoming a reality. Hopefully.

    1. Re:The Future of International Business by malakai · · Score: 3, Informative

      . I foresee someone attempting a friendly gesture by offering to share her mother's recipe for "shut up."

      Context is context. Obviously, an English speaker hearing a Spanish speaker offer to share a recipe for "shut up" on a (up until this point) benign and friendly conference call is going to assume translation error. Better than that, translation software knows about these little mix ups better than you do. On a Text To Speech, there's not much to do but suffer the mis-translation ( or maybe they play an audble 'ping' when they warn about a context or idiosyncrasy error), but in a system that displays you something on a device, these things tend to be shaded a different color, and offer options as to what other possible meaning they may have meant, based on context.

      One, our text translation software isn't foolproof, but people expect it to be.

      No, they don't. No one even expects paid human translators to be perfect.

      Two, live conversations depend upon both parties building on a shared experience. If each one has a different account of the experience, conversations break down very quickly. Ever tried to carry on a conversation with a schizophrenic?

      Honestly, with a schizophrenic, chances are I have, at some point in my life, on IRC. But more to your point, i've played games where opposing sides are communicating from different languages via google translate. Think Russia vs US, and the only way to talk to them is via delayed google translate results. It's slow, it's tedious, and yet we somehow managed to have amazing rapport with people of like mind. The assholes were still assholes via google translate, and the people we wanted to work with we managed to communicate with. Again, you are ignoring the fact than incrementally better translation is still better than it's predecessor. For now. Sure, one day we'll identify some uncanny valley with voice translation, and we'll all spend lots of time plotting how bad the translation software has to be for us to feel it's robotic.... but for now, any small step forward is better than the previous one.

      Then again, this whole discussion is purely academic. Gene Roddenberry's estate will just claim prior art [memory-alpha.org] and prevent this from ever becoming a reality. Hopefully.

      Yup, god forbid someone spends time and money on a problem that sci-fi writers got to magically make disappear in one sentence, and a prop. Maybe someday some brilliant young chap will figure out how to make warp drive not require 3x the mass of the universe for power, and Gene's children can make some more cash. Hopefully.

  16. Re:I see where this is headed. by Anonymous Coward · · Score: 2, Funny

    Like as if

  17. Re:Given the torment that foreign language class by ChatHuant · · Score: 5, Insightful

    That said, I don't regret learning Spanish, but learning it just so you can get a cheaper tourist trap is not worth it at all.

    Of course it's not worth it, if all the benefit you find in knowing another language is saving a couple of bucks at some touristy place. But knowing a different language is much more than that. You have now access to new worlds of literature, movies, poetry and music first hand, without a translator to intermediate (because, as the Italians say, "traduttore, traditore"!). You can talk to more people directly, understand their culture, expand your mind. You can read a whole set of new web sites, see different perspectives, or read news that aren't easily available otherwise. It opens lots of new possibilities for you - for example if you want to work for a global company, or if you ever feel like work in a different country for a few years. And even without any of those, the very effort of learning a different language improves your brain and slows mental aging.

    I'm relatively fluent in three languages now, and can more or less read another two. I read books in all of them, and I find it really enriches my mind. I just started learning a fourth (Japanese), and am really looking forward to reading Japanese books in their original form (even though learning enough of the kanji characters will be a pain).

  18. Re:microsoft and their credibility by philip.paradis · · Score: 2

    Stay thirsty, my friend.

    --
    Write failed: Broken pipe
  19. Re:Do they sound alike? by Phics · · Score: 5, Insightful

    It's not garbage, and if they had real innovations, it would be nice. Instead, they've taken a few characteristics of a speaker, like pitch, and used those to model the computer voice in another language.

    No, if you listened to the keynote, they took speech characteristics, and then broke the target voice pattern up into 5ms pieces and reconstructed the voice to match a reference translation from a different language. What they are doing is not only very interesting, but clearly has space for improvement and a variety of applications.

    It's about as interesting as if someone said, "what would you look like if you were a boy?" (or girl, if you are male), and then sampled your eye color, hair length, nose shape, etc, and then morphed those into a stock photo of a boy. Yeah, it would have some characteristics of you, but it also wouldn't be what you would look like if you were a boy.

    That's sort of the point. The sampled voice may not speak fluent Mandarin, but if you'd like it to, this technology will allow it to. A better analogy would be along the lines of taking a computerized sample of your body shape and texture, (skin, hair, face, etc), and then using 3D animation to reconstruct a model of you doing karate, even if you didn't actually know karate.

    Eventually, as the 'resolution' improves, the bits of this that you disapprove of, (the computerized feel you are getting from the voice), will most certainly improve as well. But it's the underlying ideas and tech which are interesting here.

    --
    There are two types of people in the world; those who believe there are two types of people, and those who don't.
  20. Re:Given the torment that foreign language class by phantomfive · · Score: 3, Informative

    I just started learning a fourth (Japanese), and am really looking forward to reading Japanese books in their original form (even though learning enough of the kanji characters will be a pain).

    Might want to check out this book, it is good. And since I'm giving completely unsolicited advice, the exposition of grammar in "Communicating with Japanese by the Total Method" is my favorite of all language textbooks I've seen.

    --
    "First they came for the slanderers and i said nothing."
  21. Re:Given the torment that foreign language class by wrook · · Score: 2

    Just a quick tip. Start on kanji as soon as possible. Knowing the kanji creates mnemonics for learning vocabulary. It also helps you decipher new vocabulary that you've never seen before. I wasted a lot of time before I realized that learning the kanji and and vocabulary at the same time is *faster* than learning the vocabulary alone.

    One more quick tip while I'm here (somewhat controversial, probably). Completely ignore polite speech until you have a good grasp of the underlying plain form. This is opposite to virtually every textbook on the market, but if you are like me it will save you a lot of time. Polite grammar is a *very* easy to learn extension of plain grammar. But the opposite is not true. If you start thinking using polite grammar you will constantly be making mistakes in the *much* more common plain grammar. Advice to the contrary is to deal with talking with strangers (100% of the speach you are likely to use while travelling). But if you want to learn to speak Japanese rather than just use handy phrases, it is bad advice IMHO. The order presented in Tae Kim's guide is extremely helpful: http://www.guidetojapanese.org/learn/ This isn't all the grammar in the language (by a long shot), but if you learn this you can be relatively fluent in most situations.

    Finally, reading manga will show you good conversational patterns. Please keep in mind that some characters have speech affectations that nobody would use in real life. These are easy to spot, though. Reading other material is not nearly as useful for acquiring conversational language in my experience.

  22. Fools by a_hanso · · Score: 2

    Do you know who the scientist is? Because of this man's work, his grandson will never be able to get Data to pronounce contractions properly.

  23. Re:Given the torment that foreign language class by wrook · · Score: 2

    I teach English to Japanese high school students. The vast majority of them will never speak English ever again. Nor will they need to. Here's what I tell them.

    Not everyone needs to speak English. If you plan to stay where you are, probably you can avoid having to speak English. This does not imply that learning English is not useful for some people. I live and work in Japan and can do so partly because I speak/read Japanese. Life in Japan is hard if you don't speak and read Japanese. This is true in other places in the world. If you don't speak the language, you will never, ever fit in the way someone who is fluent does.

    You don't need to speak any particular language, but being able to learn a language gives you options that other people don't have. It is a skill that can open many doors for you.

    One of the advantages of learning a language is that it is easy. That is probably surprising to many people, but the fact of the matter is that it is not difficult to speak English, or Japanese or any other human language. All over the world there are amazingly stupid people who can speak their native language fluently. If you can speak one language, you can easily speak two, or three, or any number of other languages.

    Why don't people learn foreign languages if it is so easy? Because, while it is conceptually simple, a language is huge. Learning a language requires persistence, attention to detail, flexibility, the ability to make and admit mistakes and a huge amount of effort. There are techniques that will make the process faster and more pleasant, but in the end language acquisition is a process of personal growth.

    While there are some few benefits for knowing a foreign language, it is true that most people neither need nor will realise those benefits. The process of learning a language is another matter altogether, though. The skills required to succeed are the real treasure. Those who avoid learning these skills, which are admittedly a pain to acquire, only hurt themselves. I don't care if my students use English in their lives or not. I teach those other skills *through* English, not *for* English.

  24. Re:Sounds cool....but.. by Gadget_Guy · · Score: 4, Informative

    They sell Microsoft Office for operating systems other than Windows.

    This concession to the antitrust authorities and Apple is something of an exception to the general rule and it was a brutal fight to make it come about.

    What rubbish! The first version of Microsoft Office EVER was for the Mac in August 1989. The Windows release came out in November 1990. With whom did they have this "brutal fight" to get this released for the Mac?

    Interestingly, according to Wikipedia, after the release of Word for the Mac in 1985 (2 years after Word for MS-DOS and Xenix), "Word for Mac's sales were higher than its MS-DOS counterpart for at least four years". It seems that Microsoft were rather pragmatic about selling software where it would make a buck!

  25. Theatrical review, circa 1599 by Hognoxious · · Score: 2

    Verily, theis latest so-called play of Mr Shakespeare sucketh most bigge. Knoweth he notte that ye Romans (and may I be flayed with my own fibbling-cloth if Julius Caesar weare notte such) spake ye Latin?

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  26. Re:I see where this is headed. by msclrhd · · Score: 5, Informative

    Provided that the speech recognition engine is good enough, it can distinguish between the /Q/ and /A/ sounds in lot (British English: /lQt/, General American English: /lAt/), cot, hot, etc, with /A/ also appearing in father /fA:D@/. This will mean that the speech recognition engine will record the actual phonemes spoken, rather than the phonemes it thinks are being spoken. With this, it can then build up a database of phonemes to the recorded audio.

    When a given language is selected (strictly speaking it is a language + accent, as Liverpudlian English sounds different to Australian English and Mexican Spanish sounds different to Argentinian Spanish) it will have a set of rules that describe how to convert the text into phonemes specific to that accent (for example, "ook" is usually pronounced /Vk/ in English, but in Scouse English it can be /Vx/). These rules provide a set of phonemes required by the language+accent to speak it properly.

    The phonemes are transcriptions of IPA-based phonemes (http://en.wikipedia.org/wiki/International_Phonetic_Alphabet). If you plot the phonemes available by the voice on the phoneme charts, you can fill in more phonemes that are similar (e.g. using /A/ instead of /Q/ if the voice does not support /Q/, or an untrilled /r/ if the trilled version is not supported, where a trilled /r/ can be found in Spanish).

    Then, provided that the voice can handle all the phonemes in a language+accent, you can then map between the two, allowing your English speaking voice to speak German, Chinese, Afrikaans or whatever language you have data for. The eSpeak text-to-speech program does a simple version of this to make the German, Polish, Swedish, Romanian, Dutch, Hungarian, French and Afrikaans MBROLA voices speak English.

    You can also use it to have a voice support different accents, provided you have the rules for producing the correct phonemes.

  27. Re:Sounds cool....but.. by symbolset · · Score: 3, Informative

    The selective memory of you 'softie fans is amazing. There's a reason for these things. In 1986 Windows looked like this. Sales of Mac Office kept Microsoft alive in this period. Microsoft Office was moved to reinforce Windows as soon as Windows was a credible environment. Windows wasn't even a credible platform until Windows for Workgroups (Windows 3.11) was released in November 1993, some 7 years later (or 1/3 of the time to present day). Mac Office was so lagging for a long while after WfW launch that it was effectively discontinued, and Office's superior support of the Windows platform was a huge part of Windows assuming dominance over the superior Mac OS which had come to rely on Office, which now offered degraded inferior performance and features on the Mac OS. There were some other shenanigans you can read about in the above links. It was a very successful strategy you can read more about here - enough horrifying content to keep you awake for years. But if that's not enough, you might try these. Microsoft through these lessons evolved a strategy where all their products have to reinforce each other, and that became their core strategy. And then...

    Apple got some traction in their TrueType font rendering patent suit against Microsoft and the Justice department was closing in on an antitrust action legendary in its scope and reach. Bill Gates blinked, and they settled, and now there's Mac Office, but you can't say that it's fully supported. The Mac versions lag the Windows versions by some years and are not fully compatible with each other in ways that can't be explained by OS platform differences. The Office platform supports Windows now, as you can see by all the sockpuppets who come out every time somebody mentions some non-Windows operating system to say "you can't get Microsoft Office for that and you never will." And then the rest of us chime in "Application vitualization solves that problem."

    Eventually Microsoft discovered political advocacy and contributed in various ways to the installation of a government more supportive of their business activities. Then the enforcement of antitrust protections to limit them and protect us against their abuse of their monopoly became lax, the limits were quashed until those protections expired. But that's another long story for another day.

    --
    Help stamp out iliturcy.
  28. Would have watched the video... by tenco · · Score: 3, Insightful

    ... if only my software could translate a bytestream of type video/x-ms-asf into a video.

    In light of this experience, why should i believe that someone actually invented a unidirectional universal translator? Nice try.

  29. Learning a language is NOT easy by Viol8 · · Score: 2

    "One of the advantages of learning a language is that it is easy."

    For you maybe, not for me. I spent 6 months trying to learn german 5 days a week because I was visiting there on holiday. Got nowhere. Some people have a talent for learning languages, others don't.

    "All over the world there are amazingly stupid people who can speak their native language fluently"

    Thats because children are coached in their own language 7 days a week 12 hours a day and yet it still takes 5 years until they can put together even a rudimentary sentence.