Microsoft Shows Off Adaptive, Multilingual Text to Speech System
MrSeb writes about a really cool project from Microsoft's speech research group. From the article: "Microsoft Research has shown off software that translates your spoken words into another language while preserving the accent, timbre, and intonation of your actual voice. In a demo of the prototype software, Rick Rashid, Microsoft's chief research officer, said a long sentence in English, and then had it translated into Spanish, Italian, and Mandarin. You can definitely hear an edge of digitized 'Microsoft Sam,' but overall it's remarkable how the three translations still sound just like Rashid. The translation requires an hour of training, but after that there's no reason why it couldn't be run in real time on a smartphone, or near-real-time with a cloud backend. Imagine this tech in a two-way setup. You speak into your smartphone, and it comes out in their language. Then, the person you're talking to speaks into your smartphone and their voice comes out in your language."
The Techfest 2012 keynote has a demo of the technology around minute 13:00.
Arby 'n' the Chief wouldn't be the same without him!
Japanese please!!!!
The answer to all your problems
Will they license this for PBX systems other than their own?
I would love a multilingual system like this. The audio is really good compared to the paid software that I have access to.
"Programmeurs, programmeurs, programmeurs, programmeurs, programmeurs!"
Imagine this tech in a two-way setup. You speak into your smartphone, and it comes out in their language. Then, the person you're talking to speaks into your smartphone and their voice comes out in your language."
So, the logical result of this is that all the phone sex lines suddenly have girls that sound like they're from India?
I don't have them in front of me, but I remember there being patents on this very thing going back quite a few years--some back to the 80's! I also think there was a /. article on it somewhere along the line...
"My hovercraft is full of eels" would have been perfect.
HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
They didn't sound alike to me. The example in the link (this one, since there are so many) didn't have translations of the same sentence, each language had different meaning (except maybe Italian, I can't understand that). Also, the translated versions sounded more like a computer than anything. You could say that it sounds more like the original than other computers, but the dominant feature is the computerness of the speech.
But at least they got their research grant.
"First they came for the slanderers and i said nothing."
can we get this over with once and for all? microsoft is like nestle, never to be trusted again. no matter how muich cool stuff you put out it won't matter; you have already abused public faith so badly that you can never be trusted again, ever.
"Imagine this tech in a two-way setup... "
Yeah, it's called Google Translate.
Remember a couple of weeks ago when we had that story about scifi nitpicks and someone griped about aliens in Star Trek always speaking English?
"I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)
was for me at university anything that could make that go away is a good thing as far as I'm concerned. (Well, that's got to be at least 0 mod but I've got karma to spare so I don't care.)
Did you know 80 to 90% of the moderators on slashdot wouldn't recognize a troll even if one dragged them under a bridge.
And does it insert the appropriate slurping and hissing sounds? "This is your opponent, earthling. I have heard every word you have said. Jim: All right. What do you want? Gorn: I weary of the chase. Wait for me. I shall be merciful and quick."
...can they explain to me what "do the needful" means? That's English to English, and I don't fully understand the subtext of it.
Isn't this the same thing that Project Festival has been doing since about 2004?
http://www.cstr.ed.ac.uk/projects/festival/ (try the demo)
I deny that I have not avoided attaining the opposite of that which I do not want.
1) The translations aren't semantically equivalent (as pointed out by commenters above above). I can already say "Ich bin ein dummer Amerikaner" in my own voice, without machine help. If the meaning isn't there, who cares?
2) The machine accent ain't that great, either.
All of this makes me think this is still somewhat of a pipe dream. The AI guys have been selling the idea of machine translation for years and years-- at least since the 50s, when it was promised to eliminate the need for trained State Department linguists. It's never emerged because it's still a hard problem. Even Google's translate, which beats the MS stuff by some yards, produces results which range from awkward phrasing to just plain inaccurate and misleading.
He's selling a great idea, but it's kind of like the Fountain of Youth. It ain't there, vaporware.
American Businessman (via translated phone call): "I think we can safely say our company would like to use your factory to produce our useless stuff people think they need."
Chinese Businessman (via translated phone call): "An excellent idea! I suggest we sign the papers over dinner at Translate Server Error. They have the best HuMan chicken in town. And the owner prides himself on his bilingual staff."
So, two problems.
One, our text translation software isn't foolproof, but people expect it to be. What happens when the software confuses "galleta" (Spanish for "cookie") with "callate" (Spanish for "shut up"). They do sound similar if you say them out loud, but no one notices because you'd almost never use both in the same conversation. I foresee someone attempting a friendly gesture by offering to share her mother's recipe for "shut up."
Two, live conversations depend upon both parties building on a shared experience. If each one has a different account of the experience, conversations break down very quickly. Ever tried to carry on a conversation with a schizophrenic? And that's just assuming the errors are innocent. What happens when corporations start using this? Your bank requires you to call a number to activate your new card and during the call they have the software "translate" some required disclosure for you, only the translation doesn't really convey what they are supposed to be disclosing. Don't think it won't happen... whoever implements this first on purpose will be running the company one day.
Then again, this whole discussion is purely academic. Gene Roddenberry's estate will just claim prior art and prevent this from ever becoming a reality. Hopefully.
3) You have to train it for an hour?
I was actually slightly interested until I got to this bit and realized, like any other Microsoft "innovation," it wasn't really at all. Anyone can make a custom voice sample in about an hour. Hooking up simple voice recognition and text-to-speech is incredibly dull.
Had they actually interpreted intonation for semantics, and simulated and learned your voice in real time, it would have been pretty neat.
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
I'm going to the nut shop where its fun.
Microsoft Research comes up with a prototype that barely works. Apple wraps it up and gives it a foreign name and sells it like crazy.
I'm confused - isn't this speech-to-speech translation, without any text involved?
Do you know who the scientist is? Because of this man's work, his grandson will never be able to get Data to pronounce contractions properly.
Somehow work out all the technicalities... and "Universal Translators" will come to be. Speak any language at will!
+1 xkcd/slashdot meme mashup
I will not buy this record; it is scratched.
I will not buy this TOBACCONIST, IT is scratched!
Would you laaahik... would you LIKE to come back to my place, bouncy bouncy?
My nipples explode with delight!
Aah just go watch it yourself! http://www.youtube.com/watch?v=G6D1YI-41ao
Frank Zappa's entry:
This is my left hand.
This is my right hand.
I have a big bunch of dick.
Aah, just go watch it yourself! http://www.youtube.com/watch?v=CkCYJ6FK0T4
Isn't teh internets great?
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
Even if you go with the most popular brand for eyewear and eye protection, there are still ways that you can get your pair for a cheaper price. The key to accomplishing that is to keep an eye on discounts and promotions. Oakley is one of the forerunners with the best eye protection and if you need cheap Oakley goggles for motocross, skiing and other sports, you can find discounted prices for these items online. Without paying for such a high price, you can get quality, a beautiful design as well as the best protection for your eyes.Let's say you're in the market for pink motocross goggles. It's not to say that you're a girl participating in the sport. It might be that you just like the effect it has on your overall look. That aside; you can get a great pair of goggles for motocross looking at the selections from Oakley. In a sense, looking for something pink would be a special case and when you look in online stores that have a surplus of these items, you can get your pair at a big discount. It's the best deal when you find exactly the cheap Oakley goggles you're looking for.If you are skiing up in the snowy mountains, cheap Oakley goggles would be a wonderful tint when you get the lenses in pink. Just like the regular yellow orange lenses, pink rose-colored ones provide an enhanced contrast. What this means to you is that you can see much more clearly if there are any bumps, lumps and other obstacles in the snow. But remember that pink motocross goggles don't mean that the lenses are colored pink. It might be the strap that's pink, the padding or the frame. Whatever you prefer, you can find just what you need when you shop extensively.Through the help of the internet, you'll have a great deal of information on goggles for motocross with the pink design. For customized pairs, you can get one with the patterns you want. But on most cases, customized goggles don't come in cheap. If you want something special, you've got to pay the price. Cheap Oakley goggles are available and if you expand your search to online stores, you can easily find a pair that has all the best features with reasonable price. Regardless if it's for skiing, motocross or whatever sport, you'll have goggles that do very well with eye protection against glare, UV rays and flying objects - at a bargain price.
Accent?
The summary says "preserving the accent, timbre, and intonation of your actual voice". Now i can get timbre and intonation but accent? It made me wonder what does Mandarin with a Scottish accent sound like, does it apply Scottish speech tones, which would make it unintelligibly, or is it clever enough to find a social equivalent, maybe an accent of a small semi-autonomous region of China?
Unfortunately checking TFA reveals this "accent" part to be the slashdot reporter's fantasy.
...as there exists already an international phonetic alphabet, an alphabet that includes annotations for lilts, gutteral intonations and such. Why not just add the IPA pronunciation of each word to a given language dictionary, and have the computer read that? This would greatly reduce the 'training' work needed by the end user. It would also open new possibilities for text-to-speech translation, or even speech-to-speech translation.
To date I have found no text-to-speech reader on any platform that can understand (and speak) IPA symbols.
No, no sig. Really.
ThePromenader
Dear aunt, let's set so double the killer delete select all
Who logs in to gdm? Not I, said the duck.
I *really* hope it's better than Bing Translate, which at best produces slightly confusing translations, and at worst totally incomprehensible crap.
Verily, theis latest so-called play of Mr Shakespeare sucketh most bigge. Knoweth he notte that ye Romans (and may I be flayed with my own fibbling-cloth if Julius Caesar weare notte such) spake ye Latin?
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
...
... if only my software could translate a bytestream of type video/x-ms-asf into a video.
In light of this experience, why should i believe that someone actually invented a unidirectional universal translator? Nice try.
"It's Microsoft; let's find a way to convince everyone this is trash". Then add obligatory references to anti-trust, quality of MS OSes, and how MS are doomed to failure.
This seems like it's just a short distance away from being able to make a computer impersonate somebody in their own voice and say things they would never say themselves.
"One of the advantages of learning a language is that it is easy."
For you maybe, not for me. I spent 6 months trying to learn german 5 days a week because I was visiting there on holiday. Got nowhere. Some people have a talent for learning languages, others don't.
"All over the world there are amazingly stupid people who can speak their native language fluently"
Thats because children are coached in their own language 7 days a week 12 hours a day and yet it still takes 5 years until they can put together even a rudimentary sentence.
Microsoft has shown more than it has shipped, and that is bad.
Now I have no reason to learn another language ever!
... when released, will it run on Linux? Or will it be open-sourced?
cpghost at Cordula's Web.
Just put a fish into your ear!
Works for all non-terrestrial languages
While the vocal effects are nice, the language translation capability is nothing new. I saw someone demonstrate that 4 or 5 years ago at JavaOne in San Francisco.
Same thing happened about 2 years ago when a Microsoft commercial highlighted a Windows Phone app that popped up information on top of the live camera input. That technology was patented 4 or 5 years ago by someone working with Boeing.
Microsoft frequently takes other peoples technology, wraps a pretty interface around it, and demonstrates it to great fanfare and applause.
Remember, Microsoft originally said running a language within a Virtual Machine (ala Java and the JVM) was a terrible idea, then a few years later came out with C# and the CLR.
Coach Outlet
Online and Coach Outlets
Stores offer you chance to purchase your ideal articles. Here login
Coach Outlet Online or Coach Outlet
to purchase your favourites, such as Coach Cristin Bags, Coach Crossbody
bags.They are renowned for exquisite workmanship, skillful knitting and elegant
design and sell very well both at home and abroad. In order to convenient our
customers, we also offer you other platforms. They are Coach Outlet
Online ,Coach Outlet Online Store and Coach Factory
Outlets . We not only provide you the superior goods but offer you
the best after-sales serives. So, please login Coach Outlet
Online .
on the english and the italian don't seem to match at all. The Italian starts "beginning next month, we will be beginning an italian ^&*&*&^8, which will take into consideration books of contemporary italian writers..."
That's not what Rashid is saying in english (at least not on my machine).
I had a peek at the video and it strikes me that the demo is not about translation at all. It merely shows a TTS system that can be tweeked to sound like any person. Even if this person does not speak the language synthesized.
Will Microsoft come up with its own Siri?
Interesting Technology Ideas!
We sometimes do Spanish-Mandarin translations. This is our process: We stopped doing a direct Spanish-Mandarin translation with Google due to awful results, now we first use Google translate to go from Spanish to English. Then we correct English translation manually. Using Google Translate again, go from Engish to Mandarin. Have a Chinese person correct the translation manually.
I can't even imagine this new system working for more than a few simple and straightforward phrases.
Aren't they completely missing the point when they list an hour of training? If it takes an hour to understand me enough to translate it into another language it's going to take another hour for them to train to be able to respond (either it requires training for a person or it doesn't). That makes it pointless, good luck finding a local to spend an hour talking to a computer so they can answer your simple question.
Where I went to university they told us how important a language was. I might have believed it if it weren't for the fact language professors only rarely taught any of the first few semester courses. Considering I got stuck taking 9 courses of language to complete the 4th semseter requirement you'd think I would have seen more than 1 class taught by a foreign language professor. (BTW yes, it was literally one. Instead we got very wet behind the ear grad students. If it's really important then don't have the class taught by the least qualified people you can find.)
Did you know 80 to 90% of the moderators on slashdot wouldn't recognize a troll even if one dragged them under a bridge.
french turned me into a business major
The problems with all translations is context and implication within languages. Some languages have no gender specific pronouns, which is important in implication. ie the sentence 'Paul and Betty met on the bridge and he killed her.' is fine till translated into a language without gender specific pronouns 'Paul and Betty met on the bridge and they killed them.' (Had a similar sentence in a book I read where it was two men that met on the bridge and both were never mentioned in the book again, so was left wondering who had killed who). In some languages you know what is happening due to the context, and if person A sends person B an SMS and they phone and just say, 'I am Coming' [meaning they are on their way NOW], in a language that has no difference in present and future tense it could get translated into English as 'I'm am coming' or 'I will be coming' - or if they (or some with no difference in past and presence tense it could get rendered as 'I've come', 'Iam coming.') The phone has no context in order to translate it, though the speakers do. Then there are languages late lack certain verbs, like 'To Be'. There was a famous (though not sure if true) story of Margaret Thatcher using a supposedly 'perfect' Japanese translation device who said, 'To be, or not to be, that is the question.' The Japanese equivalent for 'to be' is 'desu', whcih is literally, 'it exists' or 'it is'. The translation back into English from the Japanese she put in came back as 'It is, it isn't, what is the question.'