Microsoft Shows Off Adaptive, Multilingual Text to Speech System
MrSeb writes about a really cool project from Microsoft's speech research group. From the article: "Microsoft Research has shown off software that translates your spoken words into another language while preserving the accent, timbre, and intonation of your actual voice. In a demo of the prototype software, Rick Rashid, Microsoft's chief research officer, said a long sentence in English, and then had it translated into Spanish, Italian, and Mandarin. You can definitely hear an edge of digitized 'Microsoft Sam,' but overall it's remarkable how the three translations still sound just like Rashid. The translation requires an hour of training, but after that there's no reason why it couldn't be run in real time on a smartphone, or near-real-time with a cloud backend. Imagine this tech in a two-way setup. You speak into your smartphone, and it comes out in their language. Then, the person you're talking to speaks into your smartphone and their voice comes out in your language."
The Techfest 2012 keynote has a demo of the technology around minute 13:00.
Arby 'n' the Chief wouldn't be the same without him!
Japanese please!!!!
The answer to all your problems
Will they license this for PBX systems other than their own?
I would love a multilingual system like this. The audio is really good compared to the paid software that I have access to.
"Programmeurs, programmeurs, programmeurs, programmeurs, programmeurs!"
Imagine this tech in a two-way setup. You speak into your smartphone, and it comes out in their language. Then, the person you're talking to speaks into your smartphone and their voice comes out in your language."
So, the logical result of this is that all the phone sex lines suddenly have girls that sound like they're from India?
I don't have them in front of me, but I remember there being patents on this very thing going back quite a few years--some back to the 80's! I also think there was a /. article on it somewhere along the line...
"My hovercraft is full of eels" would have been perfect.
HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
They didn't sound alike to me. The example in the link (this one, since there are so many) didn't have translations of the same sentence, each language had different meaning (except maybe Italian, I can't understand that). Also, the translated versions sounded more like a computer than anything. You could say that it sounds more like the original than other computers, but the dominant feature is the computerness of the speech.
But at least they got their research grant.
"First they came for the slanderers and i said nothing."
Yeah, text translation is exactly the same thing as speech translation. It must have been really hard for Google to get the 'accent, timbe, and intonation' of all that text just right.
Remember a couple of weeks ago when we had that story about scifi nitpicks and someone griped about aliens in Star Trek always speaking English?
"I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)
was for me at university anything that could make that go away is a good thing as far as I'm concerned. (Well, that's got to be at least 0 mod but I've got karma to spare so I don't care.)
Did you know 80 to 90% of the moderators on slashdot wouldn't recognize a troll even if one dragged them under a bridge.
We? Is that the royal we? Otherwise, who are you speaking for?
And does it insert the appropriate slurping and hissing sounds? "This is your opponent, earthling. I have heard every word you have said. Jim: All right. What do you want? Gorn: I weary of the chase. Wait for me. I shall be merciful and quick."
microsoft is like nestle, never to be trusted again
That reminds me. I need to pick up some chocolate milk powder.
...can they explain to me what "do the needful" means? That's English to English, and I don't fully understand the subtext of it.
Isn't this the same thing that Project Festival has been doing since about 2004?
http://www.cstr.ed.ac.uk/projects/festival/ (try the demo)
I deny that I have not avoided attaining the opposite of that which I do not want.
1) The translations aren't semantically equivalent (as pointed out by commenters above above). I can already say "Ich bin ein dummer Amerikaner" in my own voice, without machine help. If the meaning isn't there, who cares?
2) The machine accent ain't that great, either.
All of this makes me think this is still somewhat of a pipe dream. The AI guys have been selling the idea of machine translation for years and years-- at least since the 50s, when it was promised to eliminate the need for trained State Department linguists. It's never emerged because it's still a hard problem. Even Google's translate, which beats the MS stuff by some yards, produces results which range from awkward phrasing to just plain inaccurate and misleading.
He's selling a great idea, but it's kind of like the Fountain of Youth. It ain't there, vaporware.
My employer is a Microsoft shop. Microsoft Windows Seven optimizes my productivity with its new context-sensitive search. Microsoft Office allows me to quickly compose documents and spreadsheets of arbitrary complexity.
It is no surprise that Excel is being used for engineering given its power and flexibility. Hell, a shop I worked for used Excel as its database.
Now let's get down the the nitty-gritty - Visual Studio is one of the most powerful IDEs on the face of the planet. You want power? You got it. You want speed? You got it. You want both? It empowers you, the ninety-pound weakling, with both, with minimal effort. I got a raise because I used Visual Studio. I got my dick sucked by my boss' hottest secretary because I wrote an patch in C# that prevented our ERP system from total meltdown.
Why be some boring open-source ODBC slob when you can be fast. Quick. Nimble. Packing.
Be potent. Be Microsoft.
American Businessman (via translated phone call): "I think we can safely say our company would like to use your factory to produce our useless stuff people think they need."
Chinese Businessman (via translated phone call): "An excellent idea! I suggest we sign the papers over dinner at Translate Server Error. They have the best HuMan chicken in town. And the owner prides himself on his bilingual staff."
So, two problems.
One, our text translation software isn't foolproof, but people expect it to be. What happens when the software confuses "galleta" (Spanish for "cookie") with "callate" (Spanish for "shut up"). They do sound similar if you say them out loud, but no one notices because you'd almost never use both in the same conversation. I foresee someone attempting a friendly gesture by offering to share her mother's recipe for "shut up."
Two, live conversations depend upon both parties building on a shared experience. If each one has a different account of the experience, conversations break down very quickly. Ever tried to carry on a conversation with a schizophrenic? And that's just assuming the errors are innocent. What happens when corporations start using this? Your bank requires you to call a number to activate your new card and during the call they have the software "translate" some required disclosure for you, only the translation doesn't really convey what they are supposed to be disclosing. Don't think it won't happen... whoever implements this first on purpose will be running the company one day.
Then again, this whole discussion is purely academic. Gene Roddenberry's estate will just claim prior art and prevent this from ever becoming a reality. Hopefully.
3) You have to train it for an hour?
I was actually slightly interested until I got to this bit and realized, like any other Microsoft "innovation," it wasn't really at all. Anyone can make a custom voice sample in about an hour. Hooking up simple voice recognition and text-to-speech is incredibly dull.
Had they actually interpreted intonation for semantics, and simulated and learned your voice in real time, it would have been pretty neat.
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
Oops you posted on the wrong account bonch.
Mod me down, my New Earth Global Warmingist friends!
I'm going to the nut shop where its fun.
Stay thirsty, my friend.
Write failed: Broken pipe
Microsoft Research comes up with a prototype that barely works. Apple wraps it up and gives it a foreign name and sells it like crazy.
I'm confused - isn't this speech-to-speech translation, without any text involved?
Do you know who the scientist is? Because of this man's work, his grandson will never be able to get Data to pronounce contractions properly.
Somehow work out all the technicalities... and "Universal Translators" will come to be. Speak any language at will!
+1 xkcd/slashdot meme mashup
I will not buy this record; it is scratched.
I will not buy this TOBACCONIST, IT is scratched!
Would you laaahik... would you LIKE to come back to my place, bouncy bouncy?
My nipples explode with delight!
Aah just go watch it yourself! http://www.youtube.com/watch?v=G6D1YI-41ao
Frank Zappa's entry:
This is my left hand.
This is my right hand.
I have a big bunch of dick.
Aah, just go watch it yourself! http://www.youtube.com/watch?v=CkCYJ6FK0T4
Isn't teh internets great?
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
Accent?
The summary says "preserving the accent, timbre, and intonation of your actual voice". Now i can get timbre and intonation but accent? It made me wonder what does Mandarin with a Scottish accent sound like, does it apply Scottish speech tones, which would make it unintelligibly, or is it clever enough to find a social equivalent, maybe an accent of a small semi-autonomous region of China?
Unfortunately checking TFA reveals this "accent" part to be the slashdot reporter's fantasy.
...as there exists already an international phonetic alphabet, an alphabet that includes annotations for lilts, gutteral intonations and such. Why not just add the IPA pronunciation of each word to a given language dictionary, and have the computer read that? This would greatly reduce the 'training' work needed by the end user. It would also open new possibilities for text-to-speech translation, or even speech-to-speech translation.
To date I have found no text-to-speech reader on any platform that can understand (and speak) IPA symbols.
No, no sig. Really.
ThePromenader
Dear aunt, let's set so double the killer delete select all
Who logs in to gdm? Not I, said the duck.
I got my dick sucked by my boss' hottest secretary because I wrote an patch in C# that prevented our ERP system from total meltdown.
Let me guess, that was two weeks ago and the ERP system was also from Microsoft? ;)
I *really* hope it's better than Bing Translate, which at best produces slightly confusing translations, and at worst totally incomprehensible crap.
The first paragraph sounded like it should be in the voice of those youtube videos like the one with the "webscale" bears.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Verily, theis latest so-called play of Mr Shakespeare sucketh most bigge. Knoweth he notte that ye Romans (and may I be flayed with my own fibbling-cloth if Julius Caesar weare notte such) spake ye Latin?
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
I know this is slightly off topic but why doesn't google translate have 4 boxes?
if you are translating language a to language b then any reply is likely to be in language b and need to be translated to language a.
A bit of history would also be useful so you could scroll back up the conversation too.
Still google translate is pretty good as long as you avoid ambiguous phrasing.
I'm not sure if microsoft is achieving anything of use here. The intonation and tone of a sentence of one language maybe completely different to the target language. Even taking a simple example of british english and australian English. British English tends to finish a sentence on a downward tone while australian English tends to rise reproducing the British English tone would not result in an Australian sounding sentence.
What Microsoft may have achieved is a blocking move, if they can register patents changing the intonation of Speech in machine translation then a third party who achieves good or excellent machine translation, good enough to sound like a native speaker and who changes intonation to reflect the mood of the speaker will be forced into paying microsoft for their patents.
Blarney Quality Restaurant, Plants
Google Translate does speech translation on Android and actually does it rather well, although the UI could be much improved.
...
... if only my software could translate a bytestream of type video/x-ms-asf into a video.
In light of this experience, why should i believe that someone actually invented a unidirectional universal translator? Nice try.
I know this is slightly off topic but why doesn't google translate have 4 boxes?
if you are translating language a to language b then any reply is likely to be in language b and need to be translated to language a.
You can switch the languages around with a single click on a button (I'd post the symbol if /. wasn't broken). Having four boxes would just make the layout confusing for new users, in my opinion.
Dilbert RSS feed
"It's Microsoft; let's find a way to convince everyone this is trash". Then add obligatory references to anti-trust, quality of MS OSes, and how MS are doomed to failure.
This seems like it's just a short distance away from being able to make a computer impersonate somebody in their own voice and say things they would never say themselves.
"One of the advantages of learning a language is that it is easy."
For you maybe, not for me. I spent 6 months trying to learn german 5 days a week because I was visiting there on holiday. Got nowhere. Some people have a talent for learning languages, others don't.
"All over the world there are amazingly stupid people who can speak their native language fluently"
Thats because children are coached in their own language 7 days a week 12 hours a day and yet it still takes 5 years until they can put together even a rudimentary sentence.
Microsoft has shown more than it has shipped, and that is bad.
Now I have no reason to learn another language ever!
... when released, will it run on Linux? Or will it be open-sourced?
cpghost at Cordula's Web.
Just put a fish into your ear!
Works for all non-terrestrial languages
While the vocal effects are nice, the language translation capability is nothing new. I saw someone demonstrate that 4 or 5 years ago at JavaOne in San Francisco.
Same thing happened about 2 years ago when a Microsoft commercial highlighted a Windows Phone app that popped up information on top of the live camera input. That technology was patented 4 or 5 years ago by someone working with Boeing.
Microsoft frequently takes other peoples technology, wraps a pretty interface around it, and demonstrates it to great fanfare and applause.
Remember, Microsoft originally said running a language within a Virtual Machine (ala Java and the JVM) was a terrible idea, then a few years later came out with C# and the CLR.
2 instances works as well (2 tabs) but your always going to have to wait longer than its possible to achieve. There is probably an api for google translate which would give you the option to create a more advanced page.
Still there is no real substitute to learning the 2nd language if you need it regularly.
Cue brain on android is pretty good for vocabulary building, lots of flash cards and with the right language plugin spoken words too.
(asked one of the developers for the option to just speak the words, rather than having you read them and translating and it should be implemented in a new release before too long)
Blarney Quality Restaurant, Plants
Coach Outlet
Online and Coach Outlets
Stores offer you chance to purchase your ideal articles. Here login
Coach Outlet Online or Coach Outlet
to purchase your favourites, such as Coach Cristin Bags, Coach Crossbody
bags.They are renowned for exquisite workmanship, skillful knitting and elegant
design and sell very well both at home and abroad. In order to convenient our
customers, we also offer you other platforms. They are Coach Outlet
Online ,Coach Outlet Online Store and Coach Factory
Outlets . We not only provide you the superior goods but offer you
the best after-sales serives. So, please login Coach Outlet
Online .
on the english and the italian don't seem to match at all. The Italian starts "beginning next month, we will be beginning an italian ^&*&*&^8, which will take into consideration books of contemporary italian writers..."
That's not what Rashid is saying in english (at least not on my machine).
I had a peek at the video and it strikes me that the demo is not about translation at all. It merely shows a TTS system that can be tweeked to sound like any person. Even if this person does not speak the language synthesized.
Why be some boring open-source ODBC slob when you can be fast.
That should be "open-source X/Open SQL CLI slob", given that ODBC is a Microsoft term for (more or less) the same thing.
</pedant>
Otherwise epic.
Will Microsoft come up with its own Siri?
Interesting Technology Ideas!
We sometimes do Spanish-Mandarin translations. This is our process: We stopped doing a direct Spanish-Mandarin translation with Google due to awful results, now we first use Google translate to go from Spanish to English. Then we correct English translation manually. Using Google Translate again, go from Engish to Mandarin. Have a Chinese person correct the translation manually.
I can't even imagine this new system working for more than a few simple and straightforward phrases.
Aren't they completely missing the point when they list an hour of training? If it takes an hour to understand me enough to translate it into another language it's going to take another hour for them to train to be able to respond (either it requires training for a person or it doesn't). That makes it pointless, good luck finding a local to spend an hour talking to a computer so they can answer your simple question.
Where I went to university they told us how important a language was. I might have believed it if it weren't for the fact language professors only rarely taught any of the first few semester courses. Considering I got stuck taking 9 courses of language to complete the 4th semseter requirement you'd think I would have seen more than 1 class taught by a foreign language professor. (BTW yes, it was literally one. Instead we got very wet behind the ear grad students. If it's really important then don't have the class taught by the least qualified people you can find.)
Did you know 80 to 90% of the moderators on slashdot wouldn't recognize a troll even if one dragged them under a bridge.
french turned me into a business major
The problems with all translations is context and implication within languages. Some languages have no gender specific pronouns, which is important in implication. ie the sentence 'Paul and Betty met on the bridge and he killed her.' is fine till translated into a language without gender specific pronouns 'Paul and Betty met on the bridge and they killed them.' (Had a similar sentence in a book I read where it was two men that met on the bridge and both were never mentioned in the book again, so was left wondering who had killed who). In some languages you know what is happening due to the context, and if person A sends person B an SMS and they phone and just say, 'I am Coming' [meaning they are on their way NOW], in a language that has no difference in present and future tense it could get translated into English as 'I'm am coming' or 'I will be coming' - or if they (or some with no difference in past and presence tense it could get rendered as 'I've come', 'Iam coming.') The phone has no context in order to translate it, though the speakers do. Then there are languages late lack certain verbs, like 'To Be'. There was a famous (though not sure if true) story of Margaret Thatcher using a supposedly 'perfect' Japanese translation device who said, 'To be, or not to be, that is the question.' The Japanese equivalent for 'to be' is 'desu', whcih is literally, 'it exists' or 'it is'. The translation back into English from the Japanese she put in came back as 'It is, it isn't, what is the question.'