Text to Speech Software Copies Any Human Voice
mindpixel writes " A New York Times Report (registration required) states that AT&T Labs will start selling speech software that it says is so good at reproducing the sounds, inflections and intonations of a human voice that it can recreate voices and even bring the voices of long-dead celebrities back to life. The software, which turns printed text into synthesized speech, makes it possible for a company to use recordings of a person's voice to utter things that the person never actually said."
AT&T's synthesis system actually contains dinburgh University's Festival Speech Synthesis System (http://festvox.org/festival), Although the synthesis technique in NextGen is not in Festival (as its proprietary). However there is work from Carnegie Mellon, by Kevin Lenzo and Alan Black (http://www.festvox.org) that provides all the tools (for free) that allow you to build your own voice in Festival. For simple domains the tools really work well, and easily capture the quality of the original speaker, for a whole general voice that can say anything it is a *lot* of work, but is possible from the tools. This is what we are doing in our company Cepstral (http://www.cepstral.com)
Actually there is even and example of Hemos himself, doing a talking clock on http://www.festvox.org/ldom/ldom_time.html
--
Geoff Harrison (http://mandrake.net)
Geoff "Mandrake" Harrison
Some Random UI Hacker
...it still stumbles over the relatively simple "Gonna bust a cap in this bizatch's shizass."
------------------------
Co-founder of GerbilMechs
Prior to this, the best sounding speech synthesis I had heard was from the Festival system, which is still pretty good - epecially considering it has an open source license, something the AT&T system doesn't.
Another good speech synthesizer, no doubt an early version of the AT&T one (possibly?), is by Lucent.
Still, I am amazed at the quality of the AT&T system - it sounds almost perfectly natural. To the naysayers that say "No, it isn't natural" - what all of you have to realize is that this simply demo doesn't allow you to tweak all the variables that would really allow the inflections or type of voice (like whispering, etc) to really come through - it is too bad they don't give an advanced interface with a FAQ or some other form of documentation to allow this, but I imagine that if they did, it would probably take quite a while to compose even a simple sentence (I remember the hell you had to go through with an old Radio Shack speech synth for the Color Computer, specifying individual phoenomes (sp?) just to get proper speech to come out - it could pronounce many words, but others it just fell flat on its face).
Finally - something I want everyone to ponder. Take a look at this old article (it was about Square redubbing FFTM) - once it loads, search for "cr0sh" and "I dare say" - you will come across a series of comments about what I think may happen in the future - what is funny is that the comments in reply to my take on things sound like your typical naysayers. How many computers were we supposed to only need back in the 60's? How much memory would people "only" need again Mr. Gates?
What I predict will come about - probably sooner than we can all imagine. It may not be cheap enough to do it now, at a quality that people would watch, fast enough to be done quicker than what can be done with live actors - but it is all software and hardware - this stuff will get faster and cheaper. Anybody who has been in this business long enough knows that it will happen. There might still be a need for actors, and voice artists, and such - but they probably won't have the "god" status society seems to confer on them now (with the exception, perhaps, of stage acting - which will probably enjoy a huge comeback).
Worldcom - Generation Duh!
Reason is the Path to God - Anon
On AT&T Speech Labs website, they have a little demo where you can enter you're own text and have it play for you using their software (30 word limit). Way Cool!!
They also have recorded demos you can listen to, but I thought the interactive demo was pretty nifty.
--BEGIN SIG BLOCK--
I'd rather be trolling for goatse.cx.
Things you think are in the Constitution, but are not.
Yes, we can give you any celebrity as your own personal plaything. All you have to do is send us the script (or enter it on our website) and we'll give you 5 minutes to remember. 5.99/minute. Long distance charges may apply.
Expect video testimony to become useless in court cases... I mean, with a bit of photo work anyone can fake the gerky security camera footage--
No, wait. We already have laws that cover this. I think they're called perjury...
Absolutely not. And for the same reason that second-printings, plastic surgery, and fake breasts all suck - they're not the real deal.
And as a die-hard Cubs fan since the age of 4, might I also add that the World Series drought for the last half century has taken on a sort of religious significance, not unlike the 40 years the Hebrews spent wandering in the desert. And Harry Caray was our Moses - resurrecting his voice without the man behind it is tantamount to sacrilege (not to mention unbelievably morbid!).
-------
We want some answers and all that we get
Some kind of shit about a terrorist threat
- Ministry
You hear that? There is to be no telemarketing use of this technology!
"Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer
Just imagine how much less space some of the more involving computer games like Half-Life and Deus Ex would take up if all the dialog was synthesized with key samples from the voice actor (or, should I say, the "phoneme source"). That saved space could be used toward other things, like textures or ambient sounds. Of course, the biggest challenge would be to allocate some processing power for the synthesis. Still, it's probably in the works.
"Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer
Some links to other online demos, so you can compare:
. html
... but I did not find anything satisfactory... :(
http://www.elantts.com/indemo.htm
http://www.cstr.ed.ac.uk/projects/festival/userin
http://www.flexvoice.com/demo.html
http://www.acuvoice.com/downloads/ttsdemo.html
I searched for good TTS software to give voice to some of the 3d animations I did in max
expect the same audience as if Tom Hanks were doing the character
... I think Tom Hanks would not mind getting paid nice royalty fees for the use of his young persona when he is retired in his 80's.
And who says Tom Hanks ever has to fade away? It could be a brave new world where your future kids and mine grow up watching the same stars we have today and some from yesterday. I can imagine my grandchildren raving about that new Humphrey Bogart action film. Not so far fetched really.
And for those that wonder about the legal aspects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~ the real world is much simpler ~~
--- -- - -
Give me LIBERTY, or give me a check.
One neat application would be to dub foreign language films in the target language using the voice of the original actor even though they do not know the target language. They could start doing that today.
They could start by fixing all those old Chinese and Japanese action/monster flicks dubbed by the same guy talking in false baritone and falsetto.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~ the real world is much simpler ~~
--- -- - -
Give me LIBERTY, or give me a check.
Another interesting point of interest is with the new Final Fantasy: spririts within movie, actors are beginning to consider copyrighting their likenesses,
Good for them... Better for us! Who wants dumpy Sandra Bullock, bug-eyed Steve Buscemi, or smarmy Ben Affleck when we can have perfect, artist produced, fan-boy (and fan-girl) material like Aki from FF?
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
Its main use is for telephony (surprise!) but it I suppose it'll be turning up in new and exciting places.
What happens when you get a sample of some General's voice and then use a synthesiser to call up the poor kid on guard duty and get him to let a bunch of terrorists enter the base?
Obviously if this does happen, then all their bases...aww, forget it.
--
Do not taunt Happy Fun Ball(TM)
"Yeah, right!"
"Officer, it is clear to me that you are in fact the one who is inebriated."
"I found it that way. Honest."
"Now, nothing has really changed since the last contract, we just cleaned up a few details; Please sign and return ASAP."
"But Billy got one...why can't I? Please?"
"Would you like to move to the sofa?"
I don't buy it for a minute. To do what they claim would require real AI(tm).
-- MarkusQ
Well kids, say goodbye to phone taps, voice mail, and important business being conducted over the phone. If this technology really accomplishes what the above says, Voice recordings wouldnt be able to hold up in court because..well..it would be difficult/impossible to proove that they were really recordings of the persons voice.
Of course, i don't think this kind of techonology should be "outlawed" or "restricted", that will only make it easier to be used maliciously, as with any technological advancement.
Another interesting point of interest is with the new Final Fantasy: spririts within movie, actors are beginning to consider copyrighting their likenesses, since they can be reproduced on a computer with frightening quality and clarity. Perhaps this applies to voice reproduction as well.
This sounds like a very beneficial technology, especially for games, where a high-quality voice synth could replace volumes of digitally recorded and compressed audio files..but it opens the door for some really frightening possabilities of fraud, social engineering, and copywrite side-stepping.