Text to Speech Software Copies Any Human Voice

← Back to Stories (view on slashdot.org)

Text to Speech Software Copies Any Human Voice

Posted by CmdrTaco on Tuesday July 31, 2001 @04:33AM from the now-thats-something-clever dept.

mindpixel writes " A New York Times Report (registration required) states that AT&T Labs will start selling speech software that it says is so good at reproducing the sounds, inflections and intonations of a human voice that it can recreate voices and even bring the voices of long-dead celebrities back to life. The software, which turns printed text into synthesized speech, makes it possible for a company to use recordings of a person's voice to utter things that the person never actually said."

17 of 299 comments (clear)

Min score:

Reason:

Sort:

open source speech synthesis by Mandrake · 2001-07-31 06:54 · Score: 4

AT&T's synthesis system actually contains dinburgh University's Festival Speech Synthesis System (http://festvox.org/festival), Although the synthesis technique in NextGen is not in Festival (as its proprietary). However there is work from Carnegie Mellon, by Kevin Lenzo and Alan Black (http://www.festvox.org) that provides all the tools (for free) that allow you to build your own voice in Festival. For simple domains the tools really work well, and easily capture the quality of the original speaker, for a whole general voice that can say anything it is a *lot* of work, but is possible from the tools. This is what we are doing in our company Cepstral (http://www.cepstral.com)

Actually there is even and example of Hemos himself, doing a talking clock on http://www.festvox.org/ldom/ldom_time.html
--
Geoff Harrison (http://mandrake.net)

--
Geoff "Mandrake" Harrison
Some Random UI Hacker
On the other hand... by Monthenor · 2001-07-31 00:46 · Score: 4

...it still stumbles over the relatively simple "Gonna bust a cap in this bizatch's shizass."
------------------------

--
Co-founder of GerbilMechs
One more step... by cr0sh · 2001-07-31 01:40 · Score: 4

Prior to this, the best sounding speech synthesis I had heard was from the Festival system, which is still pretty good - epecially considering it has an open source license, something the AT&T system doesn't.

Another good speech synthesizer, no doubt an early version of the AT&T one (possibly?), is by Lucent.

Still, I am amazed at the quality of the AT&T system - it sounds almost perfectly natural. To the naysayers that say "No, it isn't natural" - what all of you have to realize is that this simply demo doesn't allow you to tweak all the variables that would really allow the inflections or type of voice (like whispering, etc) to really come through - it is too bad they don't give an advanced interface with a FAQ or some other form of documentation to allow this, but I imagine that if they did, it would probably take quite a while to compose even a simple sentence (I remember the hell you had to go through with an old Radio Shack speech synth for the Color Computer, specifying individual phoenomes (sp?) just to get proper speech to come out - it could pronounce many words, but others it just fell flat on its face).

Finally - something I want everyone to ponder. Take a look at this old article (it was about Square redubbing FFTM) - once it loads, search for "cr0sh" and "I dare say" - you will come across a series of comments about what I think may happen in the future - what is funny is that the comments in reply to my take on things sound like your typical naysayers. How many computers were we supposed to only need back in the 60's? How much memory would people "only" need again Mr. Gates?

What I predict will come about - probably sooner than we can all imagine. It may not be cheap enough to do it now, at a quality that people would watch, fast enough to be done quicker than what can be done with live actors - but it is all software and hardware - this stuff will get faster and cheaper. Anybody who has been in this business long enough knows that it will happen. There might still be a need for actors, and voice artists, and such - but they probably won't have the "god" status society seems to confer on them now (with the exception, perhaps, of stage acting - which will probably enjoy a huge comeback).

Worldcom - Generation Duh!

--
Reason is the Path to God - Anon
Try it out! by Mr.+Sketch · 2001-07-31 00:43 · Score: 5

On AT&T Speech Labs website, they have a little demo where you can enter you're own text and have it play for you using their software (30 word limit). Way Cool!!

They also have recorded demos you can listen to, but I thought the interactive demo was pretty nifty.

--BEGIN SIG BLOCK--
I'd rather be trolling for goatse.cx.

--
Things you think are in the Constitution, but are not.
Phone Sex With Anyone!! Call Now 1-800-ANJOLIE by Sydney+Weidman · 2001-07-31 00:54 · Score: 4

Yes, we can give you any celebrity as your own personal plaything. All you have to do is send us the script (or enter it on our website) and we'll give you 5 minutes to remember. 5.99/minute. Long distance charges may apply.
Re:Entropy-licious by Planesdragon · 2001-07-31 00:49 · Score: 4

Expect video testimony to become useless in court cases... I mean, with a bit of photo work anyone can fake the gerky security camera footage--

No, wait. We already have laws that cover this. I think they're called perjury...
Fakes by DreamingReal · 2001-07-31 01:12 · Score: 4

Dr. Rabiner said he was excited about the possibility of resurrecting renowned voices, like that of Harry Caray, the Chicago Cubs announcer who delivered rousing play-by-play broadcasts. "There are probably hours of recordings in archives," he said. Wouldn't it be great, he asked, if Harry Caray's voice could again be broadcasting in Wrigley Field?
Absolutely not. And for the same reason that second-printings, plastic surgery, and fake breasts all suck - they're not the real deal.
And as a die-hard Cubs fan since the age of 4, might I also add that the World Series drought for the last half century has taken on a sort of religious significance, not unlike the 40 years the Hebrews spent wandering in the desert. And Harry Caray was our Moses - resurrecting his voice without the man behind it is tantamount to sacrilege (not to mention unbelievably morbid!).

-------

--
We want some answers and all that we get
Some kind of shit about a terrorist threat
- Ministry
There's an evil use for this too: by AFCArchvile · 2001-07-31 00:49 · Score: 5

I quote from U.S. Code, Title 47, Section 227, otherwise known as the Telephone Consumer Protection Act:
"(b) (1) It shall be unlawful for any person within the United States
(B) to initiate any telephone call to any residential telephone line using an artificial or prerecorded voice to deliver a message without the prior express consent of the called party, unless the call is initiated for emergency purposes or is exempted by rule or order by the Commission under paragraph (2)(B); ..."

You hear that? There is to be no telemarketing use of this technology!

--
"Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer
This could be useful in games. by AFCArchvile · 2001-07-31 00:55 · Score: 5

Just imagine how much less space some of the more involving computer games like Half-Life and Deus Ex would take up if all the dialog was synthesized with key samples from the voice actor (or, should I say, the "phoneme source"). That saved space could be used toward other things, like textures or ambient sounds. Of course, the biggest challenge would be to allocate some processing power for the synthesis. Still, it's probably in the works.

--
"Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer
Other Online Demos by DaneelGiskard · 2001-07-31 01:03 · Score: 5

Some links to other online demos, so you can compare:

http://www.elantts.com/indemo.htm
http://www.cstr.ed.ac.uk/projects/festival/userin. html
http://www.flexvoice.com/demo.html
http://www.acuvoice.com/downloads/ttsdemo.html

I searched for good TTS software to give voice to some of the 3d animations I did in max ... but I did not find anything satisfactory... :(
Re:Job cuts in Hollywood... by KarmaBlackballed · 2001-07-31 01:09 · Score: 4

expect the same audience as if Tom Hanks were doing the character

And who says Tom Hanks ever has to fade away? It could be a brave new world where your future kids and mine grow up watching the same stars we have today and some from yesterday. I can imagine my grandchildren raving about that new Humphrey Bogart action film. Not so far fetched really.

And for those that wonder about the legal aspects ... I think Tom Hanks would not mind getting paid nice royalty fees for the use of his young persona when he is retired in his 80's.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~ the real world is much simpler ~~

--

--- -- - -
Give me LIBERTY, or give me a check.
Movie dubbing today... by KarmaBlackballed · 2001-07-31 01:19 · Score: 5

One neat application would be to dub foreign language films in the target language using the voice of the original actor even though they do not know the target language. They could start doing that today.

They could start by fixing all those old Chinese and Japanese action/monster flicks dubbed by the same guy talking in false baritone and falsetto.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~ the real world is much simpler ~~

--

--- -- - -
Give me LIBERTY, or give me a check.
Re:Entropy-licious by Bonker · 2001-07-31 00:56 · Score: 4

Another interesting point of interest is with the new Final Fantasy: spririts within movie, actors are beginning to consider copyrighting their likenesses,

Good for them... Better for us! Who wants dumpy Sandra Bullock, bug-eyed Steve Buscemi, or smarmy Ben Affleck when we can have perfect, artist produced, fan-boy (and fan-girl) material like Aki from FF?

--
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
Re:Cool... and disturbing. by dachshund · 2001-07-31 00:54 · Score: 5

Actually, this isn't a very exciting thing for the blind. For most practical uses, the visually impaired tend to prefer speed over quality. It doesn't have to sound great as long as it can read several times faster than "normal" speed. The AT&T TTS isn't really designed for this purpose.
Its main use is for telephony (surprise!) but it I suppose it'll be turning up in new and exciting places.
Re:Cool... and disturbing. by Anixamander · 2001-07-31 00:44 · Score: 5

What happens when you get a sample of some General's voice and then use a synthesiser to call up the poor kid on guard duty and get him to let a bunch of terrorists enter the base?

Obviously if this does happen, then all their bases...aww, forget it.
--

--
Do not taunt Happy Fun Ball(TM)
Doubtful. by MarkusQ · 2001-07-31 00:50 · Score: 5

Match the intonation of any human voice, without a sample of that voice saying the phrase in the desired intonation, just from the text?
"Yeah, right!"
"Officer, it is clear to me that you are in fact the one who is inebriated."
"I found it that way. Honest."
"Now, nothing has really changed since the last contract, we just cleaned up a few details; Please sign and return ASAP."
"But Billy got one...why can't I? Please?"
"Would you like to move to the sofa?"
I don't buy it for a minute. To do what they claim would require real AI(tm).
-- MarkusQ
Entropy-licious by Nihilanth · 2001-07-31 00:40 · Score: 5

Well kids, say goodbye to phone taps, voice mail, and important business being conducted over the phone. If this technology really accomplishes what the above says, Voice recordings wouldnt be able to hold up in court because..well..it would be difficult/impossible to proove that they were really recordings of the persons voice.

Of course, i don't think this kind of techonology should be "outlawed" or "restricted", that will only make it easier to be used maliciously, as with any technological advancement.

Another interesting point of interest is with the new Final Fantasy: spririts within movie, actors are beginning to consider copyrighting their likenesses, since they can be reproduced on a computer with frightening quality and clarity. Perhaps this applies to voice reproduction as well.

This sounds like a very beneficial technology, especially for games, where a high-quality voice synth could replace volumes of digitally recorded and compressed audio files..but it opens the door for some really frightening possabilities of fraud, social engineering, and copywrite side-stepping.