Text to Speech Software Copies Any Human Voice

I've seen it! It's "almost" open-source by Anonymous Coward · 2001-07-31 01:17 · Score: 1

I worked as a summer intern in AT&T Shannon Labs, in Florham Park, NJ. I've been working on speech recognition and understanding (nice stuff, too), but the guy with whom I shared the office worked precisely on this product, and we talked a lot about his baby. It's based on "festival" from Edinbourgh University. Basicly, it's festival with a proprietary data set and some minor enchancements (which were buggy as hell in early 2000, but again, that's been a long time...). Speaks great, I must admit!

voice over talent.... by Anonymous Coward · 2001-07-31 02:22 · Score: 1

having earned my living as a voice over guy, this is *bad* news for the v.o. talent. I still don't think it's a bad thing. I think most of us are way overpaid for what we do anyway....$300 /hr.

Re:I'm going to hell for this, I know... by Anonymous Coward · 2001-07-31 03:53 · Score: 1

I'd of banged Natalie Wood. She was a hotty back in the day.

The Net2Phone "Virus" by Anonymous Coward · 2001-07-31 07:15 · Score: 1

I can just see it now... The Net2Phone virus... Calls up all your friends and tells them about the big party...

Re:Give me 0% or 100% by Alan · 2001-07-31 04:36 · Score: 1

:) Anyone who's ever read my page(s) or work knows I don't proofread worth a damn, never really have since high school when they *insisted* that we do multiple drafts and hand them in as well as the final paper. Bah humbug on that shit I say!

Give me 0% or 100% by Alan · 2001-07-31 01:14 · Score: 2

I think 99.5% accuracy would suck personally. If you had that close that only a couple of words in say, 200 or 500 or 1000 were wrong, you still have to go and find them. And if the software is that good it would probably do things like to->too or thier->they're. The errors that aren't easily caught by a spell checker program.

No, I'd rather have such horrible accuracy that you *know* you're going to have to go through and correct the document. *OR* I'd take 100% accuracy, so I'd be completely confident that there were no errors.

Just my $0.02

Re:Give me 0% or 100% by macinslak · 2001-07-31 03:48 · Score: 1

Seriously man, it's called proof reading. You'll have to do it no matter how you enter your text. I assume you'll also be promptly giving up that damn computer of yours with its feeble 99.99% uptime too.
Re:Give me 0% or 100% by honkycat · 2001-07-31 01:36 · Score: 2

A good human typist working quickly probably has in the neighborhood of 99.5% accuracy (my estimate, seems reasonable). You're never going to get 100% accuracy out of anything. An argument for lower accuracy makes no sense at all. Any time you type or dictate something, expect to proof read it-- there's no way around that, and you'll still miss some of the mistakes.
Gee, rather than using voice recognition software, I just sit on the keyboard and rock back and forth. Sure it's inaccurate, but I just expect to fix up the errors. I'm sure glad it doesn't get most of the words right or I might miss some of the mistakes.
:-)

COOL. Hrm. by oGMo · 2001-07-31 02:11 · Score: 2

I already get prerecorded voice messages. Talk about the ultimate annoyance: phone spam is bad enough, but you don't even have someone on the other end whose time you can waste, mind you can play with, and other ... er, someone on the other end to demand you're taken off their lists, etc.

Thus, it is very interesting to learn about this part of the TCPA... any idea who I can file a formal complaint with next time I get one of these calls?

--

Don't think of it as a flame---it's more like an argument that does 3d6 fire damage

Re:COOL. Hrm. by AFCArchvile · 2001-07-31 02:52 · Score: 1
I believe a call to your District Attorney should be adequate; remember to catalog as much identifiable information as you can about the offender. Also, the best offense is a good defense; be careful about who you give your phone number, and make sure that all e-commerce partners treat your personal information
- responsibly
(i.e.; they don't give it away or sell it to telemarketers).
--
"Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer

Re:Entropy-licious by David+Greene · 2001-07-31 22:37 · Score: 1

I'm not a particularly keyboardist (and an even worse pianist :), but I can easily I can easily discern an old, untuned upright from my $200 yamaha keyboard.

Of course an electronic 'board is going to sound much better than an old beater acoustic that hasn't been kept up. It's sad what some of these instruments go through. But take a well-maintained acoustic and it will blow away any electronic "equivalent."

Some of the higher-end digital pianos are actually quite good for what they do. They won't replace the real thing, but it's a pretty good simulation. I got mine for the portability.

In a few years, maybe it can be used or "classic" (ie - dead) voices, where people are onlyfamiliar with a memory of how it goes, but I don't think it will ever (ie - within my lifetime) be indistinguishable from the real thing.

Agreed.

--

Re:Entropy-licious by David+Greene · 2001-07-31 02:07 · Score: 2

There are instrument synth's that have been out for a while that actually acoustically model the instrument being synthesized, and instead of altering the frequency/amplitude of the generated noise, actually change the model's airflow, resonance, etc.

Having extensive experience with digital "pianos," I can testify that the technology to realistically produce an authentic piano sound is a long, long, long way off. The synth is close, but any trained pianist can easily tell the difference. I have a fairly modern (about 1.5 years old) professional digital piano and every time I use it I lament the limitations of the resonance reproduction. It's just not there.

Acoustic instruments (including voice) are very complex beasts. To reproduce the qualities of a piano's strings, pedal effects, soundboard and overall resonance is not easily done. That's not even taking into account temperature and humidity. In the end it would probably take more memory than is practical, if it is even possible.

Better to treat these things as they are: another class of instrument. I don't call my keyboard a piano. I call it a keyboard.

--

Re:So? by Have+Blue · 2001-07-31 01:00 · Score: 2

Or put them together, and get a REAL telephone voice changer... What celebrity do you want to pretend to be your secretary?

Re:Entropy-licious by shogun · 2001-07-31 13:52 · Score: 1

Dont know what google you used but the one everyone else searches web with had the following results:

Searched the web for "innocent until proven guilty". Results 1 - 10 of about 28,300

Searched the web for "guilty until proven innocent" . Results 1 - 10 of about 11,000

Interesting...

Re:attaboys? by shogun · 2001-07-31 15:06 · Score: 1

They must be really small boys, somewhere between the size of attoboys and zeptoboys.

Re:This could be useful in games. by Masem · 2001-07-31 01:22 · Score: 2

One of the features of HL is that the voice that you hear over the PA system throughout the game is actually several different samples; it's possible to program the scripting engine to say any combination of words that have been generated as sound files; many mod authors used this to give a bit more unique feel to their maps. Yes, there was no intonation, but for a loudspeaker voice, this worked well.

--
"Pinky, you've left the lens cap of your mind on again." - P&TB
"I can see my house from here!" - ST:

Important Uses by Hallow · 2001-07-31 01:44 · Score: 1

I think, far beyond use in games, replacing actors, and automated phone systems, this tech has important uses and implications especially for the disabled.

Imagine not being able to speak and relying on a computer generated voice like Stephen Hawking does. Now imagine if you could actually bring intonation and expression back to your words? Or perhaps even use your own voice again if you had sufficient recordings from prior to your loss of speech?

Imagine being blind and the computer having to read a website or ebook to you. A cold, unnatural voice, with no hint of meaning in the words. Wouldn't it be nice if it actually had some sense of humanity to it, some bit of inflection and reality to the speech?

This could be an important advance in enabling technology for many people.

Voice Portals by valmont · 2001-07-31 00:38 · Score: 1

I wonder if voice portals like Hey Anita! will leverage this cool new technology, 'cuz last time I checked most text to speech engines out there could use some serious help.

--
Extraordinary Vacations. Exceptional Prices

Re:Sweet by valmont · 2001-07-31 00:43 · Score: 1

he might as well just tell you to "Join Verizon Wireless" ;]

--
Extraordinary Vacations. Exceptional Prices

This is great! by GPS+Pilot · 2001-07-31 03:13 · Score: 1

Now I can create a recording of Will Rogers saying, "I never met a man I didn't like... until I met JonKatz!"

--
That that is is that that that that is not is not.

Re:LOL, don't get me wrong by Mandrake · 2001-07-31 01:39 · Score: 2

but there are different voices for festival, some of which sound quite fine.

http://mandrake.net/demo_voice.wav
produced with festival
http://mandrake.net/demo_2.wav
produced with festival
--
Geoff Harrison (http://mandrake.net)

--
Geoff "Mandrake" Harrison
Some Random UI Hacker

Re:Other Online Demos by Mandrake · 2001-07-31 02:00 · Score: 2

soon festival will have higher quality voices available freely.

see http://mandrake.net/demo_voice.wav
and http://mandrake.net/demo_2.wav
for samples.

--
Geoff Harrison (http://mandrake.net)

--
Geoff "Mandrake" Harrison
Some Random UI Hacker

festival vs AT&T NextGen synthesis by Mandrake · 2001-07-31 05:34 · Score: 2

a few people in here so far have been comparing Festival to AT&T NextGen saying that NextGen sounds better, etc. I find that funny, especially considering that NextGen is built on top of festival, and festival (the open source speech synthesizer) can be made to sound just as good or better.
--
Geoff Harrison (http://mandrake.net)

--
Geoff "Mandrake" Harrison
Some Random UI Hacker

another thing by Mandrake · 2001-07-31 07:01 · Score: 3

Also, building a voice from speakers who you do not control the coverage on (particularly the mention of reviving dead actors, etc) would be problematic at best. You could not get the proper coverage (nor the quality) to really do anything useful.
--
Geoff Harrison (http://mandrake.net)

--
Geoff "Mandrake" Harrison
Some Random UI Hacker

open source speech synthesis by Mandrake · 2001-07-31 06:54 · Score: 4

AT&T's synthesis system actually contains dinburgh University's Festival Speech Synthesis System (http://festvox.org/festival), Although the synthesis technique in NextGen is not in Festival (as its proprietary). However there is work from Carnegie Mellon, by Kevin Lenzo and Alan Black (http://www.festvox.org) that provides all the tools (for free) that allow you to build your own voice in Festival. For simple domains the tools really work well, and easily capture the quality of the original speaker, for a whole general voice that can say anything it is a *lot* of work, but is possible from the tools. This is what we are doing in our company Cepstral (http://www.cepstral.com)

Actually there is even and example of Hemos himself, doing a talking clock on http://www.festvox.org/ldom/ldom_time.html
--
Geoff Harrison (http://mandrake.net)

--
Geoff "Mandrake" Harrison
Some Random UI Hacker

Re:Entropy-licious by Glytch · 2001-07-31 01:57 · Score: 2

Very true with regard to movie and TV acting, but there's always live performances. Perhaps this could spur a greater public interest in theatre acting. After all, all these trained actors and actresses would still want jobs.

Re:The downside... by Glytch · 2001-07-31 02:17 · Score: 2

You mean they aren't already?

LOL, don't get me wrong by Archfeld · 2001-07-31 01:25 · Score: 2

festival is kinda cool but it sounds like Charlie Brown's teacher compared to the AT&T voice.

--
errr....umm...*whooosh* *whoosh* Is this thing on ?

thanks fer the info by Archfeld · 2001-08-02 02:26 · Score: 2

both of these voices are hands above the default :)

--
errr....umm...*whooosh* *whoosh* Is this thing on ?

Re:Code words and access lists by t · 2001-07-31 03:33 · Score: 1

Incidentally, where did you swipe this from?

Re:I'm going to hell for this, I know... by Bob+McCown · 2001-07-31 00:53 · Score: 1

...or Steven King "Ah, here's a crosswalk, pedestrians have the right of" [WHACK] [THUD]

the end of media credibility by jub · 2001-07-31 00:37 · Score: 1

Is this going to be Photoshop for audio? It's been a few years since we could trust the authenticity of any photograph, and now it sounds like this is the final leap over traditional sound editing - audio can't be completely trusted any more either.

Re:the end of media credibility by SlippyToad · 2001-07-31 03:35 · Score: 1

The "end" of media credibility has already occurred. The media just haven't figured that out yet.

--
One day I feel I'm ahead of the wheel / the next it's rolling over me / I can get back on / I can get back on

However, for people who have no voice... by dduck · 2001-07-31 02:23 · Score: 1

...this is a very useful thing, and a big breakthrough.

Quite a few persons suffer from disabilities, that have ribbed them of their voice, and possibly other modes of expression. For these people the digitized or synthetisized voice pretty much becomes their primary way of presenting themselves to the world (think Stephen Hawking). Until now the choice was basically between a digitized voice (high quality output, but limited vocabulary and limited choice in voice types) and synthesized voice (low quality output, but unlimited vocabulary). A synthesized voice with realistic characteristics would be the best of both worlds.

Re:Entropy-licious by hRothGar · 2001-07-31 02:23 · Score: 1

out with one technology (audio/visual proof), in with another (forensics/dna).

A Violation of the Geneva Convention. by FreeUser · 2001-07-31 03:01 · Score: 2

supposedly the DoD has had this capability for years, including in foreign languages. The idea being that the US can intercept enemy radio communications and replace them with confusing or erroneous instructions, *in real-time, in the original radio operator's voice.

If this were ever used it would be a violation of the Geneva Convention (the idea that you could use it to give fake orders to the enemy, or impersonate leaders telling their people to surrender, etc). Not that the United States cares at all about the Geneva Convention, what with our history of detainming and even executing foreign nationals without ever letting them speak to their consulates, in direct violation of said Convention.

Nevertheless, the U.S. military (and this is indeed ironic) has been more inclined to repect the Geneva contention (at least officially) than the civilian government. Developing this sort of technologies flies in the face of that, however, which makes me suspect it is being driven more by one of the spook agencies (CIA, NSA, FBI) than the DoD ... but in today's ethic-free climate, who can tell?
--

--
The Future of Human Evolution: Autonomy

Re:This could be useful in games. by Kismet · 2001-07-31 02:34 · Score: 1

Forget the space savings. If programmers can use this technology to automate voice acting in their games, then the barier-to-entry for game creation will be reduced significantly.

Most effort in today's best-selling games goes into creating images and recording the voices and sounds. This costs a lot of money and resources. Few hobbyists can afford to make a game this complex.

With cheap tools that can do a reasonable job replacing the artists who provide the sound and graphics, high-school kids can start dreaming about competing with the big boys again.

Re:Doubtful. by Sloppy · 2001-07-31 02:09 · Score: 2

While you can certainly automate intonation so that the sentences come out with a natural-sounding intonation, you still have the problem of choosing which intonation, based on the meaning and context.

Look at MarkusQ's examples again. "Yeah, right!" is the best one since it's so simple. There is more than one correct way to intone it. How do you know whether to it's meant ironically or as an exclamation of revelation? I think you need Intelligence to correctly make that decision.

---

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.

Re:Cool... and disturbing. by Sloppy · 2001-07-31 02:15 · Score: 2

What happens when you get a sample of some General's voice

That's why you also need to know the secret key "OPE" to get past the CRM-114 discriminator.

---

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.

1984 by chuckw · 2001-07-31 01:14 · Score: 1

It's interesting to me to note that 1984 will come true, not through the government but throught the effort of corporations. It amazes me that the Christian right supports the Republican agenda which flies in the face of religion. The mark of the beast won't be coming from any governments. It'll be coming from huge multi-national companies that have merged and merged and merged until they're so powerful they can enforce anything they want.
--
*Condense fact from the vapor of nuance*
25: ten.knilrevlis@wkcuhc

--
*Condense fact from the vapor of nuance*

Re:Cool... and disturbing. by TWR · 2001-07-31 02:23 · Score: 2

"limited problem domain" is the key. Given a limited problem domain, computers are better than anyone at anything. It's all in how you define the problem.

Most AI research is useless for this very reason. Data sets that are known to do well are re-used as "proof" that a particular algorithm works well. Heck, the act of inputting the data into the computer for processing usually involved human interaction which skews the data.

When it comes to performing tasks that a 5 year old can do, computers still suck.

-jon

--

Remember Amalek.

Re:Try it out! - It's not that great by ivan256 · 2001-07-31 01:15 · Score: 3

It's interesting that their precooked demo's sound great, but the speach generated in the interactive demo still sounds like a classic text-to-speech program with a few enhancements. This doesn't seem like a significant improvement over, say, what ships with MacOS by default. I'm not impressed.

Re:On the other hand... by Entropy_ah · 2001-07-31 03:27 · Score: 1

Unfortunatly i stumble over that one aswell

--
my other penis is a vagina

Re:Job cuts in Hollywood... by JabberWokky · 2001-07-31 02:35 · Score: 2

Who will need an overpaid difficult celebrity when you can resurrect dead ones or invent new ones for movies, sitcoms, etc?

It'll never happen. Would you have gone to see Final Fantasy if it hadn't starred Ben Affleck and a RealDoll, plus the cast from Aliens?

--
Evan

--
"$30 for the One True Ring. $10 each additional ring!" -- JRR "Bob" Tolkien

Re:This could be useful in games. by ShieldWolf · 2001-07-31 02:33 · Score: 2

They already did something similar in the game Dune II: The Builing of a Dynasty. Words and phrases are atomized and then combined.

e.g.

[Ordos] [Unit] [Destroyed].
[Harkonen] [Unit] [Destroyed].

Each of the three voice actors used had the same thing applied. Different phrases have different intonations so that it doesn't sound (too) robotic (unlike autmated phone attendants).

I thought the effect was pretty good, much better than the AT&T samples I tried IMHO.

-Shieldwolf

--
just = (My)Opinion.toCents();

Re:This could be useful in games. by hugg · 2001-07-31 10:21 · Score: 2

This is extremely difficult in practice, as it equates to acting. The speech may sound pretty good, but it can't insert all the aural cues that a human actor can. You may as well expect a virtual actor to act out a virtual movie using nothing but the text input of the script.

Instead of TTS, what is needed is a program that takes the speech input of one actor and modifies it to sound like another. This is already sort of done with vocoders used for music production.

Sweet by scooby-doo · 2001-07-31 00:40 · Score: 2

From now on I will have James Earl Jones read me all my e-mail. Occasional he can also utter "Come to the Dark side" to keep me amused :)

Re: Sweet by Havokmon · 2001-07-31 00:53 · Score: 1

"... Do you have sex with your bathers?"
"I do."

--
"I can't give you a brain, so I'll give you a diploma" - The Great Oz (blatently stolen sig)

Voice over IP compression; useful for the deaf by ChrisDolan · 2001-07-31 01:29 · Score: 3

With a good speech recognition package, this would be a good way to get extremely high compression for voice. Record your voice, convert to text, compress text, spit over the net, change back to *your* voice on the other end. It would require initially transmitting your voice profile. However, it would not work well with current technology because the lag during speech recognition would be quite noticable. Also, you would have to detect inflection in the speech recognition phase and encode that in the text.

This could also be very useful for deaf telephone users. Currently, a deaf person relies on a human relay to talk to a non-TDD equipped person. With good speech-to-text and text-to-speech technology the human middle-man could be removed, saving a ton of money.

Re:Human rights? by GregWebb · 2001-07-31 02:30 · Score: 2

I'm _not_ saying the device should be illegal. I'm not even saying that using it and publishing the results should necessarily be banned, no matter what use it's put to. I'm honestly not sure.

All I'm saying is that I can see an argument for making it illegal to post a statement which poses as being from someone when (a) it isn't and (b) it causes them harm. Whether I agree with that opinion I'm really not sure.

--

Greg

(Inside a nuclear plant)
Aaaarrrggh! Run! The canary has mutated!

Human rights? by GregWebb · 2001-07-31 00:47 · Score: 3

I'm honestly not sure what to think here, but do I have a right to my voice?

Let's say someone wanted to make me say something in direct contradiction to my normal views, then publish that. Now, I don't consider myself famous enough for this to be a problem ;-) but the possibilities are obvious. The technical liberal in me says that this is fine. The, erm, other part of me says that this could cause some serious problems and harm for people, so shouldn't be allowed. Which do people think here?

The flipside for law enforcement is perhaps even more scary. What if I published a recording, generated in this way, of (for example) Gary Condit (sp?) confessing to having killed Chandra Levy (again, sp?)? For a parallel (and I never thought I'd cite Lois & Clarke... Promise I'm not a fan, my sister used to watch it over meals so we all had to, I have a weird memory, honest really...) the episode where a photographer produces a pre-wedding image of them in bed which could have been taken properly but was actually faked due to a lost film.

This has been coming for years, I know, but it's still a nasty big can of worms.

--

Greg

(Inside a nuclear plant)
Aaaarrrggh! Run! The canary has mutated!

Re:Human rights? by First+Person · 2001-07-31 01:01 · Score: 2

I'm honestly not sure what to think here, but do I have a right to my voice?

Yes, you do. That is until you sign a contract as an aspiring actor/actress with no leverage which requires that you sign over future rights to the studio.

There should be some interesting legal cases over the next 3-5 years.

--
Given one hour to live, the student replied: "I'd spend it with professor FP who can make an hour seem like a lifetime."
Re:Human rights? by tswinzig · 2001-07-31 01:25 · Score: 2

Let's say someone wanted to make me say something in direct contradiction to my normal views, then publish that.

OK. They could either do what they've been doing for hundreds of years, use a person that is good at impersonating someone's voice, or they could possibly use this new software.

The, erm, other part of me says that this could cause some serious problems and harm for people, so shouldn't be allowed. Which do people think here?

Are you kidding me? Because something COULD cause harm, it should be illegal? That's called prior restraint. It ain't gonna happen.

Personally, I just want to hear this thing do Sean Connery. Oh, I can hear it now.

Connery: "I'll take The Rapishts for $500, Alex!"

Trebek: "That's THERAPISTS!"

Oh, the possibilities are endless. By all means, let's OUTLAW IT!

--

"And like that ... he's gone."
Re:Human rights? by tswinzig · 2001-07-31 04:28 · Score: 2

All I'm saying is that I can see an argument for making it illegal to post a statement which poses as being from someone when (a) it isn't and (b) it causes them harm. Whether I agree with that opinion I'm really not sure.

But that's just it -- there are already laws covering this. If someone says something that harms you, you can sue for slander and possibly defamation. If someone impersonates you without harm, then there is nothing you can do, which is how it should be.

So no, I can't see ANY argument for making this technology illegal as being anything other than totally absurd. The next step would obviously be to make Rich Little (and many other comedians) illegal...

--

"And like that ... he's gone."

Universal Translators? by nEoN+nOoDlE · 2001-07-31 05:00 · Score: 1

Someone posted a reply about how this technology can be used to dub foreign movies with the actors original voice but in a different language. This reminds me of the Universal Translators from Star Trek. Can it be possible in a few years when processors are much powerful to instantaneously convert your speech to text, then run it through a translator and then convert it back to your original voice? That would be pretty damn cool. A couple of years back, I was thinking about how ridiculous the translators were but now it seems pretty close. IBM also has a commercial using something like this where some foreigners call this girls father about some shipping problems and she speaks to them in English and it automatically gets translated... my friends, the future is here.

--
Don't trust a bull's horn, a doberman's tooth, a runaway horse or me.

Dmitry vs. ATT by Scotter · 2001-07-31 02:14 · Score: 2

When a Russian company writes software that can be misused to copy a book, a programmer gets arrested and sits in jail.

When AT&T writes software that can be misused to copy somebody's IDENTITY, they are hailed as great innovators.

Something is wrong with this picture.

On Yahho w/o registration here by __aadkms7016 · 2001-07-31 01:01 · Score: 3

Read it on Yahoo without registration here.

Re:One more step... by mrfrostee · 2001-07-31 11:03 · Score: 1

The free cross-platform Squeak Smalltalk environment (http://www.squeak.org) has a built-in Klatt TTS synthesizer.

Try executing Speaker manWithHead say: 'Put whatever you want said here' anywhere in Squeak and a Southpark-like animated head will appear and talk at you.

Hey, I did... by Ethelred+Unraed · 2001-07-31 05:00 · Score: 2

I tried a bit of Shakespeare:

O, for a muse of fire that would ascend the brightest heaven of invention! A kingdom for a stage, princes to act, and monarchs to behold the swelling scene. -- Henry V, I:1

and

Can such things be, and o'ercome us like a summer's cloud, without our special wonder? -- Macbeth, III:4

Now, mind you, it sounded like a TV weatherman reading it, rather than anything like a Shakespearean actor (no, not even Kevin Costner ;-) ) -- but if you think that this is intended to be a generic male voice...hey, maybe they could take Ian McKellen's or Patrick Stewart's or Emma Thompson's or (God forbid) Keanu Reeve's voices. Who knows?

Well, I'm impressed...

Now, if you want to have some fun, try some Bushisms with it. ;-)

cya

Ethelred

--
Everyone wants to be Ethelred. Even I want to be Ethelred.

Re:Hey, I did... by ptbrown · 2001-07-31 12:38 · Score: 2

I had even more fun. I started with a little T.S. Eliot and it sounded pretty good. But I noticed it handled some words better than other, so I decided to try...
Twas brillig, and the slithy toves did gyre and gimble in the wabe. All mimsy were the borogoves, and the mome raths outgrabe.
Naturally, it didn't fare much better than any other TTS synthesizer I've heard. That is, a jittery, obviously artificial monotone. Apparently, it can only produce inflection for words that are already in its vocabulary.

--
Any sufficiently advanced civilization is indistinguishable from Gods.

Re:Entropy-licious by dAzED1 · 2001-07-31 03:50 · Score: 1

Who wants dumpy Sandra Bullock...

Well, if nobody else does, I guess I could find room for her around here... B-) I was thinking something along the same lines...I mean hell, I have an extra bedroom even... I don't care what lame-brain thinks, Sandra Bullock is still hot, and will be for some time. She doesn't have that fake look, she looks like a perfectly real beautiful woman. Maybe Mr Slippery and I could share her...arrange some sort of time-share or something. I wouldn't even mind setting her up with her own place somewhere, like a little mistress or something ;)

AT&T Research WWW page by billnapier · 2001-07-31 00:44 · Score: 1

You can check out more information on their TTS project here and create your own samples here.

Real problem credibility by zook · 2001-07-31 01:01 · Score: 2

From the NYT article:

"...a person must first go to a studio where engineers record 10 to 40 hours of readings. Texts range from business news reports to nonsense babble."

I wouldn't be too concerned about someone faking my voice (yet---wait for next year) this still raises the issue that what we hear and see may no longer be reality at all. This reminds me of the technology that the media is using to insert adds into sports events, and which CBS used to cover up a NBC billboard during the "millenium" New Years celebration.

It's not too long before we'll be able to completely fake the voice and image of whomever we please. Then it's just the credibility of the source that will matter. Content alone will carry little weight.

Philipsvision, anyone? by rufus+t+firefly · 2001-07-31 02:20 · Score: 1

For all of those who were/are fans of Jon Lovitz's "The Critic" television show, we know that they're almost there. All they need to do is use that nice new skin-rendering technology, and we won't need actors anymore...

---

--
"He may look like an idiot, and talk like an idiot, but don't let that fool you. He really is an idiot." - Duck Soup

Re:Job cuts in Hollywood... by rufus+t+firefly · 2001-07-31 02:30 · Score: 1

I seem to remember something like that in "The Running Man". Definitely not the best acting, but some pretty neat ideas.

---

--
"He may look like an idiot, and talk like an idiot, but don't let that fool you. He really is an idiot." - Duck Soup

Re:So much for wiretaps. by alecto · 2001-07-31 20:03 · Score: 1

Meaning that photographic evidence has been useless since Photoshop 1.0?

Re:So much for wiretaps. by alecto · 2001-08-01 19:52 · Score: 1

Is it still easy to spot even after being reprinted onto photographic paper? I can fathom that Photoshop leaves a "smoking gun" in a digital image, but I wonder how well the eye could perceive manipulation once the picture's on paper. Of course, photographic retouching's been with us since there's been photography--it's just never been so easy.

I hadn't heard about the cut-and-paste in the Scientology case--thanks!

Progress? by Monthenor · 2001-07-31 00:53 · Score: 2

This thing has only marginal improvement over the old System 8 MacSpeak I remember playing with in high school. It pauses too long at commas and has trouble with contractions and plurals...as evidenced by the industry-wide standard of "The Oscar Meyer Wiener" song. True, this ATT thing doesn't need funky spelling to say it properly ("Meier weener" being the MacSpeak solution), but the demo doesn't attempt to sing or say it in rhythm. MacSpeak actually hit some of the notes and beats!
------------------------

--
Co-founder of GerbilMechs

Re:Progress? by egomaniac · 2001-07-31 01:21 · Score: 2

This is what we refer to as "nostalgia". I used Macs for years, and there is no way in hell you can tell me that Macspeak sounds as good as this thing. It's not even close.

--- egomaniac

--
ZFS: because love is never having to say fsck

On the other hand... by Monthenor · 2001-07-31 00:46 · Score: 4

...it still stumbles over the relatively simple "Gonna bust a cap in this bizatch's shizass."
------------------------

--
Co-founder of GerbilMechs

Re:One more step... by cr0sh · 2001-07-31 03:16 · Score: 3

I looked up Klatt, like the AC mentioned - here are some links for the rest of us...

GPL'd Klatt Synth Source

RSynth Speech Synthesizer - Klatt based synth - go to /soundapps to download gzipped code

KPE80 - A Klatt Synthesiser and Parameter Editor

Worldcom - Generation Duh!

--
Reason is the Path to God - Anon

One more step... by cr0sh · 2001-07-31 01:40 · Score: 4

Prior to this, the best sounding speech synthesis I had heard was from the Festival system, which is still pretty good - epecially considering it has an open source license, something the AT&T system doesn't.

Another good speech synthesizer, no doubt an early version of the AT&T one (possibly?), is by Lucent.

Still, I am amazed at the quality of the AT&T system - it sounds almost perfectly natural. To the naysayers that say "No, it isn't natural" - what all of you have to realize is that this simply demo doesn't allow you to tweak all the variables that would really allow the inflections or type of voice (like whispering, etc) to really come through - it is too bad they don't give an advanced interface with a FAQ or some other form of documentation to allow this, but I imagine that if they did, it would probably take quite a while to compose even a simple sentence (I remember the hell you had to go through with an old Radio Shack speech synth for the Color Computer, specifying individual phoenomes (sp?) just to get proper speech to come out - it could pronounce many words, but others it just fell flat on its face).

Finally - something I want everyone to ponder. Take a look at this old article (it was about Square redubbing FFTM) - once it loads, search for "cr0sh" and "I dare say" - you will come across a series of comments about what I think may happen in the future - what is funny is that the comments in reply to my take on things sound like your typical naysayers. How many computers were we supposed to only need back in the 60's? How much memory would people "only" need again Mr. Gates?

What I predict will come about - probably sooner than we can all imagine. It may not be cheap enough to do it now, at a quality that people would watch, fast enough to be done quicker than what can be done with live actors - but it is all software and hardware - this stuff will get faster and cheaper. Anybody who has been in this business long enough knows that it will happen. There might still be a need for actors, and voice artists, and such - but they probably won't have the "god" status society seems to confer on them now (with the exception, perhaps, of stage acting - which will probably enjoy a huge comeback).

Worldcom - Generation Duh!

--
Reason is the Path to God - Anon

Code words and access lists by wiredog · 2001-07-31 00:52 · Score: 3

I used to be in the army.

A general can't just call up the guard post and order the person on duty to let unknown people in. I once was on duty in a radio room and we had a Very Important Senior Officer come by to see what we were doing. He wasn't on the access list, so we wouldn't let him in, even though we recognized him. He had to go get the Colonel, who was on the list, to get in. We got attaboys from him, the Colonel, and our NCOs for that. If we'd let him in, we'd have been in deep doo doo.

--

Best Slashdot Co

don't let ole bill get ahold of this technology.. by spam368 · 2001-07-31 00:41 · Score: 1

heh, we don't need to hear his voice whenever something goes wrong in windows...

Re:Cool... and disturbing. by ncc74656 · 2001-07-31 03:18 · Score: 3

Its main use is for telephony (surprise!) but it I suppose it'll be turning up in new and exciting places.

On the radio this morning, CBS ran a short blurb about this system, including hypothetical news and sports reports. It sounded pretty good, too...if you've done anything with TTS before, the speech quality of this system was considerably ahead of what's been done before. (Light years ahead of Speak & Spell, but that's almost a given at this point. Compared to more modern systems such as Festival, it still comes out ahead quite a bit.)

The announcer posited that, one day, his job could be in danger from this kind of technology. With some broadcasters' penchants for cutting costs any way possible (somebody either here or on K5 posted a link about Clear Channel and its shenanigans a while back, but I can't find it), DJs could end up going the way of the dodo as well.

--
20 January 2017: the End of an Error.

Re:Entropy-licious by Mr.+Slippery · 2001-07-31 01:37 · Score: 1

Who wants dumpy Sandra Bullock...

Well, if nobody else does, I guess I could find room for her around here... B-)

Tom Swiss | the infamous tms | http://www.infamous.net/

--
Tom Swiss | the infamous tms | my blog
You cannot wash away blood with blood

Voices, but what about emotion? by hattig · 2001-07-31 00:47 · Score: 2

The software might very well be able to copy a voice, but how does it copy emotion? Can it whisper? Can it shout? Can it sound happy or sad? Can it sing?

Do we have a more enhanced vocal technology, or a real voice? Considering where the Amiga was in 1985 with synthesised voices, I would have hoped that a lot could have happened in the 16 years since...

Re:Voices, but what about emotion? by First+Person · 2001-07-31 00:57 · Score: 2

Last week at the Oreilly Open Source conference, I heard two examples of TTS singing from Carnegie Mellon University and the University of Colorodo. Unfortunately, I don't have any links I can refer you to. Let's just say, they were very rough, but quite humourous.

--
Given one hour to live, the student replied: "I'd spend it with professor FP who can make an hour seem like a lifetime."
Re:Voices, but what about emotion? by InigoMontoya(tm) · 2001-07-31 01:26 · Score: 1

Daisy, daisy, give me your answer do.....

--
This signature is self-referential.

about darn time by Drath · 2001-07-31 05:34 · Score: 1

Now I can finially get Mr. T to pitty me by name!

Design decisions and demos by First+Person · 2001-07-31 00:53 · Score: 2

For what it's worth, SpeechWorks International licensed an earlier version of the AT&T synthesizer. You can find demos here. The version in the NYT seems to have been developed with different constraints. Many TTS engines are designed to achieve real time play back or to use limited amounts of CPU. For instance, synthesized speech during game play should only use 5% or maybe 10% of the processor. Whereas a system for Hollywood may demand considerable CPU power to produce small utterances (say 100 CPU seconds per second of speech). This is completely acceptable for many purposes where perceived quality is the primary criteria.

There is also an open source TTS engine called Festival, developed at the University of Edinburgh and at Carnegie Mellon University. You can find out more here. Or, just download the source.

--
Given one hour to live, the student replied: "I'd spend it with professor FP who can make an hour seem like a lifetime."

Re:Job cuts in Hollywood... by tapiwa · 2001-07-31 01:36 · Score: 1

Can you spell Tomb Raider, and what is this other new animated movie?? Phantom something or another I think.. (never go to movies.... can't sit still that long!!)

I saw the reviews on TV and in the press, and I have already seen the lead character in a print adverts for underwear!!

The only thing human in that movie was the voices.. with this development, even they are not necessary.

A whole new ballgame.

--

Live today. Tomorrow will cost a lot more!

Re:Entropy-licious by tapiwa · 2001-07-31 01:41 · Score: 1

This is one of those dumb technology predictions.

man on the moon, cloned sheep, and 100GB hard drives later, I am still amazed by these you will never be able to ...... prdictions.

Have faith!!

--

Live today. Tomorrow will cost a lot more!

So much for wiretaps. by jcr · 2001-07-31 15:25 · Score: 2

FBI Agent Jack B. Thug: "I made this recording of the defendant conspiring to distribute narcotics."

Defense Attorney: " Agent Thug, isn't it true that the agency has the ability to synthesize the voice of the defendant saying anything at all?"

Agent Thug: "No!"

Defense Attorney:

Jury finds reasonable doubt.

-jcr

--
The only title of honor that a tyrant can grant is "Enemy of the State."

Re:So much for wiretaps. by jcr · 2001-08-01 11:01 · Score: 2

No, photoshop manipulation is pretty easy to spot (like when the scientologists were cutting-and-pasting to lie about the turnout at one of their rallies.)

-jcr

--
The only title of honor that a tyrant can grant is "Enemy of the State."

Been there, done that... by Myself · 2001-07-31 01:32 · Score: 2

Ensign Crusher did that aboard the NCC-1701-D years ago. He had a synthesizer that'd reproduce Picard's voice, and he'd send himself all kinds of orders.

News, earl gray, lukewarm.

Re:Cool... and disturbing. by meatspray · 2001-07-31 01:15 · Score: 1

I seem to remember when working on post the army prefered the good ol' bank vault guarded by guys with big guns scenario. They leave little to chance on technology. They also wiped hard drives in an incenerator.

Nothing New by Chasuk · 2001-07-31 14:28 · Score: 1

I listened to demos of British Telecom's Laureate Text-to-Speech System many years ago (1995? slightly earlier?) at the BT Laboratories in Ipswich, England. It was brilliant, mimicking Bill Clinton perfectly.

It seemed like a real security threat. I always surmised that the reason it never made a more public appearance was because of this risk. Imagine world leaders in disfiguring "accidents," swathed in bandages, but assuring us in completely normal voices that everything was all right, with the real politician assasinated and a lip-syncing actor behind the bandages...

--
Neopets - the best free game on the Int

Slashdotted early? by aallan · 2001-07-31 00:52 · Score: 1

The site limits not only words, but the number of accesses. From the site FAQ:

One unfortunate result is that people sometimes hit the limits too early. We can only distinguish hubs, and not individual machines. If 40 accesses have already come from the same server at your ISP, you will unfortunately be blocked. If this happens, we apologize. Please try again (earlier) tomorrow.

This one will be slashdotted fairly quickly, at least for those people using large carriers.

Al.
--

--
The Daily ACK - Eclectic posts by yet another hacker

Play it again... Sam... by dethlejd · 2001-07-31 02:08 · Score: 1

Can we sic the DMCA on them for reverse engineering the human voice for preemptive copyright violations of famous dead peoples intellectual properties?

Actors out the door? by GooberToo · 2001-07-31 10:06 · Score: 1

Is it just me or does anyone else realize that with the advent of CG being so impressive and TTS available of this quality, physical actors could become a thing of the past in perhaps ten or fifteen years. Previously, I've always said that even if we didn't need actors to act, we'd still need voice actors. I guess that isn't true anymore either. Just TTS the whole script. In fact, you may be able to use this technology to better lip sync the TTS to the animation since timing could be completely computer controlled.

How is this a troll? by StarKruzr · 2001-08-01 02:20 · Score: 1

By the way, I'm joining the Air Force as an officer (graduated 4-year college already). Should I expect the same shit in OTS as in boot camp?

--

+++ATH0

Re:Doubtful. by Puk · 2001-07-31 04:52 · Score: 2

That's a good point, and in some sense, you're right -- I'm sure, even given textual context, that this program can't always figure out the right intonation for "yeah, right". It takes more -- far more -- than what we have now to figure out how to say "reah, right" correctly in a given context. Out of context (as in a single sentence), it should be able to do pretty well.

However, I still claim that it takes far less than full AI to determine that, from purely textual context. That is, it takes far less than full AI to get pretty good at looking at the same set of text given to a human, and determine the correct intonation for the text. If a human can't figure it out (as I couldn't from your isolated textual "yeah, right"), I don't expect a machine to, ever.

No, I'm not claiming we're there now. I'm only claiming that we won't need a "thinking machine" to get there (the the ability of the average human) -- just one with significantly enhanced ability to analyze language and context. But exactly when we will have achieved "full AI" is, and how our work on AI will progress in general is hardly determined, so I guess we'll just have to wait and see. :)

-Puk

Re:Doubtful. by Puk · 2001-07-31 01:33 · Score: 3

That's patently false. Speech synthesis systems are getting better and better at (or, technically, their creators are getting better at creating systems which) generate speech with very similar intonation to what a human would, based on sentence structure analysis and concatenation of recorded subword units with various intonations (there aren't as many as you might think).

Of course, it would need a corpus of recorded and (possibly automatically) tagged speech from the person they wish to imitate, but that's not that impossible. Every notice how the generated speech on some speech recognizing phone system (such as American Airlines) is getting better and better, with more and more human-like pronunciation and intonation? And these are the production systems -- not the research systems. I'm not saying they're perfect (and, of course, they're dealing with multiple intonations of fully recorded words, not subwords), but the problem is a far cry from "true AI", and the work on it is getting better all the time.

Check out http://www.sls.lcs.mit.edu/sls/publications/1998/m engthesis-jonyi.pdf for som more detailed info on such research. (Other papers and theses at http://www.sls.lcs.mit.edu/sls/publications/index. html may be relevant as well.)

-Puk

p.s. If this gets modded up, I could cap my karma on this. :P

Re:I'll believe that ... by Eil · 2001-07-31 01:24 · Score: 2

I tried to make it say, "Go and boil your bottoms, sons of a silly person.". Pronounced everything right, even sounded halfway realistic, but it sounded much more like a radio newscaster announcing the current stock quotes or something. :P

Re:Even Worse by Ioldanach · 2001-08-01 03:24 · Score: 1

"Your honor, to counter that recording, I'd like to submit a recording of the arresting officer speaking later... (tape recorder playing)Okay, now run it through the computer and make up a statement. Make it something good.(tape recorder stops)... Would you agree that statement was the arresting officer? Good, I move both recordings be stricken, because they were both faked."

The downside... by dmoen · 2001-07-31 01:58 · Score: 1

The downside is that the Chinese and Japanese action/monster flick producers that use this technology for film dubbing will also use Babelfish to translate the scripts.

--
I have written a truly remarkable program which this sig is too small to contain.

Re:The downside... by jayhawk88 · 2001-07-31 02:45 · Score: 1

No, now they use a monkey that pulls random words out of a hat.

Dana Carvey as GW by Havokmon · 2001-07-31 00:49 · Score: 1

Reminds me of a Dana Carvey skit where he visited the White House, when GW 1st was prez.

GW: "Hey, can you talk like me, and call the Security Guard in here."
DC: "No, I don't really want to.."
GW: "Come on..."
DC: "OK OK.."
DC: Picking up phone, in GW voice "ahhh could you bring some popcorn in here.."
Security Guard walks in.
GW: wheezing laugh, "he he, forget it forget it. It's ok. he he"
Guard leaves
GW: "Do it again..."

--
"I can't give you a brain, so I'll give you a diploma" - The Great Oz (blatently stolen sig)

Hack: Re:There's an evil use for this too: by Havokmon · 2001-07-31 00:59 · Score: 1

"(b) (1) It shall be unlawful for any person within the United States (B) to initiate any telephone call to any residential telephone line using an artificial or prerecorded voice to deliver a message..."

Woo hoo! Initiate the call from Overseas using DialPad!

OR

Woo hoo! Use MY voice to say Hello, then change to Sean Connery: "By any chance, would you know who your long distance provider is?"

--
"I can't give you a brain, so I'll give you a diploma" - The Great Oz (blatently stolen sig)

Re:Cool... and disturbing. by JoeShmoe · 2001-07-31 00:55 · Score: 2

Not to mention, I seem to remember reading that the Army looked into modifying people's vocal cords to get around voice-based security systems which is why the armed services don't have any kind of Star Trek "authorization Picard alpha zero" voice-authentication for their secure areas. Fingerprint or retinal scan or galvanic skin response or something.

But anyway, beside the point, commands are no good no matter whose voice they are in because they have to give the appropriate code words or the order is immediately ignore and the channel is closed.

- JoeShmoe

--
-- I wonder which will go down in history as the bigger failure: the War on Drugs or the War on Filesharing

say it with me... by jcs · 2001-07-31 02:03 · Score: 1

``My voice is my passport.''

Re:Doubtful. by tchapin · 2001-07-31 02:28 · Score: 1

Actually, the speech on the American Airlines system is not TTS. It's prerecorded sentences and phrases. Most of the commerical deployed telephony speech rec systems use prerecorded speech. For parts like the readout of the flight information, they use concatenated prompting, so that you don't have to record:

The flight is departing Boston, MA
The flight is departing New York, NY
etc...

and can instead record:
The flight is departing

and

Boston, MA
New York, NY
etc...

and then just glue them together when you play out the information.

United Airlines (800 824 6200), Continental (800 784 4444) and Airtran (800 247 8726) also have speech rec flight information systems. You can also call TellMe (888 55 8355) and HeyAnita (800 442 6482), which are Yahoo!-like voice portals.

There are a whole bunch of other systems out there; it's quite an interesting field.

Todd

(ObDisclaimer: I work in the speech rec telephony industry.)

--
-- !todd erases a red dot! I steal music on the internet.

Re:Other Online Demos by tchapin · 2001-07-31 02:39 · Score: 1

Also...

http://www.speechworks.com/products/tts/interact iv edemo.cfm

http://www.nuance.com/products/vocalizer.html

http://www.lhsl.com/realspeak/demo.cfm

For a while, L&H's RealSpeak was considered the best in telephony, but the AT&T matches or beats it. When determining the best TTS to use, it's really hard to tell the capabilities via these demos. When used in an telephony application, there is the ability to semantically mark up the text to make it sound a lot better.

Todd

--
-- !todd erases a red dot! I steal music on the internet.

Re:This could be useful in games. by tchapin · 2001-07-31 02:46 · Score: 1

I'm not sure that the AT&T TTS is PC-based. I think that it's used on a "TTS server" for things like telephony platforms.

Todd

--
-- !todd erases a red dot! I steal music on the internet.

Re:long time coming by tchapin · 2001-07-31 02:59 · Score: 2

Actually, in the past two years or so, TTS has again become more important. The cost of a voice talent in relation to the cost of a developed system is really cheap. But, it would be really hard and get more expensive to use prerecorded words and phrases to read things back like email, for example.

I believe that Yahoo! and AOL have phone systems (touch-tone and speech, respectively) that read back email to people.

For the most part, using a real voice talent is the best bet; there are some fantastic people working out there.

Todd

--
-- !todd erases a red dot! I steal music on the internet.

So much for voice print security systems. by TomatoMan · 2001-07-31 01:14 · Score: 3

Disable those voice passwords on your machines, kids. Your pr0n is now exposed.

TomatoMan

--
-- http://frobnosticate.com

Re:Cool... and disturbing. by Pedrito · 2001-07-31 00:54 · Score: 2

Are we watching a little too much T.V.? Do you think:

A: Voice activation is what gets you into a military installation

B: If voice activation were useful to get you into an installation that a recording of someones voice, in the traditional manner, wouldn't be sufficient?

C: If voice activation were useful to get you into an installation that recordings or impersonations would get past algorithms that search for exactly this thing?

Remember one thing: Voice is pretty much useless for security. Fingerprints are much more useful. Why? Ever get a bad cold? What happens to your voice? I went to a wedding recently where I drank and smoked too much. I came back and pretty much lost my voice. My friends didn't recognize me over the phone. Do you think a computer can do better than a human being at voice recognition? If so, you're living in the Star Trek universe. Doesn't happen.

Re:Cool... and disturbing. by Pedrito · 2001-07-31 03:08 · Score: 2

Pedrito, Pedrito, te pica el culito?

Claro que si ;-)

Re:Cool... and disturbing. by Pedrito · 2001-07-31 03:11 · Score: 2

Boy, that's sad and disturbing. I don't think I'll sleep better at night. While easily intimidated, I never hesitate to talk to the "higher-ups." I'd like to think there are more like me, but I haven't been through boot camp, so I can't say. Maybe that would have changed my behavior.

Re:Entropy-licious by Tiroth · 2001-07-31 02:36 · Score: 2

This doesn't really have a lot of bearing on that; you still have court-appointed witnesses to such testimony who can vouch for its authenticity, just like anyone can alter a printed will but having it witnessed and notorized creates an official copy.

Now video evidence, that is something else entirely.

Re:Movie dubbing today... by Tiroth · 2001-07-31 02:56 · Score: 3

I think that is a very interesting idea, but there are a lot of subtleties to consider. Languages don't share a common sound set...if you were dubbing English into German, there just isn't a sound for the glottal stop. How would you infer how the "actor model" should sound? I'm guessing this is a very nontrivial problem.

One solution would be to get demo reels of the actors saying various sounds in the target language. The downside is that they will come across speaking the foreign language with a terrible accent...a Japanese actor might be fairly unintelligable speaking English since they are missing so many sounds (la=da=ra, no th-, etc)

It's definitely a neat idea though.

Re:Cool... and disturbing. by TheCarp · 2001-07-31 01:00 · Score: 1

Well as was said, the military has solved this one already. Frankly, the ability to impersonate someone on the phone has never been out of the grasp of those who have needed it.

Now, fabricating evidence that will stand up to professional scrutiny (say, during a trial, or public announcment) is a little harder. This is no better than current technology for fooling someone "here and now"... the questuion is, as it advances, can it be good enough that a real good look can't tell the difference?

Of course, then we can get into that "linguistic fingerprinting" that was on /. a while back. Remember the guy who helped show who the unibomber was?

I wonder how much text is needed, on average, to prove (or disprove) who the original author of the text was?

-Steve

--
"I opened my eyes, and everything went dark again"

The AT&T "Rich" Voice by jaydho · 2001-07-31 09:24 · Score: 3

If you haven't already, listen to the AT&T Customized Voice Product Demo (U.S. English, Male: "Rich"), truly amazing.

With online news feeds coming in to the local radio station and the quality of the "Rich" custom voice, I have a feeeling a lot of announcers may be going bye bye. In these samples he's way better than our local guy. Plus, since Shoutcast and such already have all the song info, think of the cool DJ announcing you could have.

My roommate and I used the older online AT&T TTS to do our answering machine message for the dorm... It's did pretty will with "This is mack daddy JD and phat daddy John's room" that's the only message we've ever had that people would call back just to hear. With the old AT&T system you could adjust the pitch and various other settings to get it to sound good, I can't imagine what their new system will do!

If you don't think too good, don't think too much.

KingoftheBongo.com

Re:My name is my passport... by Qubit · 2001-07-31 03:48 · Score: 1

exactly what I was thinking...

of course, this time we can just get random words from the guy -- we don't need to make him say "passport" (it's not really *that* sexy... :)

__________________________
____________________ ____________

--

coding is life /* the rest is */

Re:Entropy-licious by JohnBowman · 2001-07-31 01:19 · Score: 1

Well, I think Ming Na is perfect fan-boy material all on her own. It was kind of a tease hearing her voice without seeing her. *sigh*

--

JohnnyB - johnbowman.net

Perfect for replacing newsreaders by pixiepuck · 2001-07-31 13:28 · Score: 1

It's no suprise that the demo simulates a newsreader. Newsreaders on radio and TV have their own special mode of incflection. I'm sure you know what I mean, it's hard to describe, but it's very consistent between journalists.

So my point is, it's not perfect yet, but it will work pefectly for some applications.

Anyone remeber the automated DJ from the Simpsons? It was called DJ-3000 or something.
"How about those bums in Congress"

--
-- Your ad here $20 --

Not THAT accurate.. :) by matek · 2001-07-31 00:41 · Score: 1

Sorry, but after trying the demo at : http://www.naturalvoices.att.com/demos/index.html I'm not that impressed - the software is a little better then the default Windoze Narrator (appeared first in Win2k), but it's still FAR from being efficient enough to cheat a human ear.

Single words are pronounced cool enough but the "melody" of the sentence is still totally wrong...

Re:Not THAT accurate.. :) by The+Devil's+Avocado · 2001-07-31 00:58 · Score: 2

Yup, this is a problem linguists have been working on for years. The patterns of language are so complex that, as far as articfical speech production goes, we've still got a very long way to go. There have been numerous papers looking at how a phoneme changes given it's context, and looking at how morphemes and words change is that much more complex. Did you know that when people speak, no word is reproduced exactly as it was a previous time? This is one of the many problems facing such technology- it sounds strange if a word is pronounced exactly the same every time, since that's not how it is in human speech. Another problems is if the phoneme follows a sound which it didn't originally follow, or is placed in a sentence with different stress patterns than the one in which it was orignally spoken. This is a very tricky thing which is being attempted.

--
"He may look like an idiot and talk like an idiot but don't let that fool you. He really is an idiot." -Groucho Marx

Re:Try it out! by kreyg · 2001-07-31 03:30 · Score: 2

Wow! It actually does a pretty good rendition of:

I teleported home one night,
With Ron and Sid and Meg.
Ron stole Meggie's heart away,
And I got Sidney's leg.

That is exceedingly cool.

--
sig fault

So? by 11thangel · 2001-07-31 00:40 · Score: 3

So you have a computer program that takes binary (or ascii converted to binary) and makes it into a sound. Get me something that turns a sound into text with more than 90% accuracy and under 5 minutes of training routines, and I'll buy it.

--

I am !amused.

Re:So? by perky · 2001-07-31 06:22 · Score: 1

OK, so actually it's lame compared to Dragon, ViaVoice and NaturallySpeaking etc. But that's because they kaven't been working on SR very long, not because it's MS.

--
"The new wave is not value-added; it's garbage-subtracted" - Esther Dyson, Dec 1994
Re:So? by evanbd · 2001-07-31 00:53 · Score: 2

You really want better than that. I used a product with 90% accuracy a while ago. It also had large training (>3hr) periods. But I went through them, and it worked. about 90%. Which wasn't good enough. Your post has 51 spoken words (don't forget, "90%" is two words, "(" is two words), so it would have gotten 5 wrong in just that post. I think you really mean you want about 99.5% accuracy -- on word in 200, or about 2 longish paragraphs. And I still don't think that's available, though I haven't used anything recently.
Re:So? by Sycraft-fu · 2001-07-31 04:51 · Score: 2

While this is likely to get me burned since this is Slashdot, I'll say it anyways: Try Office XP. It's speech-to-text engine is excellent and the training doesn't take very long at all. I'd say it's better than 90%. It also learns from it's mistakes and gets better as time goes on.
Re:So? by famazza · 2001-07-31 00:44 · Score: 1

Maybe then we can do a moto-continuum software!!!

Gee, don't worry, I'm too [un]?funny[to|every]day

--

-=-=-=-=
I know life isn't fair, but why can't it ever be un-fair in MY favor!?

Sneakers by Niles_Stonne · 2001-07-31 00:51 · Score: 1

Hi, my name is Werner Brandes, My voice is my passport, verify me.

--
Sticks and Stones may break my bones, but copyright will always protect me.

Re:Doubtful. by Saidin · 2001-07-31 02:35 · Score: 1

Look at MarkusQ's examples again. "Yeah, right!" is the best one since it's so simple. There is more than one correct way to intone it. How do you know whether to it's meant ironically or as an exclamation of revelation? I think you need Intelligence to correctly make that decision.

The simple answer to that is, without more context, you can't. If all I have is the one phrase "Yeah, right!", I can't figure out how it is supposed to be spoken either.

SpeechWorks by mshomphe · 2001-07-31 00:56 · Score: 1

SpeechWorks is a commecial company bringing AT&T Labs's TTS stuff to the marketplace. They're currently offering Speechify, which is a slightly older version of the AT&T Labs system. A demo is here. Can anyone tell the difference?

--
She sat at the window watching the evening invade the avenue.

Re:Years behind DoD, though... by whovian · 2001-07-31 07:31 · Score: 1

IIRC there used to be a foreign language TTS on the ATandT web site. But all they have there now are pre-recorded examples.

Me: Computer -- please access google to search for websites with the following keywords: bell, labs, text, to, speech, synthesis. User authorization Picard alpha tango charlie zero.
Computer: Authorization code valid. Recognizing Picard. Accessing google. Search complete.
Me: All right!

http://www.bell-labs.com/project/tts/

--
To-do List: Receive telemarketing call during a tornado warning. Check.

I'm not impressed... by quakeslut · 2001-07-31 00:49 · Score: 1

I remember using Smoothtalker for the Mac back in the day and when I tried out the AT&T natural voices today I was let down. About a dozen years after I first played with Smoothtalker this is the best we can do?

Re:Entropy-licious by Trepalium · 2001-08-03 02:13 · Score: 2

Searched the web for "innocent until prooven guilty". Results 1 - 7 of about 11. Search took 0.26 seconds.
Searched the web for "guilty until prooven innocent". Results 1 - 7 of about 9. Search took 0.33 seconds.

Spelling errors are FUN!

--
I used up all my sick days, so I'm calling in dead.

This explains it.. by katarn · 2001-07-31 01:17 · Score: 1

I bet they've had this technology for years now... It would explain a lot of things. Isn't it obvious? Dan Quale is actually a failed attempt at an AI.

Re:Try it out! by Mr.+Sketch · 2001-07-31 00:55 · Score: 1

No fair! I posted the same thing at the same time (With even the same subject, scary), but I'm #28 and you're #26 so I'll probably get moderated as redundant. Oh well, I suppose I can't stay at the karma cap forever.

--BEGIN SIG BLOCK--
I'd rather be trolling for goatse.cx.

--
Things you think are in the Constitution, but are not.

Try it out! by Mr.+Sketch · 2001-07-31 00:43 · Score: 5

On AT&T Speech Labs website, they have a little demo where you can enter you're own text and have it play for you using their software (30 word limit). Way Cool!!

They also have recorded demos you can listen to, but I thought the interactive demo was pretty nifty.

--BEGIN SIG BLOCK--
I'd rather be trolling for goatse.cx.

--
Things you think are in the Constitution, but are not.

Re:Try it out! by KidSock · 2001-07-31 05:19 · Score: 2

On AT&T Speech Labs website, they have a little demo ...
I heard that they keep a log of the stuff people enter into that demo and that it's almost always the worst, most grotesque, violent, sickening verbage people can think of. I bet it won't be as bad as what there going to see today from ./ers via your link though.
Re:Try it out! by DaneelGiskard · 2001-07-31 01:05 · Score: 1

Hehe :) Sorry pal, bad luck :)
Re:Try it out! by nicodaemos · 2001-07-31 01:10 · Score: 2

Here's another place to try it out.

What next? by garoush · 2001-07-31 01:10 · Score: 1

First we let them use calculator during test, now we let them use computers; so what is next, let the listen to Shakespeare in person instead of reading?
---------------
Sig
abbr.

--

Karma stuck at 50? Add 2-5 inches.. err.. 2-5x Karmas Count to your pen1es.. err.. Karma all naturally and private

I'm going to hell for this, I know... by Raymond+Luxury+Yacht · 2001-07-31 00:51 · Score: 2

... but how about Natalie Wood's voice saying "I'll have a few drinks at the party... but I won't go overboard"

/me ducks as karma goes whizzing away.../

--

Ceci n'est pas une sig.

Re:This could be useful in games. by DrEldarion · 2001-07-31 02:45 · Score: 1

Voice acting has really limited the size of games; any game with a long playtime has had to stay away from it. It's just too much trouble to bring in voice actors for even the simplest parts, a Final Fantasy sized game would just be too expensive and time consuming to produce.

FFX has full voice acting. Of course, it's on a DVD-ROM, so they had plenty of space, but can you imagine all that speech?

-- Dr. Eldarion --

Re:This could be useful in games. by DrEldarion · 2001-07-31 02:49 · Score: 2

Here it is:

Nearly every important story scene, as well as many other scenes, will feature voice acting. Characters' facial expressions will also change as they speak, an FF first. Gamers with hearing disabilities can rest easy, though; not only will subtitles be included, the voices can be turned off entirely.

From: http://www.thegia.com/psx2/ff10/ff10.html

So maybe it's not FULL voice acting, but I get the impression that there's a LOT of it.

-- Dr. Eldarion --

Re:Job cuts in Hollywood... by epukinsk · 2001-07-31 08:03 · Score: 2

Imagine software following actors around through their career, watching their movies and public appearances and learning their style and their history and developing a database to draw on to simulate them.

The actor hits their third blockbuster at 28 and the computer says "I think I can take it from here."

-Erik

I need this by MrResistor · 2001-07-31 02:09 · Score: 1

It's all about remixing the Dead Kennedy's "Kinky Sex Makes the World Go Round", perhaps with some more current world leaders...

--
Under capitalism man exploits man. Under communism it's the other way around.

Re:Grrrreat by mr_gerbik · 2001-07-31 01:08 · Score: 3

"i guess this can only mean more fraudulent accounts of his-story."

His-story.. I hate that term. Who are you? Michael Jackson?

waaaaazzzzzzzuuuuuppppppppppp!!!!! by tommut · 2001-08-02 01:54 · Score: 1

heh... The ATT demo handled this annoying catchphrase with a bit of the Max Headroom syndrome: "wa-wa-wa-waz-you-you-you-you-pp". This should entertain me for at least another 15 minutes! ;)

What I said. by Capt.+DrunkenBum · 2001-07-31 01:10 · Score: 1

I get in enough wrouble for what I have said without worrying that someone is using some text to speach software to make me say nice things about M$.

--

Not everyone deserves a 320i

I think we're ready for this - or are we? by RobinH · 2001-07-31 00:42 · Score: 1

The obvious danger with this technology is the ability to make someone appear to have said something they didn't, like in a perjury case. However, if I'm not mistaken, audio recordings are looked upon with skepticism in court, are they not?

Here's another cool use though... say you can't sign Sean Connery for your movie because he wants too much money. Well, just use some _Final_Fantasy_ technology to make a model of him, and use this voice technology to do his lines. Presto! You don't need actors at all (not even to do their own voices).

--
"I have never let my schooling interfere with my education." - Mark Twain

Next You're Going To Tell Me by joel_archer · 2001-07-31 02:52 · Score: 1

"The software, which turns printed text into synthesized speech, makes it possible for a company to use recordings of a person's voice to utter things that the person never actually said."

Next you're going to tell me there is software that can put an image of someone, where they have never been!

Stephen Hawking would be exempt by yerricde · 2001-08-05 11:25 · Score: 1

steven halking now cant make phone calls to order a pizza now can he, or 911.

Except the FCC will probably look the other way in the case of Prof. Hawking or of other individuals using assistive speech devices. 47 USC 227(b)(2) gives the agency every right to do so:

The Commission ... (B) may, by rule or order, exempt from the requirements of paragraph (1)(B) of this subsection, subject to such conditions as the Commission may prescribe - (i) calls that are not made for a commercial purpose; and (ii) such classes or categories of calls made for commercial purposes as the Commission determines (I) will not adversely affect the privacy rights that this section is intended to protect; and (II) do not include the transmission of any unsolicited advertisement

This pretty much covers most common types of calls that an individual using assistive speech devices would place.

--
Will I retire or break 10K?

Crime! by pallex · 2001-07-31 01:04 · Score: 1

So, now you can fake a photo, create some audio evidence, maybe a touch of DNA from their rubbish and voila!! Guilty as charged!

Years behind DoD, though... by aiken_d · 2001-07-31 01:07 · Score: 2

...supposedly the DoD has had this capability for years, including in foreign languages. The idea being that the US can intercept enemy radio communications and replace them with confusing or erroneous instructions, *in real-time, in the original radio operator's voice*.

It's not like it's some big national secret; I first found this in a story on CNN, though I can't find it in a search right now.

Cheers
-b

--
If I wanted a sig I would have filled in that stupid box.

Actors are safe by Tarkwyn · 2001-07-31 06:36 · Score: 1

I don't think actors are going to lose their jobs on this one. The time it takes to get a synthesised voice intoning correctly (using phoneme manipulation) is prohibitive, when I can say to an actor "Let's read that again, but this time be more scared".

Actors, for all their foibles, spend many years in formal or industry training, learning how to mimic a person / emotional state. A good actor is (if you'll forgive the horrible cliche) like clay in a director's hands. An average TTS could never be as flexible, as quickly.

As an aside, I've played with the AT&T TTS engine (which has been sitting as a useable demo since at least April) and with the festival TTS system and festival seems to be more on track to genuine speech reproduction, although I still have yet to hear a convincing TTS rendition of any form of performing art. We shall see.

--

--
Tarkwyn.

Phil Hartman and the Simpsons by MBCook · 2001-07-31 02:13 · Score: 1

This is great! Pending the approval of his family and all, Fox could put all those great Simpsons charicters back on that were voiced by Phil Hartman includeing (but not limited to) Actor Troy McClure. You might remember him from episodes such as "A Fish Called Selma" or the one where he hosts the Simpsons Spinoffs like "The Love-Matic Grampa" and "Simpson Family Sime Time Variety Hour".

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.

Re:Cool... and disturbing. by aozilla · 2001-07-31 07:01 · Score: 2

Generals do not call up guards.

And bosses don't send attachments saying "I love you", but that never stopped people from believing it anyway.

--
ok then your [sic] infringing on my copyright! Could you as [sic] me next time before STEALING my comments for your own?

Great fun! by Mustang+Matt · 2001-07-31 02:52 · Score: 1

Paste the below text into the interactive demo.
The fun never ends!

Your momma so fat, when she wears a red dress the neighborhood kids down the street see her and yell, Cool Aid!!

--
The man who trades freedom for security does not deserve nor will he ever receive either. - Benjamin Franklin

long time coming by kisrael · 2001-07-31 02:22 · Score: 2

It's been a long time coming and it's still not that great. It still has that little bendy creaky quality at the end of syllables.

The main problem is sampling became so cheap, that some of the incentive for pushing it beyond a 1983 Commodore 64 running the all-software S.A.M. was lost. Now maybe that paying for voice talent is the limiting factor, this will improve.

Jar Jar the first all-computer major character in a full length flick my ass, his annoying voice was voiced just like the flintstones.
--

--
SO YOU'RE GOING TO DIE: The Comic for Dealing with Death

Calling In Sick by Kondoor · 2001-07-31 01:32 · Score: 1

I will buy this the day it comes out if it can reproduce me sounding sick instead of hung over. I can see it now, typing in my "I'm to sick to come in today" message then settign up a cron job to dial my boss's extension at work and leave my message for me while i sleep like a baby. I gotta get this

AT&T + Matrox = low-bandwidth video conferencing by Ogerman · 2001-07-31 04:21 · Score: 1

So now all we have to do is create an easy way to put someone's voice characteristics into perhaps an XML format and convert their face into a 3D model. For those of you who missed out, Matrox's new line of video cards has a special 3D function to aid in video conferencing by sending only movement data over the wire and then rendering a somewhat lifelike 3D head on the other ends to represent each party.

But in reality, I don't think the technology is anywhere near what would be required for a perfectly believable video conf. session. I wasn't all that impressed by the TTS demos. They sounded like Festival run through a couple filters and perhaps will a little better inflection. To do realistic synthesized voice, you need to be able to input a large amount of expression data. (sorta like MIDI synths) Telephony would require a speech recognition program able to not only perfectly recognize all spoken words, but also catch every nuance of expression.

As a side note to Festival users, try using a high quality voice module and set the pitch range higher than default.

Finally! by Boulder+Geek · 2001-07-31 04:51 · Score: 1

I can make my Joan Greenwood answering machine!

--
A well-crafted lie appears unquestionable - Dama Mahaleo

Re:Job cuts in Hollywood... by clickety6 · 2001-07-31 16:47 · Score: 1

And on the scary side - Claude Van Damme movies for ever - and you woulnd't even need a better speech synthesizer than the one's we have now!

--
----------------------------------- My Other Sig Is Hilarious -----------------------------------

Re:Grrrreat by drfrog · 2001-07-31 01:42 · Score: 1

no im not michael jackson.
are you mccauly caulkin? emanuel lewis?
are you some rummy bum runner?
looking for some same sex action?
im flattered for sure, but your barking/backing up into the wrong tree hole

his story as opposed to the real story.
hate the term if you must, i didnt say i liked it
;)

i hate the actual fact that people are going to great lengths to fictionalize change and water down what really happens , and this 'could' be a tool to instigate more falsies

imagine winston churchills speak slightly altered

or martin luther king....

take that great police action called the vietnam war for instance

yay!! america one!

--
back in the day we didnt have no old school

Grrrreat by drfrog · 2001-07-31 00:40 · Score: 2

i guess this can only mean more fraudulent accounts of his-story.

more astronomical accounts of what 'might have' be
said.

maybe now the G8 can fake the sounds of that protester shouting 'yeah shoot me i wanna die'

and maybe they can fake Dmitri Sklyarov shouting 'jail me im bad'
of course hearing all my fave dead celebs
selling coca cola will be so good for humanity too

--
back in the day we didnt have no old school

Re:There's an evil use for this too: by Docrates · 2001-07-31 01:37 · Score: 2

Well, everyda more and more telemarketing call centers are being installed outside the US, where this law doesn't apply.

--

There are two kinds of people in the world: Those with good memory.

Re:Entropy-licious by Tetsujin28 · 2001-07-31 07:43 · Score: 2

That's something i have no experience with. can you (or someone else) briefly explain the perjury laws, as they would apply in this case?

The short non-technical answer:
Everything offered as evidence, unless both sides otherwise agree (if the court lets them), has to have a live person testifying about it, to vouch for its accuracy and authenticity. Physical evidence found at the crime scene? The cop who bagged it testifies as to where and when he found it, the condition it was in, etc. Surveilance video or audio tape? Someone has to testify as to how and when it was made, how accurate the process is, etc. Those witnesses, of course, are subject to cross-examination, and are subject to the laws against perjury.
--------------------
WWW.TETSUJIN.ORG

--
- - - -
The real Tetsujin 28 is a giant robot.

So by next Xmass by ReidMaynard · 2001-07-31 02:25 · Score: 1

Radio Shack will have a "fool your friends" telephone voice-box.

I'll get one, if they have Tony Soprano's voice.

"Hey, you little fuck my car better be ready by 3 o'clock, capisch?

--
-- www.globaltics.net

Political discussion for a new world

Intellectual Property Protection of "voice-style" by jameshowison · 2001-07-31 10:14 · Score: 1

Quick ... think of the children.

Eventually one ought to be able to programme celebrity voices to have them say what you what them to say. So that one could get James Hetfield (singer for Metallica) to praise Napster.

Oh - it's already happened ... MC Hawking's Crib

We must protect the voices of our stars against the reverse engineering of their talent and the propagation of atrocities like "The (almost) real slim shady".

It's the thin edge of the wedge which will destroy all musical creativity.

James

Re:Entropy-licious by tshak · 2001-07-31 08:01 · Score: 2

Another interesting point of interest is with the new Final Fantasy: spririts within movie, actors are beginning to consider copyrighting their likenesses...

Actually, the concept "replicating a voice" is a bit short sited. For example, years ago we where able to replicate the sound of a piano with computers/synthesizers. That doesn't mean that the computer becomes a great piano player - it's just a narrow replication of the sound. The same (to some degree) applies to replicating a voice. Sure, I can make a voice resemble an actors voice, but no computer can generate a persona as annoying as Chris Tuker :).

--

There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips

DECSS!!! by UID30 · 2001-07-31 02:28 · Score: 1

QUICK! somebody run the DECSS source thru this thing 30 words at a time ... tack all the individual files together into one huge wave and broadcast it over a pirate radio station!!!

...errr ... i mean ... DON'T ... yeah ... DON'T do that...

--
"Glory is fleeting, but obscurity is forever." - Napoleon Bonaparte

Sneakers by MayorQ · 2001-07-31 00:53 · Score: 1

"My voice is my passport..."

Uh oh...

no school! by Teflon+Coating · 2001-07-31 01:23 · Score: 1

Finally i can fake sick to stay home from school and i can fake my parents voice. Kids across the world rejoice, you can now stay home from school and don't have to fake your parents voice!

Pronunciation by yolto · 2001-07-31 03:08 · Score: 1

Granted, this does sound more realistic than standard Text-to-Speech programs, but it pronounced several words incorrectly during my trial (including "internet"). And the breaks in speech are off. It pauses at the wrong points.

I really dont see this (at least the online demo) as being some major revolutionary advance. Its just another evolution of speech synthesis. It still has a long way to go.

I'm still waiting for the computers they have in "Star Trek". They understand even poorly phrased requests and seem to know how to route communications simply when people begin speaking.
-----------------
Kevin Mitchell

hmmm by the_other_one · 2001-07-31 01:05 · Score: 1

I wonder if I can copyright the sound of my voice to prevent someone from using it.

At least if I am framed using this technology I can sue for royalties.

--
134340: I am not a number. I am a free planet!

Re:This is good and bad by HitScan · 2001-07-31 03:05 · Score: 1

OR, Sam Kinnesin.

"No, dumbass, you ride the damn superstring into the black hole! AAAAAAAAAAAAAAAAHHHHHHHH! AAAAAAGGGGGHHHHH!!!!!!!"

:D

--
HitScan

Dead military greats by cnkeller · 2001-07-31 01:08 · Score: 1

At the risk of sounding like a troll (although perhaps a funny one), anyone else interested in recordings of Hannibal or Julius Caesar saying "All your base are belong to us..." or the captain of Trojan army saying "Somebody set us up the bomb!" as soliders pour out of the horse?

--

there are no stupid questions, but there are a lot of inquisitive idiots

In related news... by cOdEgUru · 2001-07-31 00:44 · Score: 1

Microsoft posted a previously unreleased audio clip of Linus exclaiming "I love Windows. Its so stable".

--
Rapid Nirvana

"I see a bad moon rising.I see trouble on the way" by Walter+Wart · 2001-07-31 00:46 · Score: 1

CCR wasn't talking about this, but it's apropos. Here are just a few of the problems I see:

Legally a voice recording can be evidence (wiretaps and whatnot). Now it will be easy to falsify evidence. After a few abuses this will lead to the end of voice evidence
With Shrek and Final Fantasy as the first steps in video and this technology in audio say bye-bye to extras
Eventually, the same thing will apply to most actors
Candidates won't even have to give their own stump speeches.
Who has rights to the sound of your voice?

--
The man who never alters his opinion is like the stagnant water and breeds Reptiles of the Mind -- William Blake

Phone Sex With Anyone!! Call Now 1-800-ANJOLIE by Sydney+Weidman · 2001-07-31 00:54 · Score: 4

Yes, we can give you any celebrity as your own personal plaything. All you have to do is send us the script (or enter it on our website) and we'll give you 5 minutes to remember. 5.99/minute. Long distance charges may apply.

Speech Synthesis Markup Language by pinkNoise · 2001-07-31 02:41 · Score: 1

Of course a sentence can be intonated in many different ways, to give it different types of meanings. This information is not present in the text itself, so it has to be provided separately.

One system for that is the Speech Synthesis Markup Language being developed by the W3C. This will allow you to use XML style markup for emphasis, voice type, etc.

Here's an example (not sure if it is 100% syntactically correct):

<speak> <voice gender="female"> <voice category="elder"> Free Software is about <emphasis level="strong"> freedom </emphasis>, not price! </voice> </voice> </speak>

I don't know if it is used by AT&Ts system in the article.

--
pinkNoise

On the flip-side by scott1853 · 2001-07-31 01:00 · Score: 2

If some grad student did this every major company with a text->speech or speech recognition product would be jumping all over him for the potential copyright && || patent violations.

And this pretty much kills the security by voice recognition methods doesn't it. Maybe they can invent little balds with LCD displays in them to trick retina scanners.

Ya know, a good quality text to speech program was all we really needed. Something that didn't sound like R2D2 on a cell phone. The potential for abuse is way too great with this.

Re:Entropy-licious by linzeal · 2001-07-31 09:37 · Score: 1

You've never fed megahal random troll posts from nntp land have you ?

--
An Education is the Font of All Liberty

ocr2speech ? by mirko · 2001-07-31 00:50 · Score: 1

> The software, which turns printed text into synthesized speech[...]
And how good is its OCR subsystem ?
--

--
Trolling using another account since 2005.

Re:Entropy-licious by BluedemonX · 2001-07-31 02:45 · Score: 2

You're on your own, there. That bulgy eyed creature ain't fan material in my book.

--

--- Jump!! Fire!! Bullet time!! - Lego version of the Matrix

Re:Entropy-licious by BluedemonX · 2001-08-01 03:10 · Score: 2

She doesn't have any of those, though. Just bulgy eyes and funny teeth.

--

--- Jump!! Fire!! Bullet time!! - Lego version of the Matrix

Re:This is good and bad by JWhitlock · 2001-07-31 00:54 · Score: 2

Well, it's good that we're finally (after decades of research) we get realistic sounding Text to Speech. On the other hand I can't imagine Stephen Hawking speaking in non-metallic voice. Am I weird?

That's an interesting idea. If Stephen Hawking has recorded samples of his voice from when he could talk, they could change his synthizer to use his own voice. Interesting idea, may have actual applications for the disabled.

Now, Stephen Hawking talking like John Wayne - that would be weird...

Re:Could it be? by Mr.+Troll · 2001-07-31 04:20 · Score: 1

Why thank you :)

--
Kiss my shiny metal ass

your missing something by caseydk · 2001-07-31 00:55 · Score: 1

In the article it says that to imitate a voice, the system needs to have 10-40 hours of studio recordings to base the creation off of... That makes this fine for regular people who aren't recording hours of stuff... it makes it bad for actors and celebs who ahve numerous public statements, movies, interviews, auditions, etc recorded around somewhere... though James Earl Jones reading me my email would be cool...

Re:your missing something by Edgewize · 2001-07-31 01:35 · Score: 1

You have obviously forgotten the possibility that whoever wants to emulate you could be tapping your phone lines, planting microphones in your car/house/garden, etc. It may not be studio quality but damn if it won't get enough samples to work with...

Sounds Familiar? by Hacker+Cracker · 2001-07-31 00:54 · Score: 2

Is it just me, or does the speech synthesis part sound a lot like a refinement of a venerable old TI speech synthesizer (TMS5220 I believe?) that they used in the old Star Wars arcade machine? They were able to get a fairly reasonable approximation of the Star Wars cast members' voices out of that...

-- Shamus

This space for rent, EZ terms!

Re:Doubtful. by wheel · 2001-07-31 02:43 · Score: 1

Speech synthesis systems are getting better and better at (or, technically, their creators are getting better at creating systems which) generate speech with very similar intonation to what a human would, based on sentence structure analysis...

A really fine example of this is from the festival project (available for linux and windows, among others) at http://cstr.ed.ac.uk, especially when used with voices from the MBROLA project (http://tcts.fpms.ac.be/synthesis/mbrola.html).

Also, Speech Synthesis Markup Language (http://www.w3.org/TR/speech-synthesis) allows you to customize the intonation for synthesizers which follow this standard.

Re:Cool... and disturbing. by JebOfTheForest · 2001-07-31 01:28 · Score: 1

Actually, computer speech-recognition systems have been tested in labs that have far greater noise tolerance over a limited problem domain than humans. Check out the slashdot post about it.

New Authentication Method for My Workstation by smnolde · 2001-07-31 01:45 · Score: 1

It begins:

Greetings Dr. Falken. It's been a long time.

Would you like to play a game?

And if I put in the correct sequence I can have the FBI rummage through my garbage cans for trashed printouts off a dot matrix printer.

This is way cool. Maybe I can use PAM with this!

I'll believe that ... by Rosco+P.+Coltrane · 2001-07-31 00:48 · Score: 2

... when I hear a TTS say "we are the knights who say ... NI !" with the proper intonation :-)

--
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash

John Wayne by TOTKChief · 2001-07-31 02:19 · Score: 2

Does this mean we have to see John Wayne in more crappy beer commercials? Please, don't drag the Duke through the mud much longer...

--
-- Geof F. Morris

Re:Entropy-licious by Planesdragon · 2001-07-31 10:51 · Score: 2

A crude example would be, say, a chat log. If someone were to just hand in an ASCI or HTML transcript of things that were said online, I dont see how that would be admissible evidence, since it only takes a word processessor and a little bit of time to forge/alter. Even with IP logging, THEN you have to proove that no one was spoofing the IP adress.

In the american courts, FACTS are determined by the Jury. Whether or not you were speeding. Whether or not OJ really did kill his wife. All determined by the Jury. For the most part, they simply sum it up as "yes he's guilty" or "no he's not", but to reach that the jury gets to listen to all of the evidence that the judge allows in, which is usualy just about anything that isn't an outright lie or illegally obtained.

Perjury is the crime of giving false testimony. To be convicted, you just need to give testimony in a trial, lie, and then have a DA take the time to convince a jury that you did so.

If you want to know more about the perjury laws, you might want to talk to a law school. If you're a US citizen, you can probably call up the local bar assocation (or the police) and, if they have time, they can probably point you towards someone who can explain the laws to you.

If you're not a US citizen, you might want to just dig around on the 'net, since US perjury laws will probably never affect you. Do a search on google for "US criminal laws" and you'll probably get a few descent hits.

Re:Entropy-licious by Planesdragon · 2001-07-31 00:49 · Score: 4

Expect video testimony to become useless in court cases... I mean, with a bit of photo work anyone can fake the gerky security camera footage--

No, wait. We already have laws that cover this. I think they're called perjury...

Re:This could be useful in games. by DarkEdgeX · 2001-07-31 02:12 · Score: 1

Right, and that's a good idea, except it's not really needed. MP3 (and now Ogg Vorbis, and the new MP3 Pro compression formats) can easily compress voice down to a manageable format, requiring very little processor power to decode and play (vs. TTS, which is pretty processor intensive, more so I'm sure with AT&T's technology). There's also the situation of mixing-- eg: you've got a sound effect going off, or a musical score playing that needs to be mixed with the speech. Between the TTS and the mixing, you're using a lot of CPU power that wouldn't be needed if you played the dialogue from MP3. Now, the right situation for TTS in a game vs. MP3 would be in DYNAMIC text, where you can't have voice actors play a scene out because it changes too often (or, has insertable parameters, eg: "Hello %s, you're looking very handsome this %s" (replace the first %s with the characters name, and the second with the time of day (morning, afternoon, evening))).

There are definately some applications, but I'd prefer they kept to compressed formats except for situations like the one given above. =)

--
All I know about Bush is I had a good job when Clinton was president.

Re:This could be useful in games. by DarkEdgeX · 2001-07-31 03:09 · Score: 1

Maybe, but L&H's TTS engine functions fine on a normal PC-- no special server hardware required. (Although they do make versions that work with Dialogic hardware, but that's for a very specific market segment.) I don't know whether AT&T's will (or does) function the same, so you may be right. =)

--
All I know about Bush is I had a good job when Clinton was president.

A company?? by Moridineas · 2001-07-31 01:14 · Score: 1

I'm curious as to why the poster said that it would allow a "company" to make it sound like someone said something they didn't. Why a company in specific? Anyone with the software could do it...that's just be hysterical and showing bias.

Scott

Re:Weather Radio by presearch · 2001-07-31 10:19 · Score: 1

The weather service uses a DecTalk from the early '80s. How...quaint.

Fakes by DreamingReal · 2001-07-31 01:12 · Score: 4

Dr. Rabiner said he was excited about the possibility of resurrecting renowned voices, like that of Harry Caray, the Chicago Cubs announcer who delivered rousing play-by-play broadcasts. "There are probably hours of recordings in archives," he said. Wouldn't it be great, he asked, if Harry Caray's voice could again be broadcasting in Wrigley Field?

Absolutely not. And for the same reason that second-printings, plastic surgery, and fake breasts all suck - they're not the real deal.

And as a die-hard Cubs fan since the age of 4, might I also add that the World Series drought for the last half century has taken on a sort of religious significance, not unlike the 40 years the Hebrews spent wandering in the desert. And Harry Caray was our Moses - resurrecting his voice without the man behind it is tantamount to sacrilege (not to mention unbelievably morbid!).

-------

--
We want some answers and all that we get
Some kind of shit about a terrorist threat
- Ministry

interesting legal ramifications by 0bjectiv3 · 2001-07-31 02:25 · Score: 1

What happens when a celebrity's "voice" is used without permission? Would this application constitute reverse engineering and thus be legal? Would it be the aural equivalent of a trademark violation and thus be illegal? Will near-imperceptable modifications to the emulated voice pattern be sufficient to avoid litigation?

[waiting for the first stupid lawsuit]

--

"Saddam Hussein cavorts with terrorists."

Maybe that's how they... by briggsb · 2001-07-31 00:45 · Score: 1

got Bill Gates to say that "Linux is the best OS ever."

Re:Cool... and disturbing. by sfe_software · 2001-07-31 03:35 · Score: 1

What happens when you get a sample of some General's voice and then use a synthesiser to call up the poor kid on guard duty and get him to let a bunch of terrorists enter the base?

If Sideshow Bob could do this without a computer at an air show, and get away with a (dud) nuclear weapon, just imagine what one can do with a computer.

But seriously, I'm sure it doesn't work like that in the real world. I'm sure it would take more than a simple phone call...

Unless one could hack into the Red Phone (if that really exists)...

- Jman

--
NGWave - Fast Sound Editor for Windows

There's an evil use for this too: by AFCArchvile · 2001-07-31 00:49 · Score: 5

I quote from U.S. Code, Title 47, Section 227, otherwise known as the Telephone Consumer Protection Act:

"(b) (1) It shall be unlawful for any person within the United States
(B) to initiate any telephone call to any residential telephone line using an artificial or prerecorded voice to deliver a message without the prior express consent of the called party, unless the call is initiated for emergency purposes or is exempted by rule or order by the Commission under paragraph (2)(B); ..."

You hear that? There is to be no telemarketing use of this technology!

--
"Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer

Re:There's an evil use for this too: by HugeMidget · 2001-07-31 02:04 · Score: 1

That's also the reason you don't get phone calls from a computer with a recorded voice asking you to call other number anymore.

This could be useful in games. by AFCArchvile · 2001-07-31 00:55 · Score: 5

Just imagine how much less space some of the more involving computer games like Half-Life and Deus Ex would take up if all the dialog was synthesized with key samples from the voice actor (or, should I say, the "phoneme source"). That saved space could be used toward other things, like textures or ambient sounds. Of course, the biggest challenge would be to allocate some processing power for the synthesis. Still, it's probably in the works.

--
"Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer

Re:This could be useful in games. by lobsterGun · 2001-07-31 01:04 · Score: 1

just imagine how much less bandwith voice over IP would consume with this.
Re:This could be useful in games. by bartle · 2001-07-31 01:16 · Score: 2

Voice acting has really limited the size of games; any game with a long playtime has had to stay away from it. It's just too much trouble to bring in voice actors for even the simplest parts, a Final Fantasy sized game would just be too expensive and time consuming to produce. A good voice synthesizer could replace all this, just add as much text as you want in the game and let the software do the rest. Another cool point is that without the cost of hiring voice actors, it would be one bit easier for small development houses to compete with large ones. And of course, voice acting in games is usually so badly done that this can't help but be a step up.
Voice synthesis is an option that game developers have been looking at for years. Let's hope it's time as finally come.
Re:This could be useful in games. by bartle · 2001-07-31 03:28 · Score: 2

FFX has full voice acting.
My understanding was that only some of the cutscenes would have voices. There hasn't been much info on it, I'm more than a little worried that the American voices won't be up to snuff. Can't wait to get my hands on it at any rate.

Not quite realistic but... by Tek+Neek · 2001-07-31 01:16 · Score: 1

For you windows users, try this out. Some of the random quotes have me rolling.

Demonstrations of the new TTS software by DaneelGiskard · 2001-07-31 00:47 · Score: 2

Here are demonstrations of the new software. This URL is given in the article, but not highlighted / linked, so here is a more convinient way to go to it.

Try it out! by DaneelGiskard · 2001-07-31 00:42 · Score: 3

You can try out the "research version of Next-Generation Text-To-Speech (TTS) from AT&T Labs." here.

I'm sure it's not the same thing as the one mentioned in the article, but I'm pretty sure the one in the article is at least based on this one.

Try it out!

Other Online Demos by DaneelGiskard · 2001-07-31 01:03 · Score: 5

Some links to other online demos, so you can compare:

http://www.elantts.com/indemo.htm
http://www.cstr.ed.ac.uk/projects/festival/userin. html
http://www.flexvoice.com/demo.html
http://www.acuvoice.com/downloads/ttsdemo.html

I searched for good TTS software to give voice to some of the 3d animations I did in max ... but I did not find anything satisfactory... :(

Job cuts in Hollywood... by KarmaBlackballed · 2001-07-31 00:38 · Score: 2

...are just a few decades away. Who will need an overpaid difficult celebrity when you can resurrect dead ones or invent new ones for movies, sitcoms, etc? Not so far fetched really. Just watch and see.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~ the real world is much simpler ~~

--

--- -- - -
Give me LIBERTY, or give me a check.

Re:Job cuts in Hollywood... by KarmaBlackballed · 2001-07-31 20:57 · Score: 2

People already pay to watch feature length movies without real actors --- Anime, ToyStory, etc. And these are literally cartoonish. The market will grow as the realism increases.

At some point I can imagine there being a small niche market for film "snobs" that insist that real actors make better films. Everyone else will watch computer generated flicks without a second thought.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~ the real world is much simpler ~~

--

--- -- - -
Give me LIBERTY, or give me a check.
Re:Job cuts in Hollywood... by KarmaBlackballed · 2001-07-31 01:09 · Score: 4

expect the same audience as if Tom Hanks were doing the character

And who says Tom Hanks ever has to fade away? It could be a brave new world where your future kids and mine grow up watching the same stars we have today and some from yesterday. I can imagine my grandchildren raving about that new Humphrey Bogart action film. Not so far fetched really.

And for those that wonder about the legal aspects ... I think Tom Hanks would not mind getting paid nice royalty fees for the use of his young persona when he is retired in his 80's.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~ the real world is much simpler ~~

--

--- -- - -
Give me LIBERTY, or give me a check.
Re:Job cuts in Hollywood... by zhensel · 2001-07-31 00:46 · Score: 2

Except the reason people are drawn to many movies is because of the big names attached to them. I doubt you can advertise "Husky Male 4" as the voice behind Timmy the Benevolent Bulldog and expect the same audience as if Tom Hanks were doing the character. Then again, if they made up a movie reviewer (Dave Manning), they could always make up an actor.

On a related note, was anyone else utterly offended by those alcatel commercials? I didn't even know what alcatel was (now I know they are a french wireless telco co. that tried to buy verizon and was trying to get American recognition), but I immediately chose not to buy their products. Somehow, I doubt their product has the same effect as MLK's speech. The worst part is that the CG wasn't even good! I said, "now why did they go and take the MLK Quake model and stick him in a poorly rendered Washington mall" before I noticed the point they were trying to make. At least in the Lou Gehrig commercial they had the sense not to pan the camera around and show what a polygonal catastrophe he was. At least the commercial with Fred Astaire (I think) and the vacuum looked good.
Re:Job cuts in Hollywood... by tanpiover2 · 2001-07-31 01:34 · Score: 2

I saw that ad at adcritic.com, and got to do a little survey on it. I told them that I thought it was bullshit, but I guess they didn't listen, if they actually used it.
Soapbox mode on
That speech in 1963 was a seminal moment in American History, and has become an iconic symbol for the entire Civil Rights movement in this country. Personally I felt that was offensive to have something so powerful cheapened by using it to sell wireless or whatever the hell it was.
Next we'll have the guy in front of the tank at Tienanmen Square waiting for a Pepsi, or Kim Phuc going naked for PETA.
It just ain't right.

--

But masters, remember that I am an ass: though it be not written down, yet forget not that I am an ass.
Re:Job cuts in Hollywood... by smitcham · 2001-07-31 03:07 · Score: 1

IIRC something similar to this when hoover was using Fred Estaire footage and mapped in a vacuum cleaner. I believe the lawsuit that ensued decided that the actors have no rights to their on-screen personas.

Therefore, once they perfect this Tom would get no money unless another ruling came down changing things.
Re:Job cuts in Hollywood... by employee+No.466351 · 2001-07-31 01:49 · Score: 2

Speaking of dead celebrities... I came accross an article the other day here about how some studios/companies are working on rendering photo-realistic versions of actors onto film. Envision combining that with the technology described in this article, and it won't be long until we start seeing silly combinations on-screen like Charlie Chaplin fighting Bruce Lee. Better yet, picture this: You throw in a print book into a computer which then proceeds to scan the book page by page, and after a few hours, it has rendered a complete full-length film out of it.

Movie dubbing today... by KarmaBlackballed · 2001-07-31 01:19 · Score: 5

One neat application would be to dub foreign language films in the target language using the voice of the original actor even though they do not know the target language. They could start doing that today.

They could start by fixing all those old Chinese and Japanese action/monster flicks dubbed by the same guy talking in false baritone and falsetto.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~ the real world is much simpler ~~

--

--- -- - -
Give me LIBERTY, or give me a check.

Pratical use by Arethan · 2001-07-31 00:50 · Score: 1

Aside from the obvious fraudulent uses, this technology could also be put to good use in the customer support field. Particularly ISP support, where most of the calls are about changing email passwords, what happened to their favorite homepage, and what the dialup # was again. Extended issues will obviously need to be handled by a human, but the biggest part of making automated human-computer voice interaction work smoothly is fooling the human into believing that it is talking to another human.

For instance, if the voice recognition software messes up once in a while, it could easily ask the human to repeat themselves. If the human knew it were a computer, they would be instantly frustrated with the machine. However, people are more forgiving when dealing with other people, so they would repeat themselves more clearly when asked.

So all we need now is a good speech to text application that requires no voice training, and low level support can kiss their jobs goodbye.

If you think about it though, it's not necessarily a bad thing. Support is the biggest drain on any company's budget, and the removal of the low job would mean that there are more people available to fill the more complicated jobs.

Re:Doubtful. by devnullkac · 2001-07-31 01:27 · Score: 1

It's possible that an AI(tm) would not be needed for many applications. Starting from text isn't necessarily a requirement of this software. In fact, it would probably perform wonderfully when combined with the half of a speech-to-text recognizer which parses speech into phonemes. You'd just need to augment it with intonation detection.

Then you'd have a really scary application: a box you place between you and the phone to emulate any programmed voice. Smells of Mission Impossible.

Of course, there are less scary applications, like perhaps helping assure anonymity in phone calls by using a generic voice from a payphone.

--
What do you mean they cut the power? How can they cut the power, man? They're animals!

I can see it now by Auckerman · 2001-07-31 00:56 · Score: 3

George Bush: All your Scuds are belong to us!

Saddam Hussein: Somebody set us up the bomb!

God help us all!

--

Burn Hollywood Burn

I don't know about it.... by 3-State+Bit · 2001-07-31 01:16 · Score: 3

I could understand it if they said "We can take a sample of speech, for instance, an actor reading a script in a dead celebrity's role, and then digitize it into an inflection and reproduce the same inflection in a different voice."
But this isn't what they're saying:
"The software [...] turns printed text into synthesized speech"
Which prays the question "How does the software know what inflection to associate with the printed text?"
I know that the same words can sound radically different. Take the phrase "one, two, or three" in each of the following contexts (not that none begins or ends a sentence):

"I can't imagine why ANYONE would want four subnets in their own house. I mean one, two, or three I can imagine, but four??"
Please press one, two, or three at the tone."
Okay, so it was in the early morning before 4. But can you be at all more specific? Do you have any idea whether it was at around one, two, or three AM?
Settings of four or five are considered dangerous, while settings of one, two, or three are considered to be within acceptible parameters.

I think that if you record yourself saying the above phrases, then crop out just the highlighted phrase, you'll find a different inflection in each one. Without understanding what a sentence says, or, more precisely, what the person means who is saying the sentence, the fact that you can produce any inflection won't help you determine which one is right.

I found Liz and Ike playing scrabble while very drunk, and putting on all sorts of none-sensical words. I even saw "Zisis's", using a piece of rice for an apostrophe! (Zisis is a greek convenience store near us).
I told Liz and Ike that I thought they were crazy. "Heheh, yeah we're crazy", Ike says, "but each of us only put one word down that broke the rules in a major way."
"Which words were those?"
" 'Zisis's' and 'Windology' "
Since Liz was the crazier of the two, I ventured a guess, "Liz's is Zisis's, isn't it?"
"Nope. Liz's is 'windology'. 'Zisis's' is mine." Ike replied proudly.

Anyway, the point of this exercise is to show that a human reader reading this can make the phrase "Liz's is Zisis's, isn't it" sound natural, but I bet any speech-synthesizing software that just follows rules will make it sound incomprehensible. That's because speech is more than reading things by set rules -- it is reading things to reflect your internal parsing of the sentence.

Not to mention the fact that actors can read the same line in a thousand different ways to show a thousand different "interpretations" (states of the character who speaks it, or parsings of the sentence). How will this software produce them, if it only has the same text to parse?

Either someone manually will give it an inflection, or it needs (or would need before truly being able to make good its claim) a human oral reading to "mimic", where it can use the synthesized voice to sound the same inflection in a different voice. Now that would, as the old mis-translated Coke slogan goes, "bring your dead relatives back alive."

Mere dancing with power brooms? Ha, now celebrities will be telling you about how easy to use AOL is. So easy to use, no wonder it's number 1 -- even among the dead!

Gee, I can hardly wait.

(It was intended to sound like "coca cola" when its Chinese characters pronounced).

--

Re:I don't know about it.... by Dephex+Twin · 2001-07-31 06:24 · Score: 1

Actually, this isn't so impossible.

While it is true that there is a variation in inflection depending on the context and placement of each word, overcoming this issue does not necessarily require human intervention or AI.

With a large enough corpus of data (from analysing millions of "sample" sentences), and the use of N-grams and Hidden Markov Models, etc., a TTS application could essentially use probability to determine the appropriate inflection for each sentence. And with smart enough algorithms and a good corpus to go on, this can prove remarkably accurate. Could it be 100% accurate? Of course not, but people make inflection mistakes when reading out loud too. I think it would be accurate enough to be "useful" though.

This is how it works with Speech-to-Text, but I don't know how much of this is currently done with Text-to-Speech.

--

If you want to make an apple pie from scratch, you must first create the universe. -- Carl Sagan
Re:I don't know about it.... by Nihilanth · 2001-07-31 02:58 · Score: 1

Get a large enough grammar database (hell, put it online and force people to pay to access it), throw in some heuristics, and a self-updating "lessons learned" database, and blammo, you've got perfect text-to-speech. The only real obstacle (which ATT claims to have beaten) is acoustically simulating realistic human speech.
Re:I don't know about it.... by Nihilanth · 2001-07-31 12:54 · Score: 1

why couldn't it, on a large enough scale? I mean, the different ways a sentance can be interperated are determined by the emotional response it is supposed to elict. Can't that be extrapolated from context? I mean, sure, that might be a bit unrealistic in terms of current limitations in databasing and processing power (and heuristics), but if we always limited our speculation to the confines of what is possible -today-, we would still be booting our OS' off of floppy disks and basking in the glow of our new 128 kilobyte RAM upgrade.

I think perhaps i mistakenly implied that such a solution was availible using today's methods, and you're right, that -does- seem a bit far-fetched. We can't be that far off, though.

The other thing to consider is if the text synthesis doesn't need to be "on the fly", the minutea can be recorded in some sorta markup-language after the script has been established. Although this wouldnt be useful in an application where the voice synthesis had to be done on the fly, I would imagine a 200k .DAT file that accomplished the same task as 40 minutes of recorded digital audio would be pretty sexy (not even considering the LACK of a need for any sort of original recording in the first place)

Where can I get the star trek computer mod by (H)elix1 · 2001-07-31 01:22 · Score: 2

for my car's MP3 controller?

--
+++ UGUCAUCGUAUUUCU

Even Worse by Kasreyn · 2001-07-31 08:37 · Score: 2

"Your honor, the state would like to submit the following item, exhibit a. It is a recording of the defendant's session with the police officers wherein he confessed to the crime he is now pleading guilty to... As you can all hear, this is clearly the voice of the man sitting in this courtroom today, ladies and gentlemen..." Or, put less narratively, what would such "voice-duplicating" technology do in the area of criminal (judicial) evidence? Will there be new tests required, to find anomalies in recording to make sure they weren't faked? Or will there just be a new way of breaking the system, used like any other card, just like buying an "expert" is used today? Would this technology offer greater fidelity and accuracy of reproduction over, say, voice impersonators (people), or programs that might patch together fragments of speech? The only real danger is that the courts not realize this in time, and take sufficient precautions to prevent audio evidence from becoming completely subvertible. -Kasreyn

--
Kasreyn: Cheerfully playing the part of Devil's Advocate to hairtrigger /. flamers since 1999.

not really old news.... by xtermz · 2001-07-31 00:48 · Score: 1

but i remember a couple years back (95-6'ish? ) on discovery channel they had this program on a sun box that would take samples of a persons voice and do the same thing, i guess it was just a matter of time before it became wide spread... but one has to wonder if at&t bought this original technology, because i believe it was some college or something doing the original research..

"Pussy: You spend 9 months trying to get out of it, and the rest of your life trying to get back in..."

--

I lost my concept of community when my community lost all concept of me.

John F. Headroom by corvi42 · 2001-07-31 01:58 · Score: 3

Ask not what your country ... can do ... for you but what
what
what
what
what
what
you can do for for your country.

--

There are a thousand forms of subversion, but few can equal the convenience and immediacy of a cream pie -Noel Godin

Ahhhhhh.... by Dave+Emami · 2001-07-31 11:24 · Score: 1

The downside is that the Chinese and Japanese action/monster flick producers that use this technology for film dubbing will also use Babelfish to translate the scripts.

So that's how we got "All Your Base Are Belong To Us."

--

"The Greens lynched a hacker in Chicago. Last month, but I think the body's still hanging from the old Water Tower."

Depends on how it's presented and used... by MadCow42 · 2001-07-31 02:39 · Score: 2

Let's say someone wanted to make me say something in direct contradiction to my normal views, then publish that.

It all depends on how the doctored sample is presented and used... if they represent that it's a real recording of you, or hint that it might be you, or allow others to think that it's you, that would be illegal (libel, slander, etc.).

However, simply making a recording that sounds like you but freely admitting that it isn't, could be considered fair use under parody.

MadCow.

--
I used to have a sig, but I set it free and it never came back.

Re:Entropy-licious by Bonker · 2001-07-31 00:56 · Score: 4

Another interesting point of interest is with the new Final Fantasy: spririts within movie, actors are beginning to consider copyrighting their likenesses,

Good for them... Better for us! Who wants dumpy Sandra Bullock, bug-eyed Steve Buscemi, or smarmy Ben Affleck when we can have perfect, artist produced, fan-boy (and fan-girl) material like Aki from FF?

--
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!

Be afraid. Be very afraid by sacremon · 2001-07-31 00:51 · Score: 1

Just imagine what the networks could do with this:

Recaps of sporting events read in the voice of Howard Cosell, with the script written by William Safire.

Fred Astair singing about vacuum cleaners, not merely dancing with them.

Aahhhhh!

--
If you can't beat them, embrace and extend them.

Re:Cool... and disturbing. by Tin+Weasil · 2001-07-31 02:58 · Score: 1

No... not too much TV. Just 9 years in the Navy. You can talk about all of the procedures in place to prevent something like this from happening, but I have seen, first hand, many occasions where a call from the Commanding Officer was all that was needed to stir ship's company into a tizzy.

I can't tell you how many times as an 18-year-old sailor straight out of boot camp I was required to stand various guard watches... without any pass down on procedures and just the general orders of sentry to get me through the night (and no way to communicate with someone higher up.) If an Admiral had called me up telling me to let such-and-such truck onto the pier, I would have done it because I would have been initimidated by rank and I wouldn't have known any better.

Cool... and disturbing. by Tin+Weasil · 2001-07-31 00:37 · Score: 3

While this is a really great leap in TTS technologies... which is sure to make computers for the blind even more accessible then ever... the idea of being able to reproduce any voice is very scary.

What happens when you get a sample of some General's voice and then use a synthesiser to call up the poor kid on guard duty and get him to let a bunch of terrorists enter the base?

Re:Cool... and disturbing. by dachshund · 2001-07-31 00:54 · Score: 5

Actually, this isn't a very exciting thing for the blind. For most practical uses, the visually impaired tend to prefer speed over quality. It doesn't have to sound great as long as it can read several times faster than "normal" speed. The AT&T TTS isn't really designed for this purpose.
Its main use is for telephony (surprise!) but it I suppose it'll be turning up in new and exciting places.
Re:Cool... and disturbing. by IdiotFactory · 2001-07-31 11:20 · Score: 1

does it really matter if the voices sound like a celebrity's or not? a voice is a voice. this is going to cause more harm than good, weirdos getting ahold of this and accusing uncle walton of god knows what. blah. no sir i dont like it. my puter talks to me, (mac) and i prefer the cold robot mispronunciation from the speakers! id take that over an imitation of sean connery any day... maybe i should reconsider my view.... hmmm.... yesh.
Re:Cool... and disturbing. by tb3 · 2001-07-31 00:45 · Score: 1

Old news, Mission:Impossible already did that.

--
www.lucernesys.comHorizon: Calendar-based personal finance
Re:Cool... and disturbing. by Secret+Coward · 2001-07-31 16:30 · Score: 1

but it I suppose it'll be turning up in new and exciting places.
Web advertising! If you're quick, maybe you can get a patent! :(
BTW, has anyone 'heard' a web advertisement? With all the complaints about people ignoring web advertisements, I'm really surprised advertisers don't play a sound, or cite their name with sound.
Re:Cool... and disturbing. by Anixamander · 2001-07-31 00:44 · Score: 5

What happens when you get a sample of some General's voice and then use a synthesiser to call up the poor kid on guard duty and get him to let a bunch of terrorists enter the base?

Obviously if this does happen, then all their bases...aww, forget it.
--

--
Do not taunt Happy Fun Ball(TM)
Re:Cool... and disturbing. by Fortmain · 2001-07-31 00:51 · Score: 1

"What happens when you get a sample of some General's voice and then use a synthesiser to call up the poor kid on guard duty and get him to let a bunch of terrorists enter the base?"

Nothing. If that 'poor kid' has received ANY of the training our military police get, he will know better than to let anyone just call him up and tell him what to do. People can currently imitate someone else's voice, so there are already rules in place to avoid this problem.

--

We gotta make democracy safe for the world! -- Pogo
Re:Cool... and disturbing. by Benetech · 2001-07-31 04:42 · Score: 1

Actually, it is quite exciting for most blind people and many people with speech disabilities. Technically skilled blind people won't be excited for the reasons posted. As a developer of software for the blind, one of the biggest barriers to use of technology by seniors (the majority of blind people) was the machinelike quality of the current voices. This voice improvement brings the technology much closer to human sounding. It may work pretty well for Stephen Hawking, but a teenager who has no voice would really prefer to have a voice that sounds closer to a human. The focus in the New York Times piece was on the downside of a human sounding synthetic voice: I would rather see the positives.

Audiorealism = Way More Data Than Text by Vegan+Pagan · 2001-07-31 01:14 · Score: 1

Computers probably can perfectly simulate someone's voice, but wouldn't they require data for pitch, patterns, pauses, mental considerations, freudian slips, sneezes, coughs, burps, sighs, and other details? I think to perfectly simulate a voice, it would require all the meticulous modeling of the FF movie, just now for audio physics.

Interesting Prospects for MMORPG's by Giddeon · 2001-07-31 05:14 · Score: 1

Massively Multiplayer games could certainly benifit from this tech - Imagine defining your own dwarven accent during character creation. The downside to such a feature is that with any global chat, you wouldn't be able to hear the environment in all the chatter. ~Giddeon

Coming soon.... by spellcheckur · 2001-07-31 00:41 · Score: 1

Great new soundbytes like: "Hello this is Bill Gates, and I pronounce Linux, 'Oh God, help me, they're making me look like a fool!'"

Most excellent by baptiste · 2001-07-31 00:40 · Score: 3

This is great news. For too long TTS has been held back by questionable voice quality. Microsofts engine was a huge step forward, but still wasn't quite there. As the technology advances and requires less CPU power (or more CPU power is fit into a smaller space) I can imagine this will rapidly show up in places where voice prompts would be nice be are so critical as to deploy a bad sounding technology.

--
Top Most Bizarre/Disturbing Error Messages

Disabled by jonearth · 2001-07-31 14:25 · Score: 1

I get an idea. This device can help those who lost the power to speak.What they need is a little device with a small keyboard tied to their palm or wrist so they can type to speak!

at long last! by hyrdra · 2001-07-31 13:57 · Score: 2

The direct link to the AT&T TTS research site (the group who developed this amazing TTS system) is here:

http://www.research.att.com/~mjm/cgi-bin/ttsdemo

We use this same system at work for our phone navigation system. Paired up with a natural language processor, it's quite easy to talk to as if a semi-sentient person. With a large phrase database, it's much more than an automated system.

I am glad AT&T is finally releasing a commercial product to the masses. I can't wait to get the developer version!

--

"I'll just chip in a bit for RedHat: I actually have that installed on my university machine." - Linus, '95

Re:Entropy-licious by BarefootClown · 2001-08-01 00:42 · Score: 2

"Your honor, this videotape clearly shows what my client has been arguing all along."

"Well, I'll be damned! The tiger really was break-dancing!"

"Make it ten--I am only a poor corrupt official."

--

"Make it ten--I am only a poor corrupt official."
--Captain Louis Renault (Claude Rains), Casablanca

This is good and bad by MSBob · 2001-07-31 00:44 · Score: 2

Well, it's good that we're finally (after decades of research) we get realistic sounding Text to Speech. On the other hand I can't imagine Stephen Hawking speaking in non-metallic voice. Am I weird?

--
Your pizza just the way you ought to have it.

Re:This is good and bad by Smedrick · 2001-07-31 00:54 · Score: 1

Hehe...I can't help but picture Stephen Hawking with the voice of Bobcat Goldthwait. So wrong...but so funny.

--

--
"I strongly urge both the faint of heart and the faint of butt to leave the room at this time."
- Strong Bad

Court evidence? by MSBob · 2001-07-31 00:47 · Score: 2

Will this render all voice recordings useless as a potential court evidence? I know that courts already try to not use voice recordings as evidence but this could make any voice recordings totally inadmissable as evidence. Don't you think?

--
Your pizza just the way you ought to have it.

brings a whole new meaning to crash by fatgraham · 2001-07-31 00:39 · Score: 2

genghis kahn screaming at meas windows crashes on debug again

anyways, most dead celebrities' voices can be sampled off bill and ted's excellent adventure :]

Big thing missing by TH4L35 · 2001-07-31 00:44 · Score: 2

There is still a big component missing from even this form of text-to-speech synthesis, and that is the fact that the program still doesn't know how to inflect and stress the reading to make it sound natural. Sure the sound of individual words might bear an uncanny resemblance to a famous human voice, but any decent sounding sentences still have to be told which words to stress by a human interpreter...

--
When Thales was asked what was difficult, he said, "To know one's self." And what was easy, "To advise another."

Re:Try it out! - It's not that great by babymac · 2001-07-31 02:27 · Score: 1

I couldn't agree more. The inflection of normal sentences is way off and it still sounds very robotic, no matter what AT&T claims. This is at best an incremental improvement over current speech synthesis software. The "pro" voices that come with Mac OS sound very much like the samples on the web site. Until things like inflection based on context and emotion can be synthesized, this technology is only good for "Please press 1 now."

--
"War makes me sad." - Me

eeep... by JohaunaRei · 2001-07-31 01:13 · Score: 1

Oh, does this one have the potential for abuse. -_-

Oh God, please do shut up. by KingAzzy · 2001-07-31 04:38 · Score: 1

You paranoid stoners. Put the bongs down and go get a job. Your conspiracy theory ravings sound like something my grandmother would be going on about.. "that wicked technology will kill us all!!!"

Have any of you even bothered to check out the demos of this technology before posting your tripe?

--

--
$ chown -R us:us yourbase

Re:Oh God, please do shut up. by Nihilanth · 2001-07-31 04:55 · Score: 1

I did not, of course, check the technology demo.

I am, however, gainfully employed.

Furthermore, the point of my "tripe" was the possible rammifications of technology down the line, not so much ATT's new stock-booster.

You've misunderstood me if you got the impression i thought the technology was inherrantly negative. Every technological advancement that is cleverly utilized to do something destructive simply spurrs on new technological advancement, which I feel is inherrantly positive.

Although i realize your "stoners" comment was meant to be an insult, one would have to reason that attributing forward-thinking speculation to cannabis use was an endorsement rather than an attack.

Solution: Practice voice safety by ez76 · 2001-07-31 02:41 · Score: 1

I don't know about you guys, but I have been appending a signed, audio MD5 hash of my words to the end of every sentence I've uttered for the past 5 years.

There are some downsides, but I rest assured that everyone knows it is the real me when I speak.

Museums are going to be so fun now! by CrazyJim0 · 2001-07-31 02:15 · Score: 1

Real talking voices from characters... Now, if I can follow up with what I want to do... Which is to have a museum robot use speech and visual recognition to entertain the guests... Basically it will identify each person, and use learning AI(stuff people get headaches over), to develop choices of actions to do... I think a little ball droid on hydrolics with a little video screen having a conversation with museum guests would rule. Actually if you could get all the copywright holders to agree(one of happy capitalism's problems), you could probably build one now.

--
God spoke to me

Umm, where are your senses? by Telek · 2001-07-31 03:44 · Score: 1

For starters, if you read the NYT article (and if you happen to have used this software in beta) it is not NEARLY as good as they're making it sound.

And besides, do you think that you will SERIOUSLY be able to, after some tweaking, make a sensibly long trail of words and have a professional compare that to the real person saying it and NOT KNOW which one was the original? I **seriously** doubt that we're anywhere close to that yet.

--

If God gave us curiosity

Weather Radio by BigGar' · 2001-07-31 01:06 · Score: 1

They should send a complimentary copy to The National Weather Service to replace that Stephan Hawking sounding thing that tells us to take cover now.

--

Shop smart, Shop S-Mart.

DMCA at it again by robvasquez · 2001-07-31 00:56 · Score: 1

I copyrighted my voice if you reproduce ityou go to jail

Reg-free link by 3ryon · 2001-07-31 01:40 · Score: 1

Most NYT tech articles are available without registration on Yahoo! News. This one's at: http://dailynews.yahoo.com/h/nyt/20010731/tc/softw are_is_called_capable_of_copying_any_human_voice_1 .html

--
Kind thoughts do not change the world

Maybe its in my head... by powerlinekid · 2001-07-31 02:03 · Score: 1

Ok... Final Fantasy... incredible movie, and sometimes even looked real (hell, it was close enough for me at the present time). Now, while this is AT&T software is nice and all, it is nowhere near the relative level of the graphics in Final Fantasy. In fact, I've heard this before... it sounds eirily similiar to that program on the mac that my friend so loves annoying me with.

--

can't sleep slashdot will eat me

Possable abuses? by the_2nd_coming · 2001-07-31 02:30 · Score: 1

What if this software were used to Synthesis a conversation between 2 people and was then used to incriminate a person? Just sample the persons vice and make a recording and submit it as evidence.

voice recordings will either become invalid forms of evidence as their origins are suspect, or many good people will start to go to Jail for things they did not to because of things they did not say.

--

I am the Alpha and the Omega-3

I'm wrong by bartle · 2001-07-31 03:54 · Score: 2

Just scanned through some of the review sites. Most of them preferred to discuss the plot in detail rather than actually review the game, but FFX will indeed have recorded voices for every line in the game. I really, really hope the US division doesn't screw this up.

Anyway, my original point is still valid for the X games that come out each year which don't have a multi-million dollar budget. Final Fantasies are truly the big budget juggernauts of the game industry.

Re:Doubtful. by MarkusQ · 2001-07-31 02:40 · Score: 2

The problem is that to produce proper intonation from any text requires that you understand the speaker's intent. In some cases you can not do this without understanding things such as:

1) the speaker's relation to the listeners (including, perhaps, people hidden from the putative "listener" being addressed, e.g. microphone);

2) the speaker's moral position on issues that may not be explicitly raised anywhere in the text;

3) the speaker's goals, short term, long term, publicly admited and deadly secret;

4) the speaker's education and state of sobriety, present focus of attention and recent emotional state;

4) the fact that something may have occured to the speaker as they were speaking and thus affected their voice;

...and so on and so forth.

Even people have a hard time doing this (thus the difficulty finding good actors out of all the wannabes), and people are specifically designed for this sort of task.

I stand by my claim: to generate speach with proper intonation from any text in general requires full AI.

--MarkusQ

P.S. Welcome to karma caphood.

Doubtful. by MarkusQ · 2001-07-31 00:50 · Score: 5

Match the intonation of any human voice, without a sample of that voice saying the phrase in the desired intonation, just from the text?

"Yeah, right!"

"Officer, it is clear to me that you are in fact the one who is inebriated."

"I found it that way. Honest."

"Now, nothing has really changed since the last contract, we just cleaned up a few details; Please sign and return ASAP."

"But Billy got one...why can't I? Please?"

"Would you like to move to the sofa?"

I don't buy it for a minute. To do what they claim would require real AI(tm).

-- MarkusQ

Re:So? - Request Granted. by feed_me_cereal · 2001-07-31 01:32 · Score: 1

if its more than 90% accurate with my normal speach, put me up for 1,000,000 shares! :)

--
"Question with boldness even the existence of a god." - Thomas Jefferson

HAL 9000 voice here we come! by rancher+dan · 2001-07-31 00:51 · Score: 1

I've wanted a computer that can speak with HAL's voice like, well since 1968. I hope hope hope whomever holds the rights to 2001 allows a HAL voice to be sold.

Remember Grace Kelly? by Dungbutter · 2001-07-31 03:59 · Score: 1

Great.
Dirt Devil can have vacuum cleaner commercials
with dead celebrities dancing and singing their praise.

Re:He would not. by r3volve · 2001-07-31 02:34 · Score: 1

Yeah... I really liked the work he did on Radiohead's OK Computer... just wouldn't be the same without that voice.

Obligatory reality check by p_trinli · 2001-07-31 01:18 · Score: 1

From the article:

For now, technical limitations may temper any worries that a person's voice could be lifted without permission. To build the software that recreates unique voices -- which AT&T Labs is calling its "custom voice" product -- a person must first go to a studio where engineers record 10 to 40 hours of readings. Texts range from business news reports to nonsense babble.

The recordings are then chopped into fragments of sounds and sorted into databases. When the software processes a text, it retrieves the sounds and re-assembles them instantly to form entirely new sentences. In the case of long-dead celebrities, archival recordings could be used in the same way.

--
Aaron J. Shaver
http://aaronshaver.com/

Re:too bad for steven halking by raechy · 2001-07-31 19:17 · Score: 1

Well, steven halking now cant make phone calls to order a pizza now can he, or 911.

"Hello, this is steven halking, my house is on fire....beeep"

what now?

jobs anyone? by sewagemaster · 2001-07-31 04:15 · Score: 1

hmmm looks like the current phonesex operators will be out of jobs! all because everything will be automated... all the fantasy created in text by freaky greasy haired guy geeks...

finally we.. umm THEY are getting some!!

i dont have greasy hair though :)

--
my blog

It cant be perfect by mary_will_grow · 2001-07-31 02:29 · Score: 1

I'm sure with a little signal processing its not difficult to compare actual-person-talk to their AT&T Text-to-Speech counterpart.......

--
Why stick up for big business?

Re:Entropy-licious by Smedrick · 2001-07-31 01:04 · Score: 1

I don't know. I find it hard to believe that a synthesized voice will ever be good enough to replace the real thing...especially in movies like FF. You need the raw emotion of the actors to give the drawings/renderings some life. Not only are human voices like fingerprints (or snowflakes) in that no two are exactly alike, but you very rarely say the same word in the same way twice. (Hell, my "about" changes every other sentence, going from the Canadian "aboot" to the New Yorkian "abaht") There are many factors in human speech...intonations, context, whathaveyou...that make every sentence unique. I suppose if you made a big enough database and changed the pitch (and whatnot) of the voice you could perfectly recreate the human voice. But it seems to me that you'd need an inifinte amount of space to hold every word with every possible intonation and accent.

--

--
"I strongly urge both the faint of heart and the faint of butt to leave the room at this time."
- Strong Bad

Re:Entropy-licious by Smedrick · 2001-07-31 02:12 · Score: 1

Hey, there's always the argument that our voices reflect our "soul". And the audiophiles who still insist that they will always be able to discern a synthesized instrument from a real one. A synthesized voice can arguably become too perfect.

Imperfections are what makes us human.

--

--
"I strongly urge both the faint of heart and the faint of butt to leave the room at this time."
- Strong Bad

test phrases by FrankHaynes · 2001-07-31 20:57 · Score: 1

I'll never forget years ago at a trade show how a text-to-speech vendor was taken down several pegs when a smart-ass came up and typed

"If we convict him, he will be a convict"

into his nifty box and it did not inflect the two variants of 'convict' properly.

I'm happy to report that all but one of the above demos does so correctly, as does the Death Star demo that started this thread.

I'm sure there are other test phrases including words such as 'read' or 'lead' that reveal strengths or weaknesses of the parsing algorithms.

--
slashdot: A failed experiment.

Re:Weather Radio - N.O.A.A. is already on it by FrankHaynes · 2001-07-31 21:02 · Score: 1

http://www.nws.noaa.gov/nwr/voicesamples.htm

Go there today to vote for your favorite voice candidate. It used to be fun to listen to the weather broadcasts, but I can't stand the poor and uneven implementation of that system they use now. I hope the new system lives up to its promise.

--
slashdot: A failed experiment.

Real time spoken translator by CommieLib · 2001-07-31 01:52 · Score: 1

Pop this technology into a handheld, paired with speech recognition technology, an go visit the south of France without speaking a word of French...

--
If your bitterest enemies are people who hack the heads off civilians, then I would say you're doing something right.

I see unreal people... by frank_adrian314159 · 2001-07-31 01:49 · Score: 1

Wow! All they need is realistic moaning and there goes the 9XX(X) industry.

--
That is all.

Re:Entropy-licious by Nihilanth · 2001-07-31 01:21 · Score: 1

That's something i have no experience with. can you (or someone else) briefly explain the perjury laws, as they would apply in this case?

I mean, just reasoning from the point of departure that the accused is "innocent until prooven guilty", if you make that proof impossible, how would someone be convicted?

A crude example would be, say, a chat log. If someone were to just hand in an ASCI or HTML transcript of things that were said online, I dont see how that would be admissible evidence, since it only takes a word processessor and a little bit of time to forge/alter. Even with IP logging, THEN you have to proove that no one was spoofing the IP adress.

Given the above, could someone explain these perjury laws?

Re:Entropy-licious by Nihilanth · 2001-07-31 01:28 · Score: 1

Well, i'm not sure about if ATT's technology is "there" yet in terms of what you're describing, but it certainly can't be a long way off. There are instrument synth's that have been out for a while that actually acoustically model the instrument being synthesized, and instead of altering the frequency/amplitude of the generated noise, actually change the model's airflow, resonance, etc. Processing power is up to the challenge, certainly, and its not too much of a stretch to imagine a synth that would model a person's vocal chords, they're not all that complicated, and then account for lung capacity, flow, etc. From there, choosing which words to say and how to say them are simple, and could be accomplished with some simple markup language (like HTML).

"Accelerator cards" are a trend that would make this even more possible. A 3d graphics accelerator is a must-have for today's games, and for a lot of people, a beefy soundcard is a must have (with hardware acceleration of "cool sound stuff"), and there have been some mentioning "out there" of the possability of an "artifical intelligence" PCI card, that would take the strain off the CPU in doing common "AI" tasks such as pathfinding (i dont know enough about AI to give better examples than good 'ol A* pathfinding). So, given that, if the technology were to take off, we could ostensibly see "Voice synthesis" PCI cards, or more likely, improved Voice synth incorperated into the chipsets of the next generation of $100+ soundcards

(on an unrelated note, i just got the Herculese Gametheatre XP soundcard, and it rocks throughout the projects)

Re:Entropy-licious by Nihilanth · 2001-07-31 01:33 · Score: 1

...and if you juxtapose the ultra-realistic movie-making that squaresoft has been doing with nearly-perfect voice synthesis (that could always be retouched and tweaked in post, it wouldnt have to be perfect on the fly), you would be able to completely eliminate the profession of acting. After all, if you can perfectly -synthesize- a voice, you could tweak the variables and create a completely unique voice, and 3d render a unique character, create a truely "virtual" person.

Re:Entropy-licious by Nihilanth · 2001-07-31 01:58 · Score: 1

That could be a great thing, maybe it would stem the tide of hundreds of cookie-cutter broadway musicals every year (to compete with the instant gratification of television and movies), and recenter commercial theatre on "quality"

Re:Entropy-licious by Nihilanth · 2001-07-31 02:15 · Score: 1

Even if there were no threat to the legal system, how easy would it be to synth. someone important's voice for social engineering? a police officer? a corperate CEO? Sounds like a good time to invest in caller ID.

I could think of some major mischeif someone could cause with a radio transmitter set to a police frequency and this technology

Re:Entropy-licious by Nihilanth · 2001-07-31 02:21 · Score: 1

I agree with you about the keyboard, even the most expensive electric piano's that ive had the opprotunity to fiddle with were not much more than glorified general midi or frequency shifted samples. The synths i were referring to were something i saw on some science television show a couple years back, and existed strictly in a labratory (and with the aid of a GFlop or two, i would imagine), and were able to do some truely incredible reproductions of a flute and a clarinet, i beleive. I havent heard anything about this recently, but the concept stuck in my mind.

Working a "day job" as an acoustical engineering intern, i get to see some pretty impressive modeling and analysis done on some -old- computers, I don't think the actual technology is that far off, just an affordable implimentation. After all, today's computer periferals can keep track of millions of triangles, account for echos and reverberation in hardware accelerated sound, and tons of other goodies. Maybe it's out of reach of the chipsets currently out there, but i think its just a matter of someone sitting down and putting all the peices together.

Re:Entropy-licious by Nihilanth · 2001-07-31 02:32 · Score: 1

I forget where i saw this website (it was probably linked to from Memepool.com), but theres a website out there where you can custom order a strand of DNA, they line up all the proteins for you and ship you the completed DNA segment. All you have to send them is the data.

Every useful technology was spoofable long before it became useful

Re:Another interesting point. by Nihilanth · 2001-07-31 02:54 · Score: 1

This has already been adressed below, there are "perjury laws" that deal with these contingencies.

The real danger, then, is from these technologies being utilized in a non-controlled environment, like forging a "phone call from the boss", or an emergency call over a radio frequency (using the dispatcher's voice), the social engineering applications are limitless, thus precepitating a need for more reliable identity verification for voice communication.

Wouldn't it be neat if ALL of the phone systems everywhere worked on voice-over-IP?

Re:Entropy-licious by Nihilanth · 2001-07-31 03:29 · Score: 1

Although "this thing" may not be THAT good, the technology is progressing, and is limited only by our imaginations. Maybe it's not that good -yet-, but it will be -someday-, and this is just a step in that direction. As computing power doubles and redoubles, we come closer to being able to inexpensively model a unique set of vocal chords and lungs, which would allow someone to accurately reproduce a voiceprint. Im not sure how it's calculated either, but from what i've seen, it just looks like a spectral analysis of the sound. You could conceivably alter one voice to sound like another in post-processing with something like a more powerful version of Cool Wav Edit by editing the sound from the spectral graph level. Of course, this would take a large amount of expertise.

When on-the-fly voice synth reaches that level, though, the ability to do this will be closer to the hands of consumers, instead of just people with access to supercomputers.

Entropy-licious by Nihilanth · 2001-07-31 00:40 · Score: 5

Well kids, say goodbye to phone taps, voice mail, and important business being conducted over the phone. If this technology really accomplishes what the above says, Voice recordings wouldnt be able to hold up in court because..well..it would be difficult/impossible to proove that they were really recordings of the persons voice.

Of course, i don't think this kind of techonology should be "outlawed" or "restricted", that will only make it easier to be used maliciously, as with any technological advancement.

Another interesting point of interest is with the new Final Fantasy: spririts within movie, actors are beginning to consider copyrighting their likenesses, since they can be reproduced on a computer with frightening quality and clarity. Perhaps this applies to voice reproduction as well.

This sounds like a very beneficial technology, especially for games, where a high-quality voice synth could replace volumes of digitally recorded and compressed audio files..but it opens the door for some really frightening possabilities of fraud, social engineering, and copywrite side-stepping.

Scary stuff by Phantom_User · 2001-07-31 01:11 · Score: 1

Ok, granted, this is probably the coolest thing I've heard of recently..the possible advances in game technology particularly. But it frightens me too. This software will encompass within its bounds massive power to create the illusion of the spoken word. That could be used for all kinds of neat things..James Earl Jones E-mail, computers with emotional voices... As the old saying goes "With great power comes great responsiblity" and quite frankly, I don't feel people are up to the challenge of wielding such power. Humankind has evolved faster techwise then they have socially, making them very dangerous. Its like giving 5-year olds ballistic missiles equipped with thermonuclear warheads. My only hope is that we distroy ourselves before leaving our own solar system, sparing the rest of the civilized galaxy form going down with us.....

--
This message brought to you by the forces of the Necronomicon and The Lovecraft Company. Have a

LET THE PRANK CALLS BEGIN!! by Skallywag · 2001-07-31 02:41 · Score: 1

I'll be the first to have Cpt Kirk call some friends offering sexual favours!

No 40 demo limit by dkresge · 2001-07-31 03:58 · Score: 1

Their "customer care" line is also powered by TTS -- good example implementation of the product, but at $5,000 it won't be running in my house any time soon... (and sorry, there doesn't appear to be an option 7)

1-877-741-4321 (from the buy page)

Slashdot Mirror

Text to Speech Software Copies Any Human Voice

299 comments