State of Speech Synthesis and Text-To-Speech?

← Back to Stories (view on slashdot.org)

State of Speech Synthesis and Text-To-Speech?

Posted by Cliff on Thursday November 14, 2002 @12:33PM from the my-computer-still-doesn't-talk-to-me dept.

Gnulix asks: "Are there any, preferably either open source products available that produce realistic speech from an arbitrary (English) text? Projects such as Festival doesn't sound all that much better than SAM (Software Automatic Mouth) did on a Commodore 64 back in 1979, nor does SoftVoice's or IBM's new products sound very good. I mean we all know that Stephen Hawking is a fun loving guy, but I bet you that he didn't choose his unrealistic, robotic voice just for the heck of it. With all the amazing advances we have seen in real-time graphics, shouldn't speech synthesis have come much, much further than what is, seemingly, available today?" Ask Slashdot last handled the Voice-To-Text issue in January of this year.

2 of 52 comments (clear)

Min score:

Reason:

Sort:

Hawking... by 3-State+Bit · 2002-11-14 12:46 · Score: 5, Interesting

Actually, I heard that they offered Hawking a revamped speech synthesizer, since although his was state-of-the-art in the seventies, today we have much better. He declined, saying he and his friends had gotten used to the voice, and it was "his". In fact, whenever on hears that particular flavor of voice synthesis, it's difficult not to think of Hawking.

He does relate, however, in A Brief History of Time, that at first people had trouble understanding "his voice", so that when he would speak or answer questions at lectures, he would have an interpreter who was more familiar with his voice repeat what he just said.

Interesting stuff...
The larger issue is NLP by RobotWisdom · 2002-11-14 12:59 · Score: 5, Interesting

Modulating intonations is part of the larger challenge of natural-language processing (NLP, a subdiscipline of AI). We simply don't have the sort of general theory of language-production that could systematically predict how the intonations should fall, any more than we have a theory of translation that can do substantially better than Babelfish.
Nor, to harp on my pet peeve, do we have a theory of semantics that can put XML to any important use on the average webpage. These all need a model of the human psyche, because all human language is flavored with metaphors from the realm of motives and plans, etc (the psychological realm). Psychological science isn't delivering the sorts of models that NLP-etc need, and probably won't for many decades yet. [My AI FAQ]