This is kind of how the formant/parametric TTS engines work; they use a simulation of the vocal tract to generate sounds on the fly. This does have some advantages, but for whatever reason (I don't know enough to say), the concatenative TTS engines sound much more "natural" right now. And there are many that are commercially available: Rhetorical Systems, Nuance, SpeechWorks (which was developed from AT&T IIRC). If you are interested in an open source TTS engine, check out Festival from the University of Edinburgh, or Flite (Festival-lite) from Carnegie-Mellon. Flite is a Festival relative that is optimized for PDAs and multi-channel servers.
Someday I think their might be a resurgence in the parameteric TTS engines, but I guess the techniques need to advance. I think of them as being more "pure".
Great post and discussion on one of my favorite subjects. I didn't find the demos here to be an amazing leap over AT&T Natural Voices or Rhetorical Systems, which I think are similar technology. I was a little surprised to find that this is "new" in Scientific American (maybe just 'cause it's not new to me), but I would take absolutely nothing away from the folks at IBM and elsewhere who are working on it.
I worked quite a bit on some telecom projects using RealSpeak from (now defunct) L&H which also uses a triphone concatenation technique IIRC. It doesn't sound as good as this stuff but it was a useful, shipping product.
Yes I think "phoneme" gets a bit fuzzy when you put it under the microscope, but so do other handy abstractions like "word" and "adjective".
ASR and TTS techniques in use now are pretty sophisticated and relatively successful considering they only try to simulate the bottom of the stack of our (human) language machine, i.e., to simplify, since the TTS doesn't know what the words or sentence "mean", how can it know how to get the right intonation, emphasis, etc.? Ditto for ASR; the state of the art is to build a grammar or language model of some kind by hand for each step in a dialog. Effectively the app developer must tell the recognizer exactly what words to listen for, and in what order, (and with what probability/preference).
So the clever stuff in current TTS engines isn't just how to glue the phonemes together, but how to generate the right intonation/prosody, emphasis, choice of pronunciations (the verb "read" in the past tense is pronounced differently from "read" in the present tense). These things can vary from one speaker or region to the next, just like the accents, so it's hard to find the "rules". This is something that is maddening about computational linguistics... seems like for every "rule" there is a phonebook full of fine print.
You do know the mining biz, but it's the terrestrial mining biz. Think out of the box. How about mining for O2? That would be pointless on earth but very handy on the moon.
I don't know if I'd agree that it's OK to completely wreck the moon. But as far as we know there are no spotted owls up there to extinct. What we wouldn't want to do is disturb her orbit or her structural integrity enough to threaten Earth. This scenario was briefly treated in the recent (Guy Pearce) Time Machine movie.
I worked on some moon mining concepts in my University days. One of the reasons to do it is to support colonies or base operations on the moon or Mars without having to truck staples like Oxygen up from Earth, which is incredibly expensive. From an eco standpoint and a cost standpoint I think you can argue it's FAR nicer to get O2 from lunar soil than to build and burn the very large rocket it would take to deliver even a small amount from home.
That is, until Fedex gets in the space business and figures out how to push the cost down;-)
Someday I think their might be a resurgence in the parameteric TTS engines, but I guess the techniques need to advance. I think of them as being more "pure".
I worked quite a bit on some telecom projects using RealSpeak from (now defunct) L&H which also uses a triphone concatenation technique IIRC. It doesn't sound as good as this stuff but it was a useful, shipping product.
Yes I think "phoneme" gets a bit fuzzy when you put it under the microscope, but so do other handy abstractions like "word" and "adjective".
ASR and TTS techniques in use now are pretty sophisticated and relatively successful considering they only try to simulate the bottom of the stack of our (human) language machine, i.e., to simplify, since the TTS doesn't know what the words or sentence "mean", how can it know how to get the right intonation, emphasis, etc.? Ditto for ASR; the state of the art is to build a grammar or language model of some kind by hand for each step in a dialog. Effectively the app developer must tell the recognizer exactly what words to listen for, and in what order, (and with what probability/preference).
So the clever stuff in current TTS engines isn't just how to glue the phonemes together, but how to generate the right intonation/prosody, emphasis, choice of pronunciations (the verb "read" in the past tense is pronounced differently from "read" in the present tense). These things can vary from one speaker or region to the next, just like the accents, so it's hard to find the "rules". This is something that is maddening about computational linguistics... seems like for every "rule" there is a phonebook full of fine print.
's fun!
You do know the mining biz, but it's the terrestrial mining biz. Think out of the box. How about mining for O2? That would be pointless on earth but very handy on the moon.
I don't know if I'd agree that it's OK to completely wreck the moon. But as far as we know there are no spotted owls up there to extinct. What we wouldn't want to do is disturb her orbit or her structural integrity enough to threaten Earth. This scenario was briefly treated in the recent (Guy Pearce) Time Machine movie.
I worked on some moon mining concepts in my University days. One of the reasons to do it is to support colonies or base operations on the moon or Mars without having to truck staples like Oxygen up from Earth, which is incredibly expensive. From an eco standpoint and a cost standpoint I think you can argue it's FAR nicer to get O2 from lunar soil than to build and burn the very large rocket it would take to deliver even a small amount from home. That is, until Fedex gets in the space business and figures out how to push the cost down ;-)