Slashdot Mirror


Phoneme Approach For Text-to-Speech in SCIAM

jscribner writes "Scientific American is running a feature on IBM Research's Text-to-Speech technology. It discusses the current state of affairs in this field, and describes IBM's phoneme based 'Supervoices' approach. The IBM site provides a demonstration, allowing users to enter text to be rendered to speech, as well as providing several examples in other languages."

10 of 189 comments (clear)

  1. Phonemes not phenomes by Tucan · · Score: 4, Informative

    Phonemes are the building blocks of language not phenomes.

  2. I was expecting better... by LeoDV · · Score: 5, Informative

    If memory serves me, I believe it was AT&T (?) that used to have a similar webpage with near-perfect text-to-speech, which is hardly the case of this project.

    What's so special about it?

    1. Re:I was expecting better... by Rubyflame · · Score: 5, Informative

      Used to? Still does! It's called "AT&T Natural Voices," and there's an online demo.

      --

      All it takes is nukes and nerves.
  3. PHONEME, y'all, not *phenome by texchanchan · · Score: 3, Informative

    Phoneme, a unit of sound in a word. From Dictionary.com: "The smallest phonetic unit in a language that is capable of conveying a distinction in meaning, as the m of mat and the b of bat in English. [... from Greek phnma, phnmat-, utterance, sound produced, from phnein, to produce a sound, from phn, sound, voice...]"

    Related to "telephone," "phonics," etc.

  4. AT&T have been doing this for a while! by Anonymous Coward · · Score: 5, Informative

    If you visit here:
    http://www.naturalvoices.att.com/demos/

    You'll find AT&T's version a whole lot better. The main problem with voice synthesis is smoothing of phoneme edges, where if it is done too aggressively the speech synthesis can sound too "lumpy".

    The other thing is, speech synthesis via phoneme's is very basic practise indeed! I remember having a Currah Speech module for my ZX Spectrum (1982 home computer) - and the first thing you were taught about was phenomes. I'm not entirely sure whats new about this IBM product. It's basically not that much evolved from the mid-90's.

  5. Open Source Speech Synthesis by wzrd2002 · · Score: 5, Informative

    There is already freely available open source speech synthesis application for both linux and windows, called Festival created by The University of Edinburgh

    1. Re:Open Source Speech Synthesis by WWWWolf · · Score: 3, Informative

      Festival is great, especially with the OGI patches. I was completely blown away by Festival's quality compared to other opensource TTS engines, and OGI stuff makes stock Festival sound pathetic. Really great stuff, regrettably still not as good as IBM's or AT&T's stuff, but they have got a TTS that I can listen to hours without making my ears bleed.

      Regrettably OGI patches are for personal/research use only, so Debian won't ship them...

  6. comparison to Apple's technology? by inblosam · · Score: 4, Informative

    I run Mac OS X and in a lot of applications you have the option for the computer to read an entire document. For example, in TextEdit (a simple text editor by Apple) you can go to Edit, Speech, Start Speaking...in the menu and it will read everything for you. There are 10-15 different default voices to choose from, and built into the OS you can control pretty much everything by speech and get information by voice.

    How does this compare? I think it is at least at the same level, if not further along! Good work Apple for being in the game, if not ahead of the game on this one.

  7. And don't forget Bell Labs by rpiquepa · · Score: 4, Informative

    IBM is not alone to work on text-to-speech technology and to have demos where you can type a phrase and listen to it. The Bell Labs Text-to-Speech system (TTS) has its own page featuring fun demos. "You can play with our basic interface for some of our Text-to-Speech systems: American English, German, Mandarin Chinese, Spanish, French, Italian and Canadian French." This page is pretty old (it makes references to Netscape 3!!), but the demos still run fine.

  8. State of the art in TTS by Sam+Lowry · · Score: 4, Informative
    There are basicaly two TTS technologies on the market:
    • dyphone-based synthesis where the database contains one dyphone (end of first sound + start of next sound) for each psossible sound combination. This approach is used in Festival. Dyphone-based synthesis will hardly sound better that in Festival because dyphones have to be modified artificially to fit every variation of pitch, duration and any other parameter that is needed to produce a given phrase.
    • corpus-based synthesis takes a different approach where a large database of several hours of speech is recorded and manually labelled to mark the start and end of each sound. Such a database is used to extract the best and the longest sequence of dyphones during the production. This approach gives naturally sounding results for short sentences where intonation is not so important Given that the cost of developing a database for corpus synthesis may be orders of magnitude higher than for dyphone synthesis, there are very few companies that make them. Two companies offer a demo on the internet: ATT and Scansoft (former L&H) and