Slashdot Mirror


Text-to-Speech on a Low-Power Chip

bluephone writes: "The EE Times has a story on a new chip from Winbond that can take ASCII or UNICODE text and convert it to either spoken English or Mandarin (the Chinese language, not the orange). The low-power chip scans the text and translates it into spoken phenomes and outputs it to a filter for smooth analog sound, or can directly output the digital signal. Imagine a cell phone with this, you can have your email read to you, rather than seeing a line at a time on a dinky screen, street directions from a website, or even Slashdot's headlines. :)"

10 of 263 comments (clear)

  1. wouldn't it be easier.... by CodePoet82 · · Score: 5, Informative

    as the writeup said, this could be used in a cellphone to read what you were looking at, but wouldn't it be simpler, and backwards compatible, to just do text to speach synthisis on a remote computer. every cell phone out there can already just transmit the sound from a remote location, and it wouldn't require any new/expensive chips.

  2. Re:Nothing new by morcheeba · · Score: 5, Informative

    No, those chips (it was a pair) were power-hungry 5 volt parts made by General Instrument. One was a microcontroller (8051?) with the text-to-phonome algorithm, and the other was the phonome-to-audio processor (GI SP0256). Actually, the SP0256 could accept external roms for specialized words, so it could have spoken in any language you wanted.

    Check out quadravox for boards that emulate the SP0256, using ISD's analog flash memory and a microcontroller.

    (My misadventure with the old GI chip: -12 instead of +5, just for a split second. After that, it developed an stutter!)

  3. Re:Dr. Sbaitso by generic-man · · Score: 3, Informative

    All you had to do was ask. Sound Blaster Acting Intelligent Text-to-Speech Operator.

    --
    For more information, click here.
  4. Re:Mac OS has that by cduffy · · Score: 3, Informative

    Actually, on a more serious note, is there anyone working on an open source speech synthesis project?

    Yup; it's called Festival.

  5. Re:Mac OS has that by Chakat · · Score: 2, Informative

    Yep, it's called Festival, and the results are pretty decent. Became free as in speech a couple minor versions back, too.

    --

    If god had intended you to be naked, you would have been born that way.

  6. Re:I bet it will choke... by thelexx · · Score: 2, Informative

    Which is why the other language is Chinese. I remember hearing years ago that Chinese is very well suited for voice recognition due to the fact that it is a tonal language with a total set of only a few hundred distinct sounds. Not sure if this is true just for Mandarin or also Cantonese and the others.

    LEXX

    --
    "Gold still represents the ultimate form of payment in the world." - Alan Greenspan, 1999
  7. Winbond's Whitepaper by jdclucidly · · Score: 2, Informative

    I haven't seen anyone post a link to Winbond's own web page on the WTS701 Text-to-Speech Processor so, here it is straight from the mouth:

    Winbond

  8. Re:Why dump more tech than necessary into the phon by Kraft · · Score: 3, Informative

    Wow! Your Oracle admin is blind? *im baffled*

    I have never worked with blind people, but after reading an article last year about how websites are getting more and more difficult for braille browsers (flash, imagelinks without alt tags etc.), I decided to make a lynx-friendly version of my site - and so should YOU!

    Anyways, how does he do it?? Is it worth it to the company you work for, or does it cause everyone else problems? Is he good? Tell! Hopefuly this could encourage others to take on "disabled" in their company....

    --

    -Kraft
    Live and let live
  9. Re:What Happened to TI by CodeShark · · Score: 2, Informative
    Yes, the programmer had to translate the text into phonemes, but you know what? That wasn't the hard part, because with a decent translation dictionary, you can get about 95% of the words right with a simple one pass "phoneme compiler" and a good set of rules. The hard part is that English is a highly inflected language without a constant set of rules for doing the inflections.

    Still, I was part of the team that made the first Apple II (at least in the State I lived in at the time) that could read from the screen back in 1981 -- to an "Echo II Speech Synthesizer" which IIRC came from Radio Shack.

    We took some of our stuff to the linguistics department at the University across town, and of all things, had the darn machine speaking understandable Japanese (from Romaji, or romanized letters) within a few days because the Japanese language is consistent not only in phonetic translation but also in inflection. It still sounded like a machine, but that was a limitation of the sound chip's internal phoneme library in the Echo II. The same program with one of today's chips would have sounded very near normal.

    Goes to show you how much more difficult spoken English is than most of us native speakers tend to realize, because I have yet to see a low cost implementation of a text to speech translator that was all that much better than what we were doing back in '81. (not that I have seen everything out there by the way -- I do have a life outside the PC world....occasionally :-)

    --
    ...Open Source isn't the only answer -- but it's almost always a better value than the alternatives...
  10. The State of the Art... by Sam+Lowry · · Score: 2, Informative

    Specialized chips for TTS applications has
    been around for a while... The problem with
    their acception is that they have poor voice
    quality. Actually, ther are tho quite different
    technologies available to produce text nowadays:
    1. Diphone synthesis and its variations. The idea
    is to have one sample of each sound compination
    (diphone) in a speechase and produce the actual
    speech by manipulating those sounds. This is what
    give computer-syntethized, somewhat metallic speech
    that most people have already heard somewhere and
    this is what actually used in low-powered devices,
    handhelds and speaking dictionaries.

    2. Corpus-based synthesis. The idea is to store
    a few hour of the speech of a highly trained
    speaker in the speechbase and select fragments
    of this speech that suit best for the genaration.
    The second approach gives astonishing results with
    the quality of the speech being sometimes
    undistinguishable for the human. However, the size
    of the speechbase is an issue. You can not fit a
    300Mb speechbase onto a handheld hevice yet
    and hardware optimizations dont help much when
    it conserns fetching data from the speechbase
    and performing text-to-phonemes conversion.

    Several companies have corpus-based synthesis
    demos on-line. Check out SpeechWorks' and
    Lernout & Hauspie's sites