Text-to-Speech on a Low-Power Chip

← Back to Stories (view on slashdot.org)

Text-to-Speech on a Low-Power Chip

Posted by michael on Wednesday November 7, 2001 @08:49AM from the youve-got-voicemail dept.

bluephone writes: "The EE Times has a story on a new chip from Winbond that can take ASCII or UNICODE text and convert it to either spoken English or Mandarin (the Chinese language, not the orange). The low-power chip scans the text and translates it into spoken phenomes and outputs it to a filter for smooth analog sound, or can directly output the digital signal. Imagine a cell phone with this, you can have your email read to you, rather than seeing a line at a time on a dinky screen, street directions from a website, or even Slashdot's headlines. :)"

10 of 263 comments (clear)

Min score:

Reason:

Sort:

wouldn't it be easier.... by CodePoet82 · 2001-11-07 08:54 · Score: 5, Informative

as the writeup said, this could be used in a cellphone to read what you were looking at, but wouldn't it be simpler, and backwards compatible, to just do text to speach synthisis on a remote computer. every cell phone out there can already just transmit the sound from a remote location, and it wouldn't require any new/expensive chips.
Re:Nothing new by morcheeba · 2001-11-07 09:05 · Score: 5, Informative

No, those chips (it was a pair) were power-hungry 5 volt parts made by General Instrument. One was a microcontroller (8051?) with the text-to-phonome algorithm, and the other was the phonome-to-audio processor (GI SP0256). Actually, the SP0256 could accept external roms for specialized words, so it could have spoken in any language you wanted.

Check out quadravox for boards that emulate the SP0256, using ISD's analog flash memory and a microcontroller.

(My misadventure with the old GI chip: -12 instead of +5, just for a split second. After that, it developed an stutter!)

--
HIV Crosses Species Barrier... into Muppets
Re:Dr. Sbaitso by generic-man · 2001-11-07 09:11 · Score: 3, Informative

All you had to do was ask. Sound Blaster Acting Intelligent Text-to-Speech Operator.

--
For more information, click here.
Re:Mac OS has that by cduffy · 2001-11-07 09:23 · Score: 3, Informative

Actually, on a more serious note, is there anyone working on an open source speech synthesis project?

Yup; it's called Festival.
Re:Mac OS has that by Chakat · 2001-11-07 09:23 · Score: 2, Informative

Yep, it's called Festival, and the results are pretty decent. Became free as in speech a couple minor versions back, too.

--
If god had intended you to be naked, you would have been born that way.
Re:I bet it will choke... by thelexx · 2001-11-07 09:32 · Score: 2, Informative

Which is why the other language is Chinese. I remember hearing years ago that Chinese is very well suited for voice recognition due to the fact that it is a tonal language with a total set of only a few hundred distinct sounds. Not sure if this is true just for Mandarin or also Cantonese and the others.

LEXX

--
"Gold still represents the ultimate form of payment in the world." - Alan Greenspan, 1999
Winbond's Whitepaper by jdclucidly · 2001-11-07 09:33 · Score: 2, Informative

I haven't seen anyone post a link to Winbond's own web page on the WTS701 Text-to-Speech Processor so, here it is straight from the mouth:

Winbond
Re:Why dump more tech than necessary into the phon by Kraft · 2001-11-07 09:57 · Score: 3, Informative

Wow! Your Oracle admin is blind? *im baffled*

I have never worked with blind people, but after reading an article last year about how websites are getting more and more difficult for braille browsers (flash, imagelinks without alt tags etc.), I decided to make a lynx-friendly version of my site - and so should YOU!

Anyways, how does he do it?? Is it worth it to the company you work for, or does it cause everyone else problems? Is he good? Tell! Hopefuly this could encourage others to take on "disabled" in their company....

--

-Kraft
Live and let live
Re:What Happened to TI by CodeShark · 2001-11-07 10:59 · Score: 2, Informative

Yes, the programmer had to translate the text into phonemes, but you know what? That wasn't the hard part, because with a decent translation dictionary, you can get about 95% of the words right with a simple one pass "phoneme compiler" and a good set of rules. The hard part is that English is a highly inflected language without a constant set of rules for doing the inflections.
Still, I was part of the team that made the first Apple II (at least in the State I lived in at the time) that could read from the screen back in 1981 -- to an "Echo II Speech Synthesizer" which IIRC came from Radio Shack.
We took some of our stuff to the linguistics department at the University across town, and of all things, had the darn machine speaking understandable Japanese (from Romaji, or romanized letters) within a few days because the Japanese language is consistent not only in phonetic translation but also in inflection. It still sounded like a machine, but that was a limitation of the sound chip's internal phoneme library in the Echo II. The same program with one of today's chips would have sounded very near normal.
Goes to show you how much more difficult spoken English is than most of us native speakers tend to realize, because I have yet to see a low cost implementation of a text to speech translator that was all that much better than what we were doing back in '81. (not that I have seen everything out there by the way -- I do have a life outside the PC world....occasionally :-)

--
...Open Source isn't the only answer -- but it's almost always a better value than the alternatives...
The State of the Art... by Sam+Lowry · 2001-11-07 23:44 · Score: 2, Informative

Specialized chips for TTS applications has
been around for a while... The problem with
their acception is that they have poor voice
quality. Actually, ther are tho quite different
technologies available to produce text nowadays:
1. Diphone synthesis and its variations. The idea
is to have one sample of each sound compination
(diphone) in a speechase and produce the actual
speech by manipulating those sounds. This is what
give computer-syntethized, somewhat metallic speech
that most people have already heard somewhere and
this is what actually used in low-powered devices,
handhelds and speaking dictionaries.

2. Corpus-based synthesis. The idea is to store
a few hour of the speech of a highly trained
speaker in the speechbase and select fragments
of this speech that suit best for the genaration.
The second approach gives astonishing results with
the quality of the speech being sometimes
undistinguishable for the human. However, the size
of the speechbase is an issue. You can not fit a
300Mb speechbase onto a handheld hevice yet
and hardware optimizations dont help much when
it conserns fetching data from the speechbase
and performing text-to-phonemes conversion.

Several companies have corpus-based synthesis
demos on-line. Check out SpeechWorks' and
Lernout & Hauspie's sites