Text-to-Speech on a Low-Power Chip
bluephone writes: "The EE Times has a story on a new chip from Winbond that can take ASCII or UNICODE text and convert it to either spoken English or Mandarin (the Chinese language, not the orange). The low-power chip scans the text and translates it into spoken phenomes and outputs it to a filter for smooth analog sound, or can directly output the digital signal. Imagine a cell phone with this, you can have your email read to you, rather than seeing a line at a time on a dinky screen, street directions from a website, or even Slashdot's headlines. :)"
I have a little bot written in perl and VXML that reads my email. It is far esier than making the phone do the processing, and ita free. see studio.tellme.com
man, if they put simplt text (apples text/scripting/voice filter program) on one of those things, no one will gt there work done!!!
I am the Alpha and the Omega-3
Cell phones, PDA's, perhaps new tools for people with vision disabilities, where it could pick up plain text via IR near busy intersections or information kiosks. Text is small, broadband wouldnt be required, since its all converted in real time on a chip. Since it is supposed to be low-powered, it would be great for devices that didnt need to be recharged often, like the pagers mentioned in the article.
I wonder how lifelike the voice is though. I don't think any text-speech tools are going to become very mainstream untill they sound better.
as the writeup said, this could be used in a cellphone to read what you were looking at, but wouldn't it be simpler, and backwards compatible, to just do text to speach synthisis on a remote computer. every cell phone out there can already just transmit the sound from a remote location, and it wouldn't require any new/expensive chips.
would be something we can all be impressed with.
That's all I need, Stephen Hawking's voice coming at me from my cell phone:
Anonymous cowards love the rich meaty taste of spam.
could i set it to a deep erotic female sounding voice and have it read dirty stories to me?
no
Can you imagine millions of geeks' cell phones with this chip? "First Post" echoing throughout the world...
would be reading /. headlines. I mean, text-to-speech is great, but can it spell-check at the same time?
The cellphone may have all the power of an original Palm Pilot these days, but we don't need to make it into a Onyx Server.
My Amiga was talking to me 15 years ago.
Actually, my Timex Sinclair 1000 was talking to me 20 years ago, but I think that was the acid...
Waltz, nymph, for quick jigs vex Bud.
The lead story read: "Unionized environmental health workers object to new chip that can read un-ionized lead levels."
Reading english is a lot tougher than most English speaking people think.
-- MarkusQ
Any technology that can translate text to words is a good thing so the Blind people can have less of a hard time with technology which is mostly sight driven. But of course with my Really bad spelling it could drive people nuts. (Yea Yea Lern to spell and that will fix the problem) But I always want the feature to disable it no matter how low processing power it uses. Speaking is generally slower then reading. Plus there is some times were your concentration dosent need a computer speaking to you.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Does it sound like Kavita Maharaj?
Because I swear, sexy though it is, her voice is synthesized.
--Blair
Actually, on a more serious note, is there anyone working on an open source speech synthesis project?
Waltz, nymph, for quick jigs vex Bud.
But is it smart enough to pronounce the boldfaced word above as "phonemes"?
Never take moderation advice from sigs, including this one.
Didn't I see this first in "Wargames?"
Incidentally, a guy I work with has a father who designs for Chrysler. He said that the big D-C was "really interested" in applications of text to speech. Think about it: ebooks that read themselves to you while you drive, driving directions and traffic info read to you rather than displayed on a screen (most nav screens require you to take your eyes entirely off the road and down the dash as much as 18 inches...eep!). You've got a much more useful interface, and with a low cost(though they'll charge you a grand, i'm sure) , easy to interface chip, they'll have no excuse not to bring this much safer system for data interaction to my dash today, and not six years from now.
Hey freaks: now you're ju
No, those chips (it was a pair) were power-hungry 5 volt parts made by General Instrument. One was a microcontroller (8051?) with the text-to-phonome algorithm, and the other was the phonome-to-audio processor (GI SP0256). Actually, the SP0256 could accept external roms for specialized words, so it could have spoken in any language you wanted.
Check out quadravox for boards that emulate the SP0256, using ISD's analog flash memory and a microcontroller.
(My misadventure with the old GI chip: -12 instead of +5, just for a split second. After that, it developed an stutter!)
HIV Crosses Species Barrier... into Muppets
Phone: "Are you looking for hot [chicks|sex|pussy|love]?"
Wife: "um... what was that, honey?"
Phone: "Get your University diploma!"
Wife: "What, I'm not good enough the way I am?"
Phone: "Get out of debt now!"
Wife: "Okay, you know what? That's your birthday present on the Credit card, bucko. That's it. I'm leaving..."
I am the very model of a modern major general!
sb = Sound Blaster
ai = "artificial intelligence"
tso = ???
text-to-speech output?
Yeah, but you know the power consumption would be a hell of a lot higher on the chip that everybody would really want anyways: the Text-to-Barry-White-Speech chip.
"You've got mail, baby."
All you had to do was ask. Sound Blaster Acting Intelligent Text-to-Speech Operator.
For more information, click here.
"We are looking at devices that don't necessarily have a really powerful processor on board," said Hezi Saar, product marketing manager at Winbond. "Usually most of the accessories for handheld devices don't have the power to run text-to-speech algorithms and they don't have the huge memory capacity to support this feature."
OK, so just imagine that in the near future anything and everything will have one of these small, low cost chips. Now, imagine the possibilities! Everyone I'm sure has their own ideas on how cool this could be, so go ahead and reply with yours!
~ now you know
Well, I jumped ship to this little company I work for now called Talk2 Technology (free plug, I guess). We've taken a different tack in voice-enabling applications. I think there are different target markets -- the Talk2 stuff uses servers on the back-end, which go out and fetch your email to read it to you. Putting this on-chip in the cell phone itself is a great step in the right direction.
:)
Fundamentally it's a different approach than today's "voice portal" technology. Voice Portals retrieve data for you, and read it over standard cell or PSTN network. There are many benefits to this approach, principal among them being improved processing power for additional functionality such as voice-processing (speech to text, or compressing speech for reply email voice attachment). By putting the power into the phone, instead of at an expensive central office, this chip could either be a great advancement for text-to-speech technology, or a "killer app" that puts my company out of business
Regardless, I'm excited to see this happening. I've long envisioned a PDA with the only interface being spoken, rather than requiring any video component. This would bring the power consumption and delicacy of these devices down within reason for extended usage. The downside is that speech is necessarily a rather slow interface to a machine; it will be interesting to see how we adapt speech for greater speed with speech-based devices, and how English as a whole will fare.
Now that I've used voice-enabled email, it would be really hard to go back to the "old" way. I still do an enormous amount of correspondence every day by typing, but when I'm on the road I don't need to bother with a laptop since I can have my email read to me over the phone *and reply* with a voice message via email. Until you've used it, it's tough to realize how convenient it is.
I want one of these for my Agenda VR3! Or something...
Matthew P. Barnson
I learn what I think when I read what I write
Actually, on a more serious note, is there anyone working on an open source speech synthesis project?
Yup; it's called Festival.
Yep, it's called Festival, and the results are pretty decent. Became free as in speech a couple minor versions back, too.
If god had intended you to be naked, you would have been born that way.
I remember my first computer - a ti99/4a - had a box I plugged into the side that generated speech. It didn't sound all that good, but you could recognize it well enough. If I remeber correctly, it cost about $100.
That was... 21 years ago. Its sad that this aspect of human computer interaction has been overlooked for so long. Its nice to finally see some development.
http://www.masturbateforpeace.com/
Uh.. who seriously would have their private e-mail read out loud in a bus?
"As your accountant I need to inform you that..",
"Here is your divorce settlement proposal..",
"This is your doctor. Test results came in. You have..",
etc..
In comparison some x-rated junk mail might actually make some poor fellows day..
If the you feel that you have to state 'not the orange' when using the word Mandarin in a language context, perhaps you should also state 'not the peoples of the England' when using the word English in the same context.
Another female voice calling my cell phone and telling my i'm offtopic...
I haven't seen anyone post a link to Winbond's own web page on the WTS701 Text-to-Speech Processor so, here it is straight from the mouth:
Winbond
The most important thing about the Internet is "bandwidth". I'm not talking bits on the wire, I'm talking how fast information flows into my brain. Speech is vastly slower than text as a medium for transfering information into my brain. I'm so accustomed to Internet speeds for information, I can no longer watch TV news -- the bandwidth is too slow. I'm glad I don't go to school anymore -- I could barely stand lectures when I was a kid, I would never be able to sit through them as an adult.
Five years ago everyone in Japan walked around with their phone to their ears. These days, everyone in Japan walks around looking at their phone (instant messaging, etc.). I'm not sure if people "get" the bandwidth problem. Sound must be multiplexed into half-bandwidth, serialized communication. By this I mean you can only input or output at the same time, but not both. Also, incoming messages must arrive separately, not in parallel. With audio, I can only talk to one person at a time, with messaging, I can carry on multiple text-based conversations simultaneously. I mean, text-to-voice has long been availabe on PCs, but nobody uses it for ICQ/AIM/YahooIM/MSIM.
As far as I can tell, audio is dead. Maybe somebody will invent some sort of hyperfast language (didn't Heinlein describe something like that in a book?), but I think the next wave is going to be something new that replaces reading text, not something that goes backwards to audio.
Great achievement, my Commodore C64 could do that so many years ago that I don't even remember when it was. SAM, the speech synthesizer which could even "sing".
;-)
Has anything new happened lately?
I've got a co-worker, our Oracle admin, who's blind. As things stand, with most cell phones he can't do anything except dial out and answer calls. He can't use the built-in address book to place calls for example, because all of the info is in text on a tiny screen. With text-to-speech software on the phone, he'd be able to use the address book just like sighted folks, read text messages he received earlier even when he's in an area with no coverage just like sighted folks, and so on. This is a good idea.
...and everything gets slower. I read between 2-20 times faster than I can comprehend spoken language, depending on the junkfiltering that's possible.
.2 cps. I know this is a difficult concept to grasp for certain cell phone companies, but a phone, as opposed to a computer, does not have these things, and thus it _sucks_ for email and browsing, and will continue to do so until it has those things, at which point in time you will not want to carry it around because it aint gonna fit in your pocket anymore.). Nor do I want to listen to my email. I dont have the time or the patience for it.
No way in hell do I want to read email on a cell phone (it's a PHONE. You _talk_ to people in it. If it was a generic mail reader it would have at least a 17 inch monitor and a keyboard that lets you type faster than
At least until the phone can give me an (intelligent) summary when I say 'Get to the point'.
MacIntalk is older than that, and quite franky, it rocks. Man or Astroman (one of the greatest bands ever -- especially live) use it as their lead singer. Fred really can sing.
In other news, "Man or Astroman wants all the party people.. to say.... yeeeaaaaahhhhhhhhhhh"
And by the way, the voice on "Fitter, Happier" (Radiohead) was actually Thom during an especially intense episode of innebriation >:P
--
#nohup cat
Texas Instruments used to have some of the best Speech Synthsis chips out there...I remember the TI/99 computer had a speech module, and one skiing game got both male and female realistic sounding voices out of the speech module. If they could do it in the early 1980's why can't they do it now?
ttyl
Farrell
CAN-CON 2019 - Ottawa's only book oriented Science Fiction Convention! October 18-20, Sheraton Hotel, Ottawa, Canada h
Now all we need are really good speach to text converters....
Only 'flamers' flame!
Wow! Your Oracle admin is blind? *im baffled*
I have never worked with blind people, but after reading an article last year about how websites are getting more and more difficult for braille browsers (flash, imagelinks without alt tags etc.), I decided to make a lynx-friendly version of my site - and so should YOU!
Anyways, how does he do it?? Is it worth it to the company you work for, or does it cause everyone else problems? Is he good? Tell! Hopefuly this could encourage others to take on "disabled" in their company....
-Kraft
Live and let live
Unlike everybody who posted "big deal, my Commodore 64 used to hold long, sexy conversations with my Speak & Spell about the meaning of Wargames," I actually read the article. Near the end it says "The multilevel storage memory system allows the chip to store up to 256 different voltage levels, or the equivalent of 8 bits, into one EEPROM cell, which is up to 8x the capacity of conventional memories..."
Being a software geek with my last classes in EE/CE several years safely my sordid past, I'm out of touch. Is this a big deal?
A man without a God is like a fish without a bicycle.
You have to remember that economics is what drives these things. If there are yuppies or geeks out there who want to have "every feature but the kitchen sink" in their cellphone, PDA, or whatever, there will be a company out there that will be happy to take their money to implement these technologies.
> I don't know what model Amiga you had, but if you define decent as "sounding like a robot that
> forgot what intonation was but could alter its voice half an octave to simulate slight masculine
> or slight feminine undertones" then I'll agree with you.
Well, that was if you fed text to the translation device, which did its best to generate the required phonetic output--also in ASCII--that was fed to the speak device. This translation could be pretty rough, and could be much improved upon if you generated your own raw phonetic output. You could smooth out, lengthen, shorten, or intonate individual phonemes that way, making the output sound much better. Basically, the translation device needed a good rewrite.
My Macintosh SE (8MHz 68000, circa 1988) can convert text to speech no problem. It's not necessarily smooth and natural, but it can't be *that* much of a jump...
SIGFEH
It's easier on bandwidth to just send a few hundred bytes of text than streaming audio.
Does anyone remember the name of this program? I think it was something like "Simon Says".
I'm a leaf on the wind. Watch how I soar.
He's got a variety of tools at his disposal. Just the other day, he gave a demo of some of them to a bunch of us.
He's got an 8-dot braile terminal that gives him enough characters to do C and Perl programming. He's got a hardware speech synthesizer he cranks up to something like 200+ words per minute. I tried, and could only understand a few phrases when it was cranked up to 95 words per minute.
And when a web site he needs or wants to access is inaccessible, he complains to them, and sometimes things get fixed. He can navigate web sites that use alt tags remarkably well. A good rule of thumb is that if a site makes sense with images turned off (or in lynx), then it'll work for him.
The General Instruments SPO256 chipset?
The '256 took coded phonemes an outputted audio,
while the other chip in the set (don't remember the name) took ASCII serial data and
converted it to phoneme codes the '256 could understand.
This set has been around for prolly close to 20 years now. (I remember finding a variant of it in
the Intellivison voice module ["Bee Sevunteen Bahlllllmer"!] that I believe was circa 1984.)
The '256 has been discontinued for a long time now, and I'm kinda excited to see
something similar to it show up, it was a cool gadget.
C-X C-S
I hope he practices safe cell phone use and doesn't call out while he's driving.../humour
"History doesn't repeat itself, but it does rhyme." Mark Twain
My Clark-Nova was talking to me in the '60s, but that was its job.
--Blair
Are there any websites where you can get a review by a blind person? or anything similar?
We can talk about web standards until we are blue in the face, but when we stop certain people from being about to use the web, that's more than a failure of standard.
I like to convert text to mp3s for long journeys so I can listen to Dickens on my Rio. Of course, that takes a lot of disk space. I'd much prefer a little handheld device that simply converts the .txt file which is much smaller, to speech.
I'd pay for it, and I bet a bunch of other people would too.
OoO
Please do not publish outside of
Such a device will be very handy for people that have visual impairments. Instead of the current bulky and expensive kits, this will be an improvement, especially for VI users out-and-about.
What can you do? Make your web pages accessible for a start.
Specialized chips for TTS applications has
been around for a while... The problem with
their acception is that they have poor voice
quality. Actually, ther are tho quite different
technologies available to produce text nowadays:
1. Diphone synthesis and its variations. The idea
is to have one sample of each sound compination
(diphone) in a speechase and produce the actual
speech by manipulating those sounds. This is what
give computer-syntethized, somewhat metallic speech
that most people have already heard somewhere and
this is what actually used in low-powered devices,
handhelds and speaking dictionaries.
2. Corpus-based synthesis. The idea is to store
a few hour of the speech of a highly trained
speaker in the speechbase and select fragments
of this speech that suit best for the genaration.
The second approach gives astonishing results with
the quality of the speech being sometimes
undistinguishable for the human. However, the size
of the speechbase is an issue. You can not fit a
300Mb speechbase onto a handheld hevice yet
and hardware optimizations dont help much when
it conserns fetching data from the speechbase
and performing text-to-phonemes conversion.
Several companies have corpus-based synthesis
demos on-line. Check out SpeechWorks' and
Lernout & Hauspie's sites
Well, not necessarily. The cell and PSTN networks are designed around carrying audio and that is still what they do best. Today, it's a toss-up as to whether it's better to approach text-to-speech from the back-end (where you can have more flexibility) servers, or by embedding pieces into phones which gives you a whole new set of problems and potentially great solutions.
The problem is, the idea of using this tech in phones is fighting against hundreds and hundreds of millions of deployed telephones without any tech newer than perhaps a microchip for caller ID. Over the long-term, text-to-speech embedded in the device is the more efficient and user-controllable format. Over the short haul, though, we're going to see many years still of central-office-controlled voice apps on your phone.
Niche applications, like on a Pocket PC, now there something like this would absolutely rock. Get a toehold, and eventually low-power text-to-speech and speech-to-text devices will be all the rage.
Now if only someone would perfect a speech-to-text engine that didn't require hours of training to recognize my accent...
Matthew P. Barnson
I learn what I think when I read what I write
G. Nolst Trenité
Dearest creature in creation,
Study English pronunciation.
Spot on! Not only would it have to disambiguate homonyms by semantic context, it would even need to use poetic context. Great poem!
-- MarkusQ
thanks Kynn