Text-to-Speech on a Low-Power Chip
bluephone writes: "The EE Times has a story on a new chip from Winbond that can take ASCII or UNICODE text and convert it to either spoken English or Mandarin (the Chinese language, not the orange). The low-power chip scans the text and translates it into spoken phenomes and outputs it to a filter for smooth analog sound, or can directly output the digital signal. Imagine a cell phone with this, you can have your email read to you, rather than seeing a line at a time on a dinky screen, street directions from a website, or even Slashdot's headlines. :)"
I have a little bot written in perl and VXML that reads my email. It is far esier than making the phone do the processing, and ita free. see studio.tellme.com
Cell phones, PDA's, perhaps new tools for people with vision disabilities, where it could pick up plain text via IR near busy intersections or information kiosks. Text is small, broadband wouldnt be required, since its all converted in real time on a chip. Since it is supposed to be low-powered, it would be great for devices that didnt need to be recharged often, like the pagers mentioned in the article.
I wonder how lifelike the voice is though. I don't think any text-speech tools are going to become very mainstream untill they sound better.
as the writeup said, this could be used in a cellphone to read what you were looking at, but wouldn't it be simpler, and backwards compatible, to just do text to speach synthisis on a remote computer. every cell phone out there can already just transmit the sound from a remote location, and it wouldn't require any new/expensive chips.
The cellphone may have all the power of an original Palm Pilot these days, but we don't need to make it into a Onyx Server.
My Amiga was talking to me 15 years ago.
Actually, my Timex Sinclair 1000 was talking to me 20 years ago, but I think that was the acid...
Waltz, nymph, for quick jigs vex Bud.
The lead story read: "Unionized environmental health workers object to new chip that can read un-ionized lead levels."
Reading english is a lot tougher than most English speaking people think.
-- MarkusQ
But is it smart enough to pronounce the boldfaced word above as "phonemes"?
Never take moderation advice from sigs, including this one.
No, those chips (it was a pair) were power-hungry 5 volt parts made by General Instrument. One was a microcontroller (8051?) with the text-to-phonome algorithm, and the other was the phonome-to-audio processor (GI SP0256). Actually, the SP0256 could accept external roms for specialized words, so it could have spoken in any language you wanted.
Check out quadravox for boards that emulate the SP0256, using ISD's analog flash memory and a microcontroller.
(My misadventure with the old GI chip: -12 instead of +5, just for a split second. After that, it developed an stutter!)
HIV Crosses Species Barrier... into Muppets
All you had to do was ask. Sound Blaster Acting Intelligent Text-to-Speech Operator.
For more information, click here.
1) My microwave at home displays "ENJOY YOUR MEAL" when it's finished cooking something, I'd sure love it if instead of cheesy LED's I heard a sexy voice saying "come and get it, baby."
2) Text messengers for blind people. You know those little IM devices all the kiddies have? Well just put brail on the keys and have one of these chips installed... there you go.
3) Watches. The next time somebody says "what time is it?" you just press a button and the voice chip in your watch simulating someone who sounds extremely pissed off shouts the time.
Well, that's it for now...
~ now you know
Actually, on a more serious note, is there anyone working on an open source speech synthesis project?
Yup; it's called Festival.
If the you feel that you have to state 'not the orange' when using the word Mandarin in a language context, perhaps you should also state 'not the peoples of the England' when using the word English in the same context.
The most important thing about the Internet is "bandwidth". I'm not talking bits on the wire, I'm talking how fast information flows into my brain. Speech is vastly slower than text as a medium for transfering information into my brain. I'm so accustomed to Internet speeds for information, I can no longer watch TV news -- the bandwidth is too slow. I'm glad I don't go to school anymore -- I could barely stand lectures when I was a kid, I would never be able to sit through them as an adult.
Five years ago everyone in Japan walked around with their phone to their ears. These days, everyone in Japan walks around looking at their phone (instant messaging, etc.). I'm not sure if people "get" the bandwidth problem. Sound must be multiplexed into half-bandwidth, serialized communication. By this I mean you can only input or output at the same time, but not both. Also, incoming messages must arrive separately, not in parallel. With audio, I can only talk to one person at a time, with messaging, I can carry on multiple text-based conversations simultaneously. I mean, text-to-voice has long been availabe on PCs, but nobody uses it for ICQ/AIM/YahooIM/MSIM.
As far as I can tell, audio is dead. Maybe somebody will invent some sort of hyperfast language (didn't Heinlein describe something like that in a book?), but I think the next wave is going to be something new that replaces reading text, not something that goes backwards to audio.
Great achievement, my Commodore C64 could do that so many years ago that I don't even remember when it was. SAM, the speech synthesizer which could even "sing".
;-)
Has anything new happened lately?
I've got a co-worker, our Oracle admin, who's blind. As things stand, with most cell phones he can't do anything except dial out and answer calls. He can't use the built-in address book to place calls for example, because all of the info is in text on a tiny screen. With text-to-speech software on the phone, he'd be able to use the address book just like sighted folks, read text messages he received earlier even when he's in an area with no coverage just like sighted folks, and so on. This is a good idea.
...and everything gets slower. I read between 2-20 times faster than I can comprehend spoken language, depending on the junkfiltering that's possible.
.2 cps. I know this is a difficult concept to grasp for certain cell phone companies, but a phone, as opposed to a computer, does not have these things, and thus it _sucks_ for email and browsing, and will continue to do so until it has those things, at which point in time you will not want to carry it around because it aint gonna fit in your pocket anymore.). Nor do I want to listen to my email. I dont have the time or the patience for it.
No way in hell do I want to read email on a cell phone (it's a PHONE. You _talk_ to people in it. If it was a generic mail reader it would have at least a 17 inch monitor and a keyboard that lets you type faster than
At least until the phone can give me an (intelligent) summary when I say 'Get to the point'.
MacIntalk is older than that, and quite franky, it rocks. Man or Astroman (one of the greatest bands ever -- especially live) use it as their lead singer. Fred really can sing.
In other news, "Man or Astroman wants all the party people.. to say.... yeeeaaaaahhhhhhhhhhh"
And by the way, the voice on "Fitter, Happier" (Radiohead) was actually Thom during an especially intense episode of innebriation >:P
--
#nohup cat
Wow! Your Oracle admin is blind? *im baffled*
I have never worked with blind people, but after reading an article last year about how websites are getting more and more difficult for braille browsers (flash, imagelinks without alt tags etc.), I decided to make a lynx-friendly version of my site - and so should YOU!
Anyways, how does he do it?? Is it worth it to the company you work for, or does it cause everyone else problems? Is he good? Tell! Hopefuly this could encourage others to take on "disabled" in their company....
-Kraft
Live and let live
He's got a variety of tools at his disposal. Just the other day, he gave a demo of some of them to a bunch of us.
He's got an 8-dot braile terminal that gives him enough characters to do C and Perl programming. He's got a hardware speech synthesizer he cranks up to something like 200+ words per minute. I tried, and could only understand a few phrases when it was cranked up to 95 words per minute.
And when a web site he needs or wants to access is inaccessible, he complains to them, and sometimes things get fixed. He can navigate web sites that use alt tags remarkably well. A good rule of thumb is that if a site makes sense with images turned off (or in lynx), then it'll work for him.