Text-to-Speech on a Low-Power Chip
bluephone writes: "The EE Times has a story on a new chip from Winbond that can take ASCII or UNICODE text and convert it to either spoken English or Mandarin (the Chinese language, not the orange). The low-power chip scans the text and translates it into spoken phenomes and outputs it to a filter for smooth analog sound, or can directly output the digital signal. Imagine a cell phone with this, you can have your email read to you, rather than seeing a line at a time on a dinky screen, street directions from a website, or even Slashdot's headlines. :)"
..probably you won't understand it anyway.
I have a little bot written in perl and VXML that reads my email. It is far esier than making the phone do the processing, and ita free. see studio.tellme.com
Anybody remember Dr. Sbaitso? This program was great for being written way back when (1994?).
I believe Radiohead used it as the voice for their track "Fitter Happier" on OK Computer.
MacOS has had that for a while. It works ok. In fact, by default in OS 10.1 it speaks modal error dialogs. It surprised me the first time this happened.
man, if they put simplt text (apples text/scripting/voice filter program) on one of those things, no one will gt there work done!!!
I am the Alpha and the Omega-3
Cell phones, PDA's, perhaps new tools for people with vision disabilities, where it could pick up plain text via IR near busy intersections or information kiosks. Text is small, broadband wouldnt be required, since its all converted in real time on a chip. Since it is supposed to be low-powered, it would be great for devices that didnt need to be recharged often, like the pagers mentioned in the article.
I wonder how lifelike the voice is though. I don't think any text-speech tools are going to become very mainstream untill they sound better.
as the writeup said, this could be used in a cellphone to read what you were looking at, but wouldn't it be simpler, and backwards compatible, to just do text to speach synthisis on a remote computer. every cell phone out there can already just transmit the sound from a remote location, and it wouldn't require any new/expensive chips.
would be something we can all be impressed with.
That's all I need, Stephen Hawking's voice coming at me from my cell phone:
Anonymous cowards love the rich meaty taste of spam.
could i set it to a deep erotic female sounding voice and have it read dirty stories to me?
no
Can you imagine millions of geeks' cell phones with this chip? "First Post" echoing throughout the world...
There's really nothing new about this product, except for its ability to speak Mandarin. And given the state of the Chinese economy, it's not very likely that many citizens over there will be in the market for talking electronic devices anytime soon. Most of them are still trying to get phone service and running water.
-CT
would be reading /. headlines. I mean, text-to-speech is great, but can it spell-check at the same time?
Think of the applications for blow up dolls and pr0n!
I used to bulls-eye womp-rats in my pants
I've tried festival on Linux, and it's output is always really fuzzy and hard to understand. Do any of you know of any alternative programs that are more discernable in their delivery of voice? I would love to have my Linux box talk to me like one of those sexy Imac operators...
A musician without the RIAA, is like a fish without a bicycle.
The cellphone may have all the power of an original Palm Pilot these days, but we don't need to make it into a Onyx Server.
/me gets out his Speak n Spell.
That's Mr. Eradicator to you.
trance-port
Oh imagine them! TO have one of these babies that understood more languages, and could translate them to one of the others. Need a real translator? Nooope, watch my lil PDA Do it for me!
Seriously, that could have tremendous bussiness implications for those who are doing bussiness in other countries.
Their usage of EEPROM is nothing ut ingenious, why hasn't anyone done this before? Or have they? It makes a lot more sense then a flash card, and it's cheaper too.
This device could easily be connected to a personal computer. Somewhere, I have schematics for a version which plugged into a Commodore 64, and code for wedging the "SAY" command into BASIC - look in the old Ahoy Magazine. I saw a working version at Lawrence Tech.
The lead story read: "Unionized environmental health workers object to new chip that can read un-ionized lead levels."
Reading english is a lot tougher than most English speaking people think.
-- MarkusQ
either spoken English or Mandarin (the Chinese language, not the orange).
It's impossible for "Mandarin" to be one or the other. As a modifier, the word is used to describe BOTH a cultural subpopulation in China and a type of orange native to the areas in which that subpopulation resides. So really, it's BOTH the Chinese language AND the orange.
Any technology that can translate text to words is a good thing so the Blind people can have less of a hard time with technology which is mostly sight driven. But of course with my Really bad spelling it could drive people nuts. (Yea Yea Lern to spell and that will fix the problem) But I always want the feature to disable it no matter how low processing power it uses. Speaking is generally slower then reading. Plus there is some times were your concentration dosent need a computer speaking to you.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Does it sound like Kavita Maharaj?
Because I swear, sexy though it is, her voice is synthesized.
--Blair
But is it smart enough to pronounce the boldfaced word above as "phonemes"?
Never take moderation advice from sigs, including this one.
Didn't I see this first in "Wargames?"
Incidentally, a guy I work with has a father who designs for Chrysler. He said that the big D-C was "really interested" in applications of text to speech. Think about it: ebooks that read themselves to you while you drive, driving directions and traffic info read to you rather than displayed on a screen (most nav screens require you to take your eyes entirely off the road and down the dash as much as 18 inches...eep!). You've got a much more useful interface, and with a low cost(though they'll charge you a grand, i'm sure) , easy to interface chip, they'll have no excuse not to bring this much safer system for data interaction to my dash today, and not six years from now.
Hey freaks: now you're ju
I can just see it now. You're sitting on a bus on your way to work, reading your e-mail when all of a sudden you hear: "ENLARGE YOUR PENIS NOW!" "XXX GIRLS INSIDE, CUM JOIN US" "COME INSIDE AND LICK MY..." well you get the idea. I hope you have headphones!
What?
Phone: "Are you looking for hot [chicks|sex|pussy|love]?"
Wife: "um... what was that, honey?"
Phone: "Get your University diploma!"
Wife: "What, I'm not good enough the way I am?"
Phone: "Get out of debt now!"
Wife: "Okay, you know what? That's your birthday present on the Credit card, bucko. That's it. I'm leaving..."
I am the very model of a modern major general!
Right now there are companies exclusively in the business of doing this. There is very little mathematical challenge here. Psychologists and linguists have researched phenomes and related material pretty well.
:)
But there are some implementation issues here. Example, if you have GNU. How do you say it? What about if you have Jekka Pukka Sarasate? If you were to take the literal English pronounciation you might never even be able to understand what it's trying to say. Figuring out how to solve that is an interesting CS problem.
But this is a cool invention. Low power wireless research is just taking off. Before we were trying to figure out how to just transmit wireless well. Now we can have fun with it. I truly look forward to a wireless life
Me..
Yeah, but you know the power consumption would be a hell of a lot higher on the chip that everybody would really want anyways: the Text-to-Barry-White-Speech chip.
"You've got mail, baby."
Somehow I doubt it'd be able to read Slashdot headlines. Can you imagine a chip trying to reproduce Malda's spelling?
Actually the "fitter happier" voice is the Bruce MacInTalk Pro voice that is part of MacOS. You can paste the lyrics into SimpleText on a Mac and get your very own live performance.
The lyrics are here
Will it translate into broken chinenglish? "Me want pork fried rice. Chop chop."
forth ?love if honk then
With all the horsepower available in any modern handheld device -- surely much more than an 8mhz 68000 with 512K of memory (of which only a fraction was used I'm sure) -- I don't understand why a dedicated chip would be needed to pull this off.
Letter To Iran
We all know how annoying it can be when other people to have their cell phones ring in public places... the last thing we need is people listening to a monotonous computer voice in public. Not to mention the fact that its usually much more convenient to read text which allows for skimming and variable speeds.
Sure, the tech is cheap and relatively disposable, but is moving every feature but the kitchen sink into the phone really the way to do it? Why put a radio transmitter and receiver in the headset-- if you do that, then you'll need a battery to power it. Stay with the corded phones, I say! And, if you have more than one person in a city talking on these newfangled radiophones, you'll need a computer to set the radio frequency! My Gremlin's 8 track player/AM radio doesn't need a computer to change channels -- it's got those big preset buttons to move the dial for me. The cell phone may have all the power of an trs-80 these days, but we don't need to make it into an IBM-PCjr.
p.s. and don't even get me started on digital phones... converting analog to digital to analog baseband to RF, and then back again!
Sorry, I was off my 2 years.
Letter To Iran
Why is this a big deal? I was doing text to speech on my Commodore 64 when I was a kid with a program called SAM (which was written in 1979). The C64 had what...1MHz?
"We are looking at devices that don't necessarily have a really powerful processor on board," said Hezi Saar, product marketing manager at Winbond. "Usually most of the accessories for handheld devices don't have the power to run text-to-speech algorithms and they don't have the huge memory capacity to support this feature."
OK, so just imagine that in the near future anything and everything will have one of these small, low cost chips. Now, imagine the possibilities! Everyone I'm sure has their own ideas on how cool this could be, so go ahead and reply with yours!
~ now you know
They used speech parts to make up words in the same way. (Hope this sounds better though)
I remember one of my coworkers had gotten a service where you could email his account, which would forward to a voice mail text -> speech system.
...
It was hilarious sending him obscene and or ridiculous emails and listening to the recorded voice play them back
Well, I jumped ship to this little company I work for now called Talk2 Technology (free plug, I guess). We've taken a different tack in voice-enabling applications. I think there are different target markets -- the Talk2 stuff uses servers on the back-end, which go out and fetch your email to read it to you. Putting this on-chip in the cell phone itself is a great step in the right direction.
:)
Fundamentally it's a different approach than today's "voice portal" technology. Voice Portals retrieve data for you, and read it over standard cell or PSTN network. There are many benefits to this approach, principal among them being improved processing power for additional functionality such as voice-processing (speech to text, or compressing speech for reply email voice attachment). By putting the power into the phone, instead of at an expensive central office, this chip could either be a great advancement for text-to-speech technology, or a "killer app" that puts my company out of business
Regardless, I'm excited to see this happening. I've long envisioned a PDA with the only interface being spoken, rather than requiring any video component. This would bring the power consumption and delicacy of these devices down within reason for extended usage. The downside is that speech is necessarily a rather slow interface to a machine; it will be interesting to see how we adapt speech for greater speed with speech-based devices, and how English as a whole will fare.
Now that I've used voice-enabled email, it would be really hard to go back to the "old" way. I still do an enormous amount of correspondence every day by typing, but when I'm on the road I don't need to bother with a laptop since I can have my email read to me over the phone *and reply* with a voice message via email. Until you've used it, it's tough to realize how convenient it is.
I want one of these for my Agenda VR3! Or something...
Matthew P. Barnson
I learn what I think when I read what I write
M$'s Front Page is one of the worst offenders. It's full of useless font adjustments and other needless code. Worse, it lables images crypticaly and encourages all of the worst practices.
As Bill Gates once said, software is what is lacking in a world full of technology. He aims to keep it that way for those who trust him.
DMCA, Hollings, Palladium. What might have sounded like paranoia is now common sense.
studio.tellme.com
I remember my first computer - a ti99/4a - had a box I plugged into the side that generated speech. It didn't sound all that good, but you could recognize it well enough. If I remeber correctly, it cost about $100.
That was... 21 years ago. Its sad that this aspect of human computer interaction has been overlooked for so long. Its nice to finally see some development.
http://www.masturbateforpeace.com/
The real potential of this thing is its low power and (presumably) price. Sure, a PDA with this might have a certain "cool" factor, but the inovation would be in other, traditionally "dumb" products:
Surely, other people have suggestions to add...
"Prepare for the worst - hope for the best."
Silly - that's a mistake for "pheromones"! I'm not sure what a "spoken pheromone" might be, but it sure sounds sexy.
soa yoau kann uase iit inn yourr own prooagramez! Thats what i heard in summer 1994, when I got a small program on the commodore 64 called speech. it was pretty good except it sounded like a drunk robot. But it DOES show that you don't need much computing power for text2speach if a programmer can do it on a lowly commodore 64 using BASIC! for crying out loud.
The Unanonymous Coward
http://www.mchawking.com/
"Aw Yeah! This be some funky-ass shit I be laying on your ass!"
The party's over
If the you feel that you have to state 'not the orange' when using the word Mandarin in a language context, perhaps you should also state 'not the peoples of the England' when using the word English in the same context.
Another female voice calling my cell phone and telling my i'm offtopic...
I still have an apple II program that converts text to speech (and outputs to the PC speaker). It fits on a 5.25" floppy and its pronounciation was better than some of the more modern TTS engines. Sure, it choked on words like pneumatic, but it was cool to have a TTS engine run in real time on the 2 MHz processor.
I haven't seen anyone post a link to Winbond's own web page on the WTS701 Text-to-Speech Processor so, here it is straight from the mouth:
Winbond
And if I can remember correctly, it was less than 100 lines of code!(!)
The Unanonymous Coward
5. Your wife is on line 1
4. Your ex-wife is on line 1
3. Your proctologist is on line 1
2. You've got mail... pattern baldness
1. You've got spam
it doesn't take tremendous chip power to convert text to speech, it takes tremendous chip power to convert speech to text. well, not tremendous power, but you know what i mean.
to give you an example, using SAM (Software Automated Mouth) on my Commodore 64 (1984) produced quality speech similar to speech synthesis products of today.
so, what was the big deal again??
Imagine a cell phone with this
Well, that's kinda hard, but I can imagine a Beowolf clu... *slap*
The most important thing about the Internet is "bandwidth". I'm not talking bits on the wire, I'm talking how fast information flows into my brain. Speech is vastly slower than text as a medium for transfering information into my brain. I'm so accustomed to Internet speeds for information, I can no longer watch TV news -- the bandwidth is too slow. I'm glad I don't go to school anymore -- I could barely stand lectures when I was a kid, I would never be able to sit through them as an adult.
Five years ago everyone in Japan walked around with their phone to their ears. These days, everyone in Japan walks around looking at their phone (instant messaging, etc.). I'm not sure if people "get" the bandwidth problem. Sound must be multiplexed into half-bandwidth, serialized communication. By this I mean you can only input or output at the same time, but not both. Also, incoming messages must arrive separately, not in parallel. With audio, I can only talk to one person at a time, with messaging, I can carry on multiple text-based conversations simultaneously. I mean, text-to-voice has long been availabe on PCs, but nobody uses it for ICQ/AIM/YahooIM/MSIM.
As far as I can tell, audio is dead. Maybe somebody will invent some sort of hyperfast language (didn't Heinlein describe something like that in a book?), but I think the next wave is going to be something new that replaces reading text, not something that goes backwards to audio.
Great achievement, my Commodore C64 could do that so many years ago that I don't even remember when it was. SAM, the speech synthesizer which could even "sing".
;-)
Has anything new happened lately?
I've got a co-worker, our Oracle admin, who's blind. As things stand, with most cell phones he can't do anything except dial out and answer calls. He can't use the built-in address book to place calls for example, because all of the info is in text on a tiny screen. With text-to-speech software on the phone, he'd be able to use the address book just like sighted folks, read text messages he received earlier even when he's in an area with no coverage just like sighted folks, and so on. This is a good idea.
Insanity is the last line of defence for the master diplomat. But you have to lay the groundwork early.
...and everything gets slower. I read between 2-20 times faster than I can comprehend spoken language, depending on the junkfiltering that's possible.
.2 cps. I know this is a difficult concept to grasp for certain cell phone companies, but a phone, as opposed to a computer, does not have these things, and thus it _sucks_ for email and browsing, and will continue to do so until it has those things, at which point in time you will not want to carry it around because it aint gonna fit in your pocket anymore.). Nor do I want to listen to my email. I dont have the time or the patience for it.
No way in hell do I want to read email on a cell phone (it's a PHONE. You _talk_ to people in it. If it was a generic mail reader it would have at least a 17 inch monitor and a keyboard that lets you type faster than
At least until the phone can give me an (intelligent) summary when I say 'Get to the point'.
The chip could enable items such as a teddy bear that lulls a child to sleep by reading a bedtime story with the pre-programmed voice of Winnie the Pooh.
In the short term, however, Winbond is setting its sites on more sophisticated markets. Topping the list are power-sensitive mobile devices, such as PDAs, cell phone accessories and pagers. Also on the radar are automotive applications such as telematics systems and car stereos.
Great. Now every damn thing everyone owns will be talking.
1Alpha7
Live to be Moderated
Your phone rings and says "Greetings Professor Faulken, Shall we play a game?"
Sorry mods couldn't resist.
It's not a low-power chip, though, it's software, so this is mighty cool. I'm sure we're going to see a billion little "talking devices."
I can't wait until I press the "Find" button on my TV and my remote control yells "between the couch cushons again, asshole!"
CAn'T CompreHend SARcaSm?
The EE Times has a story on a new chip from Winbond that can take ASCII or UNICODE text and convert it to either spoken English or Mandarin (the Chinese language, not the orange).
Too bad... steganography would have never tasted so sweet...
(Posted anonymously because, let's face it, that joke sucked.)
Texas Instruments used to have some of the best Speech Synthsis chips out there...I remember the TI/99 computer had a speech module, and one skiing game got both male and female realistic sounding voices out of the speech module. If they could do it in the early 1980's why can't they do it now?
ttyl
Farrell
CAN-CON 2019 - Ottawa's only book oriented Science Fiction Convention! October 18-20, Sheraton Hotel, Ottawa, Canada h
I'm just imagining the horrid phonetic contortions the poor chip would produce trying to interpret the acronym-laden jargon that us techie folks throw around so loosely. I mean, we have enough trouble agreeing on how you should pronounce something simple like MMPOG. And then there are people's habits of using code-speak and syntax fragments? the poor punctuation checker will shoot itself.
The mind reels in horror...
"If I wanted your input on my pet project, I'd stick my hand up your ass and use you like a sock-puppet." - Muse
While reading the headline for this article I looked up and saw the ThinkGeek add saying "How are you gentlemen....". Of all the possibilities that a chip like this could be used for, it will most likely in th end be used for jokes like sending messages saying "Would you like to play a game." Of cource, then you could send messages through the ikinator and it would say "Would you like to play a god damn game!"
Actually, the Macintosh at release had a processor with considerably less power and RAM than even the lowest end Palm Device does now.
Now all we need are really good speach to text converters....
Only 'flamers' flame!
Wow! Your Oracle admin is blind? *im baffled*
I have never worked with blind people, but after reading an article last year about how websites are getting more and more difficult for braille browsers (flash, imagelinks without alt tags etc.), I decided to make a lynx-friendly version of my site - and so should YOU!
Anyways, how does he do it?? Is it worth it to the company you work for, or does it cause everyone else problems? Is he good? Tell! Hopefuly this could encourage others to take on "disabled" in their company....
-Kraft
Live and let live
Ok i've seen 2 schools of thought over the years when it concerns what some see as overkill in technology.
1. Tech should be stripped down and basic I.E. just enough to get the job done.
2. Tech should be overpowered to show what can be done with tech.
Considering the price of technology has fallen considerably over the last 5 years. WHY NOT add everything but the kitchen sink. 5 years ago I never dreamed I could afford a gig of ram. Low and behold today's low priced pc-133 can be had for less than 20 cents a megabyte. Do I have a gig of ram? Yes I do! It doesn't matter if you call it an upgrade, addition, or bragging rights it was affordable so I went ahead with it.
Any aspect of our lives becomes better when you throw more hardware at it. Freezers that keep inventory, centralized house controls, personal area networks. When the benifit > price and price a drop in the bucket it then becomes a necessary feature for the manufacturer to add in order to stay competitive, which in the end is a win for the consumer. If you want to stay in the dark ages with your low power palm pilot go ahead, i'm going to buy my onyx handheld for the same price.
I can't remember if I saw that on a bugs bunny cartoon or school. I just remember a mouse dressed up like a professor, or was it elmer fudd the shoe maker?
The ASCII-to-speech chip and the Perl program are good candidates for inclusion in a Perl-based AI Mind.
Since is Perl is already a major Web and Internet language, imagine an Internet iMind roaming the 'Net and talking to people in English or Mandarine Chinese with the new speech chip being fed the ASCII output of the AI Speech Module.
Unlike everybody who posted "big deal, my Commodore 64 used to hold long, sexy conversations with my Speak & Spell about the meaning of Wargames," I actually read the article. Near the end it says "The multilevel storage memory system allows the chip to store up to 256 different voltage levels, or the equivalent of 8 bits, into one EEPROM cell, which is up to 8x the capacity of conventional memories..."
Being a software geek with my last classes in EE/CE several years safely my sordid past, I'm out of touch. Is this a big deal?
A man without a God is like a fish without a bicycle.
You have to remember that economics is what drives these things. If there are yuppies or geeks out there who want to have "every feature but the kitchen sink" in their cellphone, PDA, or whatever, there will be a company out there that will be happy to take their money to implement these technologies.
One step closer to the Babel Fish ...
...that runs on any system that can do this reliably, i.e. so that it is readily understandable and pleasant to listen to, whose logic has been reduced to this chip. I'm not talking individual words, here. The production of speech from text has been a hot research topic for over 50 years, and every step forward is like Zeno's paradox. English is an extremely complex language, with a very large number of orthographic rules. From the description, it sounds as though the individual phones are calculated, and then smoothed, which isn't exactly how speech works (it's an approximation, but not always a good one).
;-)
On the other hand, this could give rise to an entirely new kind of email virus.
My Macintosh SE (8MHz 68000, circa 1988) can convert text to speech no problem. It's not necessarily smooth and natural, but it can't be *that* much of a jump...
SIGFEH
Here's some reasons why you want it on the device, and not on the server:
Privacy: I, for one, don't want to send my personal content through a portal provider. I don't want Microsoft getting all my mail, I don't want TellMe getting it either. And I don't want to have everything that I'm supposed to want available for me at the channel, with my usage stats, habits, and particulars sold to direct marketers or worse.
Security: The more places you ship the data around to, and the more intermediaries involved, the more possibilities there are for sniffing, bad security, leaks, and misuse. Passing things through a provider means trusting them to maintain security properly, and I for one don't trust many people enough to allow that.
Bandwidth: Alice in Wonderland can be shipped in full audio at several gigabytes, or shipped as a 100+kb text file and synthesized on the device. Cell connections are terrible, despite what the telcos are pushing in their media campaigns -- coverage even in the Bay Area is spotty; you lose signal as you pass in and out of cells, and there are network overloads and outages. Keeping it down to small text streams and synthesizing on the device means getting one step away from the unreliable, low-bandwidth networks available today, and 3G is a long. long way off.
Kevin
interesting
It's easier on bandwidth to just send a few hundred bytes of text than streaming audio.
context sensitivity
Have you looked at your proposed solution? As I see it there are two ways that you could have it: 1.) Put the text to speech in the phone. This allows everything local and remote to be converted into sound. The server needs to do no processing, and much bandwidth is saved. Also, it gives the end user more control over the program (this is important). The disadvantages are that the phone would need a small, cheap chip. More power may not even be used because the wireless transmission doesn't need to be powered. This works when the main server cannot be contacted. 2.) Impliment the tech. at the server. This uses more bandwidth, cpu time (unless they have these dedicated chips) and is under control of the service provider. I think it's clear.
I recently tried some of the shareware/freeware TTS engines I found on the web and the quality was pretty lousy. It still sounded like Operation Stealth on my Amiga.
So if my desktop PIII can't do it, how are they gonna put this into a low-power chip?
Well, if algorithm in this chip anyhow recembles Bell Labs text to speech system, then it handles this text pretty well. I'll check home how XP's narrator handles it thou...
Microsoft bought up what was SAM and put it
in Windows 2000.
A CPU that TELLS you when its time to upgrade :D
----- Whats wrong with this picture? http://www.revoh.org:1234/whatswrong
Hey, I know this question is somewhat off topic, but I've been using /. for a while and am stumped anyway. How in the heck do you get those Chinese characters into your sig?
Joe wants to tell Bob something. Joe turns on his fancyass cell phone, types a 12 word message on his 8 by 4 character screen using a keyboard the size of his thumb, to send to his buddy who then uses his phone to read the text to him?
Why in the world would anyone want to do that? JUST TALK INTO THE DAMN THING ALREADY!!!
Man, we so deserve to get wiped out by a comet...
Unicode. For example, "&#" + 28961 + ";" =
Does anyone remember the name of this program? I think it was something like "Simon Says".
I'm a leaf on the wind. Watch how I soar.
The support of TTS software for other languages than english, seems to be quite low. I tried every software I am aware on for french TTS (Mbrola, Euler), and it is not able for me to read a text correctly. This TTS feature is most useful for visually impared people, some blind friends of mine even had to use english TTS with french text, and I sware it sounds horrible
This situation seems like a case where speech-to-text translation would be more useful. For example, instead of having to lsten to an address book being read to him, he could simply say "Call Bob Smith" and have the phone dial for him. In fact, some wireless carriers already offer this.
Sig (appended to the end of comments you post, 120 chars)
He's got a variety of tools at his disposal. Just the other day, he gave a demo of some of them to a bunch of us.
He's got an 8-dot braile terminal that gives him enough characters to do C and Perl programming. He's got a hardware speech synthesizer he cranks up to something like 200+ words per minute. I tried, and could only understand a few phrases when it was cranked up to 95 words per minute.
And when a web site he needs or wants to access is inaccessible, he complains to them, and sometimes things get fixed. He can navigate web sites that use alt tags remarkably well. A good rule of thumb is that if a site makes sense with images turned off (or in lynx), then it'll work for him.
The General Instruments SPO256 chipset?
The '256 took coded phonemes an outputted audio,
while the other chip in the set (don't remember the name) took ASCII serial data and
converted it to phoneme codes the '256 could understand.
This set has been around for prolly close to 20 years now. (I remember finding a variant of it in
the Intellivison voice module ["Bee Sevunteen Bahlllllmer"!] that I believe was circa 1984.)
The '256 has been discontinued for a long time now, and I'm kinda excited to see
something similar to it show up, it was a cool gadget.
C-X C-S
Welcome to 20 years years ago.
it's still gonna like sound azz.
robotic synth voice: eeeeeeuoo hhhaaaaveeee waaaaauuuunnn neeeeeeuwwwwww meeeeeessssaaage. meeeeeessssaaage seeeennnnt tweeeelveee foooorreeteeee tuuuuuueeee. meeeeeessssaaage reeeeeeaaads: heeeeeeeeeloooooou.
1 hour later/etc...
[translation - you have one new message, message sent: 12:42 message reads: "hello"]
and then you respond with voice recognition:
"hi"
> high
"no! HI"
> go, high
"HI"
> bye
"HIEEEE EEE"
> i
[type] "hi"
This comment does not represent the views or opinions of the user.
I hope he practices safe cell phone use and doesn't call out while he's driving.../humour
"History doesn't repeat itself, but it does rhyme." Mark Twain
If the Slashdot crowd cannot identify a language spoken by 25% of the world, it shows how ignorant USians are.
I've had a PC card (based on late '70's technology) for ages that does RS-232 to speech. (© 1986 B.G. Micro)
Okay, it doesn't do great speech, but it made a dandy talking clock when hooked up to cron. "BONG! BONG! BONG! BONG! BONG! BONG! The time is six o'clock!"
I'd use it for something, but ISA card slots are rare these days. I'd power up one of my horde of 486/66's to fit it, but that's too silly even for me.
Is it just me, or is this a solution still looking for a problem to solve?
One line blog. I hear that they're called Twitters now.
viewing the source html reveals all:
& # 22825; & # 32; & # 19979; & # 32; & # 28961; & # 32; & # 19978;
(remove spaces to get characters to show)
This improves upon an old chip by Votrax, called the SC-01A.
:)
That chip required you to pump it phonemes, rather than text. Phonemes are the actual sounds that are strung together to form syllables, then words - there were 64 of them, including pauses of varying length. So you had to have an external microprocessor with a text-to-phoneme algorithm (not too big - just a few K in 6502 or Z80 assembly), then you would feed the phonemes to the chip. This chip actually sounded pretty good (for its' day) if it was used correctly. You usually had to have some good filtering on the output, and had to be careful about generating phoneme strings.
The chip in the TI-99/4A worked the same way, except that the speech synthesizer cartridge included the text-to-phoneme algorithm (and a new BASIC command "SAY". It worked ok, but not great. Unfortunately, the implementation of the "say" command in TI BASIC didn't allow you to specify phonemes, so you had to do things like "tayck thuh kawttin owt uv yoar mowth" to get it to pronounce things correctly. It still sounded worse than the computer in War Games. I'm not sure if this chip was the SPO256-AL2 or not, I seem to remember that TI was down on using chips from other companies at the time.
So, the big deal is that they now have an easy way to dump text in one side of a chip, and get something you can drop into a speaker on the other side. Plus it does it in two languages.
That's a far cry from the crap we had in the 80's. But, you should note that it's not a speech synthesizer, it's just a fast playback device, which chooses the order of playback based on text input.
As for all the people mentioning the various software speech synthesizers, don't forget SAM - "Software Automated Mouth" for the Commodore 64. I think that was actually based on something that had been around for the Atari 400/800's since the very early 80's. And none of those computers was on a single chip (though they're about as powerful as a modern watch
A friend and I had considered making a small (EEPROM-sized) board to emulate an SC-01A. We didn't plan to put the text to speech on it, just an EEPROM full of recorded phonemes and a little microcontroller to interface to the outside world. Never got around to it - oh well.
- The Sigless Wonder
Are there any websites where you can get a review by a blind person? or anything similar?
We can talk about web standards until we are blue in the face, but when we stop certain people from being about to use the web, that's more than a failure of standard.
are a moron
Speaking of getting assistance from the network, check this out : http://www.logilab.org/narval/ if you've ever been thinking about getting some actually useful AI stuff to get work done.
http://www.kurzweiledu.com/kurzweil3000.html
We had one of these in my public library back in '92. It could scan, OCR, and read a page to you within about 10 seconds. The prosody was actually fairly good too -- you could tell it was artificial, but it didn't sound like crazy pickle face man. <Adam Sandler>Hey, I'm crazy pickle face man! I got a pickle on my face. Now give me some candy!</Adam Sandler>
Of course, it cost in the tens of thousands back then but hey, we expect that kind of technology advance don't we?
Perhaps more interesting than the voice technology is the actual chip, which is able to store 8 bits in a single EEPROM cell instead of the usual 1 bit. This is achieved by a multi-voltage technique. 256 voltage levels of course give 8 bit capacity. Voila, the actual size of the chip is potentially reduced to an eight of conventional technology.
Mobile phones are already usable by blind people, and while it's certainly nice to have this added functionality in for them, I would like to see something for the deaf and hard of hearing first. As it stands, most mobile phones are completely useless to the deaf. Attaching a teletype machine is only possible on very few models, and lugging a teletype machine defeats the purpose of having a small phone. Considering that many modern digital phones have texting capability through SMS or other technologies, and some mobile networks support 711/relay service, it doesn't seem like too much trouble to add teletype support directly into a mobile phone. A chip that could demodulate the tty codes would be almost all it takes.
i wish i had some mod points so i could mod this off topic.
I've long envisioned a PDA with the only interface being spoken, rather than requiring any video component. This would bring the power consumption and delicacy of these devices down within reason for extended usage.
This chip uses 53 milliamps. Explain to me how this voice interface will be less power-hungry than an LCD and a few buttons.
I am not sure how much power a Palm display draws, but my phone (when not on a call) uses 9 milliamps without a back light, 29 milliamps with a back light. That's TOTAL, including powering the radio components and processor. Meanwhile, your 53 milliamps is just to power the speech chip. Not to mention the fact that audio speakers are very power-hungry.
Furthermore, an LCD display shows a lot more usable information in a shorter amount of time than any serialized voice interface. And for now, we will ignore the huge amounts of power required for speech recognition.
So again, explain how this speech interface will bring your PDA power consumption down?
// Alan Porter
mchawking.com
// Alan Porter
It's already here.
60623
I like to convert text to mp3s for long journeys so I can listen to Dickens on my Rio. Of course, that takes a lot of disk space. I'd much prefer a little handheld device that simply converts the .txt file which is much smaller, to speech.
I'd pay for it, and I bet a bunch of other people would too.
OoO
Please do not publish outside of
Isn't this just a step away from an English to Mandarin translator, in itself a step away from being a universal translator for the most common Eurasian languages? You know this has military implications, with a US Tour of Duty over Taiwan. And Carnivore usage is pretty much guaranteed if this isn't already spawned from that project.
Still, doesn't this type of tech get in the way of learning new langauges? Learning languages is known to foster some good brain wiring in early ages, so this tech throws that out the window if people are no longer going to need to learn the language. Just take out your C-Pen, scan the text, and it will speak the text into your bluetooth connected earpiece.
Makes me wonder if other tech that we introduce to our kids will affect them, making them too dependent on the tech. Those PalmPCs are great to help you remember things, but I know I start to rely on it and become more forgetful. Or maybe this will be a boon because we have so many things to remember that it no longer fits in our skulls. Time will tell.
"Beware of he who would deny you access to information, for in his heart, he dreams himself your master."
Babelfish seems to say the sig means something like "World Above". Which just goes to show that this is nothing new, just that it has become hardware capable so that it becomes portable and fast enough to translate in real time. Definitely has military use. Of course, that's also probably badly translated, showing the limits of the tech.
"Beware of he who would deny you access to information, for in his heart, he dreams himself your master."
Forgive me if I am making a mistake, but isn't this old tech? I've seen Text-2-Speech chips for sale in at Radio Shack for like five bucks. Not with the Chinese, however. But, still...
Question
http://www.ironfroggy.com/
If I could get my hands on one of these, I'd hook it up to a PIC and a Compact Flash card in IDE mode, then store Project Gutenberg text files on the card with an index so that I could select a title by having the device read them to me. Sounds like a summer project...
watch out... someone's going to make a unicode goatse.cx
As witten by another poster, the Amiga speech system (translator.library IIRC) did that job very well about 15 years ago. Unfortunately it was never ported to other languages than English, and other platforms when Commodore died, for licensing problems.
There were speech systems for other microcomputers in the late 70s or early 80s, but the quality of the Amiga system was way superior. The chip used at that time in many toys (SP0256 IIRC) didn't implement inflexions and emphatic characterizing, but a simple allophone->sound conversion table; the Amiga voice did, and it was amazing.
However, it's a nice product , and being low power it can be added to a great number of small and portable devices too.
Well, just imagine (what mess would produce) a Beowulf cluster of these things!:)
Yeah, it'd be cool to have your phones read your email to you. But, if the chip just supports English (I'm ignoring the Mandarin part, I know, but bear with me), it's a long way from making it into real cellphones.
A cellphone isn't built to serve just English people. Phone manufacturers cram multiple languages into a single phone, and the chip would have to support all those languages at once. The manufacturers to this so they can make *one* phone and sell it to all the countries in Europe. This is, of course, less of a problem in North America, but let's face it, Europe is a vastly bigger cellphone market. The last thing a phone manufactuer is going to do is throw a chip in that costs them more money but only serves English-speaking people.
Such a device will be very handy for people that have visual impairments. Instead of the current bulky and expensive kits, this will be an improvement, especially for VI users out-and-about.
What can you do? Make your web pages accessible for a start.
There is a package called IRCHA (a perl plugin) to add text to speech capabilities to xchat, bitchx and MIRC/W32 under Linux and Windows. It uses a program called MBROLA (freeware but restricted for non-militar and non-comercial users).
In the other hand, anyone knows how to use festival??? I've installed potato packages and:
[sromero@compiler:~]$ festival
Festival Speech Synthesis System 1.4.1:release November 1999
Copyright (C) University of Edinburgh, 1996-1999. All rights reserved.
For details type `(festival_warranty)'
festival> (SayText "hola")
#
festival> (SayText "hi")
#
ARGHHH!! I'm not able to make it running!
CU!
Phew! For a second I thought it had said "ASCII or EBCDIC text"....
Esli epei etot cumprenan, shris soa Sfaha.
Specialized chips for TTS applications has
been around for a while... The problem with
their acception is that they have poor voice
quality. Actually, ther are tho quite different
technologies available to produce text nowadays:
1. Diphone synthesis and its variations. The idea
is to have one sample of each sound compination
(diphone) in a speechase and produce the actual
speech by manipulating those sounds. This is what
give computer-syntethized, somewhat metallic speech
that most people have already heard somewhere and
this is what actually used in low-powered devices,
handhelds and speaking dictionaries.
2. Corpus-based synthesis. The idea is to store
a few hour of the speech of a highly trained
speaker in the speechbase and select fragments
of this speech that suit best for the genaration.
The second approach gives astonishing results with
the quality of the speech being sometimes
undistinguishable for the human. However, the size
of the speechbase is an issue. You can not fit a
300Mb speechbase onto a handheld hevice yet
and hardware optimizations dont help much when
it conserns fetching data from the speechbase
and performing text-to-phonemes conversion.
Several companies have corpus-based synthesis
demos on-line. Check out SpeechWorks' and
Lernout & Hauspie's sites
I'll be happy as long as it comes with a copy of Shit Talker built into the phone. I envision a bold new era of prank calls...
Well, not necessarily. The cell and PSTN networks are designed around carrying audio and that is still what they do best. Today, it's a toss-up as to whether it's better to approach text-to-speech from the back-end (where you can have more flexibility) servers, or by embedding pieces into phones which gives you a whole new set of problems and potentially great solutions.
The problem is, the idea of using this tech in phones is fighting against hundreds and hundreds of millions of deployed telephones without any tech newer than perhaps a microchip for caller ID. Over the long-term, text-to-speech embedded in the device is the more efficient and user-controllable format. Over the short haul, though, we're going to see many years still of central-office-controlled voice apps on your phone.
Niche applications, like on a Pocket PC, now there something like this would absolutely rock. Get a toehold, and eventually low-power text-to-speech and speech-to-text devices will be all the rage.
Now if only someone would perfect a speech-to-text engine that didn't require hours of training to recognize my accent...
Matthew P. Barnson
I learn what I think when I read what I write
Alternatively, output on a Refreshable Braille display can also be used.
At the moment, this is GSM900 only (so it is mostly a European thing for now), but with the arrival of the Nokia 9290 in the States by mid-2002, it will also be available in the US.
You can find more info on the device here.
G. Nolst Trenité
Dearest creature in creation,
Study English pronunciation.
Spot on! Not only would it have to disambiguate homonyms by semantic context, it would even need to use poetic context. Great poem!
-- MarkusQ
thanks Kynn