Cell Phones for the Deaf
nitzan writes "Quoting from the article: 'the software translates the voice on the other side of the line into a three dimensional animated face on the computer, whose lips move in real time synch with the voice allowing the receiver to lip read.' Unfortunately this only works with laptops, but a pda version is in the works." The company website has a demonstration.
This is a fantastic idea which will enable communication for the vast numbers of hearing impaired, however if the web-site is any indication, the technology needs improvement. I'm pretty good at reading lips and I was working pretty hard to figure out what was being said with the sound off.
Visit Jonesblog and say hello.
Rather than have a computer interpret a person's speech, the software basically gives a representation of what the speaker's mouth is doing. This will allow the deaf person watching the device to do their own interpretation of what they see, which I'd imagine is much more reliable than speech-to-text could hope to be.
Paul Lenhart writes words!
Someone can say "Pot" and yet with the same lip movement, can also say "My". Men with bushy mustaches are a lip-reading disaster.
For me, I've adapted in my own way: I rely heavily on my hearing aids. That combination of both lip-reading and hearing the audio stream from your mouth enables me to achieve at least a 70% success rate (under ideal conditions, if it's a party atomosphere, fudgeddaboutit). I've had hearing aids since I was 1 1/2, and only with extensive speech therapy can I speak well. I'm one of the few deaf-from-birth people that can do it this well. So, from that perspective, I can speak on a phone (as long as I can understand that mangled audio coming out the receiver, which is 0%).
Why don't they just focus on speech recognition? A great speech recognition phone would enable deaf people that speak to use phones for near real-time conversations. In addition, such technology can also be (easily?) adapted to foreign language translators for tourists.
However, until such technology is available at the consumer level, I'm stuck with two-way text messaging devices like the T-Mobile SideKick.
-Cyc
/.'s 10 Millionth
Partly, because speech to text isn't very good.
Speech to text isn't very good because its very hard to turn phonetics into words. Our ability to understand people is very reliant on context. Knowing what's been said helps you understand what's being said.
Some will say that speech to text is getting fairly good in English, which is somewhat true. Obviously, though, there are bigger markets in other languages.
So how does this thing work, if it doesn't do speech to text? It does speech to phonetics, and phonetics to lips.
For example, its relatively easy to understand when someone has said "h -ee- r", but knowing if that's supposed to be "here" or "hear" is quite difficult.
This is why the same software works across languages. "Th" is "Th" in any language, and your single algorithm doesn't have to care.
-Zipwow
I don't know which is more depressing, that 2/3 didn't care enough to vote, or that 1/2 of those that did are crazy.
Seems like it's not over-engineering. This is less steps than speech-to-text as far as I can see.
You have to record the speech and convert those sounds into phonemes. Now all you do is use the picture(s) that go with that phoneme, which is going to be more or less consistent.
With speech-to-text you have to use probability and word banks to figure out what the heck words those phonemes are supposed to go with, which is the hardest part by far, because spelling and grammar is so inconsistent. That requires a lot more time and computing power, and you are prone to a bunch more mistakes of course.
If you want to make an apple pie from scratch, you must first create the universe. -- Carl Sagan
Ok I'm deaf so I've actually used this
The new phones are TTY compatible, they do not have a TTY in them, but if you hook a TTY to them it actually works, whereas with the other digital phones that aren't TTY compatible (right now the majority) you get alot of garbage characters.
Analog cell phones are unaffected and with a TTY just fine without modification
Pete
Posting late, but wtf.
By way of introduction: I developed the core coarticulation and other algorithms for lip synching when I worked at a now-defunct company called...wait for it...LIPSinc. We thought the resulting lip synching was pretty damn convincing, so on my own I tested out our stuff with a hearing-impaired friend, with mixed results. Anyway, I don't know a little about this stuff, I know a *lot* about it.
What these guys have done is map phonemes onto exaggerated visemes (the pictures of the mouth). Not a bad idea at all! Bunch of problems, though. First, there's a data data reduction of about 3x in going from sound to video--there are 40-50 distinguishable phonemes, and 9-16 distinguishable visemes, depending on how you count each. This is because the visible part of the face only makes up the end of the vocal tract, a lot of distinctions between letters occurs without the involvement of the lips, like the difference between F and V, while others, like K, can be pronounced with the face in virtually any position. This is part of what makes lip reading so hard with a real person, and why they need a lot of context to pull it off. They also seem to be slowing down the timing, as if they recognized the phonemes and then synthesized each at the same length. This gives longer to recognize each one, but wrecks the visual prosody (rhthym) of the speech, which is a good cue for where the parts of speech are. Then there's the rest of the face. The eyebrows and head positions help you figure out key words, ends of clauses, tell if something is a question, etc.
Those who say that TTS is superior to lip reading have a point. Good TTS contains *more* accurate information than an uninterpreted stream of phonemes (itself 3x richer than a stream of visemes, as I said above), because the machine can do a Viterbi search to find the most likely sequence of words from a continuous stream of phonemes. Words also open up higher NLP functions, so you can do constraint relaxation to test whether "wreck a nice beach" or "recognize speech" fits better in the context.
Still, I'd like to see an experiment where the raw phonemes are fed, as text, to the recipient. I think with practice, your brain would start to decode the string (it manages with the sound, right?), despite the lack of word boundaries and the errors in phoneme detection (which is not all that high without text-I think seventy-something percent). Seems like an easier pattern recognition problem than lip reading. Who wants to go get funding?
I'd also like to add that For Hearing People Only, ISBN 0-934016-1-0 is a great source of information about the complex and interesting world of Deaf people, and the language of ASL.
http://www.santacruzbynight.com/index.shtml Santa Cruz By Night Vampire Larp