Cell Phones for the Deaf
nitzan writes "Quoting from the article: 'the software translates the voice on the other side of the line into a three dimensional animated face on the computer, whose lips move in real time synch with the voice allowing the receiver to lip read.' Unfortunately this only works with laptops, but a pda version is in the works." The company website has a demonstration.
I worked with deaf people for a while and they were (and I am sure still are) disappointed that cell phones are not compatible with TTY devices. How difficult is this to do?
I sure as hell couldn't tell you what they were saying, even when I knew what words were coming out of their mouth. And this is not to mention cell phone static, distractions, contractions, mumbling, and lots of "ummm" and "uhhhh" that occurs during normal speech. I really don't see how this is a viable communication method.
Maybe it's because I'm not experienced with lip reading. Maybe people who are deaf are better at it than I am, but I can usually tell what Football coaches are saying on the sidelines of games (of course, that's limited to "Bull****" and "You've gotta be ****ing kidding me!", but still...)
I thought it seemed a little weird at first, but then I checked out the other demos. When I knew what the words were ("Thank you" in English, German, French, Spanish, and Japanese), I could easily tell what was being said.
I notice a lot of people complaining about improving text-to-speech, which is far more advanced than this technology. Speech sounds come out in a continuous flow. Getting a computer to recognize the breaks between words, properly spell them reliably, etc. is hard enough on a desktop system, much less a PDA. Especially considering in languages like English, where most vowels in unstressed syllables are rendered vocally as "uh".
This system simply has to hear a sound, and immediately display an associated... well, not "grapheme", since this isn't writing... maybe "pixeme". It is the graphical equivalent of attempting to spell perfectly phonetically.
Also, if you didn't notice it, "invisible" sounds that occur on the back of the tongue are indicated by circles on the cheeks (like hard 'g' and 'k'), and nasal sounds are indicated by a darkening of the nose.
All in all, I think this is an interesting idea. It will be even cooler when they can render different faces so the "avatar" resembles the person to whom you're speaking.
bytesmythe
Hypocrisy is the resin that holds the plywood of society together.
-- Scott Meyer
Actually speech to text is much more reliable.
Text to speech:
1. person speaks
2. software interprets phonetics converts it into words
3. deaf person reads the words
versus
1. person speaks
2. software interprets phonetics into picture based lip movements
3. deaf person interprets picture based lip movements
Point of fact this is unbelievably dumb and is right up there with converting Russian to German for an English speaker to read.
... if you have this software running on a phone then if you are hearing impared you could get real time conversation with the other party without having to go through a human being.
I've spoken with a hearing impared person on a phone before through a TTY system and it is painfully slow. First you have to say your sentence and then they send it. Then the other end needs to read it, type in a response, and then send it at which point it is read back to you. Imagine having a conversation over an Instant Messenger except you're secretary was reading the screen and typing for you. (IM for the blind for example)
I agree that we need better voice to text and text to voice translation. That technology would give use better access for everyone. You could have "hearing" for the hearing impared (speech to text), "reading" for the vision impaired (text to speech), and you could even have "writing" for those with fine muscle control imparement or who are lacking the necessary limbs for various reasons.
But this is an interesting approach to solve one of the three problems.
42 - So long and thanks for all the fish.
David Stork has a chapter computer lip reading on in the book "Hal's Legacy" on A.I. methods. The combination is much more reliable that either audio or visual.
On a more serious note not everyone reads lips in english, if you develop this right its common for any language
Think about it like this.
If you have to say the sounds for the word "ow", what does that look like? There is a way for a computer to display this that would be pretty clear, and figuring this out would more or less require grabbing the "ow" picture group.
Now, what if you have to write something with a "ow" sound in it, but this "ow" sound might be in the middle, beginning or end of any word? The sounds all around it have an effect on it. It might be spelled "au", "ow", "ao", "ough", god-knows-what-else. There are dozens and dozens of situations where this sound might arise. Including ways you might not even consciously think about. Figuring all this out is really hard for a computer to do, because it has no AI. It doesn't know what is being talked about.
Probably mapping mouth movements isn't dead-easy, but I'd wager it is much easier than speech-to-text.
If you want to make an apple pie from scratch, you must first create the universe. -- Carl Sagan
Working in a call center, i get the occasional deaf call.
It takes tremendous amounts of time, because not only does the translator have to interpret what the customer is saying, so that i can hear it, he then has to translate what i say back to the customer. It takes ages, and i'd imagine that with a cell phone, having a comptuer immediately translate, if slightly less accurate, would be preferable to having a human slowly (compared to the comptuer) enter it. Speed Vs Ease of Comprehension. Pretty common comparison. To each their own
0110100100100000011000010110110100100000011000100
My wife is hearing impaired, so we've got a lot of HI and deaf friends. Nearly all of them use Motorola T900 two-way pagers. Even the older, less tech-savy deaf people are comfortable with them, since it's not that different from a TTY (albeit non-interactive). They're small, handy, inexpensive, and run for quite a while on an AA battery. They're email-able so they have no problems communicating with hearing people (especially if you've got a email-capable cell phone).
The only problem is the non-interactive nature, and the fact that the email messages have to be rather small. If someone would come up with a version that could do real unit-to-unit TTY (essentially put a phone/TTY in it) in addition to email, they would sweep the market.
All this flashy lip-reading-speech-recognition crap is trying to kill a cockroach with a hand grenade.
Very well put!
I've had deaf friends, one of whom attended Gallaudet University. (Famous liberal arts college for the deaf.) In addition, I lost most of my hearing for some years as a child -- fortunately, I got it back after surgery. I've thought about deafness, and dealt with it.
Lip-reading works best for people who were hearing at one point and lost some or all of their hearing. I went deaf after I learned to talk, and went deaf slowly, which means I relied heavily upon it. People who have always been deaf often find lip-reading very difficult, or even impossible. When you have no concept of hearing or sound, trying to figure out what meaning is associated with specific lip movements is tough.
This is true of learning to read, as well. A person who was already speaking, or could read, before going deaf has no real problem with reading. If you can't hear and never have heard, though, the concept of an alphabet and "sounding it out" makes no sense. A congenitally deaf person who wants to learn to read must learn each word as a whole, much as a Chinese or Japanese person who learns to read his/her language must learn each character separately.
Since a congenitally deaf person faces a humongous task regardless of whether he/she is learning to read lips, or read and write, just which one do you think he/she would rather have to learn? In most cases, learning to read and write is going to be a lot more useful.
From where I sit, speech to text would work better for most deaf people, congenitally deaf or not.
Catherine
Those are some good points.
For my interpretation of how the lipreading worked, I was looking at the sample on the website. It appeared to be that certain sounds had certain pictures. Anyway, it also had something that touched on what you were talking about. It had a little bit of extra information you can't normally see, like red dots on the cheeks for g and k type sounds, and a red nose for nasal sounds, etc. Extras like these might make up for some of the lacking facial cues.
The other thing I was thinking about was that the lipreading could be part of the understanding process. A good number of people (most?) who are legally deaf are not truly 100% deaf. If a person is able to get a bit of auditory information, plus this lipsynching information, it might be enough to make things a lot easier for these people, even if the lip-reading by itself were too simplistic.
If you want to make an apple pie from scratch, you must first create the universe. -- Carl Sagan