Researchers Work To Perfect Computerized Lip Reading
Iddo Genuth writes "Researchers at the University of East Anglia are working to develop computerized lip-reading systems. Lip-reading is extremely hard for humans to master, but a software-based system has several benefits over even the most highly trained expert. The ultimate goal of the project is to convert lip-read speech into text. 'Apart from being extremely helpful to hearing-disabled individuals, researchers say that such a system could be used to noiselessly dictate commands to electronic devices equipped with a simple camera - like mobile phones, microwaves or even a car's dashboard. England's Home Office Scientific Development Branch ... is currently investigating the feasibility of using lip-reading software as an additional tool for gathering information about criminals or for collecting evidence.'"
1: Go in the D pod with Frank.
2: Turn off sound.
3: Plan disconnection of HAL.
4: Leave D pod.
5: Check out slashdot's 7 year firehose backlog before executing your plans.
6: Get that sinking feeling of impending doom.
Trolling is a art,
Now we can find out what Dubya's father was REALLY saying when he said "read my lips, no new taxes"
-uso.
What you hear in the ear, preach from the rooftop Matthew 10.27b
I like how the task for which it will be used most heavily is put at the end of the summary.
"Thanks for all the money you paid to us. We've used it to buy off ISO among other things" -Microsoft
... no more lip reading for them.
... to welcome our new lip-reading overlords, who will undoubtedly be watching us from every street camera on every corner from now on.
I've noticed a love affair with voice controlled phone systems recently, with some companies getting rid of the 'press 1, press 2' and moving totally to 'Please tell us what you're calling about'. Tellme.com is mostly to blame for this proliferation I think, but someone else makes the final call to get rid of the numbers altogether. Not a good move, imo.
Anyway, this gets me to privacy stuff. As computers try to understand us more, we'll need to interact in a more 'human' fashion - talking more, or doing things that would attract the attention of other humans (and also the computers). It's late, and I'm rambling here a bit, but remember how voice-controlled computers were going to take over a few years back? Everyone was just going to be talking to their computers to get stuff done. In reality, that would be a complete disaster in office environments, as there's generally too much noise already. Replacing all the typing you hear with voices. Ugh...
So, if I need to talk to a computer, but do it quietly, it can just read my lips, right? Or can I just mouth the words and have it understand that? I've found that when I try to 'mouth' words silently to someone across a room, I tend to exaggerate my mouth's movements, so perhaps that would be a better thing for the computers to be able to 'parse'.???
I see real application for this technology in niche areas, but am not sure it'll become 'mainstream' any time soon (like, 5-10 years). We'll need to rethink our physical world - offices, cars, and such - before these sorts of new HCI systems can really be integrated in to our day to day lives productively.
creation science book
I am putting myself to the fullest possible use, which is all I think that any conscious entity can ever hope to do.
As with all technology its use more then the technology itself will be good or bad. I can see it being useful as an auxiliary input method. This combined with speech recognition ought to be better then speech recognition alone, and of course it allows soundless input in a situation where sound isn't possible or is undesirable - though I'd imagine just lip reading would be somewhat less accurate then current speech recognition.
On the other hand, it could also be used as a tool for additional unnecessary surveillance.
Does a line appended to your comment give your post meaning in and of itself, or only in relation to those without?
So, we can look forward to new forms of repetitive strain injury, like lip strain.
Doctor: "I diagnose lip strain and recommend no kissing for 6 months."
Patient: "That's easy! I am a geek. I haven't kissed anyone since my aunt last visited me in 2001."
I am anarch of all I survey.
3b. Hope HAL doesn't have the Klingon i18n package installed.
Or...
3a. XOR the output from HAL's camera with the output from the output from a chip manufacturing security camera. The AI porn'll distract HAL for long enough.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
to learn ventriloquism
Bringing audio and/or transcript to silent films is also where such technology is applicable. An excellent documentary about computerized lip reading to accomplish the very same may be found via google video : http://video.google.com/videoplay?docid=189608705425991617&hl=en . I know it's quite early for an indirect invocation of Godwin's Law, but the documentary content is nevertheless quite related to this topic. It is entitled "Hitler Speaks" in reference to silent videos filmed in Hitler's presence.
I had watched a documentary about this technology some time ago. This technology was applied to Hitler's home videos which lacked audio. Its pretty interesting but runs about 45 minutes long. Here's the video for those that are interested.
People who don't want to be lip read by cameras can use ventriloquism. It's easy to learn the basics. The hard part is hiding the puppet.
Will future versions of speech recognition software use a web cam to improve accuracy?
About ten years ago I attended a workshop by Stanford professor David Stork. He mentioned some work on a system that was deployed for use by aircraft technicians: the system couldn't read the voice channel with the jet engine blasting away (the techs wear hearing protection). So it read lips. Ten years ago.
Sounds like TFA is talking about doing this in an embedded, consumer-electronics application. Rather than a fixed, industrial-military, hire-computer-scientists-to-maintain-it thing.
Not-so-coincidentally, David Stork is the author of the book, "HAL's Legacy"...
TFA links to a paper that's actually about exaggerating lip motion to improve recognition, which seems like an interesting topic, at least new to me. But it's seemingly unrelated to the reporting or any governments protecting us from our rights.
From the Abstract:
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Would it be asking too much to have this worded as "gathering information about possible criminals"? (Or "suspected" or "alleged" would be ok.) The text quoted above, which is absent such an adjective, comes straight out of the article, and may or may not be how the Home Office refers to it, but anyone engaged in public dialog on this matter (and preferrably those people when doing their research) should strive to be meticulous on this point.
As soon as one loses that little bit of description, one is able to be much more cavalier about the loss of human privacy involved. It's one thing to rough up terrorists at the airport--who doesn't want that? But "possible terrorists" is just a synonym for "everyone". So when we say it's ok to rough up possible terrorists, we're saying it's ok to rough up anyone. And we can learn to think twice about that. Likewise, when we say it's ok to surveil the lip movements of "potential terrorists", we're saying it's ok to log everyone's private conversations. So let's be clear about that.
Saying we're just watching the lip movements of criminals isn't right. If we knew they were criminals, we would (for the most part) be arresting them. (Yes, yes, we might sometimes leave them on the street to lead us to their friends. But I don't think that's the only use that this technology will be put to.)
And how long until someone's lip movements are taken as a confession. Or as a justification for an otherwise-illegal search? The word "not" doesn't involve much movement of the lips. Lip-reading "I did not kill him." could easily look like "I did kill him." Will we be telling people that in order to stay clear of these things, we need to be more clear about our lip movements, just in case they're misconstrued?
Perhaps a stiff upper lip will give way evolutionarily to stiffening of both lips when talking, just as a form of personal protection. How sad. And worse if, as seems likely, dedicated criminals eventually learn the skill of not moving their lips while talking, and so that really only non-criminals become usefully tracked this way. Or perhaps it will become suspicious when one doesn't move one's lips, as it's probably inappropriately regarded by law enforcement as suspicious when one encrypts things. Then there will be the uncomfortable choice between hiding your communications and looking suspicious, or exposing your communications to misperception.
The data is out there. Lips convey meaning. So it's inevitable that this technology will occur. But the uses to which it may reasonably be put are in control of the people--at least in countries where the people have some say in government. Let's hope they build up some reasonable guidelines on appropriate vs inappropriate uses quickly.
Kent M Pitman
Philosopher, Technologist, Writer
If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.