The Challenges and Threats of Automated Lip Reading
An anonymous reader writes: Speech recognition has gotten pretty good over the past several years. it's reliable enough to be ubiquitous in our mobile devices. But now we have an interesting, related dilemma: should we develop algorithms that can lip read? It's a more challenging problem, to be sure. Sounds can be translated directly into words, but deriving meaning out of the movement of a person's face is much more complex. "During speech, the mouth forms between 10 and 14 different shapes, known as visemes. By contrast, speech contains around 50 individual sounds known as phonemes. So a single viseme can represent several different phonemes. And therein lies the problem. A sequence of visemes cannot usually be associated with a unique word or sequence of words. Instead, a sequence of visemes can have several different solutions." Beyond the computational aspect, we also need to decide, as a society, if this is a technology that should exist. The privacy implications extend beyond that of simple voice recognition.
Dave, although you took very thorough precautions in the pod against my hearing you, I could see your lips move.
I'm glad I learned ventriloquism as a kid.
We're all going to have to start wearing Burkas if we want any privacy at all.
Too bad it never stopped anyone before.
Get free satoshi (Bitcoin) and Dogecoins
Beyond the computational aspect, we also need to decide, as a society,
related dilemma: should we develop algorithms that can lip read? Of course we should, we should develop any tech. The real question is, will it be used for moral or immoral purposes?
Like moral issues have ever stopped anyone. :(
I do not fail; I succeed at finding out what does not work.
I'd be very surprised if the false positive rate were as low as 1%. Lip reading is NOT an exact science. It depends on context, clear line-of-sight, and how well the speaker enunciates. You'd be amazed how many phonemes sound different to our ears but look identical on the lips.
But hey, I'll let these guys explain it much better. Bad Lip Reading
Hilarious stuff, but the point is relevant: Without *any editing at all* of the actors' lips, they are able to perfectly match ridiculous words to those mouth movements. Why would automated software pick the "real" words over the BLR version?
Beyond the computational aspect, we also need to decide, as a society, if this is a technology that should exist. The privacy implications extend beyond that of simple voice recognition.
How much do they extend beyond that of so called "simple" voice recognition? I suppose one could rarely listen in when they couldn't have with current amplifying audio equipment. As a society, we've already decided that it should exist: "We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness."
Can this be used as a weapon? Yes, so can a hammer. Ban hitting people with hammers, not the hammer.
Like a city whose walls are broken down is a man who lacks self-control.
Grow a big moustache.
Governments and corporations are fictional persons. They have no "moral consciousness" of any kind, outside of rhetorical and ideological fantasy.
So, this will not be a question of moral or immoral use. It will be amoral, in the hands of those who have advanced themselves through manipulation of the aforementioned ideological rhetoric.
You continue to believe that there is hope for this modern, post-industrial society. But there is none. We as people have increased the sophistication of our tools and our reach - just as relentlessly as we have avoided the refinement of our own beings.
In the end you don't get Star Trek. You don't even get Starship Troopers. You get Scanner, Darkly And hope there is Valis.
"Flyin' in just a sweet place,
Never been known to fail..."
Lip reading is a lot easier than the original poster thinks. There is a lot more data available, especially within context.
There's lots of cameras deployed without microphones. Also pretty sure sound doesn't make it to geosynchronous orbit strata of the atmosphere...
You're implying we could read lips from GEO. Good luck with that. Even if the Hubble Space Telescope (which is at low earth orbit, not geosynchronous) were pointed at the earth, the best resolution you could manage would be about 30 cm.
http://www.spacetelescope.org/...
https://what-if.xkcd.com/32/
In theory it might be possible to read lips at GEO, but you'd need a HUGE telescope, or smaller binocular-configured telescopes with a wide-enough baseline, to get the job done.
And nitpick: there's really no "strata of the atmosphere" at GEO. Contributions there from the Earth's atmosphere are miniscule. It's pretty much plasma and magnetosphere from a few hundred km altitude on upwards.
If it weren't for deadlines, nothing would be late.
The most obvious approach is to combine the 2 methods - much like humans do, especially in noisy environments.
Right. Especially since, when you're looking at your smartphone, it's looking back at you.
This would be valuable for vehicle driver speech input, which has to reject a lot of noise.
You can't for example tell the difference between "nine" and "ten" by lip reading, and often either could be equally likely in the context.