The Challenges and Threats of Automated Lip Reading

← Back to Stories (view on slashdot.org)

The Challenges and Threats of Automated Lip Reading

Posted by Soulskill on Saturday September 13, 2014 @03:45AM from the surgical-masks-become-high-fashion-in-2018 dept.

An anonymous reader writes: Speech recognition has gotten pretty good over the past several years. it's reliable enough to be ubiquitous in our mobile devices. But now we have an interesting, related dilemma: should we develop algorithms that can lip read? It's a more challenging problem, to be sure. Sounds can be translated directly into words, but deriving meaning out of the movement of a person's face is much more complex. "During speech, the mouth forms between 10 and 14 different shapes, known as visemes. By contrast, speech contains around 50 individual sounds known as phonemes. So a single viseme can represent several different phonemes. And therein lies the problem. A sequence of visemes cannot usually be associated with a unique word or sequence of words. Instead, a sequence of visemes can have several different solutions." Beyond the computational aspect, we also need to decide, as a society, if this is a technology that should exist. The privacy implications extend beyond that of simple voice recognition.

18 of 120 comments (clear)

Min score:

Reason:

Sort:

HAL 9000 by tchuladdiass · 2014-09-13 03:50 · Score: 5, Funny

Dave, although you took very thorough precautions in the pod against my hearing you, I could see your lips move.
Thanks Jerry Mahoney! by SternisheFan · 2014-09-13 03:50 · Score: 2

I'm glad I learned ventriloquism as a kid.
Jesus H Christ! by mark_reh · 2014-09-13 03:52 · Score: 4, Insightful

We're all going to have to start wearing Burkas if we want any privacy at all.
1. Re:Jesus H Christ! by Dr_Barnowl · 2014-09-13 04:59 · Score: 2
  
  More like CV Dazzle
  A burkha will get you "profiled". Weird hair and makeup is a fasion statement.
Too bad by ArcadeMan · 2014-09-13 03:53 · Score: 5, Insightful

Beyond the computational aspect, we also need to decide, as a society, if this is a technology that should exist.
Too bad it never stopped anyone before.

--
Get free satoshi (Bitcoin) and Dogecoins
How Naive by Tanuki64 · 2014-09-13 03:54 · Score: 5, Insightful

Beyond the computational aspect, we also need to decide, as a society,
1. Re:How Naive by nbauman · 2014-09-13 04:18 · Score: 2
  
  If we don't get it, the terrorists will get it first.
Re:Why should it NOT exist? by SternisheFan · 2014-09-13 04:01 · Score: 4, Insightful

related dilemma: should we develop algorithms that can lip read? Of course we should, we should develop any tech. The real question is, will it be used for moral or immoral purposes?
Pfft by msobkow · 2014-09-13 04:09 · Score: 3, Insightful

Like moral issues have ever stopped anyone. :(

--
I do not fail; I succeed at finding out what does not work.
Re:NSA probably already has this technology by Deep+Esophagus · 2014-09-13 04:10 · Score: 4, Interesting

I'd be very surprised if the false positive rate were as low as 1%. Lip reading is NOT an exact science. It depends on context, clear line-of-sight, and how well the speaker enunciates. You'd be amazed how many phonemes sound different to our ears but look identical on the lips.
But hey, I'll let these guys explain it much better. Bad Lip Reading
Hilarious stuff, but the point is relevant: Without *any editing at all* of the actors' lips, they are able to perfectly match ridiculous words to those mouth movements. Why would automated software pick the "real" words over the BLR version?
It's already been decided.... by Aldenissin · 2014-09-13 04:17 · Score: 2

Beyond the computational aspect, we also need to decide, as a society, if this is a technology that should exist. The privacy implications extend beyond that of simple voice recognition.
How much do they extend beyond that of so called "simple" voice recognition? I suppose one could rarely listen in when they couldn't have with current amplifying audio equipment. As a society, we've already decided that it should exist: "We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness."
Can this be used as a weapon? Yes, so can a hammer. Ban hitting people with hammers, not the hammer.

--
Like a city whose walls are broken down is a man who lacks self-control.
Re:Why should it NOT exist? by nbauman · 2014-09-13 04:17 · Score: 2

Grow a big moustache.
Re: Why should it NOT exist? by Jeremiah+Cornelius · 2014-09-13 04:22 · Score: 3, Insightful

Governments and corporations are fictional persons. They have no "moral consciousness" of any kind, outside of rhetorical and ideological fantasy.
So, this will not be a question of moral or immoral use. It will be amoral, in the hands of those who have advanced themselves through manipulation of the aforementioned ideological rhetoric.
You continue to believe that there is hope for this modern, post-industrial society. But there is none. We as people have increased the sophistication of our tools and our reach - just as relentlessly as we have avoided the refinement of our own beings.
In the end you don't get Star Trek. You don't even get Starship Troopers. You get Scanner, Darkly And hope there is Valis.

--
"Flyin' in just a sweet place,
Never been known to fail..."
Easier than you think. by pubwvj · 2014-09-13 04:27 · Score: 2

Lip reading is a lot easier than the original poster thinks. There is a lot more data available, especially within context.
1. Re:Easier than you think. by Oligonicella · 2014-09-13 07:16 · Score: 2
  
  Try it from across the room and you don't know what the conversation is about. Do it at a bar looking for people using pick up lines and you'll get false positives. As for context, try to figure out how to inject that into the reading algorithms.
Re: This technology *will* exist... by ClickOnThis · 2014-09-13 04:40 · Score: 2

There's lots of cameras deployed without microphones. Also pretty sure sound doesn't make it to geosynchronous orbit strata of the atmosphere...
You're implying we could read lips from GEO. Good luck with that. Even if the Hubble Space Telescope (which is at low earth orbit, not geosynchronous) were pointed at the earth, the best resolution you could manage would be about 30 cm.
http://www.spacetelescope.org/...
https://what-if.xkcd.com/32/
In theory it might be possible to read lips at GEO, but you'd need a HUGE telescope, or smaller binocular-configured telescopes with a wide-enough baseline, to get the job done.
And nitpick: there's really no "strata of the atmosphere" at GEO. Contributions there from the Earth's atmosphere are miniscule. It's pretty much plasma and magnetosphere from a few hundred km altitude on upwards.

--
If it weren't for deadlines, nothing would be late.
Re:Combined by Animats · 2014-09-13 07:12 · Score: 3, Insightful

The most obvious approach is to combine the 2 methods - much like humans do, especially in noisy environments.

Right. Especially since, when you're looking at your smartphone, it's looking back at you.
This would be valuable for vehicle driver speech input, which has to reject a lot of noise.
Re:NSA probably already has this technology by jonbryce · 2014-09-13 07:33 · Score: 2

You can't for example tell the difference between "nine" and "ten" by lip reading, and often either could be equally likely in the context.