Slashdot Mirror


The Challenges and Threats of Automated Lip Reading

An anonymous reader writes: Speech recognition has gotten pretty good over the past several years. it's reliable enough to be ubiquitous in our mobile devices. But now we have an interesting, related dilemma: should we develop algorithms that can lip read? It's a more challenging problem, to be sure. Sounds can be translated directly into words, but deriving meaning out of the movement of a person's face is much more complex. "During speech, the mouth forms between 10 and 14 different shapes, known as visemes. By contrast, speech contains around 50 individual sounds known as phonemes. So a single viseme can represent several different phonemes. And therein lies the problem. A sequence of visemes cannot usually be associated with a unique word or sequence of words. Instead, a sequence of visemes can have several different solutions." Beyond the computational aspect, we also need to decide, as a society, if this is a technology that should exist. The privacy implications extend beyond that of simple voice recognition.

1 of 120 comments (clear)

  1. Re:NSA probably already has this technology by Deep+Esophagus · · Score: 4, Interesting

    I'd be very surprised if the false positive rate were as low as 1%. Lip reading is NOT an exact science. It depends on context, clear line-of-sight, and how well the speaker enunciates. You'd be amazed how many phonemes sound different to our ears but look identical on the lips.

    But hey, I'll let these guys explain it much better. Bad Lip Reading

    Hilarious stuff, but the point is relevant: Without *any editing at all* of the actors' lips, they are able to perfectly match ridiculous words to those mouth movements. Why would automated software pick the "real" words over the BLR version?