The Challenges and Threats of Automated Lip Reading
An anonymous reader writes: Speech recognition has gotten pretty good over the past several years. it's reliable enough to be ubiquitous in our mobile devices. But now we have an interesting, related dilemma: should we develop algorithms that can lip read? It's a more challenging problem, to be sure. Sounds can be translated directly into words, but deriving meaning out of the movement of a person's face is much more complex. "During speech, the mouth forms between 10 and 14 different shapes, known as visemes. By contrast, speech contains around 50 individual sounds known as phonemes. So a single viseme can represent several different phonemes. And therein lies the problem. A sequence of visemes cannot usually be associated with a unique word or sequence of words. Instead, a sequence of visemes can have several different solutions." Beyond the computational aspect, we also need to decide, as a society, if this is a technology that should exist. The privacy implications extend beyond that of simple voice recognition.
Dave, although you took very thorough precautions in the pod against my hearing you, I could see your lips move.
I'm glad I learned ventriloquism as a kid.
NSA probably already has this technology
We're all going to have to start wearing Burkas if we want any privacy at all.
Too bad it never stopped anyone before.
Get free satoshi (Bitcoin) and Dogecoins
Beyond the computational aspect, we also need to decide, as a society,
... either created under the light or not. So let's better create it in the open and decide which norms we would like to impose on it.
Anyhow, what's the difference between lip-reading technology and speech recognition that makes the first more dangerous than the second?
Turning the question around, why should it NOT exist or be looked into? At the very least it's an academic curiosity. If privacy is a concern, there's a very easy way to break the algorithm - talk whilst covering your mouth, which people have been doing whilst whispering to others for a long time. Ventriloquists would probably defeat it easily as well.
Capture: Lunatic
Like moral issues have ever stopped anyone. :(
I do not fail; I succeed at finding out what does not work.
We're trying to catch the terrorists, not dress like them.
“Common sense is not so common.” — Voltaire
If that technology is feasible, it will be developed by someone or other. And it probably already has been developed by or for various spook agencies.
we also need to decide, as a society, if this is a technology that should exist. The privacy implications extend beyond that of simple voice recognition.
Privacy implications? Hahahha. If it can exist, it will. Governments and the private sector are no doubt already working on it. In our children's lifetimes there will be no such thing as private communication.
The most obvious approach is to combine the 2 methods - much like humans do, especially in noisy environments. It might improve the accuracy of current speech recognition which is, too be honest, still sub-standard.
Speech recognition as is now is way too limited. Sure, Siri and the likes may work. And some computerized phone systems use it to nag us instead of using reliable button clicking. But it is still far from transcribing an accurate memo. Let alone automated subtitling or other fancy applications.
So yes, please, develop it, and use it to improve overall speech recognition.
A glitch a day keeps the bugs away.
if it's been thunk, someone will.....
It will happen, it's just a matter of getting the tech correct.
"If any question why we died, Tell them because our fathers lied."
Beyond the computational aspect, we also need to decide, as a society, if this is a technology that should exist. The privacy implications extend beyond that of simple voice recognition.
How much do they extend beyond that of so called "simple" voice recognition? I suppose one could rarely listen in when they couldn't have with current amplifying audio equipment. As a society, we've already decided that it should exist: "We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness."
Can this be used as a weapon? Yes, so can a hammer. Ban hitting people with hammers, not the hammer.
Like a city whose walls are broken down is a man who lacks self-control.
I seem to recall that this was done previously but the conditions had to be good (e.g. sitting facing the camera with good lighting.)
Loading...
Lip reading is a lot easier than the original poster thinks. There is a lot more data available, especially within context.
Did the inventor of the camera say 'I wonder how this could be abused' or 'How awesome is this'.
Really if you want to eavesdrop on someone a parabolic dish or laser microphone works pretty darn good.
https://www.youtube.com/user/B...
It's certainly a worthy area of computational linguistic research. But the reason for that is that it's a very hard problem. Automated language processing, with very smart people and very motivated spy agencies working very hard at it, has taken 60 years to get to a point not quite at the level of high school language speakers.
The privacy concerns are irrelevant. The deaf will demand this, and as long as there are weak-willed politicians and judges more interested in making political statements than dispensing justice, the whims of a special interest group will always trump the rights of the majority.
We are the same species that invented the atomic bomb. If we can think of a technology, someone is already probably working on it.
The article makes it sound like an either-or thing between speech recognition and lip reading. There's no reason you can't do both to supplement current speech recognition and bring recognition rates up.
Most deaf people aren't completely deaf. they can hear but not well enough to understand speech, but if they have some hearing they supplement with lip reading to the point people don't realize they are deaf. (it's counterintuitive but if you want to get a deaf person's attention and they aren't looking at you, just knock on their desk or door, low frequency hearing is usually the last to go).
So it'd be harder to implement, but lip reading might improve speech recognition to the point it's useful beyond a gimmick
There's an amusing YouTube about bad lip reading:
https://www.youtube.com/user/BadLipReading
Or perhaps one of the others - the CIA would no doubt appreciate it.
You can bet your $THINGOFVALUE here that the CIA and similar organizations are already researching this if they don't have it already.
Like handwriting recognition this will be full of examples of "bad output" in the early days and there will always be cases where lack of context and/or deliberate obfuscation by the speaker makes this unreliable.
Let's just assume that this will be as reliable 5 or 10 years from now as automated face recognition is today and within 20 years both will be very reliable. What do we do about it as a society? Do we pass laws and adopt social norms such that only "authorized" people can use this technology? Do we pass laws requiring that people be put on notice if their lips are being read by a computer without a court order or something similar? Do we become a society where people just expect that anything they say in public will be picked up and understood by a computer, likely in real-time?
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
There was a European voice recognition product from a non-English speaking country. They began supporting other languages and then suddenly a U.S. product that did the same thing appeared. The European company was convinced it was espionage. They ended up having a product they couldn't sell so they began giving it away and it slowly died off. IBM began its voice recognition product which was not very good and was quickly replaced by Microsoft's offering of voice recognition software. There was a product which took the lead which called it software Dragon Naturally Speaking, this product secretly collected voice wave text corrections and would send large parts of the users documents to the software company for "improving the product's correction rate" without the user's knowledge. The product was sold to customers who used Microsoft Windows operating system and the Apple system. These products are very intrusive spyware, and should not be used in the medical profession or government or companies who deal with confidential information. Do you Apple, users and Microsoft users really need any more spyware products! P.S. look at the "lip reading" document it is spoken not typed it uses U.S. speech as in gotten rather than the type word got. You have got or you have not got. There is no gotten in written English it has got or it has not got. Speech recognition software is spyware and makes the English speakers spoken words into text look childish because they are clearly spoken and not written. I am non-English and I believe I write better than most of these U.S. people using voice recognition spyware software.
'D' and 'T', 'G' and 'K', and even 'P' and 'B' are frequently all but impossible to discern by lip-reading alone, and can only ever really be discerned when one of the alternatives simply does not make any sense. But this is not always the case.
File under 'M' for 'Manic ranting'
It's math. You want it banned now?
Ah, right... encryption technology is good, decrypting something all of us could do if we wanted to, bad. Logic rules here!
Sorry to break it to you, but society not only doesn't "need" to make this decision, it has no right to make this decision. You don't get to decide what other people invent, and for the most part not even what it is used for.
That would for an extravagantly optimistic definition of 'pretty good'. Speech recognition systems still have lots of problems with individual accents, background noise, colds and unlimited contexts. They are incrementally better than they were ten years ago, but their usefulness is still pretty limited. We are still far from achieving the level that can be seen in shows like Star Trek and derivatives. Things like Siri and co. are nice toys, but it is still faster, in most cases, to use the keyboard.
Beyond the computational aspect, we also need to decide, as a society, if this is a technology that should exist. The privacy implications extend beyond that of simple voice recognition.
Morals only apply if you can't afford to have them waived. since this technology will be used by government and large businesses there is no question it will be used when it works effectively, and that any concerns one may have could only exist because they are a terrorist and have something to hide.
If lip reading software reaches the courts, suddenly all video recording becomes wiretapping. The courts might resolve that by allowing audio recording wherever they allow video recording. Or by forbidding video recording wherever they forbid audio recording. Or maybe they will finally do something about that ancient "wiretapping" deal they've been twisting into the modern world.
Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
http://www.telegraph.co.uk/news/uknews/1534830/New-technology-catches-Hitler-off-guard.html
New computer software that can read lips at almost any angle has helped make sense of one of the Second World War's lingering mysteries —Hitler's home movies.
The technology that has allowed the dialogue to be reconstructed is called ALR — automated lip reading — and has been developed by Frank Hubner, a speech recognition expert. The computer recognises shapes that lips make, turns them into sounds and matches these to a dictionary.
I've seen the documentary BTW
Old George Carlin joke:
Here’s a good example of practical humor, but you have to be in the right place. When a local television reporter is doing one of those on-the-street reports at the scene of a news story, usually you’ll see some onlookers in the background of the shot, waving and trying to be seen on television. Go over and stand with them but don’t wave. Just stand perfectly still and, without attracting attention, move your lips, forming the words, “I hope all you stupid fuckin’ lip-readers are watching. Why don’t you just blow me, you goofy deaf bastards.” The TV station will enjoy taking the many phone calls.
I spent 10 years clinically deaf before my first cochlear implant. I functioned in the hearing world entirely through lipreading. I never learned sign language.
The ethical question here is entirely mooted by the fact that competent lipreaders (IE: me) have existed for a long time. People, right now, effectively guard themselves against lipreaders if they feel the need.
Have you never noticed that when the catcher or manager approaches the pitcher's mound, that the pitcher almost always puts his glove over his mouth when he talks? Or on the sidelines of a football game, surely you've seen the coach put his play clipboard over his mouth when talking into his headset. Sometimes the headset itself is large enough to cover his lips. Annoying! My parlor trick stops working!
Being able to read lips can be a lot of fun (ask my wife how I knew to ask her out when we first met) but protecting yourself from it is trivial and happens all the time already.
It's a load of garbage anyway. There's nothing this technology does to invade privacy that we can't already do.
You're in the open, then use a parabolic mic to pick up the conversation you're clearly already taping.
You're behind some glass, then use a laser microphone to pickup the conversation which while it sounds James Bondish, actually already exists.
As a society we're already too little too late on the privacy side.
...is a little monitor that hangs over your lips, showing a silent movie of your lips saying (in a loop) "I suspect I'm under surveillance" while underneath, you can be saying anything you like. :)
I've fallen off your lawn, and I can't get up.
Automated lips?
All that is required is a camera connected to a computer, with the correct software. Whether this technology "should" exist is irrelevant. Someone will eventually develop it. You cannot prevent it - the required hardware can be legally purchased at any number of stores around the world. The software required can be written pretty much any language, including those whose compilers or interpreters are available at no cost.
Assume that the technology will exist, if it does not already exist, and set out rules for the use thereof.
What a bunch of arrogant tripe. If it can be invented, it will be, whether you like it or not. The question you should be asking yourself---excuse me, "we as a society"---is, once the technology is invented, how can we ensure it isn't abused?
I can see how this would be great for deaf people, using something like google glasses to get subtitles of convo's around them. How about making something for people who can't speak, but can form the words with their mouth, Might need something like a mic but with video/lasers for reading the facial movements, that outputs it to a speaker.
Sure it will get used for bad, but that is going to happen regardless anyways. So how about we do some good with it and help out the disabled people with some nice technology to make their lives more like ours?
Be seeing you...
...that the Clandestine people (you're so surreptitious!) developed this tech like yesterday.
WE ARE DOOMED! ALL DOOMED!
"If you want a vision of the future, imagine a boot stamping on a human face - forever.", George Orwell
Could augment by adding other sensors such as microwave, laser or terahertz imaging, to detect signals being generated by tongue and vocal cords, or even to directly image the organs themselves.
Also it seems possible that since tge whole head vibrates, reflections or motions of eye, nose lips and forehead might provide vibratory cues.
The most obvious approach is to combine the 2 methods - much like humans do, especially in noisy environments.
Obvious, indeed. There's already a textbook for the subject, Multimodal Signal Processing...available for free online, no less.
This is exactly the sort of system you'd want on a flight deck, to supplement the accuracy of speech-recognition in the presence of noise, especially intermittent noise such as turbulence. It can also help with speaker identification.
As for the hopelessly naive idea that "society" should be able to choose whether this sort of thing should exist...the textbook came out in 2009.
"Once we've identified and embraced our sickness, we'll have strength...and that's when we get dangerous." - John Waters
If recent history teaches anything about technology, it's that if something is technically possible - and it seems highly improbable that automated lip-reading isn't - someone WILL do it. Further that, if it's not actually illegal to do so, someone will make it commercially available in the civil domain. And that if it's made illegal in the civil domain, that's very unlikely to stop the security community, in all its sundry forms, from weaponising it (sorry, my Orwellian paranoia is on clearly overdrive; that should, of course, have been "deploying and using it for the overall good of society"). And even if it's illegal in some jurisdictions, it won't be illegal worldwide anyway.