Reading Lips In Software
SEWilco writes "The Register points out that Intel has released code for reading lips from a video image, Audio Visual Speech Recognition (AVSR). They do point out that better results would probably be achieved by combining video and audio recognition processing. I don't know if they have any patents, we all know some prior "art" from 2001, er.. 1968. HAL's accomplishment was also mentioned by CNN during 2001 in an article about this group's work."
Men and women, boys and girls. All with really thick, dirty, obscuring mustaches.
What is this world coming to?
Go calculate something
Oh wait, that was a different lip reading session...
If brevity is the soul of wit, then how does one explain Twitter?
Maybe now with a cluster at our finger tips and this sound visual lip analyser thing, we may be able to (finally) understand what all those foreign heavy accented professors are actually mumbling about...
And well, beats manual note taking if the computer can read the board and his mouth and his voice.
A couple months ago, a very fine article was posted to /. about work at MIT regarding speech-->video synthesis using pre-recorded syllables. This means in the near future we'll be able to have avatars which an communicate to other people by videophone and/or other computers should we wish to do so. I'm reposting the old link because it got /.'ed for about 2 months (the professor took down the link) before putting the vids back up. So check out the amazing work that's on the flip-side of this article.
l ts/results.html
http://cerboli.mit.edu:8000/research/mary101/resu
-Christopher Wu
http://www.christopherwu.net/
Body language should be even easier than lip reading. I want to know if I'm wasting my time or whether I should invite her back to my place.
Wow, that must have taken a lot of hard work to do. First you'd have to recognize the location of the lips in the images (they might not stand out that much, especially in a crowd scene), then find the region in which the lips are moving, then finally use the positions of the lips to extrapolate for the current shape of the inside of the person's mouth, and make a haphazard guess at the sound being produced. And you'd need to be able to recognize the lips from any angle whatsoever. Sounds near impossible to me... and besides, by the point at which the person is beyond the range of the audio pickup of a security camera (I'm assuming that's what this would be used for), it would also be beyond the point of bad resolution. (unless the target is in a crowd, in which case the lips would be obscured frequently by people moving around in front of the target).
... but I think it is interesting that Arthur C. Clarke thought HAL reading lips was the only implausible scene in the film. You know, as opposed to the whole aliens thing. :P Just goes to show you the perils of trying to predict the future...
There is no excellent beauty that hath not some strangeness in the proportion. -- Francis Bacon
ersonally-pay, i-ay(?) erfer-pay o-tay use pig latin.
geeze, that really wasn't worth the effort...
No, he never did. If he had, he would almost certainly by now be far and away the richest man on the planet. Now, imagine if you will what Arthur Clarke might have done with a fortune that would make Gates green with envy... He'd have been on Mars twenty years ago.
Real Daleks don't climb stairs - they level the building.
patents are supposed to be on inventions, not ideas. (very) generally speaking, you have to demonstrate you know how to do something for it to count as prior art. actually building something counts, as does a patent application (since the patent application has to explain how the invention works at a reasonable level of detail, for an admittedly arguable legal definition of reasonable).
ianal, but the last i heard, a mention in a science fiction book or movie wouldn't typically be considered prior art. a person skilled in the art can't tell from 2001 how to make a computer read lips.
I have taken many years of ASL classes and am pretty involved with Deaf culture; one of the biggest myths about it is peoples ability to read lips.
;)
The idea most people have of lipreaders, like in the movie See No Evil Hear No Evil (Richard Pryor Gene Wilder comedy) or the Seinfeld lipreader episode just really isn't possible. Many sounds such as "t" and "d" look the exact same, and many such as "k" and "g" are not visible at all. The best lipreaders really can only get 2/3 of what is being said, (if they are entirely Deaf, which many Deaf people are not, if your hearing loss is not total it can be far more efective) and that is with the person speaking slowly, facing them, and human intuition (context). Throw in facial contortions, (like yelling... "they can't hear me so if I yell it will help") low light, bad angle, fast talking, etc. and the accuracy drops dramatically.
Computers lack the ability to figure out what word is being said based on context when the lips don't provide adequate information. They are also historically terribly poor at things like complex image recognition. Registration script busting is based on what? Image recognition with noise in the image (i.e. type the word that appears in the next form box) and no one has even come close to a functional computer ASL interpreter and ASL is far easier to disguish visibly than speech.
I don't see that 40% word error rate it is currently having being able to improve much at all, and I'm guessing the video feed that's off of isn't anything like fullspeed nonexagerated human speech.
Your fears of the video cameras on the streets logging your conversations are pretty unfounded