Reading Lips In Software
SEWilco writes "The Register points out that Intel has released code for reading lips from a video image, Audio Visual Speech Recognition (AVSR). They do point out that better results would probably be achieved by combining video and audio recognition processing. I don't know if they have any patents, we all know some prior "art" from 2001, er.. 1968. HAL's accomplishment was also mentioned by CNN during 2001 in an article about this group's work."
I don't know if they have any patents, we all know some prior "art" from 2001
Just in case anyone gets the wrong idea here, copyrighted works cannot be used to contravene a patent.
Software and business model patents have evidently effected comprehension of what a patent entails.
"A computer, examining a set of video images, to perform lip reading" is not patentable. HAL would be prior art for this; but it doesn't matter because there isn't any inventive step here anyway.
"A computer, processing a set of video images by locating what appears to be a set of lips, selecting recognizable points, using the movement of those points to track the deformation against a 3D model, comparing against a table of syllables to compute the probability of each particular syllable, and using knowledge about a language to determine which syllables are most likely to follow each other" could be patented. HAL would not be prior art for this, because there is no indication of how HAL performed the lip reading.
Tarsnap: Online backups for the truly paranoid
patents are supposed to be on inventions, not ideas. (very) generally speaking, you have to demonstrate you know how to do something for it to count as prior art. actually building something counts, as does a patent application (since the patent application has to explain how the invention works at a reasonable level of detail, for an admittedly arguable legal definition of reasonable).
ianal, but the last i heard, a mention in a science fiction book or movie wouldn't typically be considered prior art. a person skilled in the art can't tell from 2001 how to make a computer read lips.
I am deaf, and your pretty much right, at least some of the time. Without context, I find lipreading very hard to impossible, with context I can get maybe 80% of the words and can fill in the blanks.
I know others can lipread better than I can but even in lipreading class they said that you wont be able to catch everything and have to fill in the blanks.
Just to note, All Deaf people can't lipread and not all people can be lipread. Bushy Mustaches, not moving your mouth when you talk are two big obstacles. (a personal peeve when someone expects me to lipread them)
Pete
It's been done at Carnegie Mellon as well.
I have taken many years of ASL classes and am pretty involved with Deaf culture; one of the biggest myths about it is peoples ability to read lips.
;)
The idea most people have of lipreaders, like in the movie See No Evil Hear No Evil (Richard Pryor Gene Wilder comedy) or the Seinfeld lipreader episode just really isn't possible. Many sounds such as "t" and "d" look the exact same, and many such as "k" and "g" are not visible at all. The best lipreaders really can only get 2/3 of what is being said, (if they are entirely Deaf, which many Deaf people are not, if your hearing loss is not total it can be far more efective) and that is with the person speaking slowly, facing them, and human intuition (context). Throw in facial contortions, (like yelling... "they can't hear me so if I yell it will help") low light, bad angle, fast talking, etc. and the accuracy drops dramatically.
Computers lack the ability to figure out what word is being said based on context when the lips don't provide adequate information. They are also historically terribly poor at things like complex image recognition. Registration script busting is based on what? Image recognition with noise in the image (i.e. type the word that appears in the next form box) and no one has even come close to a functional computer ASL interpreter and ASL is far easier to disguish visibly than speech.
I don't see that 40% word error rate it is currently having being able to improve much at all, and I'm guessing the video feed that's off of isn't anything like fullspeed nonexagerated human speech.
Your fears of the video cameras on the streets logging your conversations are pretty unfounded