Reading Lips In Software

← Back to Stories (view on slashdot.org)

Posted by timothy on Monday April 28, 2003 @10:36AM from the hey-cutie dept.

SEWilco writes "The Register points out that Intel has released code for reading lips from a video image, Audio Visual Speech Recognition (AVSR). They do point out that better results would probably be achieved by combining video and audio recognition processing. I don't know if they have any patents, we all know some prior "art" from 2001, er.. 1968. HAL's accomplishment was also mentioned by CNN during 2001 in an article about this group's work."

39 of 149 comments (clear)

Min score:

Reason:

Sort:

The only hope for privacy: by burgburgburg · 2003-04-28 10:40 · Score: 5, Funny

Thick mustaches.
Men and women, boys and girls. All with really thick, dirty, obscuring mustaches.
What is this world coming to?
1. Re:The only hope for privacy: by deadsaijinx* · 2003-04-28 11:05 · Score: 2, Funny
  
  No, this is the era of ventriliquisum (sp?), as the ventriliquistes will rise up against those who mocked them, their plans for unholy vengance will go unnoticed, as our only safety net collapses.
  
  --
  YOU SUCK BALLS!
Good or Evil? by Blaine+Hilton · 2003-04-28 10:41 · Score: 5, Insightful

That's all we need, now everybody and his brother can easily create software applications to log everything. Security cameras record a lot of movements, but imagine hooking that up to lip readers and then being able to grep through all of that text output? Total Information Awareness here we come......
Go calculate something
No new taxes! by Shadow+Wrought · 2003-04-28 10:41 · Score: 5, Funny

Oh wait, that was a different lip reading session...

--
If brevity is the soul of wit, then how does one explain Twitter?
What about changing what people say? by KPU · 2003-04-28 10:46 · Score: 2, Interesting

Anybody else reminded of the Read My Lips videos that fit clips to songs?
1. Re:What about changing what people say? by PeteEMT · 2003-04-28 12:40 · Score: 3, Informative
  
  I am deaf, and your pretty much right, at least some of the time. Without context, I find lipreading very hard to impossible, with context I can get maybe 80% of the words and can fill in the blanks.
  I know others can lipread better than I can but even in lipreading class they said that you wont be able to catch everything and have to fill in the blanks.
  
  Just to note, All Deaf people can't lipread and not all people can be lipread. Bushy Mustaches, not moving your mouth when you talk are two big obstacles. (a personal peeve when someone expects me to lipread them)
  
  --
  Pete
Woot, this is a godsend for us college students. by yeoua · 2003-04-28 10:46 · Score: 5, Funny

Maybe now with a cluster at our finger tips and this sound visual lip analyser thing, we may be able to (finally) understand what all those foreign heavy accented professors are actually mumbling about...

And well, beats manual note taking if the computer can read the board and his mouth and his voice.
So computers can now talk to themselves (Re /.) by skermit · 2003-04-28 10:50 · Score: 5, Interesting

A couple months ago, a very fine article was posted to /. about work at MIT regarding speech-->video synthesis using pre-recorded syllables. This means in the near future we'll be able to have avatars which an communicate to other people by videophone and/or other computers should we wish to do so. I'm reposting the old link because it got /.'ed for about 2 months (the professor took down the link) before putting the vids back up. So check out the amazing work that's on the flip-side of this article.

http://cerboli.mit.edu:8000/research/mary101/resul ts/results.html

--
-Christopher Wu
http://www.christopherwu.net/
1. Re:So computers can now talk to themselves (Re /.) by haroldhunt · 2003-04-28 11:45 · Score: 2, Interesting
  
  Great! So computers can talk to themselves but they still haven't got anything to say.
Body language by Smallpond · 2003-04-28 10:52 · Score: 4, Funny

Body language should be even easier than lip reading. I want to know if I'm wasting my time or whether I should invite her back to my place.
1. Re:Body language by Suchetha · 2003-04-28 11:15 · Score: 5, Funny
  
  simple.. you're posting on /. .. face it.. you're wasting your time
  
  Suchetha
  
  --
  
  learn from yesterday, plan for tomorrow, party tonight
  or one out of three ain't bad
Some coding expertise... by flamingspinach · 2003-04-28 10:53 · Score: 4, Insightful

Wow, that must have taken a lot of hard work to do. First you'd have to recognize the location of the lips in the images (they might not stand out that much, especially in a crowd scene), then find the region in which the lips are moving, then finally use the positions of the lips to extrapolate for the current shape of the inside of the person's mouth, and make a haphazard guess at the sound being produced. And you'd need to be able to recognize the lips from any angle whatsoever. Sounds near impossible to me... and besides, by the point at which the person is beyond the range of the audio pickup of a security camera (I'm assuming that's what this would be used for), it would also be beyond the point of bad resolution. (unless the target is in a crowd, in which case the lips would be obscured frequently by people moving around in front of the target).
1. Re:Some coding expertise... by flamingspinach · 2003-04-28 10:55 · Score: 2, Interesting
  
  Hey, and what about Chinese? Reading inflection would be near impossible, even if you looked at the person's voicebox (assuming it's visible).
2. Re:Some coding expertise... by Nihilanth · 2003-04-28 11:05 · Score: 3, Insightful
  
  yeah, a lot of asian languages rely on internal vowel sounds that make lip-reading nearly impossible. Maybe if they used lasers to measure the sound pressure waves, or vibrations of the voicebox in conjunction with the lipreading.
3. Re:Some coding expertise... by flamingspinach · 2003-04-28 11:10 · Score: 2, Interesting
  
  That second one could work, but can lasers measure pressure fluctuations? I would think that air wouldn't reflect a laser, and if one measures the pressure by the speed of light through the medium (high pressure will slow it down slightly), you'd need a reflector of some sort...
Planet Express Delivery Ship by luzrek · 2003-04-28 10:53 · Score: 2, Funny

Unlike HAL the Planet Express Delivery Ship cannot read lips.
Fry, Leela, and Bender are hiding out in the shower discussing how to turn of Planet Express Delivery Ship. The little red light is on, the screen is scrolling back and forth between the lips as Leela gives orders and Bender objects. Then the ship says, "Oh, if only I could read lips!"

--
Galium Arsenide is the material of the future, and always will be.
Orwellian p0ssibilities by asadodetira · 2003-04-28 10:55 · Score: 2, Insightful

Cameras randomly zooming on the lips of the crowd, if somebody says someting from some "list" of words, they keep tracking that person and make some face recognition also.
Not that 2001 ended up being very accurate... by DeadScreenSky · 2003-04-28 10:55 · Score: 5, Interesting

... but I think it is interesting that Arthur C. Clarke thought HAL reading lips was the only implausible scene in the film. You know, as opposed to the whole aliens thing. :P Just goes to show you the perils of trying to predict the future...

--
There is no excellent beauty that hath not some strangeness in the proportion. -- Francis Bacon
Sigh... by ScoLgo · 2003-04-28 10:55 · Score: 3, Interesting

Sigh... the signal to noise ratio alone is enough to lend you reasonable anonymity. There's just way too much information that would need to be grepped through in order to listen in on your dinner conversation. No one, (or their Big Brother), is going to bother unless they have a really good reason to be investigating you in the first place.

I'm thinking that the 'good' will outweigh the 'evil' here...

--
"Michael, I did nothing. I did absolutely nothing - and it was everything that I thought it could be."
1. Re:Sigh... by shaitand · 2003-04-28 11:37 · Score: 3, Interesting
  
  How about having it record everything it picks up and time coding it, so that you grep for the word "revolution" "bomb" "nuts itch"and then cross reference it to the time sequence in the video. This is then passed on to the FBI as routine policy for "the war on terror"
2. Re:Sigh... by ScoLgo · 2003-04-28 13:08 · Score: 2, Interesting
  
  Well, it's possible that my tinfoil hat is on crooked today...
  
  From the Reg article... "Intel's announcement implies that the system works better when coupled with facial recognition to identify 'known' speakers."
  
  Doesn't this imply that, at least for the foreseeable future, this technology won't be easily used as some general Orwellian tool? It sounds as though it needs to 'learn' each speaker - much like voice recognition software has to be trained to your voice before it can be used accurately.
  
  From the Intel link... "The speaker independent audio-visual continuous speech recognition system relies on a robust set of visual features obtained from the accurate detection and tracking of the mouth region."
  
  As mentioned by someone else in another thread, this system relies on a relatively uninterrupted view of the speaker's face. There are billions of people on this planet, all moving around willy-nilly and not worrying about holding still long enough for this technology to track their mouth movements. It's therefore just not feasible to apply this to public video 'eavesdropping'.
  
  It's more likely to be used in educational situations and for people with special needs, (automatic translation of seminar presentations for the deaf, perhaps?).
  
  As I already said, I can see this being used by government spooks to track certain individuals that are already under investigation - hopefully after getting a warrant.
  
  Of course, I could be wrong...
  
  --
  "Michael, I did nothing. I did absolutely nothing - and it was everything that I thought it could be."
Too late for me... by raehl · 2003-04-28 11:06 · Score: 3, Funny

I may have done better in my AI class if I was able to read lisp. All those damned parenthesis made life very difficult.

--
paintball
Oh yeah? Lip Read this! by Metallic+Matty · 2003-04-28 11:06 · Score: 2, Funny
Can it read this? by ektor · 2003-04-28 11:08 · Score: 2, Funny

No... more... taxes.
How do you think the court system would handle... by djoham · 2003-04-28 11:16 · Score: 2, Interesting

...someone recording to video a person *speaking* the source code of DeCSS and then using this tool in combination with gcc to generate libDVDCSS?

Would this tool then be declared a "circumvention device" under the DMCA, or would the courts finally realize that code can be considered protected speech? The code was, after all, spoken in its original form in this case.

This same question could also be applied to audio-to-text converters as well. Maybe there's hope the DMCA will be declared unconstitutional after all.

Interesting food for thought...

David
ig-pay atin-lay by bryanthompson · 2003-04-28 11:17 · Score: 4, Funny

ersonally-pay, i-ay(?) erfer-pay o-tay use pig latin.

geeze, that really wasn't worth the effort...
Re:Copyrighted Prior Art by Anonymous Coward · 2003-04-28 11:17 · Score: 3, Interesting

Just in case anyone gets the wrong idea here, copyrighted works cannot be used to contravene a patent.

erm, yes they can. In fact, the firm I work for specializes in that very thing.
Prior Art by cperciva · 2003-04-28 11:24 · Score: 3, Informative

Software and business model patents have evidently effected comprehension of what a patent entails.

"A computer, examining a set of video images, to perform lip reading" is not patentable. HAL would be prior art for this; but it doesn't matter because there isn't any inventive step here anyway.

"A computer, processing a set of video images by locating what appears to be a set of lips, selecting recognizable points, using the movement of those points to track the deformation against a 3D model, comparing against a table of syllables to compute the probability of each particular syllable, and using knowledge about a language to determine which syllables are most likely to follow each other" could be patented. HAL would not be prior art for this, because there is no indication of how HAL performed the lip reading.

--
Tarsnap: Online backups for the truly paranoid
Re:Anybody played with other languages? by deadsaijinx* · 2003-04-28 11:24 · Score: 2, Funny

yes, since there are so many people concerned with reading the lips of birds, especially since they don't have lips, or talk. yeah, consider this my karma burn for the evening.

--
YOU SUCK BALLS!
Re:finally! by MisterFancypants · 2003-04-28 11:25 · Score: 3, Funny

a reason to really hunker down and learn an obscure Chinese dialect.
Good plan... Oh wait, who would you talk to? Bad plan.
Re:Prior Art? by meringuoid · 2003-04-28 11:27 · Score: 4, Interesting

Did Clarke ever file a patent for the geosynchronous satellites?
No, he never did. If he had, he would almost certainly by now be far and away the richest man on the planet. Now, imagine if you will what Arthur Clarke might have done with a fortune that would make Gates green with envy... He'd have been on Mars twenty years ago.

--
Real Daleks don't climb stairs - they level the building.
Fox News by Jru+Hym · 2003-04-28 11:35 · Score: 2, Funny

It probably wouldn't work for Greta "Lips" Van Susteren

--
This lobster was alive when it hit the frothy, boiling water.
1. Re:Fox News by Jru+Hym · 2003-04-28 14:07 · Score: 2, Funny
  
  Next test subject: The Vagina Monologues
  
  --
  This lobster was alive when it hit the frothy, boiling water.
SF movies typically don't count as prior art... by GoBears · 2003-04-28 11:36 · Score: 4, Informative

I don't know if they have any patents, we all know some prior "art" from 2001, er.. 1968.
patents are supposed to be on inventions, not ideas. (very) generally speaking, you have to demonstrate you know how to do something for it to count as prior art. actually building something counts, as does a patent application (since the patent application has to explain how the invention works at a reasonable level of detail, for an admittedly arguable legal definition of reasonable).
ianal, but the last i heard, a mention in a science fiction book or movie wouldn't typically be considered prior art. a person skilled in the art can't tell from 2001 how to make a computer read lips.
Actually, this could be a major breakthrough by RhettLivingston · 2003-04-28 11:42 · Score: 3, Interesting

in speech recognition if it does no more than allow input from a camera to aid in separating out which sounds came from which speakers. Simply fixing the background noise problem would be a huge advance.
They certainly aren't the first by Omegalomaniac · 2003-04-28 12:45 · Score: 2, Informative

It's been done at Carnegie Mellon as well.
Oh yeah THAT'll work by graveyhead · 2003-04-28 13:28 · Score: 3, Funny

Lyndsey Nagle: Do I detect a note of sarcasm?
Frink: (With sarcasm detector) Are you kidding? This baby is off the charts mm-hai.
CBG: A sarcasm detector, that's a real useful invention.
(Sarcasm detector explodes)

--
std::disclaimer<std::legalese> sig=new std::disclaimer; sig->dump(); delete sig;
Sports by Dynastar454 · 2003-04-28 15:02 · Score: 2, Funny

I know what I want this for- I want to read the lips of all the coaches and players during basketball/baseball/whatever broadcasts. Maybe ESPN could offer this as a feature, censoring as needed. :-)

--

Laugh at stupidity: mod idiots +1 Funny.
Lipreading is a myth, as is this code working. by nloop · 2003-04-28 19:22 · Score: 4, Informative

I have taken many years of ASL classes and am pretty involved with Deaf culture; one of the biggest myths about it is peoples ability to read lips.

The idea most people have of lipreaders, like in the movie See No Evil Hear No Evil (Richard Pryor Gene Wilder comedy) or the Seinfeld lipreader episode just really isn't possible. Many sounds such as "t" and "d" look the exact same, and many such as "k" and "g" are not visible at all. The best lipreaders really can only get 2/3 of what is being said, (if they are entirely Deaf, which many Deaf people are not, if your hearing loss is not total it can be far more efective) and that is with the person speaking slowly, facing them, and human intuition (context). Throw in facial contortions, (like yelling... "they can't hear me so if I yell it will help") low light, bad angle, fast talking, etc. and the accuracy drops dramatically.

Computers lack the ability to figure out what word is being said based on context when the lips don't provide adequate information. They are also historically terribly poor at things like complex image recognition. Registration script busting is based on what? Image recognition with noise in the image (i.e. type the word that appears in the next form box) and no one has even come close to a functional computer ASL interpreter and ASL is far easier to disguish visibly than speech.

I don't see that 40% word error rate it is currently having being able to improve much at all, and I'm guessing the video feed that's off of isn't anything like fullspeed nonexagerated human speech.

Your fears of the video cameras on the streets logging your conversations are pretty unfounded ;)