Slashdot Mirror


Reading Lips In Software

SEWilco writes "The Register points out that Intel has released code for reading lips from a video image, Audio Visual Speech Recognition (AVSR). They do point out that better results would probably be achieved by combining video and audio recognition processing. I don't know if they have any patents, we all know some prior "art" from 2001, er.. 1968. HAL's accomplishment was also mentioned by CNN during 2001 in an article about this group's work."

13 of 149 comments (clear)

  1. What about changing what people say? by KPU · · Score: 2, Interesting

    Anybody else reminded of the Read My Lips videos that fit clips to songs?

  2. So computers can now talk to themselves (Re /.) by skermit · · Score: 5, Interesting

    A couple months ago, a very fine article was posted to /. about work at MIT regarding speech-->video synthesis using pre-recorded syllables. This means in the near future we'll be able to have avatars which an communicate to other people by videophone and/or other computers should we wish to do so. I'm reposting the old link because it got /.'ed for about 2 months (the professor took down the link) before putting the vids back up. So check out the amazing work that's on the flip-side of this article.

    http://cerboli.mit.edu:8000/research/mary101/resul ts/results.html

    --
    -Christopher Wu
    http://www.christopherwu.net/
    1. Re:So computers can now talk to themselves (Re /.) by haroldhunt · · Score: 2, Interesting

      Great! So computers can talk to themselves but they still haven't got anything to say.

  3. Not that 2001 ended up being very accurate... by DeadScreenSky · · Score: 5, Interesting

    ... but I think it is interesting that Arthur C. Clarke thought HAL reading lips was the only implausible scene in the film. You know, as opposed to the whole aliens thing. :P Just goes to show you the perils of trying to predict the future...

    --
    There is no excellent beauty that hath not some strangeness in the proportion. -- Francis Bacon
  4. Sigh... by ScoLgo · · Score: 3, Interesting

    Sigh... the signal to noise ratio alone is enough to lend you reasonable anonymity. There's just way too much information that would need to be grepped through in order to listen in on your dinner conversation. No one, (or their Big Brother), is going to bother unless they have a really good reason to be investigating you in the first place.

    I'm thinking that the 'good' will outweigh the 'evil' here...

    --
    "Michael, I did nothing. I did absolutely nothing - and it was everything that I thought it could be."
    1. Re:Sigh... by shaitand · · Score: 3, Interesting

      How about having it record everything it picks up and time coding it, so that you grep for the word "revolution" "bomb" "nuts itch"and then cross reference it to the time sequence in the video. This is then passed on to the FBI as routine policy for "the war on terror"

    2. Re:Sigh... by ScoLgo · · Score: 2, Interesting

      Well, it's possible that my tinfoil hat is on crooked today...

      From the Reg article... "Intel's announcement implies that the system works better when coupled with facial recognition to identify 'known' speakers."

      Doesn't this imply that, at least for the foreseeable future, this technology won't be easily used as some general Orwellian tool? It sounds as though it needs to 'learn' each speaker - much like voice recognition software has to be trained to your voice before it can be used accurately.

      From the Intel link... "The speaker independent audio-visual continuous speech recognition system relies on a robust set of visual features obtained from the accurate detection and tracking of the mouth region."

      As mentioned by someone else in another thread, this system relies on a relatively uninterrupted view of the speaker's face. There are billions of people on this planet, all moving around willy-nilly and not worrying about holding still long enough for this technology to track their mouth movements. It's therefore just not feasible to apply this to public video 'eavesdropping'.

      It's more likely to be used in educational situations and for people with special needs, (automatic translation of seminar presentations for the deaf, perhaps?).

      As I already said, I can see this being used by government spooks to track certain individuals that are already under investigation - hopefully after getting a warrant.

      Of course, I could be wrong...

      --
      "Michael, I did nothing. I did absolutely nothing - and it was everything that I thought it could be."
  5. Re:Some coding expertise... by flamingspinach · · Score: 2, Interesting

    Hey, and what about Chinese? Reading inflection would be near impossible, even if you looked at the person's voicebox (assuming it's visible).

  6. Re:Some coding expertise... by flamingspinach · · Score: 2, Interesting

    That second one could work, but can lasers measure pressure fluctuations? I would think that air wouldn't reflect a laser, and if one measures the pressure by the speed of light through the medium (high pressure will slow it down slightly), you'd need a reflector of some sort...

  7. How do you think the court system would handle... by djoham · · Score: 2, Interesting

    ...someone recording to video a person *speaking* the source code of DeCSS and then using this tool in combination with gcc to generate libDVDCSS?

    Would this tool then be declared a "circumvention device" under the DMCA, or would the courts finally realize that code can be considered protected speech? The code was, after all, spoken in its original form in this case.

    This same question could also be applied to audio-to-text converters as well. Maybe there's hope the DMCA will be declared unconstitutional after all.

    Interesting food for thought...

    David

  8. Re:Copyrighted Prior Art by Anonymous Coward · · Score: 3, Interesting


    Just in case anyone gets the wrong idea here, copyrighted works cannot be used to contravene a patent.

    erm, yes they can. In fact, the firm I work for specializes in that very thing.

  9. Re:Prior Art? by meringuoid · · Score: 4, Interesting
    Did Clarke ever file a patent for the geosynchronous satellites?

    No, he never did. If he had, he would almost certainly by now be far and away the richest man on the planet. Now, imagine if you will what Arthur Clarke might have done with a fortune that would make Gates green with envy... He'd have been on Mars twenty years ago.

    --
    Real Daleks don't climb stairs - they level the building.
  10. Actually, this could be a major breakthrough by RhettLivingston · · Score: 3, Interesting

    in speech recognition if it does no more than allow input from a camera to aid in separating out which sounds came from which speakers. Simply fixing the background noise problem would be a huge advance.