Slashdot Mirror


Extracting Audio From Visual Information

rtoz writes Researchers at MIT, Microsoft, and Adobe have developed an algorithm that can reconstruct an audio signal by analyzing minute vibrations of objects depicted in video. In one set of experiments, they were able to recover intelligible speech from the vibrations of a potato-chip bag (video) photographed from 15 feet away through soundproof glass.

14 of 142 comments (clear)

  1. Not surprising by Z00L00K · · Score: 4, Insightful

    Measuring the vibrations of windows or other items was used already 40 to 50 years ago by spy agencies, so I wonder if this isn't something that has been re-discovered?

    --
    If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    1. Re:Not surprising by Z00L00K · · Score: 5, Informative

      To follow up, look at the Electromax Laser Listening Systems.

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    2. Re:Not surprising by Hamsterdan · · Score: 5, Insightful

      The countermeasure for laser listening was to install the windows inside a pipe *frame* and play music in the pipes. Using an object inside the building to extract audio defeats that countermeasure. This is 2014, do not expect any privacy, especially from government agencies...

      --
      I've got better things to do tonight than die.
    3. Re:Not surprising by JazzHarper · · Score: 4, Informative

      There is a very significant difference: this involves detecting vibrations in images of objects in a video recording rather than the objects themselves. However, not just any video will do; it requires a very high frame rate.

    4. Re:Not surprising by timeOday · · Score: 4, Insightful

      Well, even a normal microphone is "just" measuring the linear displacement of a membrane over time, so clearly the important distinction is how you measure it. A laser range-finder is different from a microphone, and a video camera is different from a laser range-finder.

    5. Re:Not surprising by fuzzyfuzzyfungus · · Score: 4, Funny

      Clearly, if your work is that important having a window office becomes a sign of extremely low status and institutional nonimportance, rather than professional advancement...

      (At least until they discover the guy spying on the basement dwellers with sophisticated seismometers)

    6. Re:Not surprising by doublebackslash · · Score: 4, Informative

      FTFA

      In other experiments, however, they used an ordinary digital camera. Because of a quirk in the design of most cameras’ sensors, the researchers were able to infer information about high-frequency vibrations even from video recorded at a standard 60 frames per second. While this audio reconstruction wasn’t as faithful as it was with the high-speed camera, it may still be good enough to identify the gender of a speaker in a room; the number of speakers; and even, given accurate enough information about the acoustic properties of speakers’ voices, their identities.

      They don't go into detail on the algorithm but reading between the lines it seems that they are using the spatial nature of video and the fact that not every pixel is captured at exactly the same moment (let alone each line) to ferret out higher frequency information. I have other guesses, but they are wild speculation. Either way VERY cool.

      --
      md5sum /boot/vmlinuz
      d41d8cd98f00b204e9800998ecf8427e /boot/vmlinuz
  2. Now my tin-foil hat... by Anonymous Coward · · Score: 5, Funny

    ...Needs a tin-foil hat!

    1. Re:Now my tin-foil hat... by JackieBrown · · Score: 4, Insightful

      The hat is a trick!

      The reason they want you to wear foil is so that the sound can bounce off it.

    2. Re:Now my tin-foil hat... by fuzzyfuzzyfungus · · Score: 4, Insightful

      Worse than that. If there's a metal foil involved, vibration measurement should be doable with RF as well as light. Only with a next generation reduced radar cross section geometries and RF absorbent materials can a truly secure tinfoil hat be constructed.

      Unfortunately, walking around with what appears to be a small F-117 attached to your head offers limited visual camouflage potential and may prove counterproductive in your attempts to avoid Their surveillance.

  3. Requires a very high speed camera by tepples · · Score: 4, Interesting

    The YouTube video captions state that this technique requires a camera capable of a few thousand frames per second. Thus this is pretty much using a camera to follow the vibrations, little different from a laser mic. What would impress me more is if they were able to pick up different frequencies from different parts of the bag with different resonant frequencies and reconstruct from standard 30 fps video using the bag as a transducer.

    1. Re:Requires a very high speed camera by interiot · · Score: 4, Insightful

      30 Hz is far below the Nyquist rate (6800 Hz, going by POTS specs), so no, that wouldn't be possible without some fundamental changes in our understanding of information theory and physics.

    2. Re:Requires a very high speed camera by blincoln · · Score: 5, Informative

      For some reason, the person who posted the article or the Slashdot editors linked to a bad knock-off video that removed 3/4 of the details instead of the actual researchers' video. The real video makes it clear that they can also get results from a standard DSLR 60 FPS video by taking advantage of the rolling shutter effect. There's a fidelity loss, but it's a lot better than I would have expected.

      --
      "...always new atoms but always doing the same dance, remembering what the dance was yesterday." -Richard Feynman
  4. Re:Been there done that by gsslay · · Score: 4, Funny

    Are they not doing this already in CSI? I'm sure I saw them enhance an office security video of a post-it note, reflected off a monitor screen, magnified a couple of times, and there they had it; complete dialog in stereo, with accompanying analysis of voice stress so they knew who was lying. Isn't science wonderful?