Slashdot Mirror


Extracting Audio From Visual Information

rtoz writes Researchers at MIT, Microsoft, and Adobe have developed an algorithm that can reconstruct an audio signal by analyzing minute vibrations of objects depicted in video. In one set of experiments, they were able to recover intelligible speech from the vibrations of a potato-chip bag (video) photographed from 15 feet away through soundproof glass.

142 comments

  1. Not surprising by Z00L00K · · Score: 4, Insightful

    Measuring the vibrations of windows or other items was used already 40 to 50 years ago by spy agencies, so I wonder if this isn't something that has been re-discovered?

    --
    If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    1. Re:Not surprising by Z00L00K · · Score: 5, Informative

      To follow up, look at the Electromax Laser Listening Systems.

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    2. Re:Not surprising by Anonymous Coward · · Score: 1

      Bet that works really well on video.

    3. Re:Not surprising by Hamsterdan · · Score: 5, Insightful

      The countermeasure for laser listening was to install the windows inside a pipe *frame* and play music in the pipes. Using an object inside the building to extract audio defeats that countermeasure. This is 2014, do not expect any privacy, especially from government agencies...

      --
      I've got better things to do tonight than die.
    4. Re:Not surprising by JazzHarper · · Score: 4, Informative

      There is a very significant difference: this involves detecting vibrations in images of objects in a video recording rather than the objects themselves. However, not just any video will do; it requires a very high frame rate.

    5. Re:Not surprising by Anonymous Coward · · Score: 2, Funny

      When high frame rate cameras are outlawed, only outlaws will have high frame rate cameras..............

    6. Re:Not surprising by timeOday · · Score: 4, Insightful

      Well, even a normal microphone is "just" measuring the linear displacement of a membrane over time, so clearly the important distinction is how you measure it. A laser range-finder is different from a microphone, and a video camera is different from a laser range-finder.

    7. Re:Not surprising by fuzzyfuzzyfungus · · Score: 1

      The general notion that all sorts of things will vibrate in the presence of ambient noise is definitely not new. Even perfectly ordinary mics depend on it, though they bring their own specialized vibrating surface in order to make the problem considerably easier.

      However, there's very little similarity, aside from the use of available objects rather than specially designed surfaces; between using an interferometer to measure vibrations and using a machine vision algorithm to do so.

    8. Re:Not surprising by Zeromous · · Score: 1

      Agreed, this is more of a bench mark scale capability I think. Also uses video capture, not lasers which is more passive technology.

      --
      ---Up Up Down Down Left Right Left Right B A START
    9. Re:Not surprising by fuzzyfuzzyfungus · · Score: 4, Funny

      Clearly, if your work is that important having a window office becomes a sign of extremely low status and institutional nonimportance, rather than professional advancement...

      (At least until they discover the guy spying on the basement dwellers with sophisticated seismometers)

    10. Re:Not surprising by Zeromous · · Score: 1

      Clarify, video is the more passive tech obv.

      --
      ---Up Up Down Down Left Right Left Right B A START
    11. Re:Not surprising by Anonymous Coward · · Score: 0

      I think calling this "machine vision" is a little generous. Detecting these vibrations in the high speed video requires only trivial image processing.

      It certainly is interesting that someone is demonstrating this. But, really, this is little more than an undergrad project for a lab that has a bit too much money on its hands.

    12. Re:Not surprising by Anonymous Coward · · Score: 0

      > government

      Piffle. Wait til the advertisers do this routinely.
      Hell, they will just put recording chips on every package along with the theft-detection things and pick up their profiling data from your appliances.

      You won't even need to leave home to shop. They'll just deliver crap to you as long as your bank account lasts, guaranteed to be exactly what you desire.

    13. Re:Not surprising by Anonymous Coward · · Score: 0

      I'd Rick Roll their ass, it's too bad they cannot turn sound into visual because they'd be getting the ol meat spin.

    14. Re:Not surprising by Anonymous Coward · · Score: 1

      I heard about this system sometime in the 1980's -- I'm sure it was "secret" back then, but dudes seem to leak "cool stuff" -- so there you go.

      The counter to this technique is to put two speakers set to different radio stations and aim them at the window -- or go into a shower stall and whisper in your hand with the water turned on (no, really).

      I hope I don't get on a "Snowden" watch list, but as a ten-year-old, I think I could figure out a counter to every possible spook tool. It's not my fault they aren't more creative than a 5th grader with ADHD.

      FYI: Tin Foil Hats are ridiculed, because THEY DON'T WANT you to wear them. It really improves the reception for my space messages ...

    15. Re:Not surprising by Z00L00K · · Score: 2

      The method is the same, it's just a different tool involved on the way.

      It's enough to measure the image of an object, you don't need to record it first and you actually don't need a laser either, even though it may help.

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    16. Re:Not surprising by Anonymous Coward · · Score: 0

      Wouldn't the frame rate need to be something like 5000 frames per second to capture a normal human voice?

    17. Re:Not surprising by Jeremy+Erwin · · Score: 2

      From Top Secret America: the rise of the surveilance state

      As important to a man's self image as the power of his car's engine or his motorcycle's rumble, SCIF size had become a symbol of status. "In DC, everyone talks SCIF, SCIF, SCIF," said Bruce Paquin, owner of a construction company that builds SCIFs for the government and private corporations. "They've got the penis envy thing going. You can't be a big boy unless you're a three letter agency and you have a big SCIF.

      (A SCIF is a room that has been certified to be impenetrable to various types of surveillance techniques.)

    18. Re:Not surprising by doublebackslash · · Score: 4, Informative

      FTFA

      In other experiments, however, they used an ordinary digital camera. Because of a quirk in the design of most cameras’ sensors, the researchers were able to infer information about high-frequency vibrations even from video recorded at a standard 60 frames per second. While this audio reconstruction wasn’t as faithful as it was with the high-speed camera, it may still be good enough to identify the gender of a speaker in a room; the number of speakers; and even, given accurate enough information about the acoustic properties of speakers’ voices, their identities.

      They don't go into detail on the algorithm but reading between the lines it seems that they are using the spatial nature of video and the fact that not every pixel is captured at exactly the same moment (let alone each line) to ferret out higher frequency information. I have other guesses, but they are wild speculation. Either way VERY cool.

      --
      md5sum /boot/vmlinuz
      d41d8cd98f00b204e9800998ecf8427e /boot/vmlinuz
    19. Re:Not surprising by skovnymfe · · Score: 1

      Is it certified to be impenetrable by human stupidity?

    20. Re:Not surprising by Jeremy+Erwin · · Score: 2

      That's on a need to know basis, and you don't have a need to know.

    21. Re:Not surprising by abedavis · · Score: 1

      The real insight here isn't that these vibrations exist - but that they can be extracted from simple video. Also, laser microphones (the devices you are referring to) depend on a laser beam that must reflect coherently off the object into a precisely positioned sensor. This pretty much limits the use of laser mics to windows. Here is the video posted by the authors of the paper: https://www.youtube.com/watch?...

    22. Re:Not surprising by Anonymous Coward · · Score: 0

      To be fair, you need to be pretty sure you are being actively monitored to go to the extreme lenghts it will take to counter all types of surveillance. you only need to forget 1 detail and they have you. you also need to keep up with (and stay ahead of) new methods.

    23. Re:Not surprising by Z00L00K · · Score: 1

      I would still say that it's the same basic principle just another method.

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    24. Re:Not surprising by nwf · · Score: 1

      And what happens if someone holds up a document to the window? Perhaps that might be a problem, too. If you want good security, don't have windows.

      --
      I don't know, but it works for me.
    25. Re:Not surprising by wierd_w · · Score: 1

      Seems to me that a very high frequency lightsource in the room (WAAY faster than the 60-120hz fluorescent type) would totally bork this system, especially if there were multiple light sources in the room that were not in good phase agreement.

      The camera takes images at 5000 frames per second, or there abouts. So, we need to degrade the camera's ability to take good images by screwing with the light levels at a higher speed, which will then introduce noise into the images captured. A normal human in the room would not notice that there was anything special about the light-- The photopigments in the retina have a much longer acrive duration than the light pulse frequency, so the light would appear to be "ON", not flickering.

      At the very least ,that would make reconstruction of audio from video serveilence with a high speed camera radically more compuationally expensive.

      Just for reference, I am imagining lighting from a recessed vaulted light system. The phase of the light pulses generated by this 360 degree illumination source has incident light happening at 90 degree angles, and at a frequency that randomly modulates between 5000 and 10000 hz. To the POV of the highspeed camera, it will look like "rave party meets disco fever" lighting is going on in there. To the human occupants, the lights are just on. This can be further enhanced by using color mixed LED lighting bars that are designed to produce discrete Red Green and Blue signals that flash so quickly that the human eye sees only white light. The high speed camera however, will see psychodelic colors in randomly mixed patterns.

      Seeing as the SNR is already kinda low even in the lab conditions, increasing the noise in the channel will render this approach untenable.
      "LED high speed 'rave lights'" would do exactly that.

    26. Re:Not surprising by Pinky's+Brain · · Score: 1

      If they can place the laser at an angle where they can see more than the ceiling then any rigid object needs countermeasures, not just the outside window.

    27. Re:Not surprising by Anonymous Coward · · Score: 0

      They don't go into detail on the algorithm but reading between the lines it seems that they are using the spatial nature of video and the fact that not every pixel is captured at exactly the same moment (let alone each line) to ferret out higher frequency information. I have other guesses, but they are wild speculation. Either way VERY cool.

      The YouTube video mentions it (and illustrates it quite well): They make use of the rolling shutter used by most consumer cameras which basically results in every scanline being recorded at a slightly different point in time.

    28. Re:Not surprising by Pinky's+Brain · · Score: 1

      I'm not sure if this is true.

      Definitely not for the coherent bit, trying to setup a second equally long beam path to be able to do interferometry would be an absolute pain and a half (needs to be pretty much exactly as long, coherence length of lasers is never going to be large enough to be able to interfere it without path length matching). Much easier to just intensity modulate the laser and measure the doppler shift of the modulating signal.

      Not sure about the angle either, the window won't have much diffuse reflection ... but it will have some.

    29. Re:Not surprising by Anonymous Coward · · Score: 0

      When high frame rate cameras are outlawed, only outlaws will have high frame rate cameras..............

      Interesting. A high end point and shoot I bought two years ago had a high framerate video mode going up to 240 FPS. Canon S100 ~$400. It is pretty nice for outdoorsy things because sunlight is good. I have a football throw video somewhere. The problems comes with
      1) resolution - everything maxes out at the meager HD resolutions that our lackluster market overlords imposed a decade ago. Haven't RTFA but I'm not sure how you can pick up vibrations meaningfully if you're more than 3 feet away. With SURVEILLANCE equipment, you're supposed to be beyond punching range ;)
      2) as photographers know, light decreases dramatically when you reduce exposure time. High speed video runs into this problem
      3) a secondary issue is the flashing of light fixtures indoors. Aim this at a hallogen lamp and you'll see a clear on-off cycle. Indirect sunlight (windows) and normal light bulbs are better. You catch too many dark frames combined with lighter frames. Sure that requires additional software to fix.

      So for amateur use, it's horrible. Pros and governments (and mythbusters and a few tv shows with 5000FPS camera demonstrations) don't have a problem with equipment.

    30. Re:Not surprising by darenw · · Score: 1

      This needs to be put on a T-shirt.

    31. Re:Not surprising by Anonymous Coward · · Score: 0

      If you could make it flicker at 60Hz square wave with a very short "on" pulse width, that would limit the frequencies the system can "hear" to 30Hz and make speech unintelligible. Imagine basically an intense, 1/1000s flash sixty times a second in an otherwise dark room. You might see it as a constant light level, but the amount of information a high-speed camera can record would be reduced by a large factor as you're essentially reducing its frame rate. You don't want high frequency light, because with sufficient processing power the listener will be able to get more information out of it. The idea is not to generate the information in the first place.

    32. Re:Not surprising by sumdumass · · Score: 1

      I was thinking the same.. Or more to the point, about putting hidden messages in video so a plant for instance would play Pantera and a bag of chips could play some canned messages about the government spying or some shit. Perhaps even make a contest to see who can decode the most and have all the different messages combine to be code for something else entirely.

      Either way, I'm wondering what can be picked up on some of those Youtube vids where they dubbed a narration and blanked out the camera's sound. For some things, people might want to be careful that they don't end up giving away their identities or information they do not want known like politicians who forget to turn their microphones off before having side bar conversations.

      OTOH, perhaps this could find out what the politicians are really saying by analyzing their immediate surroundings in recordings of before and after their speeches when there is no audio.

    33. Re:Not surprising by Anonymous Coward · · Score: 0

      Are we sure that countermeasure wouldn't work in this case?

      Unless you're viewing the object at exactly normal angle to the window, vibrations in the glass would interfere with your attempt to measure vibrations on interior objects. You'd have to pick some other interior object to 'normalise' the image against - and that could be vibrating too...

    34. Re:Not surprising by skovnymfe · · Score: 1

      I see. Well I suppose I'll just go perform rootine maintenance on these servers over here...

  2. Now that's a bit disturbing.... by PortHaven · · Score: 1

    Puts the potato chip bag back into his lunch bag.

    (How much did DHS pay for this research?)

    1. Re:Now that's a bit disturbing.... by Anonymous Coward · · Score: 0

      The war on drugs just found a new way to populate prisons. because chips and the munchies... I think this might be the turning point of the war.

  3. Been there done that by Anonymous Coward · · Score: 3, Funny

    Sorry but that is so 2004.

    - NSA

    1. Re:Been there done that by gsslay · · Score: 4, Funny

      Are they not doing this already in CSI? I'm sure I saw them enhance an office security video of a post-it note, reflected off a monitor screen, magnified a couple of times, and there they had it; complete dialog in stereo, with accompanying analysis of voice stress so they knew who was lying. Isn't science wonderful?

    2. Re:Been there done that by Anonymous Coward · · Score: 0

      So what? I did that on my VIC-20 in BASIC with enough RAM left over for a PONG clone.

    3. Re:Been there done that by schlachter · · Score: 1

      Er, that's so 1984

      --
      My God can beat up your God. Just kidding...don't take offense. I know there's no God.
    4. Re:Been there done that by Anonymous Coward · · Score: 0

      You forgot the first part where they uncropped the video to find the monitor with the desired reflection.

  4. Possible NASA method by Anonymous Coward · · Score: 2, Funny

    Could this be used by NASA to look for intelligent life on other worlds by measuring objects in the same fashion?

    1. Re:Possible NASA method by Anonymous Coward · · Score: 1

      Depends, does it work on objects of sub-pixel size?

    2. Re:Possible NASA method by Anonymous Coward · · Score: 0

      Hopefully nobody used this method on Earth, all they will hear is the noise of a million farts every day.

      "Ah yes, it appears these creatures communicate using loud gaseous noises, we have managed to decipher a few of them, some seem playful, others seem vengeful and hostile in nature.
      We found other noises as well, but they seem unintelligible."

    3. Re:Possible NASA method by Calinous · · Score: 1

      This is best used at very high frame rates (50,000 frames per second I think) - and the "pictures" of alien planets are made with exposures of hours.

    4. Re:Possible NASA method by jellomizer · · Score: 1

      In theory... However you will need to get a really good video image of a planet.
      Right now most of the planets outside of our solar system is extrapolated mathematically not actually seen directly with a camera.

      Then you will need to get a really really good resolution (To a point where you can probably see the life on the planet anyways) Then you will need to send it back, to earth for translation. So by the time we say hello. to them it may just be a lost language, or at least sounding quite out of place.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    5. Re:Possible NASA method by fuzzyfuzzyfungus · · Score: 1

      Depends, does it work on objects of sub-pixel size?

      Just to make life more fun, a planet with life would likely have an atmosphere and those make a bit of a mess of light that passes through them(which might be handy if we are looking for atmospheres; but substantially less so if we are trying to look through them).

    6. Re:Possible NASA method by gtall · · Score: 1

      I think it depends on whether the intelligent life has developed potato chips and bag in which to hold them while being eaten...under the assumption that potato chips are not toxic to the intelligent life as they sometimes are to terrestrial humans.

  5. Scary by Anonymous Coward · · Score: 3, Interesting

    This is cool, yet scary stuff.

    I wonder how loud the original audio has to be in order to be recovered in this manner? It sounded to me like the spoken words were being shouted, and we have no way of knowing how loud the music was played. I didn't see any mention of that in the linked article.

    The linked article has additional technical(ish) information that's not in the video.

    1. Re:Scary by Anonymous Coward · · Score: 0

      It sounded to me like the spoken words were being shouted, and we have no way of knowing how loud the music was played.

      We don't know how loud the spoken words were played either. You can play a shouted recording with a low volume too.

    2. Re:Scary by Anonymous Coward · · Score: 0

      As a proof-of-concept article, it was great. However, the 'noise' issue is an 'in-progress' developmental issue. Generally, it's not uncommon to find drastic orders of improvement once an item gets serious consideration, hence the level of the in-room audio will become much less of an issue.

  6. Now my tin-foil hat... by Anonymous Coward · · Score: 5, Funny

    ...Needs a tin-foil hat!

    1. Re:Now my tin-foil hat... by JackieBrown · · Score: 4, Insightful

      The hat is a trick!

      The reason they want you to wear foil is so that the sound can bounce off it.

    2. Re:Now my tin-foil hat... by fuzzyfuzzyfungus · · Score: 4, Insightful

      Worse than that. If there's a metal foil involved, vibration measurement should be doable with RF as well as light. Only with a next generation reduced radar cross section geometries and RF absorbent materials can a truly secure tinfoil hat be constructed.

      Unfortunately, walking around with what appears to be a small F-117 attached to your head offers limited visual camouflage potential and may prove counterproductive in your attempts to avoid Their surveillance.

    3. Re:Now my tin-foil hat... by fustakrakich · · Score: 1

      What's that, a hockey joke?

      To properly secure your tin foil hat, you need to cover it with cork. Quarter inch should do it.

      --
      “He’s not deformed, he’s just drunk!”
  7. Requires a very high speed camera by tepples · · Score: 4, Interesting

    The YouTube video captions state that this technique requires a camera capable of a few thousand frames per second. Thus this is pretty much using a camera to follow the vibrations, little different from a laser mic. What would impress me more is if they were able to pick up different frequencies from different parts of the bag with different resonant frequencies and reconstruct from standard 30 fps video using the bag as a transducer.

    1. Re:Requires a very high speed camera by interiot · · Score: 4, Insightful

      30 Hz is far below the Nyquist rate (6800 Hz, going by POTS specs), so no, that wouldn't be possible without some fundamental changes in our understanding of information theory and physics.

    2. Re:Requires a very high speed camera by sunderland56 · · Score: 2, Insightful

      reconstruct from standard 30 fps video

      Dear sir: what you are asking is impossible.

      Sincerely yours,

      Harry Nyquist

    3. Re:Requires a very high speed camera by Uecker · · Score: 2

      No, it could work. He wants to capture different information from different parts of the bag. This is a multi-channel problem so you can go below Nyquist. Also you might have a model for speech and you can use to reduce the amount of required information. Finally, you co not need perfect recovery.

    4. Re:Requires a very high speed camera by silfen · · Score: 2

      It's not "impossible", and he even told you how to do it. Incidentally, your ear works the way he suggested.

    5. Re:Requires a very high speed camera by jones_supa · · Score: 2

      30 fps would allow a maximum frequency of 15 Hz.

    6. Re:Requires a very high speed camera by Anonymous Coward · · Score: 1

      The problem is that you would be able to pick up a maximum frequency of 15Hz.
      It doesn't matter how many channels you have if all of them are completely out of your frequency range.

    7. Re:Requires a very high speed camera by tepples · · Score: 3, Insightful

      In theory, if you can find different targets in the frame with resonant frequencies spaced no more than 15 Hz apart, you can read a different 15 Hz off each target.

    8. Re:Requires a very high speed camera by kristianbrigman · · Score: 2

      Well, it might be theoretically possible - but you'd need to get the bits from somewhere. Think of an ocean wave, and you want to measure the height of the water at a given point in time. But waves on water move in fairly predictable ways, so a single picture will tell you both the height of the water at the time the picture was taken, as well as a good approximation of what it was for a short time before and after the picture.

      Another possibility is if there are multiple video streams from the same event, they are probably all 30 fps, but probably didn't catch the exact same samples - overlay them and you may be able to reconstruct a higher-frequency signal.

      This doesn't make it an easier problem, or even possible - now, instead of having to capture at a frequency above the nyquist rate, you have to capture video at a resolution that can tell the micro-topology of a potato chip bag from 15 feet away. After all, you have to extract the information from somewhere. But there are ways to get beyond nyquist sometimes.

      Another example which feels related but i'm not sure how yet: Roland has a patent on electronic drums. They have a single sensor in the middle of the drum, yet within a quarter-wave of a hit anywhere on the drum, they can tell both that it was hit and how far away from the center it was hit, based on the shape of the wave.

    9. Re:Requires a very high speed camera by Anonymous Coward · · Score: 3, Insightful

      Oh dear. You even linked to Wikipedia (although not to the Wikipedia page "Nyquist Rate"). Does it not occur to you that OP understands those things better than you do?

      To start with you need to understand what the Nyquist rate means. Sampling is like wrapping a signal around a cylinder. Just because parts are overlaid ("aliasing") doesn't mean you can't untangle the original signal. For instance, if a single audio source contains only pure harmonics, so the frequencies are known to be N, 2N, 3N, 4N, and so on, and if you have the range of possible N down to a smallish range (e.g. you know it's a voice) and you know that higher harmonics are always smaller than lower harmonics, then you can, from a massively sub-Nyquist sampling like this, extract both N *and* all the coefficients of all the harmonics. It's just like determining the dimensions of a triangle after it's wrapped around a cylinder. No, the triangle doesn't have to fit within one revolution of the cylinder, that's just the trivial case that obviously works.

      What OP is proposing is that because different parts of the physical system have different resonances, when you look at that part of the image you are seeing a strongly filtered version of the original signal - basically a single frequency. You can measure the size of this signal using an aliased sampling - there's no problem with that whatsoever, it just works, an aliased sampling has the same energy as a non-aliased sampling, the samples are just in a different order. Then if you know different image areas have different responses, you can build up an image of the signal by patchwork. It would be a bloody hard job for a crisp packet in arbitrary configuration, but if you get to design the object you're looking at you can make this as sensitive as you like, and even use really crappy cameras to do it.

      Nyquist rate isn't the be-all and end-all people think it is, it's just a limit for *perfect* reconstruction of *arbitrary* signals. The naive approach is to restrict yourself to sub-Nyquist signals and use the easy algorithms everybody knows. The fun stuff (read: the stuff you might get paid for) involves at least flirting with the Nyquist range, or even fully embracing that aliasing is happening and figuring out the consequences from first principles. Once you do this, you can do amazing things that seem impossible to Signal Processing 101 students ... the only problem then is you get SP101 students telling you you're an idiot for thinking that's possible. Oh, well.

      BTW, sampling rate on telephony is 8000Hz as standard. Pro-tip: if you want to sound like a signal processing expert, know common sample rates.

    10. Re:Requires a very high speed camera by jones_supa · · Score: 1

      Yes, that is true. :)

    11. Re:Requires a very high speed camera by SydShamino · · Score: 2, Informative

      No, you can pick up something higher than Nyquist, as long as you understand your sources of information and noise. It will alias down into the measurable range, and you can extract useful information from the alias. We have a system that operates up to 1 MHz using a 1.8 MHz ADC. When we know the signal is at 1 MHz, we extract the information at 800 kHz and use that.

      What the GGP was talking about, though, was finding resonance on the bag where unique 30-Hz-width bands higher frequencies were being naturally modulated to baseband. If you had 100 points on the bag that each modulated a different frequency (30 Hz, 45 Hz, 90 Hz, ... 1500 Hz), you could extract the data from each sub-band separately and reconstruct the original signal. See http://en.wikipedia.org/wiki/F... and assume the source isn't one 1500 Hz conversation but instead one hundred 15 Hz conversations. And also assume that is one amazing bag of chips.

      --
      It doesn't hurt to be nice.
    12. Re:Requires a very high speed camera by blincoln · · Score: 5, Informative

      For some reason, the person who posted the article or the Slashdot editors linked to a bad knock-off video that removed 3/4 of the details instead of the actual researchers' video. The real video makes it clear that they can also get results from a standard DSLR 60 FPS video by taking advantage of the rolling shutter effect. There's a fidelity loss, but it's a lot better than I would have expected.

      --
      "...always new atoms but always doing the same dance, remembering what the dance was yesterday." -Richard Feynman
    13. Re:Requires a very high speed camera by SydShamino · · Score: 1

      If you use a 1 kHz ADC to measure a 1.1 kHz signal, what do you measure at 900 Hz?

      --
      It doesn't hurt to be nice.
    14. Re:Requires a very high speed camera by Anonymous Coward · · Score: 0

      The reason you normally can't use a 30 Hz sampling to go higher than 15 Hz, is because higher frequency signals will alias, and look like lower frequency signals. You then can't tell if a signal is the apparent frequency, or is a higher frequency that is being aliased to look lower (and if you don't have appropriate filters, you won't even be able to measure below 15 Hz if there is too much high frequency stuff around too...). If you know before hand what narrow frequency range to look at, then you can tell that a signal is being aliased and still reconstruct it. If you have several channels that only can emit very narrow ranges each (about 15 Hz bandwidth), then you will know how each would be aliased and reconstruct each from 30 Hz measurements, and then combine them together. This does require that such signals are around long enough to be picked up by several samples at 30 Hz though.

    15. Re:Requires a very high speed camera by Anonymous Coward · · Score: 0

      They do it with a 60fps camera by considering the row timing differences produced by the rolling shutter artifact. In this way there is no violation of the Nyquist rate.
      I highly doubt that this will work equally well on a camera with a lower frame rate, though.

    16. Re:Requires a very high speed camera by Anonymous Coward · · Score: 0

      The frame rate of video is only a few tens of Hz, but the video signal itself has a much higher band width. It may not be completely impossible to do using one of these "shaky" camera's that don't freeze a complete frame at once and using some very smart stuff. I would be impressed too by that.

    17. Re:Requires a very high speed camera by Anonymous Coward · · Score: 0

      In the DSLR camera mentioned lines of pixels are recorded sequentially in time so, if the picture has 1080 lines (a standard HD image) the sample rate is actually 1080*30 (or (1080*60 depending on the format, even SD would work) which is more than sufficient a sample rate for audio as long as the sound waves are big enough to move enough of the bag (or other target) together at the same time and the target takes up sufficient space in the image.

    18. Re:Requires a very high speed camera by Anonymous Coward · · Score: 0

      This works only if there we're resonances scattered throughout the audible range. Given the extremely low weight of the material in a potato chip bag, I doubt there would be any. Also, given the strong coupling of the bag to the air, the Q of any resonances would probably be very low.

      Now, you might have more luck with objects other than a potato chip bag. In fact, it would be cool to try to construct an object that would do this - provide resonances at reasonably high Q across the audio band on a visible surface.

      Sounds like a thesis project.

    19. Re:Requires a very high speed camera by doublebackslash · · Score: 2

      That assumes that you only are getting one sample per frame. FTFA

      In other experiments, however, they used an ordinary digital camera. Because of a quirk in the design of most cameras’ sensors, the researchers were able to infer information about high-frequency vibrations even from video recorded at a standard 60 frames per second. While this audio reconstruction wasn’t as faithful as it was with the high-speed camera, it may still be good enough to identify the gender of a speaker in a room; the number of speakers; and even, given accurate enough information about the acoustic properties of speakers’ voices, their identities.

      Remember that video has two spatial dimensions with 3 channels (which themselves are in different spatial locations within each pixel) each and that each line isn't captured at the same instant. There is a lot more information there than a single sample at a given rate. Nyquist doesn't apply to the frame rate here. Nyquist is stil lrelevant to the problem, of course! They didn't break Nyquist, they just found a way to get more information than intuition implies.

      --
      md5sum /boot/vmlinuz
      d41d8cd98f00b204e9800998ecf8427e /boot/vmlinuz
    20. Re:Requires a very high speed camera by aitikin · · Score: 1

      If you use a 1 kHz ADC to measure a 1.1 kHz signal, what do you measure at 900 Hz?

      450Hz. Just like, in your example of 1.1kHz measured by a 1kHz ADC, your measurement would be .650kHz or 650Hz.

      --
      "Don't meddle in the affairs of a patent dragon, for thou art tasty and good with ketchup." ~ohcrapitssteve
    21. Re:Requires a very high speed camera by Anonymous Coward · · Score: 1

      What OP is proposing is that because different parts of the physical system have different resonances, when you look at that part of the image you are seeing a strongly filtered version of the original signal - basically a single frequency. You can measure the size of this signal using an aliased sampling - there's no problem with that whatsoever, it just works, an aliased sampling has the same energy as a non-aliased sampling, the samples are just in a different order.

      Figuring out the spatial distribution of resonant frequencies in the surface of an arbitrarily crumpled bag of potato chips to let you can invert the aliasing of a mixed-frequency human voice against 30 Hz video is so much more difficult than just buying a kilohertz framerate camera that I don't even know how to begin.

    22. Re:Requires a very high speed camera by lgw · · Score: 1

      They were able to get decent voice from a 60 Hz camera. I'm guessing the camera didn't record all pixels simultaneously, and so the differences between when each pixel sampled were enough to work with.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    23. Re:Requires a very high speed camera by Anonymous Coward · · Score: 0

      And that's the problem. If you're sampling at 30 Hz, you can't tell whether the audio signal you see at 14 Hz is actually from a 14 Hz, 29 Hz, 31 Hz, 59 Hz, 61 Hz, 89 Hz... Now, you might be able to decompose that using a set of mechanical amplifiers, each tuned to a different multiple of 15 Hz spanning typical human speech range of 100-8000 Hz. Call it about 500 tuned mechanical resonators. Seems like it would be much harder to get your spy-subject to display your resonator grid in his window, than to get a higher frame rate camera.

      nevermind the fact that some phonemes are shorter than a camera frame.

    24. Re:Requires a very high speed camera by K.+S.+Kyosuke · · Score: 1

      Unless thirty frames per second isn't actually thirty scalar samples per second, right?

      --
      Ezekiel 23:20
    25. Re: Requires a very high speed camera by Anonymous Coward · · Score: 0

      Almost all video cameras integrate the illumination of each pixel over a significant fraction of a frame period. Therefore, each pixel is significantly low-pass filtered, as well as being sampled only once per frame.

  8. Needs video filmed at 2000-6000 FPS by Anonymous Coward · · Score: 0

    Which significantly tones down the 'omg they will hear what we say in the background of youtube videos' effect.

    1. Re:Needs video filmed at 2000-6000 FPS by bobbied · · Score: 1

      Which significantly tones down the 'omg they will hear what we say in the background of youtube videos' effect.

      Not to mention that you need fairly high resolution images to be able to detect the movement too. If you cannot do high resolution at reasonable frame rates you are not going to have anything to hear...

      --
      "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
  9. Resolution and sensor noise by BitZtream · · Score: 2, Informative

    The sensor and optics must have been ridiculously high quality and resolution for this to work. Sensor noise alone would almost certainly rule this out for any COTS consumer package. They certainly aren't doing it with CNN footage or old CCTV surveillance tapes.

    In which case, it's of no practical value since a laser mic would be far cheaper and more discrete.

    Cool from an academic perspective that they can use DSP now, but it's just more fun with a laser mic, same principals and theories, new less workable application.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    1. Re:Resolution and sensor noise by Calinous · · Score: 1

      This seems to work through soundproof glass... On the other hand, how big would be a camera able to record at this resolution and frame rates, and how close it must be?

    2. Re:Resolution and sensor noise by bobbied · · Score: 1

      This seems to work through soundproof glass...

      Glass that is pretty CLEAN.. You need really fast frame rates (6Khz will get you phone quality audio) and pretty high optical resolution. I'm just guessing, but you are going to need 3-4 pixels for any kind of reasonable S/N ratio that's listenable, so if the object you are looking at only moves a few nanometers with the sound, that means you need a minimum of two pixels per nanometer. To do that kind of resolution at say 10 feet, is going to require some pretty good optics. The resolution of the video itself, doesn't matter all that much, but you are going to need some serious optics and some really fine focus.

      --
      "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
    3. Re:Resolution and sensor noise by gtall · · Score: 1

      Or Wolf Blitzer broadcasts. It is a fact that Wolf can actually suck information out of the ambient room and it is never seen nor heard from again. Think of him as an acoustical black hole.

    4. Re:Resolution and sensor noise by Anonymous Coward · · Score: 0

      you are going to need 3-4 pixels for any kind of reasonable S/N ratio that's listenable, so if the object you are looking at only moves a few nanometers with the sound, that means you need a minimum of two pixels per nanometer

      No, no, no. Using image processing algorithms you can distinguish movement to way higher resolution than 1 pixel. That's kind of the point, and why this is news.

      Let me tell you a story. I'm no expert but I once wrote (hacked?) a webcam barcode scanner that can decode a barcode without even enough pixels to distinguish every pixel of the barcode. Yeah, it surprised me too; Nyquist must have been turning in his grave! But if you think about it, there's plenty enough raw bits to get a product code out, they're just in inconvenient forms like you have greyscale where you want more black/white pixels, or you have multiple frames where you want better colour discrimination, or you have a set of similar scanlines and you want better discrimination of a single scanline. My algorithm (intended to deal with poor focus) accidentally exploited some of that extra data and BANG! I beat the (naive) Nyquist limit. Somewhat in awe, I researched that field, and what's being done with this is AMAZING. Like, take a "still life" video at 720p and reconstruct a 4Kx2K image from it. Trade temporal resolution for spatial resolution, or trade Y resolution for X resolution, or trade greyscale depth for spatial resolution. The more you know about your input image, the better, which is why the barcode problem is an accessible way into thisc. But the cutting edge is exactly the sort of thing being discussed here - things like measuring *miniscule* displacements over time, that common sense says the camera can't even see. The output isn't a 1/0 square wave, it's more like a Bayesian probability wave, and you can get *incredible* SNR, just way beyond what intuition says, because you have a ton of data from a video feed, and you can use it ALL.

      This kind of thing is only the beginning, believe me.

    5. Re:Resolution and sensor noise by lgw · · Score: 1

      Being able to do this from a recording is the magic. There are many places where video is recorded but not audio. Being able to recover the audio much later would be special. Also worrying, in places where you can record video but not audio without a warrant.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    6. Re:Resolution and sensor noise by BitZtream · · Score: 1

      The problem is that the video is recorded in too low of resolution and too low of framerate to be useful. Any place just recording video and not audio is doing it with cheap ass bargain basement gear, hence where there is no audio gear already recording the sound they are missing

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    7. Re:Resolution and sensor noise by schlachter · · Score: 1

      except that a laser mic is ACTIVE monitoring and could be detected, whereas video is PASSIVE monitoring and is undetectable. It would also allow for monitoring in hostile environments, i.e. some guys around a campfire, where there may not be a nice object to laser against.

      --
      My God can beat up your God. Just kidding...don't take offense. I know there's no God.
    8. Re:Resolution and sensor noise by lgw · · Score: 1

      Cheap cameras only get better. They were able to reconstruct voice from a good 60 Hz camera. If a policeman wanted to game the system, he could use a good camera to avoid the need for a warrant. Much like the NSAs "we capture everything, but only look at the data with (later) probable cause" excuse, someone might try an equally disturbing "we capture video, and only reconstruct the audio with (later) probable cause" in an attempt to game the system similarly. Uggh.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    9. Re:Resolution and sensor noise by Anonymous Coward · · Score: 0

      The video demonstrates that using a cellular phone's rolling shutter you can sample each line independently for an effective rate at (num horizontal lines) * (frame rate), which assuming 1080 lines x 30 fps, puts you well into the audible range. They demonstrate this by filming a set of in-ear headphones, playing back the estimated audio (a song), and having it correctly identified by a song-identification service.

  10. Yeah, only if one speaks in extremely low tones... by American+Patent+Guy · · Score: 1

    After all, video normally has an update rate of 24 - 30 fps. The sampling rate will be half that at about 15 Hz. If you have to have a video camera that can take pictures at audible sampling rates (very expensive), why not just bounce an IR laser off that potato chip bag?

  11. NFL call stealing by Anonymous Coward · · Score: 0

    So does this mean that someone with that equipment could measure the vibrations on the play sheets coaches commonly hold in front of their face while calling in plays? In the past teams have hired lip readers to try to determine the play, this might be more accurate - or at least lead to people waving their play sheet in front of their face while they call in the play.

    1. Re:NFL call stealing by king+neckbeard · · Score: 1

      I'm not sure if that would a practical usage of this. With the hardware needed for this, it would probably be cheaper to get a sensitive enough shotgun mic.

      --
      This is my signature. There are many like it, but this one is mine.
    2. Re:NFL call stealing by ColdWetDog · · Score: 1

      Radar! What is the general saying?

      --
      Faster! Faster! Faster would be better!
  12. Re: Yeah, only if one speaks in extremely low tone by Anonymous Coward · · Score: 2, Funny

    Because if your target is eating SunChips you'd risk hearing loss.

  13. Re: Yeah, only if one speaks in extremely low tone by Anonymous Coward · · Score: 0

    There doesn't need to be a bag. Only a very high frame rate video of a bag.

  14. Not surprising by Anonymous Coward · · Score: 0

    That was done live, not from a video.

  15. I'm usually a cynical bastard but... by Assmasher · · Score: 1

    ...that's pretty effin' amazing. From video, 15 feet away. Not using a laser, FROM VIDEO! Lol.

    --
    Loading...
  16. It's curtains for privacy by hippo · · Score: 1

    The minimalist architects are in league with the spooks!

  17. Re:Yeah, only if one speaks in extremely low tones by silas_moeckel · · Score: 2

    Because your emitting something sending that IR laser to do it. This is completely passive.

    --
    No sir I dont like it.
  18. Breached! by Anonymous Coward · · Score: 1

    The Cone of Silence has been breached Max!

    1. Re: Breached! by bill_mcgonigle · · Score: 1

      only because the Adobe software was subject to a buffer overflow encoded in the Ruffles halftone.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  19. Stop talking by Anonymous Coward · · Score: 1

    that glasshole is staring at your crisps.

  20. Requires a very high speed camera by TwentyCharsIsNotEnou · · Score: 1
    From your own link:

    However, countermeasures exist in the form of specialized light sensors that can detect the light from the beam. Rippled glass can be used as a defense, as it provides a poor surface for a laser microphone.

    You say "little different from a laser mic". Yes, innovation is incremental, this is an increment.

  21. The inverse would be truly phenomenal by Anonymous Coward · · Score: 0

    extracting video from audio.

  22. video frame rate = 2x higest audio freq by Anonymous Coward · · Score: 0, Informative

    You need a good 500 fps to recover audio from video. This has not been standard, but is possible with some cameras.

    1. Re:video frame rate = 2x higest audio freq by Anonymous Coward · · Score: 0

      At the end of the video they demonstrated exploiting the shutter effect to extract sound using a non-highspeed camera.

  23. Re:Yeah, only if one speaks in extremely low tones by Anonymous Coward · · Score: 1

    RTFA. They are using a high speed camera.

    Why not bounce an IR laser off that potato chip bag? Because you may not be able to. Windows have IR-reflective coating, the bag is not very reflective, the surface may be at the wrong angle. This is useful.

    High speed cameras are not that expensive. You can get 1000fps in a consumer camera for around $1000 these days.

  24. Re:Yeah, only if one speaks in extremely low tones by American+Patent+Guy · · Score: 1

    Yeah ... except that your spy van in your target's parking lot housing the high speed camera and its zooming lens will be obvious. With an IR laser you could do it from blocks away and nobody in the room would be the wiser.

  25. The first thing by kryliss · · Score: 1

    This is the first thing I though of when reading the title.

    SFW

    https://www.youtube.com/watch?v=6i3NWKbBaaU

    --
    --- If the bible proves the existence of God, then Superman comics prove the existence of Superman.
  26. Class and Security by khr · · Score: 1

    Not only is it classier, but now serving your potato chips in a nice bowl is more secure.

  27. Re:Yeah, only if one speaks in extremely low tones by silas_moeckel · · Score: 1

    And with good optics the camera could be just/nearly as far away. Laser is old know tech so places will have countermeasures and detection setup. Miniaturization can also come into place where you can shove the camera into a cell phone etc.

    --
    No sir I dont like it.
  28. Um... by Ronin+Developer · · Score: 1

    How did they isolate the speech from the crunch of potato chips???? And, if this possible, there is no hope for anyone with the munchies!

  29. tl;dr: by CaptainStumpy · · Score: 2

    Yelling MARY HAD A LITTLE LAMB, ITS FLEECE WAS WHITE AS SNOW at a houseplant, bag of chips, and glass of water is now research.

    --
    It will be better to purchase from an owner who is a good farmer and a good builder.
    1. Re:tl;dr: by QilessQi · · Score: 1

      Laugh if you want, but if you yell MARY HAD A LITTLE LAMB at your rhododendrons, the NSA will know about it.

  30. "photographed" by Anonymous Coward · · Score: 0

    Is "photographed" now a synonym for "recorded" (video)? Because for a second I thought that they were claiming to have extracted a slice of audio from the ripples in a photograph... and THAT would be very impressive.

    Using a high-dollar, high-resolution, high-speed video camera to capture frames of video and then processing vibrations into sound? Meh, it's basically been done before, only in real-time and using equipment that costs a whole lot less.

  31. laws on recording by Anonymous Coward · · Score: 1

    The reason I find this interesting is that there are a number of locations (in the U.S.) where it is legal to record video but not audio. I happen to be against some of the laws forbidding audio recordings, because I think that it usually is used to protect against the recording of corruption. (try to open a bar in new york city, with live music, see what kinds of visits you get, and notice the correlation with fire, police, and food inspection after varying responses to those asking for money.) Would be interested in how the video gets treated in law, if one were to subsequently extract the audio for it.

  32. Observers by Cmplctd_Smplcty · · Score: 1

    Reminds me of technology used in the show Fringe by the Observers. Surely, no one will use this tech for nefarious purposes. Don't worry about it! Now, where's that Carbon Monoxide generator?

  33. Original video link by ChrisSlicks · · Score: 1

    Original video with a less annoying voice.
    https://www.youtube.com/watch?v=FKXOucXB4a8

  34. How this get modded up? by Anonymous Coward · · Score: 1

    If you are going to sign your name as someone, maybe you should understand what they said first. The Nyquist limit only applies when you are trying to reconstruct an arbitrary waveform you know nothing about. If you can constrain the waveform with previous knowledge, or make some assumptions, you can go well beyond the Nyquist limit. If you know something resonates at a particular frequency (or even a particular small range), you can measure the intensity of that frequency using measurement rates way below where you would if you had no prior estimate of the frequency. Of course measuring frequencies higher than the Nyquist limit will alias, but you can still reconstruct things if you have an appropriate frequency range and know it is aliasing.

  35. It's nothing more than laser mics by LostMyBeaver · · Score: 1

    All they did was remove the laser to make the ease dropper unobservable. The tech is useless unless the camera has an insanely fine pitch resolution or the speech is so loud it causes large vibrations.

    Otherwise, they're just sampling motion at 2000-6000fps. It seems ridiculously processor intensive for something which could be better achieved using a high performance light meter.

  36. Re:Yeah, only if one speaks in extremely low tones by American+Patent+Guy · · Score: 1

    You can get 1000fps in a consumer camera for around $1000 these days.

    Not one with a telephoto lens that will zoom in from yards away to resolve a potato chip bag. Your consumer camera spy equipment is a pipe dream...

    You'd be better off hiring a lip reader. Of course, all of this could be averted by closing the blinds in front of the window where the subject wishes his conversation to be private... I'm filing this one under "Dumb ideas".

  37. Eagle Eye by advantis · · Score: 1

    This is what I instantly thought of: Eagle Eye. The scene with the soundproof room.

    --
    Question for religious people: where do unrepentant masochists go when they die?
  38. Next step - extract recorded sounds by Anonymous Coward · · Score: 1

    Amazing accomplishment! I remember using a laser interferometer decades ago to hear audio at a distance, although the quality was low. For years I've wondered whether a material setting up such as a ceramic or brick or anything that turns hard over time might be capturing nearby sounds in its micro structure. An analysis of micro density variations on an old brick or piece of pottery might be able to recreate the sounds of voices in some ancient culture - we could actually hear what Sumerian or ancient Egyption really sounded like, maybe even hear the voices of Socrates and Ramses!

  39. Re:Yeah, only if one speaks in extremely low tones by mpe · · Score: 1

    Yeah ... except that your spy van in your target's parking lot housing the high speed camera and its zooming lens will be obvious.

    Why should a high speed camera look any different from a regular one?

  40. Re: Not surprising-Interesting constitutional rami by Anonymous Coward · · Score: 1

    Woth this discovery, I wonder what the ramifications of takng public video wothout sound are...with this technique its possible to take any video from a spy camera and reproduce the audio...what are the ramifications on wire tapping laws?

  41. Re:This can not work with most video by Technician · · Score: 1

    This works with very high framerate video. Normal video has way too low of a frame rate to capture vibrations in the (phone company defined) band of 300 HZ to 3000 HZ. At normal vidoe rates of about 24-50 FPS, the sample rate is below the lower fundemental frequency of voice.

    --
    The truth shall set you free!
  42. Re:Yeah, only if one speaks in extremely low tones by BitZtream · · Score: 1

    Miniaturization can also come into place where you can shove the camera into a cell phone etc.

    The laws of physics say otherwise. You can't get the resolution/quality out of a cell phone sized camera to do it at any difference, there just isn't enough in the lenses to do it.

    Then couple in the need for 6k frames per second being about the bottom line requirement for getting voice frequencies using this technique, its pretty useless in any situation that an alternative wouldn't work better.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  43. Am I missing something? by Fredde87 · · Score: 1

    A lot of the posts above are saying you need a high frame rate camera but the video uses a 60fps camera at the end. Am I misunderstanding something or are people above not watching the whole video?

  44. Possible refinement of method? by Anonymous Coward · · Score: 0

    I'm curious how much of the image they are using to recover the sound and if it would be possible to use multiple objects in the image to recover audio at different points. If you have some knowledge of the three dimensional spatial relationship between the various objects you could probably use phased array techniques to effectively restrict your signal to sound coming from particular points significantly improving noise rejection.

  45. Link to youtube? by CurryCamel · · Score: 1

    We totally /.'d Youtube now - at the time of writing this, the video has been watched 2k times. After several hours of being on the /. front page.

  46. Re:This can not work with most video by chefmonkey · · Score: 1

    Actually, I was terribly unimpressed, too, until they did something clever that overcomes this limitation. Watch the last few dozen seconds of the video, as they pull sound out of the video recorded by a normal, consumer DSLR: no high speed video needed!

  47. Re:Yeah, only if one speaks in extremely low tones by K.+S.+Kyosuke · · Score: 1

    I just thought of using microwaves to get a reflection from something metallic and vibrating. Wouldn't that work even through the blinds? (Or even walls?)

    --
    Ezekiel 23:20
  48. Re:Yeah, only if one speaks in extremely low tones by silas_moeckel · · Score: 1

    From the looks of it a 1d array might work rather well and get the frame rates required. I didn't say generic cell phone just fitting in the form factor or close enough to not be suspicious. This could fit in existing security camera form factors (the 18 ish inch long all weather enclosures commonly used) that are so common as to be forgotten.

    --
    No sir I dont like it.
  49. Re: Not surprising-Interesting constitutional rami by sumdumass · · Score: 1

    I don't know but I bet a Google Glass wearer might be the first to find out.

    Imagine that, now not only do you have to worry about your pic and video of you at the bar hanging with the guys being posted all over face book by someone you don't even know, but now your conversations about who you would like to bag might make it back to your wife or significant other.

    Expect the cops and government to use this. I imagine a drone flying over at 15,000 feet with a telescopic lenses and listening in on a terrorist or even your conversations in your own back yards. And the courts already said that cops video taping you is legit if they can legally see you- no warrant needed.

  50. Throw them off with an off-camera fan by ayesnymous · · Score: 1

    They won't know the make/model/speed of the fan.

  51. We tend to scoff at the beliefs of the ancients by handy_vandal · · Score: 1

    "We tend to scoff at the beliefs of the ancients. But we can't scoff at them personally, to their faces, and this is what annoys me."

    - Jack Handey

    --
    -kgj