Extracting Audio From Visual Information
rtoz writes Researchers at MIT, Microsoft, and Adobe have developed an algorithm that can reconstruct an audio signal by analyzing minute vibrations of objects depicted in video. In one set of experiments, they were able to recover intelligible speech from the vibrations of a potato-chip bag (video) photographed from 15 feet away through soundproof glass.
Measuring the vibrations of windows or other items was used already 40 to 50 years ago by spy agencies, so I wonder if this isn't something that has been re-discovered?
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
Puts the potato chip bag back into his lunch bag.
(How much did DHS pay for this research?)
Sorry but that is so 2004.
- NSA
Could this be used by NASA to look for intelligent life on other worlds by measuring objects in the same fashion?
This is cool, yet scary stuff.
I wonder how loud the original audio has to be in order to be recovered in this manner? It sounded to me like the spoken words were being shouted, and we have no way of knowing how loud the music was played. I didn't see any mention of that in the linked article.
The linked article has additional technical(ish) information that's not in the video.
...Needs a tin-foil hat!
The YouTube video captions state that this technique requires a camera capable of a few thousand frames per second. Thus this is pretty much using a camera to follow the vibrations, little different from a laser mic. What would impress me more is if they were able to pick up different frequencies from different parts of the bag with different resonant frequencies and reconstruct from standard 30 fps video using the bag as a transducer.
Which significantly tones down the 'omg they will hear what we say in the background of youtube videos' effect.
The sensor and optics must have been ridiculously high quality and resolution for this to work. Sensor noise alone would almost certainly rule this out for any COTS consumer package. They certainly aren't doing it with CNN footage or old CCTV surveillance tapes.
In which case, it's of no practical value since a laser mic would be far cheaper and more discrete.
Cool from an academic perspective that they can use DSP now, but it's just more fun with a laser mic, same principals and theories, new less workable application.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
After all, video normally has an update rate of 24 - 30 fps. The sampling rate will be half that at about 15 Hz. If you have to have a video camera that can take pictures at audible sampling rates (very expensive), why not just bounce an IR laser off that potato chip bag?
So does this mean that someone with that equipment could measure the vibrations on the play sheets coaches commonly hold in front of their face while calling in plays? In the past teams have hired lip readers to try to determine the play, this might be more accurate - or at least lead to people waving their play sheet in front of their face while they call in the play.
Because if your target is eating SunChips you'd risk hearing loss.
There doesn't need to be a bag. Only a very high frame rate video of a bag.
That was done live, not from a video.
...that's pretty effin' amazing. From video, 15 feet away. Not using a laser, FROM VIDEO! Lol.
Loading...
The minimalist architects are in league with the spooks!
Because your emitting something sending that IR laser to do it. This is completely passive.
No sir I dont like it.
The Cone of Silence has been breached Max!
that glasshole is staring at your crisps.
You say "little different from a laser mic". Yes, innovation is incremental, this is an increment.
extracting video from audio.
You need a good 500 fps to recover audio from video. This has not been standard, but is possible with some cameras.
RTFA. They are using a high speed camera.
Why not bounce an IR laser off that potato chip bag? Because you may not be able to. Windows have IR-reflective coating, the bag is not very reflective, the surface may be at the wrong angle. This is useful.
High speed cameras are not that expensive. You can get 1000fps in a consumer camera for around $1000 these days.
Yeah ... except that your spy van in your target's parking lot housing the high speed camera and its zooming lens will be obvious. With an IR laser you could do it from blocks away and nobody in the room would be the wiser.
This is the first thing I though of when reading the title.
SFW
https://www.youtube.com/watch?v=6i3NWKbBaaU
--- If the bible proves the existence of God, then Superman comics prove the existence of Superman.
Not only is it classier, but now serving your potato chips in a nice bowl is more secure.
And with good optics the camera could be just/nearly as far away. Laser is old know tech so places will have countermeasures and detection setup. Miniaturization can also come into place where you can shove the camera into a cell phone etc.
No sir I dont like it.
How did they isolate the speech from the crunch of potato chips???? And, if this possible, there is no hope for anyone with the munchies!
Yelling MARY HAD A LITTLE LAMB, ITS FLEECE WAS WHITE AS SNOW at a houseplant, bag of chips, and glass of water is now research.
It will be better to purchase from an owner who is a good farmer and a good builder.
Is "photographed" now a synonym for "recorded" (video)? Because for a second I thought that they were claiming to have extracted a slice of audio from the ripples in a photograph... and THAT would be very impressive.
Using a high-dollar, high-resolution, high-speed video camera to capture frames of video and then processing vibrations into sound? Meh, it's basically been done before, only in real-time and using equipment that costs a whole lot less.
The reason I find this interesting is that there are a number of locations (in the U.S.) where it is legal to record video but not audio. I happen to be against some of the laws forbidding audio recordings, because I think that it usually is used to protect against the recording of corruption. (try to open a bar in new york city, with live music, see what kinds of visits you get, and notice the correlation with fire, police, and food inspection after varying responses to those asking for money.) Would be interested in how the video gets treated in law, if one were to subsequently extract the audio for it.
Reminds me of technology used in the show Fringe by the Observers. Surely, no one will use this tech for nefarious purposes. Don't worry about it! Now, where's that Carbon Monoxide generator?
Original video with a less annoying voice.
https://www.youtube.com/watch?v=FKXOucXB4a8
If you are going to sign your name as someone, maybe you should understand what they said first. The Nyquist limit only applies when you are trying to reconstruct an arbitrary waveform you know nothing about. If you can constrain the waveform with previous knowledge, or make some assumptions, you can go well beyond the Nyquist limit. If you know something resonates at a particular frequency (or even a particular small range), you can measure the intensity of that frequency using measurement rates way below where you would if you had no prior estimate of the frequency. Of course measuring frequencies higher than the Nyquist limit will alias, but you can still reconstruct things if you have an appropriate frequency range and know it is aliasing.
All they did was remove the laser to make the ease dropper unobservable. The tech is useless unless the camera has an insanely fine pitch resolution or the speech is so loud it causes large vibrations.
Otherwise, they're just sampling motion at 2000-6000fps. It seems ridiculously processor intensive for something which could be better achieved using a high performance light meter.
You can get 1000fps in a consumer camera for around $1000 these days.
Not one with a telephoto lens that will zoom in from yards away to resolve a potato chip bag. Your consumer camera spy equipment is a pipe dream...
You'd be better off hiring a lip reader. Of course, all of this could be averted by closing the blinds in front of the window where the subject wishes his conversation to be private... I'm filing this one under "Dumb ideas".
This is what I instantly thought of: Eagle Eye. The scene with the soundproof room.
Question for religious people: where do unrepentant masochists go when they die?
Amazing accomplishment! I remember using a laser interferometer decades ago to hear audio at a distance, although the quality was low. For years I've wondered whether a material setting up such as a ceramic or brick or anything that turns hard over time might be capturing nearby sounds in its micro structure. An analysis of micro density variations on an old brick or piece of pottery might be able to recreate the sounds of voices in some ancient culture - we could actually hear what Sumerian or ancient Egyption really sounded like, maybe even hear the voices of Socrates and Ramses!
Yeah ... except that your spy van in your target's parking lot housing the high speed camera and its zooming lens will be obvious.
Why should a high speed camera look any different from a regular one?
Woth this discovery, I wonder what the ramifications of takng public video wothout sound are...with this technique its possible to take any video from a spy camera and reproduce the audio...what are the ramifications on wire tapping laws?
This works with very high framerate video. Normal video has way too low of a frame rate to capture vibrations in the (phone company defined) band of 300 HZ to 3000 HZ. At normal vidoe rates of about 24-50 FPS, the sample rate is below the lower fundemental frequency of voice.
The truth shall set you free!
Miniaturization can also come into place where you can shove the camera into a cell phone etc.
The laws of physics say otherwise. You can't get the resolution/quality out of a cell phone sized camera to do it at any difference, there just isn't enough in the lenses to do it.
Then couple in the need for 6k frames per second being about the bottom line requirement for getting voice frequencies using this technique, its pretty useless in any situation that an alternative wouldn't work better.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
A lot of the posts above are saying you need a high frame rate camera but the video uses a 60fps camera at the end. Am I misunderstanding something or are people above not watching the whole video?
I'm curious how much of the image they are using to recover the sound and if it would be possible to use multiple objects in the image to recover audio at different points. If you have some knowledge of the three dimensional spatial relationship between the various objects you could probably use phased array techniques to effectively restrict your signal to sound coming from particular points significantly improving noise rejection.
We totally /.'d Youtube now - at the time of writing this, the video has been watched 2k times. After several hours of being on the /. front page.
Actually, I was terribly unimpressed, too, until they did something clever that overcomes this limitation. Watch the last few dozen seconds of the video, as they pull sound out of the video recorded by a normal, consumer DSLR: no high speed video needed!
I just thought of using microwaves to get a reflection from something metallic and vibrating. Wouldn't that work even through the blinds? (Or even walls?)
Ezekiel 23:20
From the looks of it a 1d array might work rather well and get the frame rates required. I didn't say generic cell phone just fitting in the form factor or close enough to not be suspicious. This could fit in existing security camera form factors (the 18 ish inch long all weather enclosures commonly used) that are so common as to be forgotten.
No sir I dont like it.
I don't know but I bet a Google Glass wearer might be the first to find out.
Imagine that, now not only do you have to worry about your pic and video of you at the bar hanging with the guys being posted all over face book by someone you don't even know, but now your conversations about who you would like to bag might make it back to your wife or significant other.
Expect the cops and government to use this. I imagine a drone flying over at 15,000 feet with a telescopic lenses and listening in on a terrorist or even your conversations in your own back yards. And the courts already said that cops video taping you is legit if they can legally see you- no warrant needed.
They won't know the make/model/speed of the fan.
"We tend to scoff at the beliefs of the ancients. But we can't scoff at them personally, to their faces, and this is what annoys me."
- Jack Handey
-kgj