Remote Exploit of Vista Speech Control
An anonymous reader writes "George Ou writes in his blog that he found a remote exploit for the new and shiny Vista Speech Control. Specifically, websites playing soundfiles can trigger arbitrary commands. Ou reports that Microsoft confirmed the bug and suggested as workarounds that either 'A user can turn off their computer speakers and/or microphone'; or, 'If a user does run an audio file that attempts to execute commands on their system, they should close the Windows Media Player, turn off speech recognition, and restart their computer.' Well, who didn't see that coming?"
Taking a computer that obeys audio instructions, and playing it some audio instructions, is more of a 'duh' than an 'exploit'. But this problem is a very Good Thing. It can only mean:
-- EITHER people stop yakking on about voice computing, which has been the Way Of The Future since about 1935 or something
-- OR pressure is exerted on web designers to NOT make sites that start making noise the moment the page appears!
Either of these, but especially the latter, would be a big win. So here's to you, Mr. Exploit Finding Man!
Whence? Hence. Whither? Thither.
If you computer starts spitting out voice commands, just create another sound that will interupt it.
Admitedly all I can think of is the Dilbert cartoon with Wally getting ticked at Dilbert having voice driven software.
Do Or Do Not, There Is No Spoon, There Is Only Zuul. Everything in the above post is probably opinion.
I wouldn't call it a bug. I'd call it a very bad idea to use a microphone without a switch for voice recognition. Your television could theoretically do things on your computer. Does that sound like a possibility you want to entertain? Get a mic with a switch, or get rooted.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
All voice recognition software, no matter what platform, would suffer from this supposed "exploit". So why this article on Vista specifically? What is the real agenda here? Also, if the voice recognition software is trained for a specific user's voice, the chances of an exploit are reduced.
I remember someone once announcing a voice controlled video player, and wondered what would happen when it played a video in which someone shouted "Stop!"
Microsoft's comments on the BBC site are poor. What microphone feedback? If it's not howling now it's not going to suddenly howl when someone tries this exploit. Clear dictation - but the attacker will make the dictation as clear as possible, and the consolation that the user will likely be in the room to hear it happening - what consolation is that?
A solution would be to use echo cancellation as used in phone systems to prevent output from the speaker being used on the microphone.
- Richard
"Vista should be testing its incoming audio to detect whether it matches any outgoing audio that Vista is playing."
I guess you never saw a room with more than one computer in it.
I imagine it's not quite so straightforward. You'd need to take into account room acoustics, hardware effects, generic ambient noises, or even other interfering sounds in the same room that could all interfere with a comparison of outgoing sound to incoming sound. It's very rare that you'd ever have a time where your outgoing sound file exactly matches one that is sensed coming from the speakers.
Why can't the computer ignore all that sound? It knows it is outputting it so why not filter it?
The sound that is output by the computer sounds similar to us when re-received through the mic and played back, but to the computer it's a totally alien waveform. A lot of distortion happens between when the computer sends a digital signal to the sound card and when it receives an analog signal from your microphone - so basically, the computer may know what it's playing, but it has very little idea how it'll sound when it reaches the mic.
There are advanced filters and algorithms that can try to match and isolate particular patterns and "sounds" within a waveform, but they're not nearly as powerful as CSI would have us believe, and they also require far too much computing power to be run in realtime.
Of course, the obvious low-tech solution to this issue is to wear headphones, as people in recording studios have for decades.
The easiest answer to this question is, try it.
Most simple schemes people come up with to address this are perfectly doable with a free sound program. Play some music, record the area while you're playing the music, then try your great idea. Like, you might think you can start out with inverting the source file and feeding it into the recording with a delay and modified amplitude. If you're really curious about this problem, this is a better way to learn about the difficulties then reading people on the internet, as, in my experience, you're quite likely to be skeptical about the explanations anyhow. The best (and in some sense, only true) explanations involve a lot of math.
I can offer you this meta-rule, though: If it were so easy, it would already have been done. Many things that I see people posting on Slashdot about "Why don't they just do this thing?" are covered by this rule.
"Format C, Colin"
Probably a good idea, though. And while we're at it, since Microsoft recommends rebooting (again, sigh), perhaps it is wise to do so with an installation CD of [linux distro of choice] in the drive. Seriously, who wants Vista? More trouble than it's worth.
An exploit is, by definition, a successful manipulation of a bug/omission/hole/whatever in a computer system to make it perform something that it was not designed to do. Usually this term is only applied when said action is harmful or potentially harmful.
What is being described here is the possibility of controlling the voice recognition system in Vista remotely to make it perform potentially harmful tasks. Furthermore, this functionality is not something that said system was designed to do; it was only designed to accept commands via microphone.
Therefore, what is being described here is an exploit.
Q.E.D.
I hear there's rumors on the Slashdots
It's not necessary to restart the PC to turn off speech recognition - just say "stop listening" or click on the always visible recognition toolbar to turn the microphone off. It's also not on by default either, and only those interested in it will find it anyway. Not really an "exploit" that's actually exploitable.
The security advice is "A user can turn off their computer speakers..." before playing an audio file. We can also solve the problem of porn getting into our school network by unplugging the monitors. I didn't realize this security stuff was so easy.
Some mornings it's hardly worth chewing through the restraints to get out of bed.
I call bull. What about that "echo cancellation" feature you find on all the popular web cam software? What about all the collaboration software out there that has echo cancellation? The basic premise is that if you don't use headphones and instead the computer speakers then the mic will pick up the sounds that the computer is transmitting from the other side, and you'll get an echo. Saying that it requires far too much computing power is incorrect. While it probably won't make it totally disappear, it will reduce the incoming signal from the mic to a level such that the voice processing feature on the computer won't be able to make out any of the commands. "totally alien waveform" right. Tell that to Sony and their noise cancellation headphones. If they can fit the technology in a headphone then a modern computer capable of running Vista certainly has enough horsepower.