Remote Exploit of Vista Speech Control

← Back to Stories (view on slashdot.org)

Remote Exploit of Vista Speech Control

Posted by kdawson on Thursday February 1, 2007 @03:54AM from the format-C-yes dept.

An anonymous reader writes "George Ou writes in his blog that he found a remote exploit for the new and shiny Vista Speech Control. Specifically, websites playing soundfiles can trigger arbitrary commands. Ou reports that Microsoft confirmed the bug and suggested as workarounds that either 'A user can turn off their computer speakers and/or microphone'; or, 'If a user does run an audio file that attempts to execute commands on their system, they should close the Windows Media Player, turn off speech recognition, and restart their computer.' Well, who didn't see that coming?"

20 of 372 comments (clear)

Min score:

Reason:

Sort:

That's hardly an exploit by kahei · 2007-02-01 04:00 · Score: 4, Insightful

Taking a computer that obeys audio instructions, and playing it some audio instructions, is more of a 'duh' than an 'exploit'. But this problem is a very Good Thing. It can only mean:

-- EITHER people stop yakking on about voice computing, which has been the Way Of The Future since about 1935 or something
-- OR pressure is exerted on web designers to NOT make sites that start making noise the moment the page appears!

Either of these, but especially the latter, would be a big win. So here's to you, Mr. Exploit Finding Man!

--
Whence? Hence. Whither? Thither.
1. Re:That's hardly an exploit by Anonymous Coward · 2007-02-01 04:13 · Score: 2, Insightful
  
  Even so, with Vista's new software audio stack, this is inexcusable. It should have been trivial to compare the input and output signals and filter out most of this automatically.
2. Re:That's hardly an exploit by gstoddart · 2007-02-01 04:15 · Score: 4, Insightful
  
  -- EITHER people stop yakking on about voice computing, which has been the Way Of The Future since about 1935 or something
  -- OR pressure is exerted on web designers to NOT make sites that start making noise the moment the page appears!
  Or, we make browsers so they don't run every damned audio file, flash frigging plugin, executable, movie, or whatever that the idiot who made the site thinks I should hear/see/play with/click/download/execute or whatever.
  
  There has never been any sound from a webpage that didn't make me want to immediately beat the person who wrote it with his own leg. I don't want to listen to your stupid MIDI file of whatever the fsck you think is cool on your web page.
  
  There was never any good reason to embed sounds in web pages unless you have to click a button to specifically play it.
  
  Cheers
  
  --
  Lost at C:>. Found at C.
3. Re:That's hardly an exploit by morgan_greywolf · 2007-02-01 04:39 · Score: 2, Insightful
  
  or default to If playing audio then audio instructions listener = off
  Yes: for all of you fanbois out there saying "Oh, that's not an exploit!" pay attention to what the parent is saying! You gotta admit, it was huge oversight on Microsoft's part to not include any mechanism for turning off the accepting of audio instructions while playing audio, or at least to have a user-configurable option for protection against this exploit, defaulted to "On".
  
  This is yet another case of Microsoft putting ease-of-use ahead of security and reliablity. We've all heard this song before. Same story, different Windows version.
  
  --
  My blog
amusing, but not much else by Thansal · 2007-02-01 04:01 · Score: 2, Insightful

If you computer starts spitting out voice commands, just create another sound that will interupt it.

Admitedly all I can think of is the Dilbert cartoon with Wally getting ticked at Dilbert having voice driven software.

--
Do Or Do Not, There Is No Spoon, There Is Only Zuul. Everything in the above post is probably opinion.
Bug? by drinkypoo · 2007-02-01 04:01 · Score: 3, Insightful

I wouldn't call it a bug. I'd call it a very bad idea to use a microphone without a switch for voice recognition. Your television could theoretically do things on your computer. Does that sound like a possibility you want to entertain? Get a mic with a switch, or get rooted.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
The Real Agenda of this Article? by ksalter · 2007-02-01 04:02 · Score: 4, Insightful

All voice recognition software, no matter what platform, would suffer from this supposed "exploit". So why this article on Vista specifically? What is the real agenda here? Also, if the voice recognition software is trained for a specific user's voice, the chances of an exploit are reduced.
1. Re:The Real Agenda of this Article? by shark72 · 2007-02-01 04:18 · Score: 2, Insightful
  
  "All voice recognition software, no matter what platform, would suffer from this supposed "exploit". So why this article on Vista specifically? What is the real agenda here? Also, if the voice recognition software is trained for a specific user's voice, the chances of an exploit are reduced."
  
  Yup, this is an old one. There's an apocryphal tale of a user group meeting from long ago of a vendor demonstrating voice-control software and a smart aleck in the back of the room yelling "DEL *.*!" (or whatever the MS-DOS command was).
  
  As you implied, the agenda is, of course, to have a laugh at Microsoft's expense. If they hadn't included voice control software, the opportunity would have been to point out that Microsoft spent $BIGNUM person-years working on Vista and didn't even include that feature. OSX's easy access to a shell prompt with root access is about as relevant an exploit as the voice control exploit, and the odds of a cat wandering into my house and walking on the keys in such a way to generate the wrong "rm" command are about the same as this Vista "exploit" happening to me. But, it's aways fun to have a laugh at Microsoft's expense, isn't it?
  
  --
  Sitting in my day care, the art is decopainted.
2. Re:The Real Agenda of this Article? by planetmn · 2007-02-01 06:10 · Score: 2, Insightful
  
  Except that it will never match. You are basically doing a D/A conversion to output the sound via the speakers, and then A/D when using the mic for input. Both of these stages will cause some distortion (lots of distortion with crappy speakers and microphones). Furthermore, the acoustical environment is going to affect different frequencies to different extents.
  
  For instance, the mic may not pick up any of the low frequencies due to location of a subwoofer, quality of speakers, sound absorbers (carpet, etc.). So in order to match the output to the input, you need to allow for these factors and by the time that you give yourself enough of a margin, you've in effect taken out all functionality.
  
  Sure, it's fun to bash MS here on slashdot. Just don't let reality get it the way.
  
  -dave
  
  --
  /., where "Apple and Google provide Iran with nukes" will be refuted with "But Microsoft is a convicted monopolist"
3. Re:The Real Agenda of this Article? by adolf · 2007-02-01 07:37 · Score: 2, Insightful
  
  All true.
  
  However, this should be a solvable problem with current DSP technology.
  
  If my cellular telephone can perform realtime echo cancellation, and subtract its own speakerphone audio from the microphone audio, and do it for several hours at a time on a battery the size of a matchbook, then I can only fucking hope that a modern dual-core machine would be able to tackle the task handily.
  
  Even after the variables are all multiplied by some factor because the speakers might move relative to the microphone, there seems to be plenty of horsepower available to throw at the problem. The fundamentals have all been solved by folks like Bell Labs, US Robotics, and Polycom a long fucking time ago, with less DSP power than my $20 optical mouse, using the widely variable POTS network as a testbed, where even the -remote- handset affects the quality of your own voice on the line.
  
  Just because there's layers of distortion, band limiting, spurious external noises, with dynamics and delay possibly being anywhere on the map and an echo signature that changes as people move around the room, does not mean that it's not all measurable, quantifiable, and possible to reduce it to acceptable levels.
  
  Remember, you don't have to get rid of all the feedback, and it doesn't have to be perfect. We're talking about a limiting computer's ability to hear itself, which is a far easier task than anything involving a human being. You only have to get rid of enough that the computer does not respond to its own voice. And also, remember that the resultant quality of the recorded microphone audio need not be production-grade, but only good enough for the computer to understand human-generated voice commands.
  
  --
  Kid-proof tablet..
Voice controlled video player. Echo cancellation? by Anonymous Coward · 2007-02-01 04:03 · Score: 1, Insightful

I remember someone once announcing a voice controlled video player, and wondered what would happen when it played a video in which someone shouted "Stop!"

Microsoft's comments on the BBC site are poor. What microphone feedback? If it's not howling now it's not going to suddenly howl when someone tries this exploit. Clear dictation - but the attacker will make the dictation as clear as possible, and the consolation that the user will likely be in the room to hear it happening - what consolation is that?

A solution would be to use echo cancellation as used in phone systems to prevent output from the speaker being used on the microphone.

- Richard
Re:In One Ear and Out the Other by itsme1234 · 2007-02-01 04:16 · Score: 2, Insightful

"Vista should be testing its incoming audio to detect whether it matches any outgoing audio that Vista is playing."

I guess you never saw a room with more than one computer in it.
Maybe a good start, but not that easy by mopslik · 2007-02-01 04:21 · Score: 2, Insightful

Vista should be testing its incoming audio to detect whether it matches any outgoing audio that Vista is playing.

I imagine it's not quite so straightforward. You'd need to take into account room acoustics, hardware effects, generic ambient noises, or even other interfering sounds in the same room that could all interfere with a comparison of outgoing sound to incoming sound. It's very rare that you'd ever have a time where your outgoing sound file exactly matches one that is sensed coming from the speakers.
Re:A Whole Decade of Nothing by xappax · 2007-02-01 04:34 · Score: 4, Insightful

Why can't the computer ignore all that sound? It knows it is outputting it so why not filter it?

The sound that is output by the computer sounds similar to us when re-received through the mic and played back, but to the computer it's a totally alien waveform. A lot of distortion happens between when the computer sends a digital signal to the sound card and when it receives an analog signal from your microphone - so basically, the computer may know what it's playing, but it has very little idea how it'll sound when it reaches the mic.

There are advanced filters and algorithms that can try to match and isolate particular patterns and "sounds" within a waveform, but they're not nearly as powerful as CSI would have us believe, and they also require far too much computing power to be run in realtime.

Of course, the obvious low-tech solution to this issue is to wear headphones, as people in recording studios have for decades.
Re:A Whole Decade of Nothing by Jerf · 2007-02-01 04:35 · Score: 4, Insightful

The easiest answer to this question is, try it.

Most simple schemes people come up with to address this are perfectly doable with a free sound program. Play some music, record the area while you're playing the music, then try your great idea. Like, you might think you can start out with inverting the source file and feeding it into the recording with a delay and modified amplitude. If you're really curious about this problem, this is a better way to learn about the difficulties then reading people on the internet, as, in my experience, you're quite likely to be skeptical about the explanations anyhow. The best (and in some sense, only true) explanations involve a lot of math.

I can offer you this meta-rule, though: If it were so easy, it would already have been done. Many things that I see people posting on Slashdot about "Why don't they just do this thing?" are covered by this rule.
Re:Most Important Part of the Announcement by BrokenHalo · 2007-02-01 04:43 · Score: 1, Insightful

"Format C, Colin"

Probably a good idea, though. And while we're at it, since Microsoft recommends rebooting (again, sigh), perhaps it is wise to do so with an installation CD of [linux distro of choice] in the drive. Seriously, who wants Vista? More trouble than it's worth.
I'm feeling anal today, so ... by spellraiser · 2007-02-01 04:44 · Score: 4, Insightful

An exploit is, by definition, a successful manipulation of a bug/omission/hole/whatever in a computer system to make it perform something that it was not designed to do. Usually this term is only applied when said action is harmful or potentially harmful.
What is being described here is the possibility of controlling the voice recognition system in Vista remotely to make it perform potentially harmful tasks. Furthermore, this functionality is not something that said system was designed to do; it was only designed to accept commands via microphone.
Therefore, what is being described here is an exploit.
Q.E.D.

--
I hear there's rumors on the Slashdots
Re:Restart? Really? by inquisitor · 2007-02-01 05:02 · Score: 2, Insightful

It's not necessary to restart the PC to turn off speech recognition - just say "stop listening" or click on the always visible recognition toolbar to turn the microphone off. It's also not on by default either, and only those interested in it will find it anyway. Not really an "exploit" that's actually exploitable.
Brilliant! by Kozar_The_Malignant · 2007-02-01 05:04 · Score: 2, Insightful

The security advice is "A user can turn off their computer speakers..." before playing an audio file. We can also solve the problem of porn getting into our school network by unplugging the monitors. I didn't realize this security stuff was so easy.

--
Some mornings it's hardly worth chewing through the restraints to get out of bed.
Re:A Whole Decade of Nothing by fwr · 2007-02-01 05:22 · Score: 2, Insightful

I call bull. What about that "echo cancellation" feature you find on all the popular web cam software? What about all the collaboration software out there that has echo cancellation? The basic premise is that if you don't use headphones and instead the computer speakers then the mic will pick up the sounds that the computer is transmitting from the other side, and you'll get an echo. Saying that it requires far too much computing power is incorrect. While it probably won't make it totally disappear, it will reduce the incoming signal from the mic to a level such that the voice processing feature on the computer won't be able to make out any of the commands. "totally alien waveform" right. Tell that to Sony and their noise cancellation headphones. If they can fit the technology in a headphone then a modern computer capable of running Vista certainly has enough horsepower.