Hackers Can Take Control of Siri and Alexa By Whispering To Them in Frequencies Humans Can't Hear (fastcodesign.com)
Chinese researchers have discovered a vulnerability in voice assistants from Apple, Google, Amazon, Microsoft, Samsung, and Huawei. It affects every iPhone and Macbook running Siri, any Galaxy phone, any PC running Windows 10, and even Amazon's Alexa assistant. From a report: Using a technique called the DolphinAttack, a team from Zhejiang University translated typical vocal commands into ultrasonic frequencies that are too high for the human ear to hear, but perfectly decipherable by the microphones and software powering our always-on voice assistants. This relatively simple translation process lets them take control of gadgets with just a few words uttered in frequencies none of us can hear. The researchers didn't just activate basic commands like "Hey Siri" or "Okay Google," though. They could also tell an iPhone to "call 1234567890" or tell an iPad to FaceTime the number. They could force a Macbook or a Nexus 7 to open a malicious website. They could order an Amazon Echo to "open the backdoor." Even an Audi Q3 could have its navigation system redirected to a new location. "Inaudible voice commands question the common design assumption that adversaries may at most try to manipulate a [voice assistant] vocally and can be detected by an alert user," the research team writes in a paper just accepted to the ACM Conference on Computer and Communications Security.
... a team from Zhejiang University translated typical vocal commands into ultrasonic frequencies that are too high for the human ear to hear, but perfectly decipherable by the microphones and software powering our always-on voice assistants.
I extol the Chinese on this discovery; & let's also agree that there's likely to be a [quick] fix as it doesn't seem that complicated.
Exactly. If someone is exploiting this in my house, then it means they already broke in and have complete physical access to my house, screwing around with the Echo and maybe making fradulent Amazon orders or whatever would be the least of my concerns.
Solution (hardware): RC low-pass filter.
Solution (software): fft low-pass filter.
bug fixed.
I can see two clear exploits:
1) Set up a personal 900 number
2) ???
3) Get on a PA system and broadcast the ultra-sonic message to call your 900 number
4) Profit!!!
The other exploit is step 3) just broadcast a normal audible message to call your 900 number
That was the turning point of my life--I went from negative zero to positive zero.
Not really. You just need remote access to something nearby with a speaker. In fact you don't even need remote access; you just need the target to play a specially prepared audio file on that speaker.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
You're not thinking very creatively, since I was able to think of a variety of attacks that could use this without having physical access to the interior of your home.
For instance, they could have just dropped a small device into your pocket that every few minutes emits an inaudible command to open the garage door. You, yourself would be the vector through which the attacker could attack your always-on devices in your home. In fact, it could even be something you're aware of, like a thumb drive you were given that secretly has a tiny speaker built in or that is setup to autorun a sound file with the commands when plugged into a computer.
Alternatively, a person who is known to you but who you don't realize is malicious could use this to gain physical access. Maybe you're okay taking a FaceTime call from them, but then they transmit the inaudible signal over the call, which your iDevice faithfully reproduces, resulting in Alexa, Siri, or whatever else opening your garage door. Or maybe someone standing outside at your smart doorbell uses it when you ask what they want via the app, resulting in your phone or tablet reproducing the sounds within earshot of a device that will respond to them.
A third possibility is that they could use your always-on phone to engage in an attack against your home even while you're not at home. For instance, an attacker passing you in the street could activate the commands on a device in your hand or pocket via "OK Google" or "Hey Siri" to open your garage door for a crony of theirs. For that matter, anyone who can get within listening distance of your phone can use this attack on it, all without ever having access to the devices within your home.
Cap'n Crunch called, he wants his attack vector back.
Exactly. If someone is exploiting this in my house, then it means they already broke in and have complete physical access to my house, screwing around with the Echo and maybe making fradulent Amazon orders or whatever would be the least of my concerns.
Yeah, I'm struggling to see the use case. Maybe a cloak-and-dagger situation where you have limited legitimate access under close scrutiny and want to plant a bug but can't do it physically, like say you're a fake inspector at a drug lord's house. All you have to do is make some pretext to walk past the device with the ultrasonic command playing and it'll go to some malware site and root itself. Pretty far fetched though...
Live today, because you never know what tomorrow brings
YAY! My useless superpower to hear up to around 30-35KHz will come in handy for things other than knowing if someone left a CRT television on! I can now detect "dolphin attacks" apparently.
and numerous AC/DC adapters, and faulty capacitors. And the fun of returning loud and obnoxious devices that a vendor can't hear.
Inheritance is the sincerest form of nepotism.
Exactly.
If by exactly you mean it is something completely different.
If someone is exploiting this in my house, then it means they already broke in and have complete physical access to my house,
Like if they embedded the audio in a youtube video that you were watching? That's basically equivalent to already having broken into your house and having run of the place right?
And what if they are exploiting it on the phone in your pocket... you do go out of the house right? Maybe you dont want the guy behind you at starbucks to prank you by getting your phone to set an alarm at 2am, or order you all 180 episodes of the Golden Girls.
screwing around with the Echo and maybe making fradulent Amazon orders or whatever would be the least of my concerns.
Or it could be the means to breaking in. Slip a tiny ultrasonic speaker under a door jam or window sill... and tell it to unlock and open the door, perhaps it even works by holding the speaker against the window glass. Not that your front door lock is a big obstacle to a would-be thief... but do you really want your house to roll out the welcome matt to every jackass with the means to play an aac file within hearing of your home?
Um, they just need to be in range of ultrasonic frequencies, which means this is exploitable anywhere on the same block as the building you're in. I hope if you live in an apartment complex all your neighbors are really really nice and trustworthy people who are close personal friends of yours.
[ I hope you all like creamed corn. ]
It must have been something you assimilated. . . .
Or it means they're outside your house while you're not home, with a loud enough ultrasonic sound for your Echo to hear through the wall.
Now your door is unlocked (because you were stupid enough to hook your door locks up to the internet and have them voice controlled).
A few days ago, I happened to be reading something online and paused and said you myself aloud, "Are you serious?"
And suddenly, my iPhone — which was far across the room and plugged in — lit up and Siri asked me what I wanted.
Apparently, "Are you serious" sounds like "Hey, Siri."
That input to a voice recognition system would be run through a notch (bandpass) filter only a little wider than human vocal range.
The point of the attack is that they're using the nonlinearity of the mechanical microphone to "mix" the ultrasonic carrier and sidebands to produce "demodulated" audio on the microphone output. Though there is no "baseband" audio in the air, that demodulated audio IS baseband. So no amount of filtering will separate it from a real voice signal.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
WTB Cap'n Crunch whistle PST
Who says Siri is that discriminating, even when dealing with a 'trained' voice?
The day after my wife got Siri all trained 'only' to recognize her, I could spoof her by simply talking out of the back of my throat and bumping my voice up a few octaves. I sound ridiculous, and nothing like my wife... but despite several re-trainings, I can still get her phone to do things she doesn't want.
Help Brendan pay off his student loans
What about wardriving with big loudspeakers?
I'm actually surprised it worked. I'd have expected one of the first things the device would do is filer out frequencies above and below human speech in order to remove as much background noise as possible. Anything ultrasonic should be discarded as it can only ever be noise, since no human can talk that high*.
* Except after getting kicked in the balls.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Someone posted a tale of woe on Twitter the other day. They bought a "smart" lock, controlled via an app on their phone. The phone uses nearby wifi APs to determine location without powering up the GPS. The guy has a portable wifi AP for use when travelling...
Every time he sets up his mobile AP, anywhere in the world, is house unlocks all its doors.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
It seems this would have been filtered before the main processing, that so many programmers would have missed doing it seems incredibly unlikely. That "whispering" in ultrasonic frequencies would have any effect at all seems even more unlikely - if they claimed that blasting high volume ultrasonic sounds and using effects like beat tones that the microphones would detect it would seem possible at least.
I am no sound engineer, but I don't think filtering high frequencies above speech would necessarily help their speech comprehension. Upper harmonics might well give hints to the module about the intended words. Second-language learners had more trouble understanding their non-native tongue over the old telephone networks, partly because of the filter on upper harmonics. POTS operators used the lowest bitrate they could get away with.
I assumed someone discovered a pattern to upper harmonics and is exploiting the hinting I described. If this is just shifting a "voice" to an inaudible pitch, that'd be kind of funny and clearly broken. TFA doesn't seem super-clear. I'd be curious to hear more from those who have relevant experience.
Clearly, if it's a security flaw, the companies in question will have to patch it and do something else.
You misunderstand. This is Siri training you to talk in silly voices for its own amusement.