Hackers Can Take Control of Siri and Alexa By Whispering To Them in Frequencies Humans Can't Hear (fastcodesign.com)
Chinese researchers have discovered a vulnerability in voice assistants from Apple, Google, Amazon, Microsoft, Samsung, and Huawei. It affects every iPhone and Macbook running Siri, any Galaxy phone, any PC running Windows 10, and even Amazon's Alexa assistant. From a report: Using a technique called the DolphinAttack, a team from Zhejiang University translated typical vocal commands into ultrasonic frequencies that are too high for the human ear to hear, but perfectly decipherable by the microphones and software powering our always-on voice assistants. This relatively simple translation process lets them take control of gadgets with just a few words uttered in frequencies none of us can hear. The researchers didn't just activate basic commands like "Hey Siri" or "Okay Google," though. They could also tell an iPhone to "call 1234567890" or tell an iPad to FaceTime the number. They could force a Macbook or a Nexus 7 to open a malicious website. They could order an Amazon Echo to "open the backdoor." Even an Audi Q3 could have its navigation system redirected to a new location. "Inaudible voice commands question the common design assumption that adversaries may at most try to manipulate a [voice assistant] vocally and can be detected by an alert user," the research team writes in a paper just accepted to the ACM Conference on Computer and Communications Security.
"our always-on voice assistants" -- the only thing that's always on is my refrigerator. Siri likes it when I press her button anyway. It would be interesting to do some electronic shoulder surfing at the airport though ... heh Band pass filter coming ASAP!
... a team from Zhejiang University translated typical vocal commands into ultrasonic frequencies that are too high for the human ear to hear, but perfectly decipherable by the microphones and software powering our always-on voice assistants.
I extol the Chinese on this discovery; & let's also agree that there's likely to be a [quick] fix as it doesn't seem that complicated.
Exactly. If someone is exploiting this in my house, then it means they already broke in and have complete physical access to my house, screwing around with the Echo and maybe making fradulent Amazon orders or whatever would be the least of my concerns.
Can I inject an attack into OTA frequencies received by radios and tv.
When Siri first came out, anyone could trigger "Hey Siri" if it was enabled. But starting with a later version of iOS (I don't remember exactly which one), you would train Siri to recognize your voice - and it seemed to work. I now can trigger my phone but not my wife's, for example. So I'm curious how this particular exploit could work on a reasonably current version of Siri.
Now the Apple Watch is another matter... and I don't recall if macOS Sierra does the voice pairing. But I'm somewhat skeptical about this working on an up-to-date iPhone.
#DeleteChrome
Solution (hardware): RC low-pass filter.
Solution (software): fft low-pass filter.
bug fixed.
"Alexa, kill all humans."
Table-ized A.I.
I can see two clear exploits:
1) Set up a personal 900 number
2) ???
3) Get on a PA system and broadcast the ultra-sonic message to call your 900 number
4) Profit!!!
The other exploit is step 3) just broadcast a normal audible message to call your 900 number
That was the turning point of my life--I went from negative zero to positive zero.
Fascinating information.
YAY! My useless superpower to hear up to around 30-35KHz will come in handy for things other than knowing if someone left a CRT television on! I can now detect "dolphin attacks" apparently.
Not really. You just need remote access to something nearby with a speaker. In fact you don't even need remote access; you just need the target to play a specially prepared audio file on that speaker.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
You're not thinking very creatively, since I was able to think of a variety of attacks that could use this without having physical access to the interior of your home.
For instance, they could have just dropped a small device into your pocket that every few minutes emits an inaudible command to open the garage door. You, yourself would be the vector through which the attacker could attack your always-on devices in your home. In fact, it could even be something you're aware of, like a thumb drive you were given that secretly has a tiny speaker built in or that is setup to autorun a sound file with the commands when plugged into a computer.
Alternatively, a person who is known to you but who you don't realize is malicious could use this to gain physical access. Maybe you're okay taking a FaceTime call from them, but then they transmit the inaudible signal over the call, which your iDevice faithfully reproduces, resulting in Alexa, Siri, or whatever else opening your garage door. Or maybe someone standing outside at your smart doorbell uses it when you ask what they want via the app, resulting in your phone or tablet reproducing the sounds within earshot of a device that will respond to them.
A third possibility is that they could use your always-on phone to engage in an attack against your home even while you're not at home. For instance, an attacker passing you in the street could activate the commands on a device in your hand or pocket via "OK Google" or "Hey Siri" to open your garage door for a crony of theirs. For that matter, anyone who can get within listening distance of your phone can use this attack on it, all without ever having access to the devices within your home.
Cap'n Crunch called, he wants his attack vector back.
Exactly. If someone is exploiting this in my house, then it means they already broke in and have complete physical access to my house, screwing around with the Echo and maybe making fradulent Amazon orders or whatever would be the least of my concerns.
Yeah, I'm struggling to see the use case. Maybe a cloak-and-dagger situation where you have limited legitimate access under close scrutiny and want to plant a bug but can't do it physically, like say you're a fake inspector at a drug lord's house. All you have to do is make some pretext to walk past the device with the ultrasonic command playing and it'll go to some malware site and root itself. Pretty far fetched though...
Live today, because you never know what tomorrow brings
Exactly.
If by exactly you mean it is something completely different.
If someone is exploiting this in my house, then it means they already broke in and have complete physical access to my house,
Like if they embedded the audio in a youtube video that you were watching? That's basically equivalent to already having broken into your house and having run of the place right?
And what if they are exploiting it on the phone in your pocket... you do go out of the house right? Maybe you dont want the guy behind you at starbucks to prank you by getting your phone to set an alarm at 2am, or order you all 180 episodes of the Golden Girls.
screwing around with the Echo and maybe making fradulent Amazon orders or whatever would be the least of my concerns.
Or it could be the means to breaking in. Slip a tiny ultrasonic speaker under a door jam or window sill... and tell it to unlock and open the door, perhaps it even works by holding the speaker against the window glass. Not that your front door lock is a big obstacle to a would-be thief... but do you really want your house to roll out the welcome matt to every jackass with the means to play an aac file within hearing of your home?
Would make a good movie script?
Spies and embassy workers wondering around whispering to another nations mil/gov contractors?
Imagine of an area in any nation filled with mil/gov contractors.
A thought experiment with trusted devices to be turned on outside secure working hours and a network of whispers waiting over a wide area.
Domestic spying is now "Benign Information Gathering"
Maybe the hackers can make these voice assistants actually work well (i.e. Siri), and do something actually useful?
Um, they just need to be in range of ultrasonic frequencies, which means this is exploitable anywhere on the same block as the building you're in. I hope if you live in an apartment complex all your neighbors are really really nice and trustworthy people who are close personal friends of yours.
" translated typical vocal commands into ultrasonic frequencies that are too high for the human ear to hear, but perfectly decipherable by the microphones and software powering our always-on voice assistants."
But, on the Internet, no one knows you're a dog.
"National Security is the chief cause of national insecurity." - Celine's First Law
[ I hope you all like creamed corn. ]
It must have been something you assimilated. . . .
That input to a voice recognition system would be run through a notch (bandpass) filter only a little wider than human vocal range. It just seems like such a simple way to help sanitize the input.
Silence is a state of mime.
They could order an Amazon Echo to "open the backdoor."
If you're not home and someone says "open the back door" loud enough for Alexa to hear it, you've fucked yourself anyway.
Pro tip: Don't control your security system/door locks with a voice system anyone can use. You may as well have the doorbell unlock the door.
Or it means they're outside your house while you're not home, with a loud enough ultrasonic sound for your Echo to hear through the wall.
Now your door is unlocked (because you were stupid enough to hook your door locks up to the internet and have them voice controlled).
A few days ago, I happened to be reading something online and paused and said you myself aloud, "Are you serious?"
And suddenly, my iPhone — which was far across the room and plugged in — lit up and Siri asked me what I wanted.
Apparently, "Are you serious" sounds like "Hey, Siri."
WTB Cap'n Crunch whistle PST
Convoluted technical means to get your internet devices to "open the back door" are not the go-to tactic for any burglar. Nor will they be.
The go-to tactic is to kick your door really hard or break a window, then retreat. This is a basic test for a real security system - with window switches, motion sensors, a battery, a failsafe, and a separate cellular connection. Getting Alexa or whatever to "open the back door" would only act as another test for this, and actually be _harder_.
If the cops don't show up within half an hour, they enter again some other time, and grab whatever looks good. (Ideally after finding the electrical service panel on the exterior of your house and flipping it, shutting off your geeky aftermarket recording setup.)
So. Get a real security system, and/or a large dog, and/or a housemate who's always home. Do whatever you like with Alexa.
That is more farfetched than the average CSI plot. Maybe you should be writing for TV instead of trolling for slavery.
Your imagination needs work. What if someone uses speakers? Either connected to a PC or while using the TV as a screen? You could easily have these frequencies playing off a website or in presumed bumper space at the beginning of streaming video, clips on twitter or streamable, etc. They could be set to very high decibles compared to the rest of the clip, and since you can't hear in that range you would still have no idea it even played, even if your volume level was set to a normal amount.
What about wardriving with big loudspeakers?
You couldn't use a 900 number because that would lead back to you. But one of these nations with phone billing scams could use it to make computers call their phone network.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Ignore all voice commands over say 500hz.
Wanna buy a shirt?
https://www.redbubble.com/people/stealthfinger/shop?asc=u
I'm actually surprised it worked. I'd have expected one of the first things the device would do is filer out frequencies above and below human speech in order to remove as much background noise as possible. Anything ultrasonic should be discarded as it can only ever be noise, since no human can talk that high*.
* Except after getting kicked in the balls.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
An attacker could place a speaker against a window pane and tell the device inside to unlock the doors. They could call the answer phone and talk to it that way.
Malicious ads already produce high frequency sounds that spyware on phones can track, so presumably they could just emit speech at those frequencies instead.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Someone posted a tale of woe on Twitter the other day. They bought a "smart" lock, controlled via an app on their phone. The phone uses nearby wifi APs to determine location without powering up the GPS. The guy has a portable wifi AP for use when travelling...
Every time he sets up his mobile AP, anywhere in the world, is house unlocks all its doors.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Possible, but less likely. The performance characteristics of most modern speakesr on most home quality devices would probably not give you much of a ceiling to transmit. IIIRC, most headphones for examples have frequency response in the 100Hz-30kHz range.
It strikes me that that one way to get around this is to be a bit more careful about signal processing on training - low pass filters filtering might help, or doing some dynamic range compression might help, though resolving higher harmonics in normal voices might be an issue.
It's not you: I'm just this horrifically socially awkward with everybody.
Hmm... poke a speaker through your letterbox?
Turn the amp up to 11 just outside your back door? (the neighbours won't hear it - it's ultrasonic)
Play sounds through an air vent, or open window too small to climb through?
Drill a hole in the wall or door?
Hey! What's wrong with the Golden Girls?
It seems this would have been filtered before the main processing, that so many programmers would have missed doing it seems incredibly unlikely. That "whispering" in ultrasonic frequencies would have any effect at all seems even more unlikely - if they claimed that blasting high volume ultrasonic sounds and using effects like beat tones that the microphones would detect it would seem possible at least.
That's a lot of effort to open the garage door... when my neighbor's house was on fire I didn't see exactly what the policeman did with the flat bar of steel he slipped in their garage door seam when he was helping the firemen verify the home was cleared, but he got it opened in about 10 seconds. Better to use a vulnerability like this in a public setting since there's no guarantee a specific person will have these enabled anyhow. Even then it would probably only be useful in accessing an undiscovered security hole deeper in the system.
Can I inject an attack into OTA frequencies received by radios and tv.
Analog and Digital media respectively can't carry such high frequencies
(the spectrum carriable by FM radio is narrower than the human ear. So it's the opposite: you can hear noises to which the radio is deaf)
or compress them away
(DAB+ radio and the various DVB- TV use AACplus codecs. This only encodes mid-range frequencies and re-generates high-frequencies by replicating the spectrum. It makes totally sense for compressing music - (store the base freq and the first couple of harmonics of an instrument, the rest of the spectrum can totally be guessed) - it's completely useless to encode ultra-sound-only speech. Also sound on these digital medium, the sampling rate is usually 48kHz, which (Nyquist, blabla) means you can code up to 24kHz sounds - given that DolphinAttack relies on > 20kHz ultra-high-pitched-speech that doesn't give a lot of frequency range. So even if the audio was uncompressed LPCM the quality might still be limiting).
BUT, on the other hand...
The TV in some circumstance, and the radio even more, tend to be "always on" devices that you leave working to give a background sound/music.
Even if you walk away, they'll be still playing.
In other words: why bother trying to hide your command stream in the inaudible ultra-sounds, if you emit your "a lot less disguised" (but much better within the transmission ability) commands when there's nobody around to notice them ?
(Example scenario : a would-be burglar spies to see the moment you go to the bathroom to take a bath. When they see you entering the bathroom, they quickly jam the FM radio frequencies (this may cause a short audible glitch - but you might not even notice the passing glitch over the noise of the water) with a signal asking your home assistant to open the door, quickly enter, steal your purse and whatever else, and leave.
By the time you leave the bath, they're gone and there's no even a sign of the burglary - door is intact)
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Alternatively, a person who is known to you but who you don't realize is malicious could use this to gain physical access. Maybe you're okay taking a FaceTime call from them, but then they transmit the inaudible signal over the call, which your iDevice faithfully reproduces, resulting in Alexa, Siri, or whatever else opening your garage door. Or maybe someone standing outside at your smart doorbell uses it when you ask what they want via the app, resulting in your phone or tablet reproducing the sounds within earshot of a device that will respond to them.
None of these can't work at all, for the exact same reason the TV/Radio attack mentionned above is severly limited :
Facetime isn't designed for dogs and bats (and dolphins).
Most of the modern internet applications for chat tend to use OPUS (e.g.: Skype, WhatsApp, Facebook, probably a few others).
This codec is optimized at carrying only audible sound/music/speech. As such the first step of OPUS is to kill all frequencies above 20kHz (no use to spend bits to encode stuff for which the ear lack any receptor. That would be like insisting to encode UV light on video instead of only R/G/B).
Given that Dolphin Attack relies on > 20kHz ultra-high-pitched speech, it won't work because it will be compressed away.
(Apple, on the other hand tend to be allergic to IETF standards and probably uses AACplus on their devices. That one similarily suck at carrying ultra-sounds: high part of the spectrum is actually reproduced by replicating the lower part of the spectrum, and most application are limited to 24kHz any way due to 48kHz sampling rate and Nyquist).
A third possibility is that they could use your always-on phone to engage in an attack against your home even while you're not at home. For instance, an attacker passing you in the street could activate the commands on a device in your hand or pocket via "OK Google" or "Hey Siri" to open your garage door for a crony of theirs. For that matter, anyone who can get within listening distance of your phone can use this attack on it, all without ever having access to the devices within your home.
In that case: Yes, an attacker could be carrying special equipment that works perfectly in Dolphin Attack's range (e.g.: a device working at 96kHz combined with a special ultra-sound speaker that has a good response on frequencies > 24kHz).
I see another problem here : from the few demos I've seen, voice assistant tend to repeat or other wise confirm commands.
The victim won't be hearing the ultra-sound high pitched commands, but they'll clearly hear the answer of the assistant to these command.
It would be very strange to suddenly here the phone in your pocket answering "Okay, I'm opening the garage door".
A much more realistic scenario would be to emit commands while the victim isn't around.
i.e.: jam the radio FM signal to give commands to the home assistant while the victim is taking a bath.
They might hear a short glitch (the jamming of the signal itself, and the commands needs to fit within the much more restricted audio bandwidth of FM radio - no way to emit them at >20kHz) coming from the radio in the living room, but they are much more likely to ignore it, think : "Yeah, again one of these solar flairs expected for this week-end".
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Convoluted technical means to get your internet devices to "open the back door" are not the go-to tactic for any burglar. Nor will they be.
The go-to tactic is to kick your door really hard or break a window, then retreat.
The problem is that "breaking the window" is a very noisy method that only works when the victim is away from home.
Managing to have the door opened to you - by e.g.: jamming the FM radio constantly blaring music as backgound - could work even when the victim is in another room of the house (e.g.: taking a bath).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Like if they embedded the audio in a youtube video that you were watching?
Bad news for this use case :
no matter what the meme says, internet wasn't build for dogs (Neither for bats nor dolphins)
As such most codecs used online are optimised for human hearing range.
- OPUS will filter out anything above 20kHz.
- AACplus only replicates spectrum from mid to high range.
etc.
And most audio sources only use 48kHz sampling rate (i.e.: up to 24kHz sounds anyway).
No way to hide secret message above the audible range : that range won't be carried.
And what if they are exploiting it on the phone in your pocket... you do go out of the house right? Maybe you dont want the guy behind you at starbucks to prank you by getting your phone to set an alarm at 2am, or order you all 180 episodes of the Golden Girls.
and when your pocket suddenly says "Okay, I'm buying 180 episodes of Golden Girls" confirmation, you're going to notice that something fishy is happening.
Sending audio commands while the victim isn't in the same room as the targetted device seems the better option.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
I noticed that when I am running my ultrasonic cleaner, Siri becomes almost completely unable to recognize my words. It knows I am speaking and detects word breaks but the accuracy drops to the point of uselessness even 5-6 feet from the source of the sound.
I haven't checked but it should be running in the 35-40 KHz range.
Have we really already forgotten how Burger King was using a commercial to trigger Google Home devices?
"I'm not sure I like the fugnutish tone you used in your post!" -RogL (608926)-
do you really want your house to roll out the welcome matt to every jackass with the means to play an aac file within hearing of your home?
I would hope that Matt would realize they aren't supposed to be in the house. Then again, his main job is welcoming people that Siri/Alexa tell him to welcome, so how smart can he really be?
But at 20KHz, you don't need much to block the audio. Something in your pocket (either the "small device" or your phone) probably wouldn't even be able to work.
There's also the issue that the device won't respond back at the sub-audible frequency.
Most of you here don't think, that if you transmit hey google or hey siri or hey alexa the assistants are like bump bamp (make their beeps to notify that they are indeed listening to you and you can talk) And that beeping is audible by owner and probably you will get feedback "ok, opening garage door" or such. So the owner will probably sure notice it. (eliminates youtube, voice chat or such), but if owner not home (what the garage door is about) then you could still have a tiny speaker laying around!
What if a shop exploits it by commanding digital assistants of the passersby to open the special web-site or tweeting @ a special account — and entering whatever information the assistant knows, but the attacker does not (yet)?
Even if little such information exists, the attacker's ability to highjack the browser and show coupons/specials/etc. would a worrying development — and that's the most benign thing I can think of...
In Soviet Washington the swamp drains you.
I am no sound engineer, but I don't think filtering high frequencies above speech would necessarily help their speech comprehension. Upper harmonics might well give hints to the module about the intended words. Second-language learners had more trouble understanding their non-native tongue over the old telephone networks, partly because of the filter on upper harmonics. POTS operators used the lowest bitrate they could get away with.
I assumed someone discovered a pattern to upper harmonics and is exploiting the hinting I described. If this is just shifting a "voice" to an inaudible pitch, that'd be kind of funny and clearly broken. TFA doesn't seem super-clear. I'd be curious to hear more from those who have relevant experience.
Clearly, if it's a security flaw, the companies in question will have to patch it and do something else.
You're not getting it. The _whole_point_ is to only enter the house when there is no one home. Contrary to what Hollywood feeds you, burglars have zero interest in dealing with hostages or committing murder. They want easily shiftable goods, not an armed confrontation and a bloody mess followed by huge police scrutiny.
I would think that both
1) Typical computer speakers wouldn't reproduce those frequencies well at all
and
2) Codecs wouldn't encode them in the first place
People claiming it can not be done should not interrupt people already doing it.
(BTW: Spotify (or was it another networked music player ?) is also doing it to help identify the various device that are within reach. And as it's the app it self generating the code, it's not constrained by the audio compression limits - Vorbis in their case.
They can easily emit beep codes at 24kHz which you would definitely NOT hear, but which could be emitted and picked up by current mic and speaker technology available to the platforms on which the various instances of music player are running and would like to "see eachother" in the physical world in addition to the network).
These ads are basic beep code, very simply signals that you can cram at the upper limit of what your medium can carry.
A few short beeps at somewhere between 6kHz and 10kHz (or whatever can get through your compression) could be audible (in absolute silence, with a good pair of earphone, provided that you haven't completely borked your upper hearing range be firing guns without adequate protection), but would be easily drowned in the rest of the noise emitted by the ads.
Today's article is about speech which must convey a lot more informartions and requires much time and frequency bandwidth.
DolphinAttack works by shifting the speech a lots of octave up, until it's above 20kHz (see paper, linked in the article, linked in this summary).
That CANNOT work in any way with current TV and Radio technology.
There simply doesn't exist any TV or Radio technology today (both analog or digital) that can carry frequencies above 20kHz.
(For the simple and obvious reason that TV and Radio was invented by humans and for humans, not for dogs, bats or dolphins.
Nobody was paying attention to keep those inaudible frequencies in. And if by dropping them you save space and/or radio frequency bandwidth, so be it)
So today's article technology CANNOT IN ANY WAY be carried over TV and radio due to very hard limits (**).
In theory, there should be ways to embed speech in TV and Radio.
You going to use these sentences of human-like speech but shifted several octaves up until they can be squeezed somewhere between ~4kHz and 10kHz, and thus stay within what could realistically be carried by your media.
That is definitely audible. Not necessarily *intelligible* but if you're sitting in front of the device and listening carefully, you'll notice some weird noise going on for a few seconds.
And of course, then suddenly hearing your phone in your pocket confirming "Okay, confirming 100'000$ purchase for 'ScamBot' article on alibaba" or "Okay, opening garage's door" - that's going to be a dead giveaway.
---
Visual analogy :
- some article : it should be possible to embed hidden infra-red images (projected by a light-bulb illuminated slide ? a special device ?) that can be interpretted as QR-Codes by selfie apps and trigger un-expected behaviour.
(NOTE: Near Infra-red ARE visible to smartphone cameras).
- /. crowd reaction : great! now someone would hack my smartphone just by hiding an invisible QR-Code in my TV picture whenever I try to make a reaction selfie in from of the TV.
(=I know that most smartphone don't actually scan QR-Codes from the main photo app, unlike Siri and co with voice, but for the sake of the argument we'll ignore that point).
- bandwidth limitation : No dum-dum ! TV aren't designed for snakes or insects, they can only emit R, G and B lights. There's no way to push IR over TV.
- counter point : Yeah, but in the real world, there are 3D PC monitors/projectors using visual cue for the left-right lens of 3D glasses !
(= in real world, done by shifting colors on a line).
(Also, small blinking pattern have been used by some form of banking 2FA, and by some early TV games-shows to beam data t
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
As such most codecs used online are optimised for human hearing range.
Bat call recorders predominantly use .wav, which does support it. So as long as you can get .wav your set; and you can put wav audio (as an LPCM format) into both AVI and MP4 containers... so I think it should be pretty doable.
And most audio sources only use 48kHz sampling rate (i.e.: up to 24kHz sounds anyway).
Lots of options for ultrasonic recording -- again... the whole bat call niche has you covered
I was answering to a different use case.
You were speaking about carrying the ultra sonics over youtube.
I'm pointing out that youtube has hard limitations preventing you from carrying ultra sonics.
Of course "custom device to blurt out ultra sonic" would work
(even a local app running on your high-range smartphone could probably switch the hardware in 96kHz audio out mode ?
Hey, we finally found a real-world use case for these 96kHz/192kHz audio out mode that the audiophile have insisted on having !~)
For that case, the confirmation would be the giveaway.
and when your pocket suddenly says "Okay, I'm buying 180 episodes of Golden Girls" confirmation, you're going to notice that something fishy is happening.
Maybe. Depends how loud it is, and how loud it is where you are. I've not heard my phone ring in my pocket lots of times.
Please keep in mind that the whole premise is about giving *audio commands* to your smartphone - i.e.: it needs to still be able to hear them.
If your smartphone is so deep in your pocket/handbag/whatever that you can't easily hear it, chances are that the device it self would have difficulties hearing the sound coming from the prankster's pocket though all these layers of cloth and other material that happens to have sonic isolating properties.
In short : if you can't hear it, chance are the device might not be able to hear you (or the prankster in return).
I was thinking that for the prank to work, you would need to target a phone that isn't deep into a hand bag, where it couldn't hear you (but where chance of hearing the confirmation are low).
You would probably target a phone that is in the shirt pocket where it can clearly hear your commands (but then the victim would clearly hear the confirmation).
But yeah, that's all speculation as I haven't attempted such prank and have no interests in even trying.
what are you going to do?
Myself ?
I wouldn't be using such an insecure thing as an "always on" voice command in the first place.
(on top of the fact that this command needs a round trip to some cloud on the internet just to be interpretted).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
You're not getting it. The _whole_point_ is to only enter the house when there is no one home. Contrary to what Hollywood feeds you,
I'm not basing my scenario on what crap is currently running on the TV.
I'm basing it on what has occasionally happened here around (but very likely, the burglars here around aren't the same as the one you have on your side of the Atlantic pond).
Something that is often seen :
An old couple go home after buying groceries. Grandma causally leaves her hand bag by the entrance door. (with her purse inside - containing money and credit cards, and this being an old couple, there's more cash than credit cards)
After finished to pack everything into the fridge, exhausted, they decide to take a nap.
When they wake up, they notice that the hand bag is missing (but nothing else of value. The big TV screen is still untouched in the middle of the living room). After inspecting the door, realize that it was forced with a crow bar.
Apparently burglar have noticed that the couple has left the living room (probably by looking at lights being turned on and off).
Once they though the path was clear, they made a run as fast as possible breaking the door and grabbing the first thing they could before the couple could notice them.
This scenario has played out a couple of time here where I live (a relatively calm and peaceful city) - no kidnapping, no menacing, no mugging. Just going as fast as possible for the simplest and surest thing with value to steal (i.e.: grab the bag containing the money while the people are away elsewhere in the apartment / house).
(It isn't as typical as the other kind of burglary - namely track the content of mailbox, notice which have been left accumulating for some time (and thus whose owner is on vacation) and then take all the necessary time to break in - sometime actually sawing the door around the "secure" lock - and take *everything* out, this time including the big heavy valuable or well stashed jewelry. But it still happens)
(On the other hand you hear now and then in the news, cases of burglary that have escalated to sequestration/menace/violence, etc. - but the fact that it is "news-worthy" probably means it happens VERY seldom).
The problem of this "break-and-run" technique is that breaking the door is still noisy, and there's a chance that the burglars could get discovered before they manage to grab anything of value.
Silent alternatives to get the door open (hacking shitty e-Locks, managing to give bogus commands to cloud-connected locks, etc.) would increases the chance of this kind of hit'n'run job to succeed.
Again, this are things that *have* happened here around.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Point was, the existing broadcast tech can carry the frequencies at the edge or even slightly beyond human hearing.
For digital TV and Radio, only barely so ("at the edge")(*). Not even beyond, and not much space up-there. (Though it might be enough to do what is a glorified form of Morse code(*) - as done by the advertiser example you site).
Still : they cannot carry the technology mentioned in the article (DolphinAttack relies on speech being pitched up all the way > 24kHz - way beyond what is carried by TV and Radio. My "infra-red over TV" metaphor still applies). /.ers panicking "DolphinAttack could be used over TV !" - No it can't on technical reasons, the same reasons you can't embed an invisible infra-red qr-code in a youtuube video neither.
So to all
And if you pitch up regular speech to the upper limits of what TV and Radio CAN actually carry, the lower bound of it will still be well within hearing range.
(the "frequencies at the edge of human hearing" that could be under some circumstance be carried by TV and Radio, don't have enough room to code speech, only Morse code)
you won't be clearly able to understand such high pitched speech, but you'll definitely be able to hear an audible glitch during it.
Un-hearable speech in ultra-sonics range is impossible over TV, Radio or Youtube.
Can it be exploited? Of course, it can...
Yes, it can be exploited. Just not completely silently.
You'll definitely hear some audible glitch during a "high pitched audible" attack.
Now per se, that doesn't prevent all possible exploitation situations.
What if you left it charging, while you take a shower? "Honey, your phone was saying something, not sure what..."
If the victim isn't nearby the phone and the TV when the attack occurs, why bother with the ultrasound in the first place ?
If the victim can't hear "Confirming 100'000$ purchase", the victim would be hearing "Ok Siri, now buy this 100'000$ article on alibaba".
In other words : the existence of DolphinAttack (no matter if actually broadcastable over TV or not) doesn't change a thing to the already existing exploitability of always-on voice controlled devices.
Possibilities are numerous, not all of them evil, but all of them worrying...
Possibilites that already existed well before ultra-sound communication. (You could do exactly the same kind of tracking by paying attention to your phone's Wifi and Bluetooth MAC addresses. Or have an installed app with over-reaching permissions - e.g.: location services)
If you are worried, you shouldn't start worrying only on the day you heard about DolphinAttack's ultra-sonic gimmicks.
You should have started worrying long ago, with things like always-on listening assistants, with applications that keep running in the background for no obvious reason, applications asking for access to location/camera/microphone for no obvious reasons, etc.)
Shut down your voice activated assistant NOW, don't pay attention if ultra sound works or not, the always on listening part is already a problem for quite some time.
---
(*): And even less possible nowadays.
The morse-code trick would have been possible with older generations of digital TV and Radio (they relied on MPEG Layer II audio - aka MP2) and Youtube (MPEG Layer III audio - aka MP3). You can represent a > 15 kHz signal in those.
The morse-code trick done by advertiser is technically impossible with newer generation of digital TV and Radio (AACplus) and most modern Internet applications (usually modern stuff like OPUS). :
As I've said in my post above
AACplus doesn't directly code high frequencies, it instead generates them from mid and low frequencies present in the same signal (again, that's a very efficient space saver due to how music works). Means that you can't code a > 10kHz audio signal alone, if there is
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]