Slashdot Mirror


Siri, Alexa, and Google Assistant Can Be Controlled By Inaudible Commands (venturebeat.com)

Apple's Siri, Amazon's Alexa, and Google's Assistant were meant to be controlled by live human voices, but all three AI assistants are susceptible to hidden commands undetectable to the human ear, researchers in China and the United States have discovered. From a report: The New York Times reports today that the assistants can be controlled using subsonic commands hidden in radio music, YouTube videos, or even white noise played over speakers, a potentially huge security risk for users. According to the report, the assistants can be made to dial phone numbers, launch websites, make purchases, and access smart home accessories -- such as door locks -- at the same time as human listeners are perceiving anything from completely different spoken text to recordings of music.

In some cases, assistants can be instructed to take pictures or send text messages, receiving commands from up to 25 feet away through a building's open windows. Researchers at Berkeley said that they can modestly alter audio files "to cancel out the sound that the speech recognition system was supposed to hear and replace it with a sound that would be transcribed differently by machines while being nearly undetectable to the human ear."

13 of 100 comments (clear)

  1. Alexa add big hairy balls to my shopping list by UnknownSoldier · · Score: 3, Insightful

    I wonder how long before we get inaudiable malware / trolled -- Alexa add big hairy balls to my shopping list!

  2. of course it does by vux984 · · Score: 4, Insightful

    And really most of this stuff is just as bad even if it is audible. It just means one has to figure out when you aren't home before they hold a speaker up to your mail slot / under the door / up to a window.

    And how are they going to secure it? Voiceprints -- we already have software that can defeat voiceprinting with a small sample. Passwords? That you have to say aloud everytime you use the device? That's pretty much pointless.

    This type of technology is fundamentally broken and from what i can see so far, it cannot be fixed.

    1. Re:of course it does by skids · · Score: 3, Interesting

      Some talented screenwriter could probably make a good movie screenplay out of a battle-royale between Siri and Alexa and Okaygoogle all trying to sabotage each other, meanwhile ruining the life of their owner. (And then get the companies to buy the rights so it'll never get shot)

    2. Re:of course it does by vux984 · · Score: 4, Insightful

      "so what? they play audio through my mailbox slot and tell it to play a podcast?"

      That's about the most innocuous thing you can do.
      In the prank category -- you could tell it to play never gonna give you up at full volume at 3am. every day.

      Moving up from there... tell it to call everyone on your contact list and hang up, or to text them all weird messages.

      Tell it to send a booty call to your crazy ex. Tell it to text a break up message to your girlfriend.

      Tell it to unlock your door - i mean amazon sells a door lock now specifically so you can do this with amazon prime. If it catches on this could be pretty big and not some nerdy niche zigbee thing.

      Tell it to turn off the heat in the dead of winter while you are on vacation.

      Tell it to start your car in the garage. (yeah... this already a thing you can do... fucking brilliant)

      Tell it to record your conversations and send them to me.
      Tell it to send me your photos.
      Tell it to post all your photos to facebook or twitter.
      Tell it to forward me your email, or post them all to facebook and twiiter.

      Tell it to install new skills / features / apps to do stuff you didn't intend.

      Tell it to buy you something from amazon. I hear you can get 1,000 ethernet cables. (Maybe I'm even the seller of such marked up cables.)

      Tell it to call 911. (siri at least already does it)

    3. Re:of course it does by Hognoxious · · Score: 2

      Some talented screenwriter could probably make a good movie screenplay out of a battle-royale between Siri and Alexa and Okaygoogle

      And even if one doesn't, there's always George Lucas.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    4. Re:of course it does by green1 · · Score: 2

      Why would the user need to set it? it seems that there's a known frequency range for all human speech and anything outside of that should be rejected. No user side configuration required.

      That said, the article is less clear about this, but I suspect the sounds aren't actually outside of the human voice/hearing range, but rather disguised in other sounds. It's not that you hear silence while your voice assistant hears a command. It's more that you hear music, or white noise, or something else, while it hears a command. This is harder, because computers simply don't "hear" the same way that humans do, so it's no surprise that you can come up with sounds that trick the computer in to thinking they are the right words, while a human doesn't hear it.

      The most obvious solution is voiceprints, which I'm shocked aren't already widely in use, the technology is decades old at this point. Sure it doesn't help against a determined attacker who can record and synthesize your voice, but it has 2 big advantages in this case: 1) it's likely much harder to disguise a voice command as something else if it also has to match a voiceprint. 2) you immediately eliminate all attacks that target multiple people at once (ads on TV or radio, youtube videos targeted at a wide audience, etc)

  3. Bug or Backdoor? by cyberchondriac · · Score: 2

    TFA seems to indicate they believe this to be an unexpected and curious flaw in the software, but the fact that this works as well as it does, from up to 25 feet away, is inaudible to humans, and nearly all these PA devices can hear and respond to these types of ostensibly surreptitious commands.. well, maybe I'm paranoid, but maybe they just stumbled onto another NSA backdoor. Or even a Google/Apple/Amazon backdoor.
    I find this creepy and suspicious as hell.

    --

    Look back up at my post, now look back down, you're on the Internet. Now look back up. I'm a signature.
    1. Re:Bug or Backdoor? by Carewolf · · Score: 3, Insightful

      TFA seems to indicate they believe this to be an unexpected and curious flaw in the software, but the fact that this works as well as it does, from up to 25 feet away, is inaudible to humans, and nearly all these PA devices can hear and respond to these types of ostensibly surreptitious commands.. well, maybe I'm paranoid, but maybe they just stumbled onto another NSA backdoor. Or even a Google/Apple/Amazon backdoor.
      I find this creepy and suspicious as hell.

      No just a result of masquerading corporate spydevices as smart home devices with AI. They are not smart and they are not working for you.

  4. Play it backwards by jittles · · Score: 4, Funny

    Researchers at Berkeley said that they can modestly alter audio files "to cancel out the sound that the speech recognition system was supposed to hear and replace it with a sound that would be transcribed differently by machines while being nearly undetectable to the human ear."

    But did these so-called researchers see what Siri, Alexa, and Google Assistant do when they play the audio clip backwards? What kind of half-assed research is this?

  5. They already are controlled by inaudible commands by DogDude · · Score: 2

    They're already controlled by inaudible commands. Ethernet packets are silent. Do people think they "control" these things? How fucking stupid do you have to be to think that? Am I living in Douglas Adams's reality, where white mice are really running experiments on humans?

    --
    I don't respond to AC's.
  6. Well, all that depends on a bunch of factors... by MindPrison · · Score: 4, Insightful

    Hi, former technician here.

    I've been constructing and building so many robotic, listening devices, radio communication devices that I have enough under the belt to tell you that you don't really need to worry TOO much about all of that, at least not for now, here's why:

    1) For this to be at all possible, the devices involved must meet a range of technical specifications and capabilities. For example, you have a mobile speaker that is specced to work within 20 hz to 20KHz, most of these will fail above 10KHz anyway, and you don't need them to be better than that, for its purpose, headphones however - is an entirely different case.

    2) I've tested numerous microphones so small we're talking 2-3 mm size, and most of these failed to pick up frequencies above 20KHz. As a young person, you could potentially hear up to 24KHz (I could pick up 23KHz sounds when I was 18 and worked in an electronics store, we tested with a Function Generator and a Piezo speaker specced well above 28KHz). Today I can pick up around 16.5-17KHz, which is not bad for my age, but on the plus side, I don't need expensive headphones anymore.

    3) We're talking inaudible sounds to the human ears here, therefor we're above the 20KHz range, to be entirely safe - we should be above 25KHz for this, very few phones, televisions, computer speakers and whatnot are capable of vibrating or picking up vibrations at those speeds, therefor this kind of communication in that frequency spectrum would fail drastically.

    What you COULD do tho, is that you use the upper audible frequency spectrum of say just above 10KHz and mix it with existing sounds, time it correctly with proper known synchronization (remember the old modems and their sounds? Now imagine a much higher pitch) - and albeit quite slow, it would still be possible to use it to trigger commands, communicate short messages etc. Anything needing more bandwidth than this would be impractical. You wouldn't hear this, albeit the sound technically would be possible to pick up if it was too long, but if just a split second there, in sequence not spaced too close, you'd be able to get away with it, possibly disguised by music or voice, but you'd still need some form of "trigger" sequence to pick it up and start reading, otherwise you'd get timing errors. Kinda like "fast morsecode" if you like.

    If you're worried about eavesdropping, you should be far more concerned with your home's windows - those are like giant eardrums, and light hitting those will create a small vibration of the reflected light, this tech has been known for years, you just don't hear about it very often.

    --
    What this world is coming to - is for you and me to decide.
  7. User Account Permissions by The+MAZZTer · · Score: 2

    The basic form of this problem was solved long ago by using user accounts and permissions to give everyone their own preferences and storage spaces and dictate who has access to what resources. It just needs to be extended to these assistant devices by using voice recognition. Then any attack would have to be personalized for you which solves any attack trying to throw a wide net. Personalized attacks would have to be addressed by having the assistant verify it sounds like a real voice by a previously-identifed user and not a synthetic voice that's been shifted into an inaudible range or whatever.

  8. Re: Not news by Ronin+Developer · · Score: 2

    Did you read the article on just jump on the fact that prior research in this area negates the latest findings?

    The article credits the Chinese teams for their research in 2016. However, this story references new and recently published research applicable to real world attacks using almost any audio source. Security implications of this ongoing research are worrisome.