Encrypted VoIP Meets Traffic Analysis

← Back to Stories (view on slashdot.org)

Encrypted VoIP Meets Traffic Analysis

Posted by CmdrTaco on Tuesday March 15, 2011 @03:42AM from the i-only-speak-in-binary dept.

Der_Yak writes "Researchers from MIT, Google, UNC Chapel Hill, and Johns Hopkins published a recent paper that presents a method for detecting spoken phrases in encrypted VoIP traffic that has been encoded using variable bitrate codecs. They claim an average accuracy of 50% and as high as 90% for specific phrases."

23 of 98 comments (clear)

Min score:

Reason:

Sort:

Re:Bleh by Anthony+Mouse · 2011-03-15 03:29 · Score: 5, Informative

I'm pretty sure that identifying a specific word with 50% accuracy is better than random chance. There are more than two words in the English language.
So...obvious solution then? by Anthony+Mouse · 2011-03-15 03:30 · Score: 4, Interesting

Use fixed-bitrate encoding for VoIP.
1. Re:So...obvious solution then? by Bengie · 2011-03-15 03:58 · Score: 2
  
  until someone gets a warrant to string tap you. You'd think the string connecting the two cans is protected by quantum randomness from the string theory, but it is not.
2. Re:So...obvious solution then? by bsquizzato · 2011-03-15 04:00 · Score: 3, Interesting
  
  Not so obvious --- now you have a much less efficient use of bandwidth to deal with.
  The article describes the method used to detect phrases ...
  
  At a high level, the success of our technique stems from exploiting the corre-lation between the most basic building blocks of speech—namely, phonemes—and the length of the packets that a VoIP codec outputs when presented with these phonemes. Intuitively, to search for a word or phrase, we first build a model by decomposing the target phrase into its most likely constituent phonemes, and then further decomposing those phonemes into the most likely packet lengths. Next, given a series of packet lengths that correspond to an encrypted VoIP conversation, we simply examine the output stream for a sub-sequence of packet lengths that match our model.
  Essentially, you gather enough information about how a VBR codec could encode a speech phrase you are looking for, then predict where it was spoken by looking at the "data bursts" being sent in the media stream. We'll need to research a way to "scramble" this predictability that's more efficient than using fixed bitrates, which eats up un-needed bandwidth.
3. Re:So...obvious solution then? by Anonymous Coward · 2011-03-15 04:09 · Score: 5, Informative
  
  OpenSSH had a similar problem, it would leak information about your login password by the timing/size of the packets:
  http://www.ece.cmu.edu/~dawnsong/papers/ssh-timing.pdf
  I believe their solution was to introduce random NOP packets into the stream. This approach could work here too.
4. Re:So...obvious solution then? by Cthefuture · 2011-03-15 04:22 · Score: 4, Interesting
  
  Actually most people are using G.711 these days which is in fact a fixed bitrate (it's the same protocol used on your normal "hard" voice line).
  But most VoIP providers do not offer SRTP or any encryption whatsoever so this whole thing is not even a question. More than likely anyone can listen in on your VoIP calls. We need to put more pressure on VoIP providers to offer encryption.
  
  --
  The ratio of people to cake is too big
5. Re:So...obvious solution then? by Jah-Wren+Ryel · 2011-03-15 05:03 · Score: 2
  
  We'll need to research a way to "scramble" this predictability that's more efficient than using fixed bitrates, which eats up un-needed bandwidth.
  Any fix is going "waste" some amount of bandwidth.
  One solution to this attack may be to semi-randomly inject "nops" to bridge phoneme breaks. So instead of being able to identify individual phonemes by bandwidth spikes, attackers will be limited to identifying entire word clusters - like filling the "space" between the phonemes in the first three words of a sentence to make it look like one really long phoneme.
  But perhaps something more exotic might work, like randomly re-ordering chunks of audio so that they are transmitted somewhat out of order and then re-ordered on the receiving end. That probably won't use up much extra bandwidth but would increase latency.
  
  --
  When information is power, privacy is freedom.
6. Re:So...obvious solution then? by Kjella · 2011-03-15 06:10 · Score: 2
  
  Not so obvious --- now you have a much less efficient use of bandwidth to deal with.
  Enough to matter? According to my cell phone bill, I had over 100MB of data traffic last month. That's about 10 hours of 24 kbps CBR encoded voice, which is the highest possible CBR setting speex has. If it's on my DSL/cable/whatever line, who cares? Even if I did that 24x7 for a month it'd be 7-8 GB and I'm pretty sure even a teenage girl with mouth diarrhea has to sleep sometimes. If that's what it takes, I don't see CBR as being a dealbreaker.
  
  --
  Live today, because you never know what tomorrow brings
Re:Bleh by gstoddart · 2011-03-15 03:30 · Score: 2

So on average that can't do any better than chance. Wow such great results!
I think if half the time you can identify a phrase in a supposedly encrypted stream ... that's better than 'chance'.

--
Lost at C:>. Found at C.
Stalin's Dream II by ackthpt · 2011-03-15 03:31 · Score: 2

Teh Recognisining.
"I'd like to order pizza, with pepperoni, pineapple, mushroom and an Iludium Pu-36 space modulator delivered to Hall of Justice."

--

A feeling of having made the same mistake before: Deja Foobar
1. Re:Stalin's Dream II by bmo · 2011-03-15 03:47 · Score: 2
  
  http://www.youtube.com/watch?v=7A4HeawmE6A
  Not knowing what an Illudium Pu-36 Explosive Space Modulator means you had a deprived childhood.
  --
  BMO
Re:Bleh by batquux · 2011-03-15 03:31 · Score: 4, Funny

Come on, 50% is better than most unencrypted voice recognition!
Re:Bleh by bennomatic · 2011-03-15 03:38 · Score: 4, Interesting

This reminds me of the guy Colbert interviewed regarding the Large Hadron Collider who thought there was a 50% chance that it would destroy the universe. When questioned as to how he got those odds, he said, "Well, there's two options... either it will happen or it won't happen. 50%."

--
The CB App. What's your 20?
Re:Bleh by zill · 2011-03-15 03:39 · Score: 4, Funny

A'LA'IH
Duh! by Anonymous Coward · 2011-03-15 03:43 · Score: 2, Insightful

When you want to secure something, you must think carefully about how you might be leaking information. You can't just slap some encryption on and call it a day.
Re:Bleh by Chrisq · 2011-03-15 03:47 · Score: 5, Funny

Once they discover a method to wire trap encrypted video calls, that would open a new era in porn scene.
...
I'm pretty sure that identifying a specific word with 50% accuracy is better than random chance. There are more than two words in the English language.
Maybe he's talking about the porn film.90% seem to be "oh" or "yes" (or so i am told)
3 years old work by slashdotmsiriv · 2011-03-15 03:59 · Score: 2

The conference version of the paper appeared in IEEE S&P 2008.
http://cs.unc.edu/~fabian/papers/oakland08.pdf
Re:Bleh by lwsimon · 2011-03-15 04:02 · Score: 2

I remember following this logic... when I was three. No shit, I have a vivid memory of trying to figure out how proportions worked - I knew that a penny tossed would give a 50/50 split, but that other problem with two states - e.g., when I threw a rock, I'd either hit the matchbox car or I wouldn't - weren't. I gave up, and figured it out later, when I was five or so.

--
Learn about Photography Basics.
then it's shitty encryption by cellocgw · 2011-03-15 04:25 · Score: 2

The definition (somewhere in the 'net archives) of encryption quality is how distinguishable the encrypted message is from random noise. Clearly setting bitrates, or any other parameter, based on the input, is not random.
Pick a better algorithm and/or suck it up and waste a little bandwidth.

--
https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
Re:Bleh by ciderbrew · 2011-03-15 04:34 · Score: 4, Funny

The pitch is the main thing in the art form.
A low German voice - "ooohhh yaaaaa", over and over. then you have the high pitched Japanese squeak sound - "ii, ii, ii, kimochi". Which really gets annoying these days. It took a few years; but it IS annoying.
Re:Bleh by NotQuiteReal · 2011-03-15 04:51 · Score: 2

The two phrases are "can you hear me?" and "I have a bad connection, let me call you back."

--
This issue is a bit more complicated than you think.
RTP blinding by WaffleMonster · 2011-03-15 05:08 · Score: 2

A few solutions...
Add some number of pad bytes to each packet to fill in blanks.
Tweak existing high complexity codecs (ilbc, speex..etc) to maintain a persistant bitrate by dynamically scaling quality to even out the per packet bits.
Use a fixed bitrate codec (most of these really suck from bw effeciency vs quality perspective)
Switch variability to the time domain adding jitter to mask the signal and control latency/security tradeoff.
SRTP scares me because it was invented for a single narrow purpose. Would much prefer the use of DTLS to secure RTP streams which being very similar to TLS has received much more scrutiny than SRTP likely ever will.
useless, and easy countermeasures by t2t10 · 2011-03-15 05:17 · Score: 2

First of all, statements like "50% accuracy" are nearly useless; you need to know both precision and recall. And to the degree that "50% accuracy" tells you anything, it tells you that the system is pretty bad.
Finally, the countermeasure for this is the same as the countermeasure for other automated speech analysis techniques: play some singing or theater in the background.