Encrypted VoIP Meets Traffic Analysis
Der_Yak writes "Researchers from MIT, Google, UNC Chapel Hill, and Johns Hopkins published a recent paper that presents a method for detecting spoken phrases in encrypted VoIP traffic that has been encoded using variable bitrate codecs. They claim an average accuracy of 50% and as high as 90% for specific phrases."
I'm pretty sure that identifying a specific word with 50% accuracy is better than random chance. There are more than two words in the English language.
Better stick to a constant bitrate then :)
People only use two phrases when they talk?
Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
Use fixed-bitrate encoding for VoIP.
I think if half the time you can identify a phrase in a supposedly encrypted stream ... that's better than 'chance'.
Lost at C:>. Found at C.
Teh Recognisining.
"I'd like to order pizza, with pepperoni, pineapple, mushroom and an Iludium Pu-36 space modulator delivered to Hall of Justice."
A feeling of having made the same mistake before: Deja Foobar
Come on, 50% is better than most unencrypted voice recognition!
Especially when being wiretapped.
I think there's a big difference in the probabilities of a coin toss and the probability of guessing the correct phrase of who-knows-how-many alternatives.
This reminds me of the guy Colbert interviewed regarding the Large Hadron Collider who thought there was a 50% chance that it would destroy the universe. When questioned as to how he got those odds, he said, "Well, there's two options... either it will happen or it won't happen. 50%."
The CB App. What's your 20?
People only use two phrases when they talk?
The phrases that it detects are "Badda-bing" and "Badda-boom."
A'LA'IH
When you want to secure something, you must think carefully about how you might be leaking information. You can't just slap some encryption on and call it a day.
Once they discover a method to wire trap encrypted video calls, that would open a new era in porn scene.
...
I'm pretty sure that identifying a specific word with 50% accuracy is better than random chance. There are more than two words in the English language.
Maybe he's talking about the porn film.90% seem to be "oh" or "yes" (or so i am told)
The conference version of the paper appeared in IEEE S&P 2008.
http://cs.unc.edu/~fabian/papers/oakland08.pdf
I remember following this logic... when I was three. No shit, I have a vivid memory of trying to figure out how proportions worked - I knew that a penny tossed would give a 50/50 split, but that other problem with two states - e.g., when I threw a rock, I'd either hit the matchbox car or I wouldn't - weren't. I gave up, and figured it out later, when I was five or so.
Learn about Photography Basics.
You mean when you vary a quality of your signal (in this case bitrate) based on content, people can read information about the content from those variations??? OMFG!
No it does not work like that (Wire tapping encrypted video calls).
It does not tap the signal, but increases your odds when guessing whether something was communicated in a specific manner.
Hivemind harvest in progress..
The definition (somewhere in the 'net archives) of encryption quality is how distinguishable the encrypted message is from random noise. Clearly setting bitrates, or any other parameter, based on the input, is not random.
Pick a better algorithm and/or suck it up and waste a little bandwidth.
https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
Oops ... wait a minute ...
Google is involved in this? Perhaps encryption could help them improve the accuracy of transcription in Google Voice...
I'm hoping it's best at picking up obvious spy phrases, like "the eagle has landed", "the moon fish squicks wickedly at midnight", "long is the gap between cacti"... Somehow I think it's probably best at "hello".
The pitch is the main thing in the art form.
A low German voice - "ooohhh yaaaaa", over and over. then you have the high pitched Japanese squeak sound - "ii, ii, ii, kimochi". Which really gets annoying these days. It took a few years; but it IS annoying.
The two phrases are "can you hear me?" and "I have a bad connection, let me call you back."
This issue is a bit more complicated than you think.
Did you note that they specified variable bit rate? In this case, I'll bet it had more to do with the timing and flow of the packets and bytes than with the actual content of the bytes. When there's a pause in a person's speech, there is a pause in the network traffic. Imagine someone trying to send morse code through an encrypted voice channel. Someone watching a bandwidth graph that had a high enough frequency would know exactly what coded message you sent regardless of the compression or encryption algorithm used (as long as the compression is variable bit rate). Due to the way voice data is compressed, increases or decreases in traffic could imply certain changes in tone, pitch, volume, inflection, etc. Tracked at a very high frequency, changes in the flow of bytes could give plenty of clues as to what is being said whether the traffic is encrypted or not. In general, encryption algorithms don't change the number or flow of bytes, just the content of the bytes.
A few solutions...
Add some number of pad bytes to each packet to fill in blanks.
Tweak existing high complexity codecs (ilbc, speex..etc) to maintain a persistant bitrate by dynamically scaling quality to even out the per packet bits.
Use a fixed bitrate codec (most of these really suck from bw effeciency vs quality perspective)
Switch variability to the time domain adding jitter to mask the signal and control latency/security tradeoff.
SRTP scares me because it was invented for a single narrow purpose. Would much prefer the use of DTLS to secure RTP streams which being very similar to TLS has received much more scrutiny than SRTP likely ever will.
First of all, statements like "50% accuracy" are nearly useless; you need to know both precision and recall. And to the degree that "50% accuracy" tells you anything, it tells you that the system is pretty bad.
Finally, the countermeasure for this is the same as the countermeasure for other automated speech analysis techniques: play some singing or theater in the background.
Well, assuming that he has no knowledge about how the thing works and has no other information, his computation of probabilities is technically correct :)
Thus you increase latency, which is the single most important thing in a phonecall.
Nexidia has been selling proprietary tech to do this for years
On any digital signal, comparing a random source of bits should get you 50% accuracy.
-fb Everything not expressly forbidden is now mandatory.
I'm sure there's a mathematical/statistical reason why 50% accuracy is better than guessing in this case, but that would be very counter-intuitive. Same with as high as 90% under certain conditions. I could get to 90% accuracy if I could select out everything that reduced my accuracy as well. I don't doubt the full article explains better though. I'm not suggesting MIT, Google, etc scientists are stupid.
Seems that I started to detect a pattern between the current TFA and this one.
Now, DHS, I know I'm not at MIT, but other cases showed I don't need to... So, just where is my grant for advanced research of the subject?
Questions raise, answers kill. Raise questions to stay alive.
This should have got at least one +funny.
You mean it doesn't amount to "fuck" and "shit"? The media and the internet have fooled me again!
+Raider of the lost BBS
How many words are there in the English language - many tens of thousands at least.
Many tens of thousands???
I hope English is your second language.
There are over 1 MILLION English words in common and uncommon use.
[ http://www.languagemonitor.com/no-of-words/ ]
Yes.... many, many, many tens of thousands.
-AI
FWIW, in response to TFA... I realize their research is on phrases. Which
very quickly reduces the set. Since many of those words would only exist
in very few spoken phrases.
For me, it is far better to grasp the Universe as it really is than to persist in delusion
but they're recognizing individual words, from a set of many thousands of potential words, half the time or better.
That's really quite impressive. And you're an idiot.
From a set of many thousands of words...
and he's the idiot?
-AI
For me, it is far better to grasp the Universe as it really is than to persist in delusion
I'd tap that.