Compressed VoIP Calls Vulnerable To Bugging
holy_calamity writes "Security researchers at Johns Hopkins report that a variable bit-rate compression scheme being rolled out on VoIP systems leaves encrypted calls vulnerable to bugging. Simpler syllables are squeezed into smaller data packets, with more complex ones taking up more space; the researchers built software that uses this to spot phrases of interest in encrypted calls simply by measuring packet size."
Easy Solution. Music in the background.
Anyone wanting to avoid detection could just follow what my German-speaking grandparents do when they don't want us kids listening into the conversation: randomly switch languages on different topics (though I think that this is sometimes also because some concepts are also easier to portray in a given language).
:-)
Random switches between languages would probably confuse the heck out of filters guessing compressed data. That or you could just learn Russian... I don't think they *have* any simple-syllable words in Russian
I would think that a very slight randomization of the packets with filler would add a trivial amount of data to the packet and would tend to interfere with thier analysis. I'm sure after a certain point of added bytes and randomization, you would change their margin of error such that the process wasn't useful or effective anymore.
FTFA
So, ummm, what we should do to, umm, well, protect ourselves from, ummm, yaknow, eavesdroppers, heh-heh, is well, make sure there's enough, ummmmmmm, yaknow, like extra noise, like, mixed in, dude.
Just st-st-stuh-stutter when you talk. And use a lot of, uh, you know, um, non-word sounds between, uh, like, your phrases. And don't use any complexificated words without Bushifying them first. Better yet, only speak in Klingon.
Or maybe you shouldn't say anything on VoIP that you don't want anyone else to hear.
steampunk web design
Encrypt the data first, then compress it.
http://www.mhall119.com
First, the article mixes things :
vowels actually are simpler than consonant to compress (because of spectral complexity - consonant use much more different frequencies. They are mostly noises and have a more "random"-like wave form making them harder to compress). They got it completely in reverse.
Then TFA doens't show a method to magically guess was is being said over a crypted channel only by looking at the bitrates, it only says that it finds some predetermined pattern in a given set of samples to test against. The whole thing would only be able to answer to some very simple questions like "did the words XYZ appear in the conversation ? or did ABC appear in the conversation ?" - with a rather bad success rate if those words are long and complex enough - which hardly makes it enough to obtain personal information or otherwise efficiently spy on someone.
Then the whole system has a lot of short comings :
- As said before it assumes that the spy know exactly that some phrase has to be said - if the spy doesn't guess exactly what words he must search for the attack fails (the users may be speaking in a foreign language to begin with).
- It assumes that the speech-generator-made needle they are looking for in the hay sack will be close to what they are looking for. The users may have an accent and pronounce words differently (cf alumnium vs. aluminium, etc...)
- And worse of all, it assume that the granularity of the packed will be small enough so that the phonemes will have an influence on the bit rate. Whereas in reality, short packets have a big overhead of bandwidth, longer packets increases the latency. But lots of VoIP users are happy with a 500ms latency because it really diminishes the overhead. At 500ms you can have a couple of words in a single packet. The whole packet will tend to have a corresponding bandwidth close to the average (there will be small difference between phonemes, but these will all be packed into the same packet and will average).
- It fails to take into account an interleaved video stream. Video conferencing is really popular, and its own bandwith will completely dwarf the bandwidth used by audio. So unless the VoIP uses 2 separate stream (some VoIP systems do), and only encrypt at the stream level, and the transmission is happening over a non crypted channel (no sane person should do that), this method will fail epically.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
I don't see how *encrypted* these calls are if they always compress the same way. Doesn't sound like they are encrypted beyond ROT-13 to me!
Fortunately I am in receipt of email at this very moment where a nice foreign man claims he can and will increase the size of my packet. Whew!
Merely utilize the stupendous wealth of complex language alternatives located in the voluminous expanse of your thesaurus to inflate your unimportant topics of conversation to prodigious lengths, and leave the vital ones to sound so simple they don't pay them any notice!
Anyway, what's the use of this? "Oh wow, they must be talking about something interesting. Now if I only knew what they were saying..." The simple fact that the communication is being encrypted would allude to that.
Ust-jay eak-spay in ode-cay.
It must have been something you assimilated. . . .
First, the paper was testing the Speex codec, and in based in principle on looking at codecs which use variable bit-rate CELP, a compression scheme which is tailored to speech, not music (music sounds terrible through one of these codecs, because their dictionaries are filled with speech sounds). Having music in the background is only likely to confuse the codec, making the speech sound terrible too, possibly to the point of unintelligibility.
The conclusions do not apply to more standardized codecs like G.711 and G.729a, which use fixed size packets.
The paper itself can be downloaded from here. Get it quick, before the IEEE figures this out and make the author remove it so they can extort their fee.
"National Security is the chief cause of national insecurity." - Celine's First Law
Voice codecs are designed to support a given level of audio quality subject to bit rate and computational complexity limitations. Most codecs are fixed-rate, or fixed-rate with silence suppression. Encryption isn't part of their design; it's somebody else's problem, and many VOIP systems aren't encrypted anyway (for instance, connections between an office phone and a PBX usually aren't.) Variable bit rate codecs are sometimes a good choice, depending on the kind of sounds you're trying to compress and the networks you're transmitting them on, and they're at least an alternative to the usual fixed-rate codecs.
Encryption systems usually aren't designed to deal with real-time message streams or timing attacks. Typically VOIP encryption protocols are designed for constant bit rate codec output, which is what most codecs provide, and the codecs usually package up 10, 20, or 30ms audio samples into a data packet for transmission over IP.
The problem occurs when you're choosing your codec and encryption separately, and you take a crypto system designed for fixed-rate codecs and use a variable-bit-rate codec instead. It's difficult to keep people from doing that sort of thing, especially if they're using huge-overhead approaches like VOIP inside IPSEC as opposed to VOIP systems with the crypto built in. It's also difficult to prevent people from making bad choices like that when they're using open-source software applications, as opposed to proprietary phones that only have the small set of codecs the manufacturer built in (typically uncompressed G.711, or G.729 or a GSM codec, all of which are fixed-rate except for silence suppression.)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
2)It's not that compressed VOIP would be inherently more or less secure than uncompressed - if you want it secure, you encrypt it. The problem is that if you use a crypto system that works just fine with fixed-rate voice (either uncompressed or with a fixed-rate codec, which most codecs are) and use a variable-bit-rate codec instead, suddenly lots of information leaks out through the timing, because the crypto system wasn't hiding the size or timing of the voice packets. So no, your decent VPN isn't taking care of it, because it wasn't designed to, and using a VPN instead of VOIP-specific encryption makes it easier for you to use whatever codec you like. Also, IPSEC is really inefficient for VOIP, and SSL or SSH are worse, because VOIP gives you a stream of lots of very small packets, and each layer of protocol (RTP, UDP, IP, IPSEC, etc.) adds more overhead - an 8kbps voice codec typically takes 24-28kbps of IP if you don't encrypt it, and maybe double if you do.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Voice codecs are lossy, so they'll happily compress your encryption data to something smaller, treating it as if it were audio samples from a human vocal tract. Unfortunately, you won't get all the bits back when you uncompress it, so decrypting the data isn't going to reconstruct anything resembling the original voice stream :-)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Send fixed size packets, splitting longer syllables into more packets and packing multiple short syllables into single packets.
It's a traffic monitoring problem.
It doesn't matter how good the cryptosystem you use to call the Pizza Hut nearest the Pentagon is, if you just need to count the trucks leaving the Pizza Hut to tell when there's a burst of late night activity so you can tell the invasion is about to start.
Bugger.
The entropy for a perfectly random coin toss will always be one bit. The formula, if I'm remembering right, is -sum(p_i * log(p_i)) where the p's are the probabilities of the various possible outcomes. In the case of a fair coin toss, these are both 0.5 and the outcome is 1, or 1 bit.
If the stream you're compressing has patterns in it, it is purely by coincidence and overall, the average entropy of any number of these streams will turn out to be 1 if you sample enough of them. Furthermore, if you do have a perfectly random string of bits, zlib, gzip, and all the rest will deliver a bigger file because of the overhead necessary for those file formats.
Try it on the command line, dd if=/dev/urand of=random_bits bs=1024 count=100 && gzip random_bits. Getting a smaller file out of that is more improbable than being attacked by a shark while being struck by lightning while you're holding a winning lottery ticket.
This is very similar to traffic analysis attacks on SSH (like this one) where packet sizes and inter-arrival times can indicate which keys you are typing.
Effective, practical counter-measures against good traffic analysis techniques are very difficult - especially if the attacked has enough traffic to work with (i.e. many conversations, many sessions, etc.).
The use of the word "bugging" infers that others can listen and understand what is being said. From the abstract all they can do is identify parts that might be interesting. There is a big difference there.
Because it isn't just simple encryption.
compression/decompression
http://en.wikipedia.org/wiki/Codec
Totally not even how VoIP works. You're making the assumption that chunk #123 actually got there. There's no ACK packets in VoIP; if a packet is received out of sequence it's dropped. That's that "jitter" that happens when the line breaks up a bit every now and again. It's your packets not all taking the same route and getting to the destination device out of order.
You have to remember: VoIP is a real-time protocol, and keeping up with real time is the paramount concern, not necessarily absolute accuracy.
Whatever solution there is to this problem it has to work on packets as individual items. It can't work on a whole conversation because that's just not how phone conversations happen. If it really were that simple, we could just email each other encrypted mp3s and the system would work beautifully.
Seriously try what I said to try if you've got a linux/unix/mac system to try it on. You'll come up just slightly larger than the original file pretty much every single time.
Considering the output of a coin toss to be a random variable, and the string of bits to be a randomly variant process of probability 0.5, the probability of any given pattern is 2^(-n) where n is the length of the pattern in bits. Square it to give the probability of that pattern repeating. In order to come up with a file that's smaller than the original file you need many patterns repeating many times. Really. The entropy of a random stream is really high, and you will never compress a file beyond the file's entropy limits. It's just not possible unless you throw information away.
You didn't actually get the clear binary data (the original wave form or the original non-crypted compressed stream).
By comparing the two pieces you have matched together you can't infer the key that was used to encrypt into another, because they actually AREN'T the crypted version of the other. They AREN'T the same data. They just happen to show some similarities in rythm.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Use longer packets.
- It saves your bandwidth because of less overhead (that's why people are *already* doing it).
- It has only a small impact upon latency.
- With long enough packets, the difference between sound averages and nothing can be eavesdropped based on phonemes compression ratio.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
If we admit that the conditions are good enough for the trick to work (short packets, no background noise, no additional data interweaved with the voice stream)
Using strong understatement, TFA were basically saying that the success rate is useless unless the chunks are long stream of unambiguous words.
Not hunting for keywords, but hunting for complete keysentence. Like two engineers (say nuclear, one of the Iranian) discussion about some formula. You know the formula, you can generate a sentence that might appear.
But then what's the likelihood that they indeed used that sentence and didn't express the idea differently, or used a different type of jargon, or used brand names, or spoke another language to begin with (the first engineer was Japanese), etc...
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Hey AC: don't be an asshuile. We are all on the same team here no? You are right, but the irony is that because you are right, you are frustrated that people don't get it, and you react in a way that reduces the fraction of readers that will get it.
It's worth noting that the wrongest part of the dintech post you're criticizing has nothing to do with music. It's "Easy Solution"... as if. So is it going to be "give a man a fish" or "teach a man to fish?"
--- Nothing clever here: move along now...