Slashdot Mirror


Chapel Hill Computational Linguists Crack Skype Calls

mikejuk writes "You might think of linguistics as being interesting but not really useful. Now computational linguistics [PDF of original paper] has been used to crack Skype encryption and reconstruct what is being said in a VoIP call. What is surprising is that though they are encrypted, the frames that make up a Skype call contain clues about what phonemes are being spoken."

26 of 156 comments (clear)

  1. Speach recognition by city · · Score: 3, Insightful

    My Google Voice voicemail transcription gets about 1 out of 4 words correct. Can Google please buy this company already.

    --
    I am a v1ral sig. Plse c0py me and h3lp me spread. Thank y0u?
    1. Re:Speach recognition by Anonymous Coward · · Score: 2, Funny

      Do you speak as well as you spell?

    2. Re:Speach recognition by newcastlejon · · Score: 2, Funny

      I hope so; city's spelling was flawless.

      You'd best learn what grammar is before you try to be a grammar nazi.

      --
      If God forks the Universe every time you roll a die, he'd better have a damned good memory.
    3. Re:Speach recognition by drb226 · · Score: 3, Funny
      speach

      Scottish Gaelic. Noun speach f (genitive speacha, plural speachan)
      1. wasp

      Like newcastlejon said, his Scottish Gaelic spelling was flawless. I always hate it when Google doesn't recognize my wasps.

  2. Side channel attack by betterunixthanunix · · Score: 5, Informative

    The wording in TFS is a little misleading; they did not "crack Skype encryption," they found an exploitable side channel in Skype. The crypto itself has not been cracked, but it was being used in a way that leaked lots of information.

    --
    Palm trees and 8
    1. Re:Side channel attack by Anonymous Coward · · Score: 2, Interesting

      The simple description is: By looking at the size of the encrypted data packets you can guess what phonemes were spoken. Yes, that's all there is to it. They are just looking at how much data is sent and guessing what might be said that reasonably fits in that size.

      An obvious simple fix would be to vary the length of the packets with random padding (using a cryptographically secure random algorithm to determine the length). It would add overhead but probably not that much considering how small these packets are in the first place (they typically don't use the full allotted bandwidth).

    2. Re:Side channel attack by blair1q · · Score: 2

      if your encryption leaves the message where it can be read without decrypting it, then it was never actually encrypted

      skype is using a lot more bandwidth than they need to. like single-sideband radio, they can drop at least half the channels they're sending and the information will still be perfectly intelligible on the other end. they've effectively done that by sending superfluous encrypted gibberish on their "main" channel.

      the bonus is, their method of sending the message in the side channel is probably patentable.

    3. Re:Side channel attack by betterunixthanunix · · Score: 2

      A simpler fix would be to use a different method of compression, which does not vary the length of its output frames.

      --
      Palm trees and 8
    4. Re:Side channel attack by thePowerOfGrayskull · · Score: 4, Interesting

      There's a reason that SSH has inserted random padding into its packets since its inception. You would think that the folks at Skype might've done just a a bit more research...

    5. Re:Side channel attack by NoSig · · Score: 3, Informative

      If the padding is random you'll decrease the amount of information leaked, but there may still be enough information leaked to reconstruct some conversations. What you really need for total security from this attack is to eliminate the side-channel completely, such as by sending packets of the same size and with the same frequency no matter how much data you've actually got that needs sending. That is a form of padding too, but it is better than random.

  3. Encrypting a wave by Anonymous Coward · · Score: 2, Informative

    Of course, since the data basically represents sound waves, there is a certain level of predictability and pattern on the data unlike normal data which is much more random.

    It would have to be a special encryption to get rid of this pattern using a more dynamic algorithm that changes as it progress (which can make it annoying to decrypt or simpler to detect) or disjoint the data over a greater amount of data (making it somewhat harder to find the patterns though still might be possible) of the encryption though that is difficult in a time sensitive app like Skype which encrypts and sends as it receives the data.

    1. Re:Encrypting a wave by betterunixthanunix · · Score: 2

      normal data...is much more random.

      Actually, most data used in practice is not uniformly random. Text, images, and even computer programs tend to have significant biases.

      It would have to be a special encryption to get rid of this pattern using a more dynamic algorithm that changes as it progress

      http://en.wikipedia.org/wiki/Stream_cipher

      We know how to get these things right, and the problem with Skype was not the type of data, but rather the way in which that data was compressed.

      --
      Palm trees and 8
  4. plague of any compressed voip conversation by youn · · Score: 2

    I remember reading something similar with sip over encrypted channel... I guess it is the plague of all compressed communication even if encrypted... the only way to bypass that is use an uncompressed protocol and not blank out the silence. I guess what's new is they've done it with skype.

    --
    Never antropomorphize computers, they do not like that :p
    1. Re:plague of any compressed voip conversation by afidel · · Score: 2

      I believe if you use a CBR codec like G.711 without VAD or CNG you should be ok.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  5. Skype's encryption sucks by HBI · · Score: 2

    The reason why is that any serious encryption attempt of IP traffic would make all packets a constant size, significantly below expected MTU size (taking into account tunnels). This attack would not exist in that scenario. They are measuring the payload size of IP packets and matching it to phonemes spoken.

    I probably shouldn't blame them for this, but it's barely worth the effort of encrypting the traffic if it is this easy to sniff out the words being spoken.

    --
    HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
    1. Re:Skype's encryption sucks by subreality · · Score: 4, Informative

      The reason why is that any serious encryption attempt of IP traffic would make all packets a constant size, significantly below expected MTU size (taking into account tunnels). This attack would not exist in that scenario.

      It's actually harder than that. You also have to generate the packets at an even rate as well, or you'll still have some leakage.

      Even after you do that, the presence or absence of a stream of packets will at the very least indicate if a call is in progress; to defend against that, you have to *always* transmit the stream.

      Even then you're leaking information about the maximum amount of data you could be communicating.

      The goalposts keep moving right on down the field when you're talking about side channels. You just have to pick the point where you're comfortable.

    2. Re:Skype's encryption sucks by indeterminator · · Score: 2

      From TFA: A solution might be to break the data up into fixed sized frames but this would make it more difficult to reconstruct the data if there was packet loss.

      And even then, the data rate would leak some information about the content.

      The only trivial solution for zero leakage is to either use constant rate encoding, or use some kind of padding to make the data rate constant. Non-trivial solutions would include some random data rate variations to obfuscate the data rate of actual payload content. Unfortunately, all these methods will waste bandwidth.

    3. Re:Skype's encryption sucks by HBI · · Score: 2

      "Somebody's talking" is information that it'll be hard to conceal without the measures you cite. I'd be ok with that, generally. Having 60% of what I say easily ferreted out is not ok, however.

      --
      HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
  6. Huh? by tthomas48 · · Score: 4, Insightful

    No, I find linguistics pretty useful. Especially since it has some pretty 1:1 relationships with computer programming. And Larry Wall was a linguist. And what kind of lead in is that?

  7. Similar work in a December 2010 paper by guusbosman · · Score: 2

    A December 2010 paper, "Uncovering Spoken Phrases in Encrypted Voice over IP Conversations", takes a similar approach.

    The article was published in ACM Transactions on Information and System Security, PDF version.

    The paper details a gap in the security of VBR compressed encrypted VoIP streams. The authors had earlier found that it is possible to determine the language that is spoken on such a VoIP call, based on packet lengths. Now they have expanded their research and show that itâ(TM)s possible to detect entire spoken phrases during a VoIP call. On average, their method achieved recall of 50% and precision of 51% for a wide variety of phrases spoken by a diverse collection of speakers (some phrases are easier to detect than others; the recall various from 0% to 98%, depending on length of the phrase and the speaker). In other words: they can detect fairly well if a certain phrase is being used in a conversation, even though the VoIP conversation is encrypted.

  8. Linguistics not really useful. The ignorance by jmcbain · · Score: 3, Insightful

    The ignorance of the statement "You might think of linguistics as being interesting but not really useful" is simply astounding. Linguistics provides the foundation and formal frameworks for grammar, syntax, morphology, phonetics, and semantics that allows us to better understand language. From that basis, computational linguistics is seen simply as an application of linguistics, and computational linguistics of course leads to information retrieval, automatic speech recognition, text classification, and other fields that are among the most important computing topics of the 21st century. Ignorantly saying linguistics is interesting but not useful is like saying physics and chemistry are interesting but not useful.

    1. Re:Linguistics not really useful. The ignorance by moogaloonie · · Score: 2

      Yet it is no less true that someone reading that statement may indeed hold that opinion. I've always found it very interesting... How else might we ever develop human/animal translators?

  9. Re:overheard in a private jet hanger by somersault · · Score: 2

    It's hardly "newly rich" - it's been rich for quite some time. I'd call this more a "desperate grab for relevance".

    --
    which is totally what she said
  10. Re:Original Slashdot Story by nickersonm · · Score: 2

    Yes; this is follow-up work to the paper in that earlier article.

    Also important to note, neither paper is specific to Skype; their work is on encrypted VoIP in general. But apparently /. prefers things having to do with Skype for some reason.

  11. Fsck You, Slashdot by theshibboleth · · Score: 3, Interesting

    "You might think of linguistics as being interesting but not really useful" Way to go Slashdot, insult one of the most important fields in existence. Do the editors and readers really not realize how closely comp ling is related to AI? I have confidence that eventually computational linguistics will crack speech/language in general and lead to computers that can learn languages as readily as human infants. This will be momentous because it would allow communication between computers and humans. Now it wouldn't solve the consciousness problem, but it would be a step in the right direction.

  12. Re:Codec as the weak point by blair1q · · Score: 2

    Okay, so, then, what are the teachers in the Charlie Brown specials saying?

    Huh? Mr. Smarty-pants?