Chapel Hill Computational Linguists Crack Skype Calls
mikejuk writes "You might think of linguistics as being interesting but not really useful. Now computational linguistics [PDF of original paper] has been used to crack Skype encryption and reconstruct what is being said in a VoIP call. What is surprising is that though they are encrypted, the frames that make up a Skype call contain clues about what phonemes are being spoken."
My Google Voice voicemail transcription gets about 1 out of 4 words correct. Can Google please buy this company already.
I am a v1ral sig. Plse c0py me and h3lp me spread. Thank y0u?
The wording in TFS is a little misleading; they did not "crack Skype encryption," they found an exploitable side channel in Skype. The crypto itself has not been cracked, but it was being used in a way that leaked lots of information.
Palm trees and 8
cunning!
Looks like their karma isnt so good these days!
Seems like the Skype buy wasnt such a good thing for MS... its been what..a week or two and already its been down and compromised?
Of course, since the data basically represents sound waves, there is a certain level of predictability and pattern on the data unlike normal data which is much more random.
It would have to be a special encryption to get rid of this pattern using a more dynamic algorithm that changes as it progress (which can make it annoying to decrypt or simpler to detect) or disjoint the data over a greater amount of data (making it somewhat harder to find the patterns though still might be possible) of the encryption though that is difficult in a time sensitive app like Skype which encrypts and sends as it receives the data.
"8.5 billion USD doesn't buy you as much as it used to"
I remember reading something similar with sip over encrypted channel... I guess it is the plague of all compressed communication even if encrypted... the only way to bypass that is use an uncompressed protocol and not blank out the silence. I guess what's new is they've done it with skype.
Never antropomorphize computers, they do not like that
The reason why is that any serious encryption attempt of IP traffic would make all packets a constant size, significantly below expected MTU size (taking into account tunnels). This attack would not exist in that scenario. They are measuring the payload size of IP packets and matching it to phonemes spoken.
I probably shouldn't blame them for this, but it's barely worth the effort of encrypting the traffic if it is this easy to sniff out the words being spoken.
HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
Cypress Hill Computational Linguistics
TFA states that this is possible due to the codec that is used:
the best...compression for voice data makes use of the structure of speech
So using a not-optimized-for-speech codec (e.g. mp3 or wav) would defeat this.
TFA was TLDR, but a quick question to those of you with knowledge to understand this... Did a particular language help? Does this work on all languages? Are some languages more secure than others?
IE - Esperanto - Easy to break, but languages with Click Consonants are harder?
I8-D
No, I find linguistics pretty useful. Especially since it has some pretty 1:1 relationships with computer programming. And Larry Wall was a linguist. And what kind of lead in is that?
A December 2010 paper, "Uncovering Spoken Phrases in Encrypted Voice over IP Conversations", takes a similar approach.
The article was published in ACM Transactions on Information and System Security, PDF version.
The paper details a gap in the security of VBR compressed encrypted VoIP streams. The authors had earlier found that it is possible to determine the language that is spoken on such a VoIP call, based on packet lengths. Now they have expanded their research and show that itâ(TM)s possible to detect entire spoken phrases during a VoIP call. On average, their method achieved recall of 50% and precision of 51% for a wide variety of phrases spoken by a diverse collection of speakers (some phrases are easier to detect than others; the recall various from 0% to 98%, depending on length of the phrase and the speaker). In other words: they can detect fairly well if a certain phrase is being used in a conversation, even though the VoIP conversation is encrypted.
Not sure how it works with voice, but I know with text, if you have a part of the message, it's a lot easier to break the encryption method - assuming it's breakable. Security is just a cat and mouse game, anyway. Someone finds a hole, someone plugs the hole, then someone finds another hole...etc. Fun stuff though!
Encrypted VoIP Meets Traffic Analysis
If you can compress the data stream from the packet contents to just the lengths of the packets and still recover the word stream, that suggests two things: A) vocal inflection is worth 100 words per syllable, and B) you're not compressing enough in the first place. Yet there's a reason why compression sucks: the low latency requirement. Compression over 5 minute speech blocks would blow this side channel away.
Were it not for the human tension of a conversation amounting to a group of people mutually waiting to speak (sometimes not so well), this wouldn't be so much of a problem in the first place.
Skeptic: My, what short packets you have!
Skype: All the better to interrupt you with.
Skeptic: What a juicy side-channel that makes.
Skype: Facebook rocks. Shut up and keep talking. I know it's you Alice, under that Hood.
http://xkcd.com/114/
The ignorance of the statement "You might think of linguistics as being interesting but not really useful" is simply astounding. Linguistics provides the foundation and formal frameworks for grammar, syntax, morphology, phonetics, and semantics that allows us to better understand language. From that basis, computational linguistics is seen simply as an application of linguistics, and computational linguistics of course leads to information retrieval, automatic speech recognition, text classification, and other fields that are among the most important computing topics of the 21st century. Ignorantly saying linguistics is interesting but not useful is like saying physics and chemistry are interesting but not useful.
Fuck Computational Linguists!
http://xkcd.com/114/
I was hoping that Skype had been cracked so we can start using 3rd party messengers!
I call it 'The Aristocrats'
"You might think of linguistics as being interesting but not really useful" Way to go Slashdot, insult one of the most important fields in existence. Do the editors and readers really not realize how closely comp ling is related to AI? I have confidence that eventually computational linguistics will crack speech/language in general and lead to computers that can learn languages as readily as human infants. This will be momentous because it would allow communication between computers and humans. Now it wouldn't solve the consciousness problem, but it would be a step in the right direction.
First glance: "Computational linguist's crack pipe kills"
First thought: "I guess you would have to smoke crack to want to spend your life as a computational linguist"
You mean Skype wasn't smart enough to mix in other sounds while encrypting the original sound?! That is just retarded. Note that I am not a mathematician or any sort of "really smart guy." But I can definitely picture in my mind why this would be somewhat trivial. Vocal sound is primarily frequency modulated which means that the flow of signal will vary in density on a constant carrier. If you mix up the numbers, you will still see a great deal of fidelity in the variations of the frequencies of data regardless of the accuracy of the "decryption" involved. (a decryption of this sort would only need to be approximate to achieve results.)
And from the very beginning, I saw this possibility and presumed everyone else did as well. But if the signal were combined with another sound pattern which the receiving end would know how to properly remove after decryption, there would be a great deal less likelihood that an audio extraction could be made from the encrypted stream.
I have to wonder why this isn't being done. It is simply too obvious to patent.
To demonstrate the obvious. What do you expect when using high complexity VBR codecs with no blinding of any kind. I sincerely hope this was not news to anyone.
Surely this was the first ever legitimate use of the phrase "cunning linguist"
I found it somewhat surprising that the Town name was used to identify the University. Would you say Ann Arbor or Ithaca or New Haven? You might say Berkeley or Princeton. So, I guess you might say Chapel Hill. OK, never mind.
Voice recognition still sucks, and those guys have UNencrypted data. Neat concept, but reliable enough for what?
This is not the exact same thing, but it's a great example of how encryption alone is not enough and it must be done right.
Block cipher modes of operation
Scroll down til you see the penguins.
Cwm, fjord-bank glyphs vext quiz
Or maybe narcissistic personality disorder.
How is 'you might think of linguistics as...' an insult?
That's saying *SOME* people may think that, *NOT ALL* people.
Either you're Dr. Sheldon Cooper or you're trolling.
Well isn't that just a coincidence?!?!! Microsoft goes and buys skype, and all of the sudden the protocol becomes decrypted. Could this because the Americans require these protocols be insecure? The americans sure would like to be able to listen into those terrorist skype calls... I may be paranoid but I think the government and Microsoft work very closely together!
the sort of person who belittles things he or she doesn't really understand or care about.
I know I do, now.
Either that, or it's Michael Larabel from Phoronix... Except the submission didn't actually have a half-dozen links back to previous articles on the same topic. But it's the same "Everyone sensible should be interested in what I'm interested in, and be bored by those things I don't understand" writing style.
Paul "TBBle" Hampson
Paul.Hampson@Pobox.Com
Go Linguists!
oh, them cunning linguists!
One more reason to hate them. Relevant: http://www.xkcd.com/114/
So, as I understand, it may not be the obvious weakest potential link that has been compromised - the cipher itself for example - but rather a detail of implementation that paved way for their successful attack, right? If Skype fragment the encrypted data stream in variable sized frames that have also rather umm unpredictable (bear with me here) sizes, the attack, as stated by researchers themselves I believe, could not be instantiated in its current form? The entire weakness is based around the fact that it was relatively easy - far easier than bruteforcing the cipher - to guess the phonemes even from the encrypted frames.
Am I making sense here? Trying to verify if I have understood the main point of their research paper...
"You might", and apparently you're someone who "might not". It's the lead-in for its intended audience, which is non-linguists. And among non-linguists, it is possible that people might find it interesting but not useful. Perfectly accurate, audience specific.
You'd think the linguists complaianing about this would be able to parse out the "...and you might not" which is implied.
http://it.slashdot.org/story/11/03/15/1513257/Encrypted-VoIP-Meets-Traffic-Analysis
. . .these are very cunning linguists.
"If your parents never had children, chances are you wonât either." -Dick Cavett