Codebreaking - Taking the First Step?
Master Spy asks: "Here's something that the Slashdot community might be able to help with. If you receive a message in code how do you take the first step? Back in the days of WWII it was easier. The codebreakers at Bletchley Park already knew that the messages were encoded using an Enigma machine so all they had to do was work out the positions of the rotors using brain power, the Bombe or later the Colossus machine. American codebreakers also knew the basic details about the methods the Japanese used but now however things are more complicated. Suppose you are listening to a transmission and you receive the following: 'sdjek dYqkP 1Nt$% GGl9) MHrYD +++' How do you know how the message has been encrypted? It could be an Enigma machine, it could have been XOR'd with a second message or a one-time pad or it could use some form of software encryption such as Blowfish or DES. Before you start ripping the message apart for decoding how do codebreakers find out what method has been used to encode the message?"
It also helps if you have a basic idea of what's encrypted, ie what kind of plaintext message you're dealing with. A .doc has a different signature than a jpeg or flat ascii or html, etc. some encryption software relies on headers or footers to the encrypted data in order to sanity check for decryption. again, look also at the medium that the message is transmitted through -- tcp/ip traffic to port 443 speaks volumes about what algorithms are being used. transmissions received in the 2.4 Ghz wavelength also speaks volumes about what algorithms you may be dealing with. finally, never trust the developer to do the 'right thing' with algorithmic selection -- look at adobe's algorithm selection for its ebooks. look for a pattern in what you're dealing with. it can't hurt to generate a dictionary of known ciphertext file patterns a la the *nix 'file' command. lacking a certain amount of information about what you're dealing with(message length, source of the ciphertext, etc), though, you're SOL.
anyway, I haven't had to deal with much of the kind of encryption that protects data from a government, mostly just the kind that delays your kid sister, so ymmvg...
Here's a partial answer:
(1) there is always the possibility that you simply won't. In fact, a properly used one time pad cipher is indistinguishable from noise. It's also a major pain in the ass to use, because you must somehow transmit as many bits of key as you want to send bits of message, and your one-time pad is only as good as your method of transmitting the key.
(2) If there is some kind of message in the signal and a cipher is involved other than a one-time pad or something isomorphic to one, then there will be some degree of redundancy in it. This is a theorem of information theory. Statistical measures will eventually reveal that the redundancy exists.
(3) At that point, there are lots of approaches. A good readable and interesting introduction to these, along with the history of such things, is David Kahn's The Codebreakers. Bruce Schneier's Applied Cryptography is a good, more technical introduction for the computer geek. I've also heard good things for Handbook of Applied Cryptography as well, but I don't actually know the book.
But as someone notes above, it's an inherently hard problem to simply identify the cipher, and modern ciphers like RSA are, as far as we know, computationally intractable because the only known attack requires factoring a very large prime number.
(4) You give up and hire a pretty young woman to talk the marine guards into letting you at the code room. (Details of this approach are left as an exercise for the interested reader.) Sometimes the old fashioned ways are best.
The only, only thing you can expect to learn is who's communicating with whom [and when / how much information is exchanged] ( you probably know this already ) , and what protocol they're using ( it's probably unbreakable ).
Chances are, if you are intercepting an encrypted stream, you are intercepting an unbreakably encrypted stream.
Perhaps you are thinking that if only you knew what protocol the stream is using, you might look online and see if that protocol has been cracked.
Don't waste your time.
The chances are approximately 0 that the stream you are intercepting is using a protocol that has been cracked, or that it is using a keyspace you can brute-force for under a few hundred thousand dollars, or in under a matter of years.
Sorry -- you have a higher chance (almost infinitely higher -- as I said, the chance you will succeed in what you are asking to do is approximately 0) of port-scanning the machine at the source or the destination and 0wning it than you do of breaking the stream.
I don't say this to mean you should give up -- just that you're phrasing your question wrong. Don't discount the 0wning venue of attack.
For every million desktop machines communicating over TCP/IP, only a matter of a few dozen will have 0 exploitable security weaknesses. (However, most security weaknesses are unknown.)
Find out what kind of machine is at the source and the destination, then 0wn one of them. Chances are almost overwhleming that it's possible, if not with a remote exploit, then through social engineering. (Send an attachment that will be opened on either end of the communication, or induce either end to visit a web page in a browser that is exploitable (=, basically, every browser except Lynx).
If they browse with Netscape or Internet Explorer, chances are almost overwhelming that they can be owned.
It's not that hard to get someone to browse to a certain page, if you know anything at all about who that person is.
Back to your original question: gone are the days that protocols were breakable by any hotshot think tank. Today only implementations are, and rarely at the level you're trying to address. Don't break the code -- break into the system.
Hope this helps.
Back in the days of WWII it was easier. The codebreakers at Bletchley Park already knew that the messages were encoded using an Enigma machine so all they had to do was work out the positions of the rotors using brain power, the Bombe or later the Colossus machine.
I think you're simplifying these first steps too much.
When the UK first intercepted a message like 'sdjek dYqkP 1Nt$% GGl9) MHrYD +++' , they had little idea what it was. Encryption? Only part of a transmission?
It took them months (years?) to work through the encryption system. In the beginning, they didn't even know there was an Enigma machine. They broke the code by brute force: Many people trying many different methods. If the Nazi's changed the key, you had to start all over from the beginning.
It took a long time for the codebreakers to figure out that there even was an Enigma machine, find a machine and figure out how it worked. It took time and effort, people died retreiving the machine.
Once the codebreakers had used Enigma for time, it sometimes became really simple to determine out if a transmission was Enigma (or other encryption) code.
If it wasn't cleartext, it was code. You knew certain things about the code: The key was transmitted first, the key was 6 characters long.
Some code had fingerprints: One guy always transmitted 'HIT#@!', where #@! = 'LER', and so you used "HITLER" to break the rest of the code. Someone else always used a German women's name (Maybe his girlfriend) said "GRE$$!" where $$! = "TTA", so "GRETTA".
So take a step back, if you can't determine the nature of the code you are seeing, it will be very hard to crack.
"Can of worms? The can is open... the worms are everywhere."
"what does this say?"
qv7qrc77qrrx777qrrrs7777qrrrrg77777qrrrrrbv7bqrbck bqqbvkbqrgkbqhskbqpckbqbqvmr
rcmhhgmjjbgmppyctbqbivayrpga7bbhjbqbxmawhhbqawwqx7 77kbqbrjhrbvaprkatrkhrca
aamwhmwhwhwhwhwhrapkqrmpkqc7bhbhwgawiiqwiqbv