Codebreaking - Taking the First Step?
Master Spy asks: "Here's something that the Slashdot community might be able to help with. If you receive a message in code how do you take the first step? Back in the days of WWII it was easier. The codebreakers at Bletchley Park already knew that the messages were encoded using an Enigma machine so all they had to do was work out the positions of the rotors using brain power, the Bombe or later the Colossus machine. American codebreakers also knew the basic details about the methods the Japanese used but now however things are more complicated. Suppose you are listening to a transmission and you receive the following: 'sdjek dYqkP 1Nt$% GGl9) MHrYD +++' How do you know how the message has been encrypted? It could be an Enigma machine, it could have been XOR'd with a second message or a one-time pad or it could use some form of software encryption such as Blowfish or DES. Before you start ripping the message apart for decoding how do codebreakers find out what method has been used to encode the message?"
Well, if this was easy, codebreaking wouldn't be any fun. Don't forget that both the Germans and the Japanese had a variety (tens if not hundreds) of different cyphers in circulation, so it wasn't exactly as simple as assuming it's Enigma or Code Purple. :)
As to how it's done, that has to do with analysing the text, frequency analysis of 1-grams 2-gram etc. Simple substitution will exhibit one fingerprint (though different languages will obviously be different), something like a playfair or Venegere (sp) square will have another, and DES encypted text a completely different structure. Obviously on a small enaugh sample there may not be enaugh information to latch into...
But with a larger sample, it's mostly a combination of good tools, experience, and guesswork
Do the statistical analysis on the encrypted data. In several ways. If all you get is seamingly random stream of data with equal distribution of all values then you've got raw stream encrypted by modern, quite strong cipher.
;)
Good luck
Bastard Operator From 193.219.28.162
Step One: "Aquire more samples."
When you have less data then a smallish key (and that message has no more then 28 * 8 = 244 bits, probably much less), the data can (most likely) decrypt to anything at all with the proper key. If that's all you really have, then you need to pursue non-code-breaking methods of finding out what that is.
And of course what to do next depends on the characteristics of that more data. A lot of cyptoanalysis assumes you have knowlege of the encryption method; this is because it's "easy" to obtain by reading code, but "easy" is a relative term. It's easier then just guessing, but still hard. Without knowlege of an algorithm, you need to luck out and hope they used one with a distinct signiture. If they didn't, you're probably basically out of luck on a single person's resources, because all of the "good" algorithms should be effectively indistinguishable from noise after encryption.
Code breaking is hard by its very nature. You're trying to find an unknown message by inverting (or short-circuiting) an unknown process.
If you think of things mathematically, you're looking to find a plaintext p in the set of all possible plaintexts P and some function f from the set of all ciphertexts to the set of all plaintexts where f(c)=p. These means both f and p are unknown, and while multiple solutions may exist they are likely of "measure zero" in 2 very large spaces. (let's asssume we have a suitable measure for such things, and not worry about the real details.)
To a mathematician, finding a general solution to the above would be a Field's medal winning sort of thing. The reality is that you need more information. If you got a large message you should start checking letter/symbols counts, following by the counts of various character pairings, etc. The goal is often to come up with a statistical model to see if you can build a plausible f. Another thing is to try common functions (xor with various values, etc.) on the stream and see what happens. Sometimes that'll give you a clue. But most of time it involves a little luck, a little intuition and a lot of perseverance.
1. Get some books. Schneier's book is the best starting point.
.. move up the scale until you can understand the way more sophisticated codes are broken (for instance differential cryptanalysis). It gets harder at this point and well outside the realm of practicality but if you get this far, you will be able to break any cryptographically weak cipher (which includes the products of many companies, unfortunately).
2. Learn statistics (and basic number theory). You can discover a lot about a message by its statistical properties.
3. BREAK LOTS OF CODES. Without experience, you are lost. Start by breaking substitution and Caesar ciphers (easy with statistics), then Vigenere/Gronsfeld ciphers (harder but still "crypto for dummies"), then try XOR ciphers (they can be solved easily in an interesting way)... then try to understand how WEP is broken... DeCSS
4. If you become advanced enough, you can start reading papers on cryptanalysis. Many of them are surprisingly easy to understand once you understand number theory. However, it is much more difficult to *discover* some of the stuff these guys come up with, it's pretty amazing.
Anyway, to summarize, understand the statistics involved and PRACTICE until you can just look at a substition cipher and understand what it says... just by the letter frequencies! If you are trying to break a simple code you need lots of ciphertext to analyze.
And don't forget: sometimes you don't need to break a code at all. As a poster above wrote, sometimes context is enough. Sometimes an external clue will give the code away. How do you know what to look for? Experience!
If you are interested, I would suggest that you start by reading The Code Book by Simon Singh. It gives a good overview of the history of the battle between cryptography and cryptanalysis, and how ciphers have evolved to defeat methods of codebreaking. It's an interesting and entertaining read and you might gain some insight on how you would approach this particular cipher.
BTW, I have a truly marvellous solution to your cipher which this textarea is too small to contain.
You start with known encryption methods (simplest first) and by process of elimination you keep going until you get a clue. A good cryptologist has information about everything from Bacon's method to the most recent ciphers and their algorithms.
The folks who cracked the Enigma started the same way. The Polish started the process, sent info to England where it was completed.
A code fragment that short, though, would be darned impossible to crack unless you get more.
But you can already see patterns: word length, multiple "+" characters (maybe an indicator of end-of-phrase or something?).
But that's -- basically -- how you do it. Educated guesses and grunt work (either by you or computer). Unless it's Quantum encryption which is spoiled as soon as you intercept, so you can't decode it.
Check out The Code Book for some great -- albeit basic -- information about methodology and history.
A book titled 'System Identification And Key-Clustering', by Dr. I. J. Kumar is available from Aegean Park Press. It deals with defining a methodology for identifying cryptosystems and narrowing the key space applicable for a given message. This is quite what you want, but be warned - it is not for the faint of heart...
- Try to get more text coded with the same cryptosystem (and preferably the same key). Cracking anything based on 25 bytes of ciphertext is going to be hard.
- Look for statistics. Run character-statistics. Do they look like normal text, only with different symbols ? If so you have a monoalphabetic substitution-cipher, crackable in 5 seconds by a computer or 5 minutes by hand. Repeat for digraphs or trigraphs. Any result different from "all combinations equally likely" (or close) gives you a hint.
- Try to xor the text with a copy of itself shifted various places left and rigth. Observe how many nulls you get with various displacements. If you get a jump in nulls for a certain shift, you're likely dealing with a periodic substitution-cipher. Again easily crackable if the period is not too long and you have enough ciphertext. (enough here is something like 20 times the period. So if the period is 50 you'd need a kilobyte of ciphertext to easily attack it, more or less.)
If the text looks completely random under all statistical analysis you can think of, and stays that way even when xored with itself shifted various ways odds are you're dealing with something a bit more serious, and you'll need more expertise than you can gain from a "ask slashdot" article to crack it.Good luck !