Codebreaking - Taking the First Step?
Master Spy asks: "Here's something that the Slashdot community might be able to help with. If you receive a message in code how do you take the first step? Back in the days of WWII it was easier. The codebreakers at Bletchley Park already knew that the messages were encoded using an Enigma machine so all they had to do was work out the positions of the rotors using brain power, the Bombe or later the Colossus machine. American codebreakers also knew the basic details about the methods the Japanese used but now however things are more complicated. Suppose you are listening to a transmission and you receive the following: 'sdjek dYqkP 1Nt$% GGl9) MHrYD +++' How do you know how the message has been encrypted? It could be an Enigma machine, it could have been XOR'd with a second message or a one-time pad or it could use some form of software encryption such as Blowfish or DES. Before you start ripping the message apart for decoding how do codebreakers find out what method has been used to encode the message?"
Let the medium decided how to decide this.
Ie if were over the net, look at the wrappers.
If over the radio, look at the spectrum.
It's whats around the message that will break the message
Sigs are dangerous coy things
Well, if this was easy, codebreaking wouldn't be any fun. Don't forget that both the Germans and the Japanese had a variety (tens if not hundreds) of different cyphers in circulation, so it wasn't exactly as simple as assuming it's Enigma or Code Purple. :)
As to how it's done, that has to do with analysing the text, frequency analysis of 1-grams 2-gram etc. Simple substitution will exhibit one fingerprint (though different languages will obviously be different), something like a playfair or Venegere (sp) square will have another, and DES encypted text a completely different structure. Obviously on a small enaugh sample there may not be enaugh information to latch into...
But with a larger sample, it's mostly a combination of good tools, experience, and guesswork
The better you know what's out there to use, the better the chance of recognizing what you're up against.
Suppose you are listening to a transmission and you receive the following: 'sdjek dYqkP 1Nt$% GGl9) MHrYD +++'
Yeah, 'sdjek dYqkP 1Nt$% GGl9) MHrYD +++' showed up on my SETI@home screen too.
This is clearly the signature of the Grays from Cygnus Prime. You don't want to communicate with them.
They Grays of Cynus Prime are evil. They will put chips in your head.
They will use the chips to make you do bad things. Like posting to Slashdot.
Opinions on the Twiddler2 hand-held keyboard?
Do the statistical analysis on the encrypted data. In several ways. If all you get is seamingly random stream of data with equal distribution of all values then you've got raw stream encrypted by modern, quite strong cipher.
;)
Good luck
Bastard Operator From 193.219.28.162
Step One: "Aquire more samples."
When you have less data then a smallish key (and that message has no more then 28 * 8 = 244 bits, probably much less), the data can (most likely) decrypt to anything at all with the proper key. If that's all you really have, then you need to pursue non-code-breaking methods of finding out what that is.
And of course what to do next depends on the characteristics of that more data. A lot of cyptoanalysis assumes you have knowlege of the encryption method; this is because it's "easy" to obtain by reading code, but "easy" is a relative term. It's easier then just guessing, but still hard. Without knowlege of an algorithm, you need to luck out and hope they used one with a distinct signiture. If they didn't, you're probably basically out of luck on a single person's resources, because all of the "good" algorithms should be effectively indistinguishable from noise after encryption.
"Suppose you are listening to a transmission and you receive the following: 'sdjek dYqkP 1Nt$% GGl9) MHrYD +++'"
I transmit a polite reply saying, "No, I am NOT interested in being your love monkey no matter how much you lust after me." Gee whiz! The things people say when they think nobody is listening!
Code breaking is hard by its very nature. You're trying to find an unknown message by inverting (or short-circuiting) an unknown process.
If you think of things mathematically, you're looking to find a plaintext p in the set of all possible plaintexts P and some function f from the set of all ciphertexts to the set of all plaintexts where f(c)=p. These means both f and p are unknown, and while multiple solutions may exist they are likely of "measure zero" in 2 very large spaces. (let's asssume we have a suitable measure for such things, and not worry about the real details.)
To a mathematician, finding a general solution to the above would be a Field's medal winning sort of thing. The reality is that you need more information. If you got a large message you should start checking letter/symbols counts, following by the counts of various character pairings, etc. The goal is often to come up with a statistical model to see if you can build a plausible f. Another thing is to try common functions (xor with various values, etc.) on the stream and see what happens. Sometimes that'll give you a clue. But most of time it involves a little luck, a little intuition and a lot of perseverance.
1. Get some books. Schneier's book is the best starting point.
.. move up the scale until you can understand the way more sophisticated codes are broken (for instance differential cryptanalysis). It gets harder at this point and well outside the realm of practicality but if you get this far, you will be able to break any cryptographically weak cipher (which includes the products of many companies, unfortunately).
2. Learn statistics (and basic number theory). You can discover a lot about a message by its statistical properties.
3. BREAK LOTS OF CODES. Without experience, you are lost. Start by breaking substitution and Caesar ciphers (easy with statistics), then Vigenere/Gronsfeld ciphers (harder but still "crypto for dummies"), then try XOR ciphers (they can be solved easily in an interesting way)... then try to understand how WEP is broken... DeCSS
4. If you become advanced enough, you can start reading papers on cryptanalysis. Many of them are surprisingly easy to understand once you understand number theory. However, it is much more difficult to *discover* some of the stuff these guys come up with, it's pretty amazing.
Anyway, to summarize, understand the statistics involved and PRACTICE until you can just look at a substition cipher and understand what it says... just by the letter frequencies! If you are trying to break a simple code you need lots of ciphertext to analyze.
And don't forget: sometimes you don't need to break a code at all. As a poster above wrote, sometimes context is enough. Sometimes an external clue will give the code away. How do you know what to look for? Experience!
If you are interested, I would suggest that you start by reading The Code Book by Simon Singh. It gives a good overview of the history of the battle between cryptography and cryptanalysis, and how ciphers have evolved to defeat methods of codebreaking. It's an interesting and entertaining read and you might gain some insight on how you would approach this particular cipher.
BTW, I have a truly marvellous solution to your cipher which this textarea is too small to contain.
It also helps if you have a basic idea of what's encrypted, ie what kind of plaintext message you're dealing with. A .doc has a different signature than a jpeg or flat ascii or html, etc. some encryption software relies on headers or footers to the encrypted data in order to sanity check for decryption. again, look also at the medium that the message is transmitted through -- tcp/ip traffic to port 443 speaks volumes about what algorithms are being used. transmissions received in the 2.4 Ghz wavelength also speaks volumes about what algorithms you may be dealing with. finally, never trust the developer to do the 'right thing' with algorithmic selection -- look at adobe's algorithm selection for its ebooks. look for a pattern in what you're dealing with. it can't hurt to generate a dictionary of known ciphertext file patterns a la the *nix 'file' command. lacking a certain amount of information about what you're dealing with(message length, source of the ciphertext, etc), though, you're SOL.
anyway, I haven't had to deal with much of the kind of encryption that protects data from a government, mostly just the kind that delays your kid sister, so ymmvg...
Take a bunch of encoded stuff and simply look at it, watch for patterns over the course of the data as a whole. For a small sample set
...
:p
'sdjek dYqkP 1Nt$% GGl9) MHrYD +++'
this is not going to do you much good, but if you have reams of encoded / encrypted data just stare at it for a while, look at it in a way that you look through it (like those hidden picture things) and after a while you will recognize patterns and have something with which to work.
There is a fine line between the high quality software engineer and mild autism. Ever watch 'Rain Man' or 'A Beautiful Mind' and think - hey that guy would be a BAD ASS developer
Helps to be able to think in 6+ dimensions when you are cracking codes, and a photographic memory helps too.
I should probably post this as an AC - last thing I need is the CIA / NSA figuring out what I am capable of
Glonoinha the MebiByte Slayer
First you djc,s dk%33R +++ (110), then you sD##N KDL:: Ds03k -332+. From there, it's a trivial matter of just 3!Wop mclDI a002g a!22# with the sklj3 V3iia aq@@1 +1867 -5309.
Duh.
You start with known encryption methods (simplest first) and by process of elimination you keep going until you get a clue. A good cryptologist has information about everything from Bacon's method to the most recent ciphers and their algorithms.
The folks who cracked the Enigma started the same way. The Polish started the process, sent info to England where it was completed.
A code fragment that short, though, would be darned impossible to crack unless you get more.
But you can already see patterns: word length, multiple "+" characters (maybe an indicator of end-of-phrase or something?).
But that's -- basically -- how you do it. Educated guesses and grunt work (either by you or computer). Unless it's Quantum encryption which is spoiled as soon as you intercept, so you can't decode it.
Check out The Code Book for some great -- albeit basic -- information about methodology and history.
If You are an american even posting this is probably a violation of some 4 letter acronym or terrorist prevention law. I mean why not just ask how to make a model rocket why dont ya.
Stop thinking about the encrypted bits. Start thinking about who sent these bits and who these bits were sent to. Think about the application which created the data. Think about what purpose the data is going to be used.
Once you have this information, you'll be much better equipped to figure out what the basic structure underpinning the cipher is. For instance, if the data is part of a realtime encrypted stream, I'd think "stream cipher" and look at RC4 or SEAL. If the data's part of a pen-and-paper arrangement with all values mod 26, I'd think "Solitaire". If the data's a pen-and-paper arrangement meant for communicating between two deep-cover espionage agents, I'd think "one-time pad". If the data's something pulled off a disk drive, I'd think of Matt Blaze's ECB+OFB algorithm. Etc.
What it boils down to is, this question is pretty arbitrary. Very rarely will you have no metainformation about the plaintext. Seek out as much metainformation as you can, and use the metainformation to make educated guesses, cribs, etc.
Here's a partial answer:
(1) there is always the possibility that you simply won't. In fact, a properly used one time pad cipher is indistinguishable from noise. It's also a major pain in the ass to use, because you must somehow transmit as many bits of key as you want to send bits of message, and your one-time pad is only as good as your method of transmitting the key.
(2) If there is some kind of message in the signal and a cipher is involved other than a one-time pad or something isomorphic to one, then there will be some degree of redundancy in it. This is a theorem of information theory. Statistical measures will eventually reveal that the redundancy exists.
(3) At that point, there are lots of approaches. A good readable and interesting introduction to these, along with the history of such things, is David Kahn's The Codebreakers. Bruce Schneier's Applied Cryptography is a good, more technical introduction for the computer geek. I've also heard good things for Handbook of Applied Cryptography as well, but I don't actually know the book.
But as someone notes above, it's an inherently hard problem to simply identify the cipher, and modern ciphers like RSA are, as far as we know, computationally intractable because the only known attack requires factoring a very large prime number.
(4) You give up and hire a pretty young woman to talk the marine guards into letting you at the code room. (Details of this approach are left as an exercise for the interested reader.) Sometimes the old fashioned ways are best.
If you receive a message in code how do you take the first step?
You do what everyone else here does when they come across a problem that may or may not full under the category "News for Nerds. Stuff that Matters": you submit it to Ask Slashdot, of course! Don't worry: they'll print it. They'll print anything and it doesn't even have to be in the form of a question!
GMD
watch this
The only, only thing you can expect to learn is who's communicating with whom [and when / how much information is exchanged] ( you probably know this already ) , and what protocol they're using ( it's probably unbreakable ).
Chances are, if you are intercepting an encrypted stream, you are intercepting an unbreakably encrypted stream.
Perhaps you are thinking that if only you knew what protocol the stream is using, you might look online and see if that protocol has been cracked.
Don't waste your time.
The chances are approximately 0 that the stream you are intercepting is using a protocol that has been cracked, or that it is using a keyspace you can brute-force for under a few hundred thousand dollars, or in under a matter of years.
Sorry -- you have a higher chance (almost infinitely higher -- as I said, the chance you will succeed in what you are asking to do is approximately 0) of port-scanning the machine at the source or the destination and 0wning it than you do of breaking the stream.
I don't say this to mean you should give up -- just that you're phrasing your question wrong. Don't discount the 0wning venue of attack.
For every million desktop machines communicating over TCP/IP, only a matter of a few dozen will have 0 exploitable security weaknesses. (However, most security weaknesses are unknown.)
Find out what kind of machine is at the source and the destination, then 0wn one of them. Chances are almost overwhleming that it's possible, if not with a remote exploit, then through social engineering. (Send an attachment that will be opened on either end of the communication, or induce either end to visit a web page in a browser that is exploitable (=, basically, every browser except Lynx).
If they browse with Netscape or Internet Explorer, chances are almost overwhelming that they can be owned.
It's not that hard to get someone to browse to a certain page, if you know anything at all about who that person is.
Back to your original question: gone are the days that protocols were breakable by any hotshot think tank. Today only implementations are, and rarely at the level you're trying to address. Don't break the code -- break into the system.
Hope this helps.
You social-engineer the NSA or other TLA with teraflop codebreaking computer-capability into helping you crack the message. For example, consider the following method used by an Idahoan to get his potato field plowed:
& gt ;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
An old man lived alone in southern Idaho. It was early spring, and he wanted to spade a garden plot to prepare it for planting potatoes. But it was very hard work and he just didn't have the energy.
You see, his only son, who would have helped him, was in prison.
The old man wrote a letter to his son and mentioned his predicament.
A week later, he received a note back, which said, "For heaven's sake, Dad, whatever you do, don't dig up that section of the garden! That's where I buried the GUNS!"
The next morning, bright and early, a dozen police showed up and dug up the entire garden, without finding any guns.
Looking out the kitchen window, the old man thought "Now, what in the world is going on here?" Confused, he wrote another letter to his son telling him what happened, and asked him for advice.
Another week passed and his son's reply arrived in the mailbox. The old man carried the letter up to the house, sat down at the kitchen table and read, "Now plant your potatoes, Dad. It's the best I could do for you under the circumstances."
NO CARRIER
Damn line noise...
Good old memories!
You're all bastards!
Back in the days of WWII it was easier. The codebreakers at Bletchley Park already knew that the messages were encoded using an Enigma machine so all they had to do was work out the positions of the rotors using brain power, the Bombe or later the Colossus machine.
I think you're simplifying these first steps too much.
When the UK first intercepted a message like 'sdjek dYqkP 1Nt$% GGl9) MHrYD +++' , they had little idea what it was. Encryption? Only part of a transmission?
It took them months (years?) to work through the encryption system. In the beginning, they didn't even know there was an Enigma machine. They broke the code by brute force: Many people trying many different methods. If the Nazi's changed the key, you had to start all over from the beginning.
It took a long time for the codebreakers to figure out that there even was an Enigma machine, find a machine and figure out how it worked. It took time and effort, people died retreiving the machine.
Once the codebreakers had used Enigma for time, it sometimes became really simple to determine out if a transmission was Enigma (or other encryption) code.
If it wasn't cleartext, it was code. You knew certain things about the code: The key was transmitted first, the key was 6 characters long.
Some code had fingerprints: One guy always transmitted 'HIT#@!', where #@! = 'LER', and so you used "HITLER" to break the rest of the code. Someone else always used a German women's name (Maybe his girlfriend) said "GRE$$!" where $$! = "TTA", so "GRETTA".
So take a step back, if you can't determine the nature of the code you are seeing, it will be very hard to crack.
"Can of worms? The can is open... the worms are everywhere."
Thanks Cliff, now everyone knows Osama's slashdot nick.
HURD - Hurd's Under Research & Development
this is obviously perl code
'sdjek dYqkP 1Nt$% GGl9) MHrYD +++' decrypts to,
'I send you this file in order to have your advice'
Beta sux! Join the Slashcott! http://hardware.slashdot.org/comments.pl?sid=4760465&cid=46173047
sdjek dYqkP 1Nt$% GGl9) MHrYD +++
Two things give it away:
The spaces are too regular. You'd be quite hard pressed to form a coherent sentence with any character occuring every 5n character.
So then perhaps the spaces are irrelevant. Then the next questionable aspect is the last three +++'s. Now, if your code didn't atleast work in groups of three, the mathematic likely hood of three +++ occuring would be small.
So then, what would make most sense is some kind of consistant bit manipulation at least in cycles of three characters. Then you double GGs and unique character (%$) make that unlikely too.
So what makes the most sense? Just random typing.
Look at the first set of characters:
sdjek
Just type it a few times... It's quite natural. You might have well used asdf (I bet your typing style isn't perfect... you probably favor your right hand).
If you examine each other character grouping, you'll see that none of them are very hard to reach.
Also, it gets the KIS approval which in most circumstances, is the winning vote.
int func(int a);
func((b += 3, b));
Now, Pinky, with this new encryption scheme that deliberately resembles random typing, we shall take over the world!
A book titled 'System Identification And Key-Clustering', by Dr. I. J. Kumar is available from Aegean Park Press. It deals with defining a methodology for identifying cryptosystems and narrowing the key space applicable for a given message. This is quite what you want, but be warned - it is not for the faint of heart...
- Try to get more text coded with the same cryptosystem (and preferably the same key). Cracking anything based on 25 bytes of ciphertext is going to be hard.
- Look for statistics. Run character-statistics. Do they look like normal text, only with different symbols ? If so you have a monoalphabetic substitution-cipher, crackable in 5 seconds by a computer or 5 minutes by hand. Repeat for digraphs or trigraphs. Any result different from "all combinations equally likely" (or close) gives you a hint.
- Try to xor the text with a copy of itself shifted various places left and rigth. Observe how many nulls you get with various displacements. If you get a jump in nulls for a certain shift, you're likely dealing with a periodic substitution-cipher. Again easily crackable if the period is not too long and you have enough ciphertext. (enough here is something like 20 times the period. So if the period is 50 you'd need a kilobyte of ciphertext to easily attack it, more or less.)
If the text looks completely random under all statistical analysis you can think of, and stays that way even when xored with itself shifted various ways odds are you're dealing with something a bit more serious, and you'll need more expertise than you can gain from a "ask slashdot" article to crack it.Good luck !