Codebreaking - Taking the First Step?
Master Spy asks: "Here's something that the Slashdot community might be able to help with. If you receive a message in code how do you take the first step? Back in the days of WWII it was easier. The codebreakers at Bletchley Park already knew that the messages were encoded using an Enigma machine so all they had to do was work out the positions of the rotors using brain power, the Bombe or later the Colossus machine. American codebreakers also knew the basic details about the methods the Japanese used but now however things are more complicated. Suppose you are listening to a transmission and you receive the following: 'sdjek dYqkP 1Nt$% GGl9) MHrYD +++' How do you know how the message has been encrypted? It could be an Enigma machine, it could have been XOR'd with a second message or a one-time pad or it could use some form of software encryption such as Blowfish or DES. Before you start ripping the message apart for decoding how do codebreakers find out what method has been used to encode the message?"
Let the medium decided how to decide this.
Ie if were over the net, look at the wrappers.
If over the radio, look at the spectrum.
It's whats around the message that will break the message
Sigs are dangerous coy things
Well, if this was easy, codebreaking wouldn't be any fun. Don't forget that both the Germans and the Japanese had a variety (tens if not hundreds) of different cyphers in circulation, so it wasn't exactly as simple as assuming it's Enigma or Code Purple. :)
As to how it's done, that has to do with analysing the text, frequency analysis of 1-grams 2-gram etc. Simple substitution will exhibit one fingerprint (though different languages will obviously be different), something like a playfair or Venegere (sp) square will have another, and DES encypted text a completely different structure. Obviously on a small enaugh sample there may not be enaugh information to latch into...
But with a larger sample, it's mostly a combination of good tools, experience, and guesswork
The better you know what's out there to use, the better the chance of recognizing what you're up against.
Suppose you are listening to a transmission and you receive the following: 'sdjek dYqkP 1Nt$% GGl9) MHrYD +++'
Yeah, 'sdjek dYqkP 1Nt$% GGl9) MHrYD +++' showed up on my SETI@home screen too.
This is clearly the signature of the Grays from Cygnus Prime. You don't want to communicate with them.
They Grays of Cynus Prime are evil. They will put chips in your head.
They will use the chips to make you do bad things. Like posting to Slashdot.
Opinions on the Twiddler2 hand-held keyboard?
Do the statistical analysis on the encrypted data. In several ways. If all you get is seamingly random stream of data with equal distribution of all values then you've got raw stream encrypted by modern, quite strong cipher.
;)
Good luck
Bastard Operator From 193.219.28.162
Step One: "Aquire more samples."
When you have less data then a smallish key (and that message has no more then 28 * 8 = 244 bits, probably much less), the data can (most likely) decrypt to anything at all with the proper key. If that's all you really have, then you need to pursue non-code-breaking methods of finding out what that is.
And of course what to do next depends on the characteristics of that more data. A lot of cyptoanalysis assumes you have knowlege of the encryption method; this is because it's "easy" to obtain by reading code, but "easy" is a relative term. It's easier then just guessing, but still hard. Without knowlege of an algorithm, you need to luck out and hope they used one with a distinct signiture. If they didn't, you're probably basically out of luck on a single person's resources, because all of the "good" algorithms should be effectively indistinguishable from noise after encryption.
"Suppose you are listening to a transmission and you receive the following: 'sdjek dYqkP 1Nt$% GGl9) MHrYD +++'"
I transmit a polite reply saying, "No, I am NOT interested in being your love monkey no matter how much you lust after me." Gee whiz! The things people say when they think nobody is listening!
Code breaking is hard by its very nature. You're trying to find an unknown message by inverting (or short-circuiting) an unknown process.
If you think of things mathematically, you're looking to find a plaintext p in the set of all possible plaintexts P and some function f from the set of all ciphertexts to the set of all plaintexts where f(c)=p. These means both f and p are unknown, and while multiple solutions may exist they are likely of "measure zero" in 2 very large spaces. (let's asssume we have a suitable measure for such things, and not worry about the real details.)
To a mathematician, finding a general solution to the above would be a Field's medal winning sort of thing. The reality is that you need more information. If you got a large message you should start checking letter/symbols counts, following by the counts of various character pairings, etc. The goal is often to come up with a statistical model to see if you can build a plausible f. Another thing is to try common functions (xor with various values, etc.) on the stream and see what happens. Sometimes that'll give you a clue. But most of time it involves a little luck, a little intuition and a lot of perseverance.
1. Get some books. Schneier's book is the best starting point.
.. move up the scale until you can understand the way more sophisticated codes are broken (for instance differential cryptanalysis). It gets harder at this point and well outside the realm of practicality but if you get this far, you will be able to break any cryptographically weak cipher (which includes the products of many companies, unfortunately).
2. Learn statistics (and basic number theory). You can discover a lot about a message by its statistical properties.
3. BREAK LOTS OF CODES. Without experience, you are lost. Start by breaking substitution and Caesar ciphers (easy with statistics), then Vigenere/Gronsfeld ciphers (harder but still "crypto for dummies"), then try XOR ciphers (they can be solved easily in an interesting way)... then try to understand how WEP is broken... DeCSS
4. If you become advanced enough, you can start reading papers on cryptanalysis. Many of them are surprisingly easy to understand once you understand number theory. However, it is much more difficult to *discover* some of the stuff these guys come up with, it's pretty amazing.
Anyway, to summarize, understand the statistics involved and PRACTICE until you can just look at a substition cipher and understand what it says... just by the letter frequencies! If you are trying to break a simple code you need lots of ciphertext to analyze.
And don't forget: sometimes you don't need to break a code at all. As a poster above wrote, sometimes context is enough. Sometimes an external clue will give the code away. How do you know what to look for? Experience!
Enlist in the NSA and enroll in the National Cryptologic School.
This is a question that probably takes a CS PhD to be able answer. So different encryption schemes have different suseptabilities (sp?) to different attacks. For example, if you are using a one-time pad stream cipher using a pad that has never been used before, you are totally SOL as an attacker. It isn't breakable without that pad. Period. If you are using some of the more sophisticated ciphers that have short keys (block ciphers), then there are sophisticated statistical analyses that can be performed to determine the likely method being used.
What you are referring to however is a situation where you don't know the encryption method. This is extra security through obscurity, which we know doesn't work very well. Many encryption schemes are very, very good, and you won't able to attack them easily even with knowledge of what they are. Usually, for example, you need to know a bit of the message, in addition to the cipher to be able to break it. For example, a bunch of emails may start with "From: xxxxx." If you have a lot of emails, encrypted similarly, you may be able to mount a reasonable attack, depending on the method used.
-Sean
If you are interested, I would suggest that you start by reading The Code Book by Simon Singh. It gives a good overview of the history of the battle between cryptography and cryptanalysis, and how ciphers have evolved to defeat methods of codebreaking. It's an interesting and entertaining read and you might gain some insight on how you would approach this particular cipher.
BTW, I have a truly marvellous solution to your cipher which this textarea is too small to contain.
It also helps if you have a basic idea of what's encrypted, ie what kind of plaintext message you're dealing with. A .doc has a different signature than a jpeg or flat ascii or html, etc. some encryption software relies on headers or footers to the encrypted data in order to sanity check for decryption. again, look also at the medium that the message is transmitted through -- tcp/ip traffic to port 443 speaks volumes about what algorithms are being used. transmissions received in the 2.4 Ghz wavelength also speaks volumes about what algorithms you may be dealing with. finally, never trust the developer to do the 'right thing' with algorithmic selection -- look at adobe's algorithm selection for its ebooks. look for a pattern in what you're dealing with. it can't hurt to generate a dictionary of known ciphertext file patterns a la the *nix 'file' command. lacking a certain amount of information about what you're dealing with(message length, source of the ciphertext, etc), though, you're SOL.
anyway, I haven't had to deal with much of the kind of encryption that protects data from a government, mostly just the kind that delays your kid sister, so ymmvg...
Take a bunch of encoded stuff and simply look at it, watch for patterns over the course of the data as a whole. For a small sample set
...
:p
'sdjek dYqkP 1Nt$% GGl9) MHrYD +++'
this is not going to do you much good, but if you have reams of encoded / encrypted data just stare at it for a while, look at it in a way that you look through it (like those hidden picture things) and after a while you will recognize patterns and have something with which to work.
There is a fine line between the high quality software engineer and mild autism. Ever watch 'Rain Man' or 'A Beautiful Mind' and think - hey that guy would be a BAD ASS developer
Helps to be able to think in 6+ dimensions when you are cracking codes, and a photographic memory helps too.
I should probably post this as an AC - last thing I need is the CIA / NSA figuring out what I am capable of
Glonoinha the MebiByte Slayer
First you djc,s dk%33R +++ (110), then you sD##N KDL:: Ds03k -332+. From there, it's a trivial matter of just 3!Wop mclDI a002g a!22# with the sklj3 V3iia aq@@1 +1867 -5309.
Duh.
You start with known encryption methods (simplest first) and by process of elimination you keep going until you get a clue. A good cryptologist has information about everything from Bacon's method to the most recent ciphers and their algorithms.
The folks who cracked the Enigma started the same way. The Polish started the process, sent info to England where it was completed.
A code fragment that short, though, would be darned impossible to crack unless you get more.
But you can already see patterns: word length, multiple "+" characters (maybe an indicator of end-of-phrase or something?).
But that's -- basically -- how you do it. Educated guesses and grunt work (either by you or computer). Unless it's Quantum encryption which is spoiled as soon as you intercept, so you can't decode it.
Check out The Code Book for some great -- albeit basic -- information about methodology and history.
If You are an american even posting this is probably a violation of some 4 letter acronym or terrorist prevention law. I mean why not just ask how to make a model rocket why dont ya.
... but I think a time honored tradition of inside knowledge would be applicable here. In short spies, I know its not as technicl of an answer as most were hoping for, but I would conclude that more information has been provided (including information on how to break encryption and decode messages, what the "enemy" is using to encrypt messages) through secret operations an spying than anything else.
Think about the machines they used to decode messages in WWII... the paterns came from somewhere. I think the same applies today to a degree.
Stop thinking about the encrypted bits. Start thinking about who sent these bits and who these bits were sent to. Think about the application which created the data. Think about what purpose the data is going to be used.
Once you have this information, you'll be much better equipped to figure out what the basic structure underpinning the cipher is. For instance, if the data is part of a realtime encrypted stream, I'd think "stream cipher" and look at RC4 or SEAL. If the data's part of a pen-and-paper arrangement with all values mod 26, I'd think "Solitaire". If the data's a pen-and-paper arrangement meant for communicating between two deep-cover espionage agents, I'd think "one-time pad". If the data's something pulled off a disk drive, I'd think of Matt Blaze's ECB+OFB algorithm. Etc.
What it boils down to is, this question is pretty arbitrary. Very rarely will you have no metainformation about the plaintext. Seek out as much metainformation as you can, and use the metainformation to make educated guesses, cribs, etc.
Here's a partial answer:
(1) there is always the possibility that you simply won't. In fact, a properly used one time pad cipher is indistinguishable from noise. It's also a major pain in the ass to use, because you must somehow transmit as many bits of key as you want to send bits of message, and your one-time pad is only as good as your method of transmitting the key.
(2) If there is some kind of message in the signal and a cipher is involved other than a one-time pad or something isomorphic to one, then there will be some degree of redundancy in it. This is a theorem of information theory. Statistical measures will eventually reveal that the redundancy exists.
(3) At that point, there are lots of approaches. A good readable and interesting introduction to these, along with the history of such things, is David Kahn's The Codebreakers. Bruce Schneier's Applied Cryptography is a good, more technical introduction for the computer geek. I've also heard good things for Handbook of Applied Cryptography as well, but I don't actually know the book.
But as someone notes above, it's an inherently hard problem to simply identify the cipher, and modern ciphers like RSA are, as far as we know, computationally intractable because the only known attack requires factoring a very large prime number.
(4) You give up and hire a pretty young woman to talk the marine guards into letting you at the code room. (Details of this approach are left as an exercise for the interested reader.) Sometimes the old fashioned ways are best.
If you receive a message in code how do you take the first step?
You do what everyone else here does when they come across a problem that may or may not full under the category "News for Nerds. Stuff that Matters": you submit it to Ask Slashdot, of course! Don't worry: they'll print it. They'll print anything and it doesn't even have to be in the form of a question!
GMD
watch this
The only, only thing you can expect to learn is who's communicating with whom [and when / how much information is exchanged] ( you probably know this already ) , and what protocol they're using ( it's probably unbreakable ).
Chances are, if you are intercepting an encrypted stream, you are intercepting an unbreakably encrypted stream.
Perhaps you are thinking that if only you knew what protocol the stream is using, you might look online and see if that protocol has been cracked.
Don't waste your time.
The chances are approximately 0 that the stream you are intercepting is using a protocol that has been cracked, or that it is using a keyspace you can brute-force for under a few hundred thousand dollars, or in under a matter of years.
Sorry -- you have a higher chance (almost infinitely higher -- as I said, the chance you will succeed in what you are asking to do is approximately 0) of port-scanning the machine at the source or the destination and 0wning it than you do of breaking the stream.
I don't say this to mean you should give up -- just that you're phrasing your question wrong. Don't discount the 0wning venue of attack.
For every million desktop machines communicating over TCP/IP, only a matter of a few dozen will have 0 exploitable security weaknesses. (However, most security weaknesses are unknown.)
Find out what kind of machine is at the source and the destination, then 0wn one of them. Chances are almost overwhleming that it's possible, if not with a remote exploit, then through social engineering. (Send an attachment that will be opened on either end of the communication, or induce either end to visit a web page in a browser that is exploitable (=, basically, every browser except Lynx).
If they browse with Netscape or Internet Explorer, chances are almost overwhelming that they can be owned.
It's not that hard to get someone to browse to a certain page, if you know anything at all about who that person is.
Back to your original question: gone are the days that protocols were breakable by any hotshot think tank. Today only implementations are, and rarely at the level you're trying to address. Don't break the code -- break into the system.
Hope this helps.
I gotta say that a non-halfassed encryption mechanism is going to have a fingerprint that can only be percieved as "random". To do otherwise would mean that the encryption system is really, really weak.
May we never see th
You social-engineer the NSA or other TLA with teraflop codebreaking computer-capability into helping you crack the message. For example, consider the following method used by an Idahoan to get his potato field plowed:
& gt ;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
An old man lived alone in southern Idaho. It was early spring, and he wanted to spade a garden plot to prepare it for planting potatoes. But it was very hard work and he just didn't have the energy.
You see, his only son, who would have helped him, was in prison.
The old man wrote a letter to his son and mentioned his predicament.
A week later, he received a note back, which said, "For heaven's sake, Dad, whatever you do, don't dig up that section of the garden! That's where I buried the GUNS!"
The next morning, bright and early, a dozen police showed up and dug up the entire garden, without finding any guns.
Looking out the kitchen window, the old man thought "Now, what in the world is going on here?" Confused, he wrote another letter to his son telling him what happened, and asked him for advice.
Another week passed and his son's reply arrived in the mailbox. The old man carried the letter up to the house, sat down at the kitchen table and read, "Now plant your potatoes, Dad. It's the best I could do for you under the circumstances."
khfs hskdf woeiyr ngusdt lsdfhuyttr *^+hgf 1khh^ jshdf +++
NO CARRIER
Damn line noise...
Good old memories!
You're all bastards!
Back in the days of WWII it was easier. The codebreakers at Bletchley Park already knew that the messages were encoded using an Enigma machine so all they had to do was work out the positions of the rotors using brain power, the Bombe or later the Colossus machine.
I think you're simplifying these first steps too much.
When the UK first intercepted a message like 'sdjek dYqkP 1Nt$% GGl9) MHrYD +++' , they had little idea what it was. Encryption? Only part of a transmission?
It took them months (years?) to work through the encryption system. In the beginning, they didn't even know there was an Enigma machine. They broke the code by brute force: Many people trying many different methods. If the Nazi's changed the key, you had to start all over from the beginning.
It took a long time for the codebreakers to figure out that there even was an Enigma machine, find a machine and figure out how it worked. It took time and effort, people died retreiving the machine.
Once the codebreakers had used Enigma for time, it sometimes became really simple to determine out if a transmission was Enigma (or other encryption) code.
If it wasn't cleartext, it was code. You knew certain things about the code: The key was transmitted first, the key was 6 characters long.
Some code had fingerprints: One guy always transmitted 'HIT#@!', where #@! = 'LER', and so you used "HITLER" to break the rest of the code. Someone else always used a German women's name (Maybe his girlfriend) said "GRE$$!" where $$! = "TTA", so "GRETTA".
So take a step back, if you can't determine the nature of the code you are seeing, it will be very hard to crack.
"Can of worms? The can is open... the worms are everywhere."
The looks of the encrypted message give good idea about the method it has been encoded with, examples include:
* Charachter range (alphanumeric - other)
* Length
* Special charachters found much in the encryption
It needs background, if you have seen the type before, you can distingiush it a bit.
And then there is the context the code is brought in.
Anyway, code breaking is usually used for malcious stuff nowadays I guess.
"What you 'seek' is what you get!"
As others have pointed out, the way this is done in practice is by looking at who sent the message to whom and the circumstances around it.
That said, it's worth pointing out that cryptographers take a keen interest in the more academic form of this question, and their defined criterion is that a cipher is only good if it's not feasible to distinguish ciphertext from uniformly distributed random data. Stated slightly more formally: if you can find a way to distinguish ciphertext from random data with probability p > .5 (keeping in mind that guessing at random will make you right half of the time, assuming half of the messages you're presented are ciphertext and half are random) then the cipher is considered broken. This means that even if you can only correctly pick out every one-millionth ciphertext, and you have no clue what that message is, or what key was used, the cipher is still "broken".
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Thanks Cliff, now everyone knows Osama's slashdot nick.
HURD - Hurd's Under Research & Development
this is obviously perl code
> ciphers like RSA are, as far as we know,
> computationally intractable because the only known
> attack requires factoring a very large prime
> number.
I believe you meant to say breaking RSA requires factoring a very large *composite* number.
'sdjek dYqkP 1Nt$% GGl9) MHrYD +++' decrypts to,
'I send you this file in order to have your advice'
Beta sux! Join the Slashcott! http://hardware.slashdot.org/comments.pl?sid=4760465&cid=46173047
sdjek dYqkP 1Nt$% GGl9) MHrYD +++
translates to
"Enlarge your penis in 5 easy steps!"
How did I get this? I think it's because that's the e-mail I get most nowadays. So it's most likely to be that. QED
~ kjrose
...visit your local library. :) No really, I recommend checking out the National Cryptologic Museum just off of 295 in Maryland - http://www.nsa.gov/museum/index.html
- RR
I should put something clever here. Maybe someday.
There is Cryptography and then there is traffic analysis. Especially with old style military radio traffic this can give you a lot of info which is then useful in determining an attack vector for the cryptoanalysts.
Just looking at some traffic can give you a very good sense about the facing unit in terms of size. A company has a different information flow than a brigade or regiment etc.. Obviously these things can be spoofed, but thats all part of the game. The unit size then gives you indication about used equipment and coding mechanism involved.
All those information as well as results from human intelligence gathering were used in bletchley park to break not only the enigma code but also various other codes.
sdjek dYqkP 1Nt$% GGl9) MHrYD +++
Two things give it away:
The spaces are too regular. You'd be quite hard pressed to form a coherent sentence with any character occuring every 5n character.
So then perhaps the spaces are irrelevant. Then the next questionable aspect is the last three +++'s. Now, if your code didn't atleast work in groups of three, the mathematic likely hood of three +++ occuring would be small.
So then, what would make most sense is some kind of consistant bit manipulation at least in cycles of three characters. Then you double GGs and unique character (%$) make that unlikely too.
So what makes the most sense? Just random typing.
Look at the first set of characters:
sdjek
Just type it a few times... It's quite natural. You might have well used asdf (I bet your typing style isn't perfect... you probably favor your right hand).
If you examine each other character grouping, you'll see that none of them are very hard to reach.
Also, it gets the KIS approval which in most circumstances, is the winning vote.
int func(int a);
func((b += 3, b));
echo /dev/urandom /dev/ttyS0 /dev/urandom /dev/ttyS0
Tarsnap: Online backups for the truly paranoid
Now, Pinky, with this new encryption scheme that deliberately resembles random typing, we shall take over the world!
A book titled 'System Identification And Key-Clustering', by Dr. I. J. Kumar is available from Aegean Park Press. It deals with defining a methodology for identifying cryptosystems and narrowing the key space applicable for a given message. This is quite what you want, but be warned - it is not for the faint of heart...
- Try to get more text coded with the same cryptosystem (and preferably the same key). Cracking anything based on 25 bytes of ciphertext is going to be hard.
- Look for statistics. Run character-statistics. Do they look like normal text, only with different symbols ? If so you have a monoalphabetic substitution-cipher, crackable in 5 seconds by a computer or 5 minutes by hand. Repeat for digraphs or trigraphs. Any result different from "all combinations equally likely" (or close) gives you a hint.
- Try to xor the text with a copy of itself shifted various places left and rigth. Observe how many nulls you get with various displacements. If you get a jump in nulls for a certain shift, you're likely dealing with a periodic substitution-cipher. Again easily crackable if the period is not too long and you have enough ciphertext. (enough here is something like 20 times the period. So if the period is 50 you'd need a kilobyte of ciphertext to easily attack it, more or less.)
If the text looks completely random under all statistical analysis you can think of, and stays that way even when xored with itself shifted various ways odds are you're dealing with something a bit more serious, and you'll need more expertise than you can gain from a "ask slashdot" article to crack it.Good luck !
...It looks like the init string for my modem.
Good judgement comes from experience, and experience comes from bad judgement.
- W. Wriston, former Citibank CEO
"The attachment foo.doc was garbled. Please re-send in .txt format"
Liberty uber alles.
Sadly, there's no way to put a > or other HTML metacharacter in the Subject line of a slashdot post, regardless of the post style setting.
The best you can do is embed an HTML entity as text and hope the receiving browser sorts it out (which it probably will, but this still isn't quite kosher).
Yes...now I am going to fix my typos...
/dev/urandom /dev/ttyS0
/dev/cua0 (iirc...I think that's what they used to be?)
cat >
or if you're wanting to stay old school...
You're all bastards!
*sigh*
I'm not posting to this again, no matter WHAT! I've already wasted enough bandwidth.
You're all bastards!
"I think I'll stop here."
The real reason the British were able to break the Enigma codes was that Polish underground agent had stolen an Enigma. Every day the German Weather forecast had the same first line which gave them the exact settings for the machine... Don't over estimate the British... Hey, they invented the concentration camps during WW I.... The Germans "only" took it one step further...
Anyway, there are a number of ways to go around your problem... The problem is that you need 4 things to decipher the code:
1. Message Format (has the message been split into multiple parts and rearranged?)...
2. Used algorithm (DES etc.).
3. Key length.
4. The language of the message.
If these are not known you have the following option: Aquire a number of PC's and either code breaking software (if you can get hold of it) or write it your self. Using these machines set each of them to try breaking the message using different algorithms. The best software doing this have access to a number of dictionaries inorder to check whether it is on the wrong track. This will take some time regarding on the machines... Have fun!
Never buy Sony CDs - they will open up your computer to anyone..
I would start like this:
- Try to compress (zip/gzip) - compressibility is a sign for bad crypto.
- Have a look at the auto-correlation - if you see a comb pattern then it is probably something like XOR, Vigenère, addition mod 256 or similar. CrypTool can break those algorithms automatically.
- Have a look at character frequency, 2-grams, n-grams
- Apply some tests for random data - good crypto should produce data undistinguishable from random data
- If the data looks random you might need some hints on the algorithm.
All the tests suggested can be performed with CrypTool. If the crypto is strong you will need some more insight, but in many practical cases bad crypto is used, e.g. in Psion Word.First, as everyone has said, the more raw material the better. Then you look for patterns; the way the Russian Cold-War era code was broken (project Venona) was that the Russians got lazy/rushed and started to reuse their 'one-time-use' pads. If anyone cares, it was actually my grandfather Merdith Gardner who made the first breakthrough, and that's without the aid of any computers :)
1)Check for similar codes from past. 2)Check with every code for similarity 3)Finalyze 4)Failed Repeat 2. 5)Decode. if everything fails 1)Ask around if anyone knows about new codeing system. 2)Verify the match 3)Decode. or 1)Get longer code. 2)Try to crack open the code your self. 3)Get shot by goverment becaue you found there secret =P
He who controls the Source, controls the program!