Cracking PGP In the Cloud
pariax writes "So you wanna build your own massively distributed password cracking infrastructure? Electric Alchemy has published a writeup detailing their experiences cracking PGP ZIP archives using brute force computing power provided by Amazon EC2 and a distributed password cracker from Elcomsoft."
If only they'd thought of using distributed computing for the first post, instead of password cracking!
I was under the impression that crypto like PGP was based on stuff which would (in theory) take millions of years to crack even with every machine on earth dedicated to it?
Yes obviously cracking passwords scales linearly, we've known that for a long time. Oh, you could get 100 machines brute forcing instead of one, but what good is that? Either the password is crap and you crack is easily, or it's helluva complex and scaling it up 100x won't do a damn thing. In this case it looks like they just picked some random range and said "Hey, this is unfeasible on a single machine and doable on a cloud, let's do that" but they haven't produced any credible evidence it is in this range. Not unless semi-complex password possibility matches their corporate password policy or whatever.
Live today, because you never know what tomorrow brings
So you wanna build your own massively distributed password cracking infrastructure?
No
> I was under the impression that crypto like PGP was based on stuff which
> would (in theory) take millions of years to crack even with every machine on
> earth dedicated to it?
That's true if everything's equal. Including your passphrase. If the cipher
for encryption is 128-bit strong, then your password/passphrase needs to match
that. If it doesn't it's the weakest link, easier to attack than the actual
crypto algorithm and will take accordingly less time to crack.
Example: For a password composed only of lower-case a-z english characters, ;-)
you'd need 28 characters chosen in a true random fashion (think scrabble tiles
pulled out of a hat) to actually achieve a strength of 128-bit, that matches a
128-bit crypto or hash algorithm.
The strength of TFA 'sweetspot' passwords were somewhere around 60-bits.
Since even RC5 has been broken at 64-bits (distributed.net - though it took
some time), such passwords are OK for low-priority stuff but not, if say, the
NSA is after you
I was under the impression that crypto like PGP was based on stuff which would (in theory) take millions of years to crack even with every machine on earth dedicated to it?
Yes, but the search space is significantly lower if you assume an password that's 1-8 latin alphanumeric characters, as this exercise did.
It's still 122 days on 10 VMs. One tenth of that on 100VMs.
One of the adversized features of ElcomSoft Distributed Password Recovery is that all network communications between password recovery clients and the server are securely encrypted. How is that possible, I wonder.
How do you know they weren't cracking a PGP'd zip archive containing secret documents about alien protein folding technology?
First of all, the article is a very nice summary of the issues involved with setting up a cloud to crack passwords - the nuts and bolts, if you will. I liked that the authors took the time to look into the economics of trying to crack passwords, how much money it would cost vs. how long it would take. Password cracking is one example of massively scalable computing, which is presumably why the NSA allegedly has had to keep upgrading the electrical infrastructure at their headquarters. Elcomsoft certainly made a splash with their PGP-cracking software and managing to harness the power of cheap GPU cards (which are set up for parallel processing) was a bit of genius. That said, even massive horsepower runs into a brick wall once the passphrases become long and the encryption algorithm is good.
On page 2 of the article, the authors nicely summarize the cost of cracking longer and longer passwords. Once passwords start incorporating special characters (per SPEC), the cost shoots sky high even for relatively short passwords (i.e. $10MM+ for a 9 character password, $1BN for a 10-character password, the US national debt for a 12-character password). The article so clearly lays out why the various law enforcement agencies have been focusing on being able to force folk to disclose their encryption keys. The cost of cracking a well-executed encryption scheme combined with a good password is simply too high. So, go ahead and use those special characters, upper and lowercase, etc. to make life interesting for would-be snoops. But realize that unless trends in privacy rights swing the other way, law enforcement will simply compel key disclosure, as they have for years in the UK, for example.
Lastly, the article underscores the value of keychain-type schemes that allow many long passphrases to be stored in a accessible format. Make it easy to have long, complex passphrases and it becomes more likely that people will actually use them.
Schneier had an interesting piece on deriving a limit of the necessary key length from thermodynamics. ... assuming your password is only bruteforce-able ... otherwise http://xkcd.com/538/
http://www.schneier.com/blog/archives/2009/09/the_doghouse_cr.html
NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
In this case, it sounds like the customer was pretty glad they'd used weak passwords.
The implication is that they'd locked some files up in an encrypted zip, forgotten the password, and wanted the contents back.
If they'd chosen a stronger key, they'd not have got their files back.
TFA:
This analysis may be insightful as you develop your enterprise password policies, or choose your personal passwords.
(A good password policy is: don't forget your passwords!)
No, they've been approached by a client who's forgotten the password they used. The client's told them they used 1-8 alphanumerics in the password.
In this case, the mapping to a binary key is irrelevant to the size of the brute forcing task.
you'd need 28 characters chosen in a true random fashion (think scrabble tiles
pulled out of a hat) to actually achieve a strength of 128-bit, that matches a
128-bit crypto or hash algorithm.
Scrabble tiles would be an exceptionally bad choice.
Only if your communications with the cloud are in the clear. Why would they be?
I'm also a bit confused. I've never used PGP to make an encrypted zip file, but I use GnuPG to encrypt emails all the time and I, too, was under the impression that it was infeasible in practice to brute force the encryption.
Is the difference that with PGP/GnuPG email encryption, our passwords are merely decrypting our keys which are themselves fully 128 or 256 bits long or whatever? Whereas in this situation with the ZIP file there was no separate key - the password was the key? (I haven't read all of TFA)
The company surely did have the private PGP key lying around. They just forgot the password.
As an analogy, think of a safe. A good safe is hard to break in if you don't have the key. If you have the key, it's quite easy. Now you fear that someone could break in your house, get the key and open your safe. Therefore you put the key for the big safe into another, smaller safe. If you need to open the big safe, you first open the small safe, take out the key of the big safe and then open that.
Now if you have lost the key for the small safe, and the small safe is less secure than the big safe, you'll certainly not crack the big safe, but just the small safe in order to get the key of the big safe.
Now, the key for the small safe is your password, and the key of the big safe is the PGP key. If someone has access to the small safe (the password-protected PGP key), then the security of whatever is in the big safe is certainly limited by the security of the small safe.
Now with emails, the point is that the big safe (the encrypted email) is out in the public, while the small safe (the password-protected PGP key) is in your home (i.e. on your computer, which hopefully itself has appropriate protection against intruders).
So the security of your PGP encrypted mail is limited by the combination of the security of your computer and the security of your PGP password. If your computer is basically unprotected, and your PGP password is weak, then anyone can read your encrypted mail by simply breaking into your computer, copying the private PGP key, and breaking the password. If your computer is well-secured, the attacker will have a hard time to get your private PGP key, and if you PGP password is strong, the attacker will have a hard time to break it if he manages to get the PGP private key.
The Tao of math: The numbers you can count are not the real numbers.
I thought the problem was that there was an infinite number of matching passphrases producing invalid results. Like, only a very simple hash or CRC - 1 or 2 bytes checks the validity of the passphrase to protect from common typos, but if you try even semi-hard, you will get a hash collision, the data decrypts, but it decrypts to garbage - a standard GIGO filter with a very weak anti-garbage protection on input.
This way, on top of one correct result you should get an infinite number of incorrect results and unless you have a clue how the correct result should look like and use some heuristics to distinguish it from garbage, you'll be no wiser than before... (and if it was additionally encrypted with anything that makes it look like white noise, there is simply no way to tell it apart from pure garbage.)
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
Damnit, my password is all vowels again!
which is totally what she said
such passwords are OK for low-priority stuff but not, if say, the NSA is after you ;-)
If the NSA is after you, I would think the strength of your passwords is the least of your worries.
The best solution (if you are dealing with a desktop system) is to have the pass-phrase and keys but also have a small GPS module. If the usb key is not close to where it should be (with a fairly big margin for the fact that cheap GPS modules arent exactly accurate) it would erase the pass-phrase
If they try to force you to hand over your password (e.g. UK RIP act), you just hand it over (to the guys who seized your computer and are now trying to use it somewhere else other than the required GPS location) and boom, the data is gone forever.
If you need to move house, just log in from the old house and reset the GPS then when you get to the new house, log in and put in the new coordinates.
To pick a trivial example.
Your password is 'password'.
Cracking algorithm attempts to open your encrypted archive using a list of, say, 20,000 english words. 'password' is 5th on the list. After 5 iterations, you notice that your decryption attempt has yielded data that looks like a valid zip archive, or contains english words. Result. You win the internets.
You can refine this.
1. Attempt a password list crack.
2. Attempt a Markov-chain based crack, looking for english-like words generated by your Markov Chain algorithm. Like, say. 'bibble' or 'foglet'. Tr
3. Repeat the above for all letter case combinations, and number/letter replacements - like B1bb7e, or f0Glet.
Et cetera,
The edge you have is that people often choose known words as passwords, or easy-to-remember nonsense words.
This reduces your password search space *hugely*.
For example, say your pgp doodad accepts up to 10 character passwords formed from any combination of letter case or number. 26 lowercase letter, 26 uppercase letters, 10 numbers. Your maximum search space would be the sum of all (26+26+10)^n, where n iterates from 1 to 10, or 853,058,371,866,181,866, or 8.5x10^17. This is the size of the set of all possible mixed case alphanumeric passwords up to a maximum length of 10. You would have to try each of these combinations to fully search this space. This is called 'brute forcing'.
It is a *much* larger number of passwords than the 20,000 in your dictionary list....
So, you use the search space limiting techniques *first*, which will yield a result in 95% of all cases. Then, you try brute force, or give up.
I looked at EC2 for raw processing power earlier this year (my company needs to train a lot of neural nets) and it just isn't worth it, unless you only need the power short term. A high-performance EC2 node gives you 8 cores running at (very roughly) the equivalent of a 2GHz P4, and costs $0.68/hr == about $460 per month, which is only a little less than what an equivalent box (probably a 2.83GHz Core 2 Quad or similar) would cost you. Put power to run that box down at about $0.05 per hour and you can build your own local cluster of equivalent performance for around the same amount of money as you'll save in your first month and a half of operation.
> If the encryption software works as advertised, they would need the private
> key file to exploit this.
You are confusing public key encryption (1 private key & 1 public key) with
conventional/symmetric encryption (gpg -c) where no separate key per se is
required. The encrypted file is all you have.
The best answer of all is "physical seganography" i.e. 802.11 NAS built into something that the cops are unlikely to seize (yet which has a legitimate need to be plugged in and doing what it does)
What chore that they need to use Windows. For a brute force password guesser, most Slashdotters could write it in 10 lines of perl.
I think ten lines of Perl would be the ideal password somehow.
That's only a problem if you have no idea what the encrypted data might be. But in most reality-based cases, that's not the problem. You almost always have the clues you need.
In this case, for example, the file is a ZIP archive. That means the archive contains in the clear the original file names including any extensions, such as .jpeg, .bmp, .doc, .pdf, or whatever. All those file types have artifacts you can test for. They all have specific formats. They'll have version numbers, dimensions that must fall within reasonable boundaries, or other attributes that simply won't produce a coherent file unless they're correct.
For example, a JPEG image file is a container and is filled with markers identifying all the different sections. They all must be right or it won't display. So you'd start by looking for the SOI marker as the first byte of the file (0xffd8) or you'd throw it out. After the SOI you'd have to find another valid JPEG marker (two more bytes beginning with 0xFF.) So that's three bytes you'd have to match exactly, and the fourth byte would have to be on the list of valid markers. After you find the next marker, it'll probably be followed by a length (two or four more bytes). If that length is greater than your file size, it's a fail. Sure, if all that passes you'd have to decrypt more data to figure out if you're still in a valid file, but the chances are now only about 1 in 16 million keys tested. You then farm all these "potentials" to a machine or other process dedicated to deeper examination of the candidates.
If I were writing this, I'd have enough smarts in the key tester to look for all possibilities within the first blocksize of the cypher. Anything that looked reasonable at that point would be exported to the "evaluate potentials" system.
Every data file has its structure. You just have to look for it.
John
It wasn't carbon, but the fuel consumed that was my first thought. Back when distributed.net was busy burning energy to win these pointless challenges, I did some rough calculations on the electricity required to solve it.
Turns out that the energy spent breaking RC5-64 used somewhere between 2 and 50 *trains* full of coal.
And that was only the energy directly consumed by the computers involved, and not any of the heating or cooling costs associated with it. And sure, more modern CPUs are more energy efficient, and I extrapolated the figures from a lot of published sources and made a lot of assumptions. But regardless of CO2 or greenhouse gasses or dirty coal or any of that environmental stuff, that's a lot of irreplaceable fossil fuel that's now gone.
I don't think it's sad or tainted to consider the overall impact of what you do. Saying "oh, I want to help search for E.T." is one thing. It may cost you an extra 1440 kWh/day, but you have the money, no big deal. But understanding that SETI@HOME is causing tens of thousands of people around the globe to collectively burn tons of fuel every day might make some of the volunteers rethink their decision. Ignoring that is the kind of perspective that thoughtlessly sucks up our finite resources.
And no, I don't consider alien hunting a valuable use of energy, at least not at this time in our history. Once we have fusion reactors or some other form of "free energy", all that will change.
Go ahead and crack keys, search for Extra Terrestrials, or fold proteins, or whatever you want to do with your box. Leave your lights on 24x7. Run the furnace and the air conditioner together. Just understand that what you do today has an impact, and consider the value of the outcome.
John
FTA, they mention that Amazon didn't allow them to create more than 9 instances, so they couldn't crack the passwords in less than 122 days. (a request to get suitable amounts of computing power was made, but takes time, is not enabled by default, and wasn't available at the time of writing?)
Dear Sir,Thank you for submitting your request to increase your Amazon EC2 limit. It is our intention to meet your needs. We will review your case and contact you within 3 - 5 business days.
Can we get a "-1 Wrong" moderation option?
Funny sig considering your post. Look, first thing; the authorities aren't stupid. The first thing they do is mirror your data, then test the passphrase you give them on the mirrored data. When your phassphrase deletes the data, they still have a copy backed up, and now you've bought yourself a prison sentence, or worse.
As to entering your ATM passphrase backward, that doesn't work anywhere. Some guy tried to make it a standard, but the authorities, noticing immediately all the problems with it, choose not to implement it anywhere. If you think you're right, and want to prove me wrong, go to Snopes and look it up.
No problem, I've got a monitor full with post-it notes. So my policy must be excellent.