17% Smaller DES S-box Circuits Found
solardiz writes "DES is still in use, brute-force key search remains the most effective attack on it, and it is an attractive building block for certain applications (the key size may be increased e.g. with 3DES). Openwall researchers, with funding from Rapid7, came up with 17% shorter Boolean expressions representing the DES S-boxes. Openwall's John the Ripper 1.7.8 tests over 20 million combinations against DES-based crypt(3) per second on a Core i7-2600K 3.4 GHz, which roughly corresponds to a DES encryption speed of 33 Gbps."
A tight datapath-driven layout on standard S-boxes has the potential to recover more area. just sayin...
This means the NSA won't have hardware versions of this running until next Tuesday.
Who cares what some CPU can do. The real question is how fast does it run on a modern GPU? Generally speaking encryption cracking and such is way faster on a GPU due this problem being highly parallel in nature.
5DES!
Here it comes. :-)
"Flyin' in just a sweet place,
Never been known to fail..."
Why did it take so long to find a shorter boolean expression? Aren't there programs that take in a truth table and churn through all the expressions that can generate it? And isn't the S-box I/O size pretty small to begin with?
Information theory is life. The rest is just the KL divergence.
Who cares what some CPU can do. The real question is how fast does it run on a modern GPU? Generally speaking encryption cracking and such is way faster on a GPU due this problem being highly parallel in nature.
TFA has nothing to do with the target platform. If you can improve the algorithm itself by 17%, then it doesn't matter if that means (33/0.83)MB/s on a CPU or (333/0.83)MB/s on a GPU.
Granted, some optimizations may adapt worse (or better!) to the instructions available on a GPU than they do on a CPU, but the task of password cracking inherently parallelizes almost perfectly, so 17% just means 17%, period.
What I thought was interesting about the S-box, was the obfuscation they introduced by where they selected the look-up bits from. Just re-order the S-box and use a straight 6 bit index....
3DES can offer up to 112 bits of security; even your GPU could check one key per nanosecond with 10000-way parallelism, you are looking at something to the effect of 16464665330209 years of work to do a brute force key search.
Palm trees and 8
Actually, a lot of people care about CPUs. I spoke to someone from a penetration testing company the other day. They run a lot of password hash cracking. And they have 10x more CPUs (used for other purposes as well) than GPUs (bought specifically for password cracking). Given that performance of DES-based crypt(3) on GPUs is by far not as impressive as it is for other hash types, they typically test this sort of hashes on CPUs and not GPUs.
That said, yes, when we worked on the S-boxes, we thought of GPUs as well. One of our target sets of "logic gates" is specifically that of high-end AMD/ATI GPUs (it also works well for Cell, PowerPC/AltiVec, and AMD XOP, but we deliberately excluded gates/instructions that are present on only some of these four platforms). The author of one of the GPU-based cracking tools (for tripcodes) reported a 20% improvement on Radeon HD 5970 due to our new S-boxes. Andrey Belenko of ElcomSoft wrote in a tweet that "Effect for GPUs might be well above 20%, actually."
I calculate about ten to the fourteenth years worst-case, or half that average. Enough that brute force doesn't look very practical.
Yeah I should have been a bit clearer; that was a worst-case estimate. It is closer to ten to the thirteenth years by my calculations, but that is still far beyond the limits of practicality.
Palm trees and 8
Here are some specific performance numbers for DES-based crypt(3) on GPUs (for comparison, recall that we're reporting over 20M c/s on a CPU):
oclhashcat-plus is reported to achieve 55M on ATI HD 5970, only 25M on NVidia GTX570 at 1600 MHz core clock, 310M on 8x ATI HD 6970, 181M on 7x NVidia GTX580 (1594 MHz). The numbers for oclhashcat-lite are very similar (57M, 26M, 297M, 181M, respectively). These are off the hashcat website. This does not use our new S-boxes yet (I expect that future versions of *hashcat tools will).
Notice how the number for high-end NVidia is on par with that for our CPU, and for ATI is less than 3x better. Of course, GPUs do have an advantage, but it still does make sense to use CPUs as well, which a typical organization has more of and doesn't need to spend extra time to deploy, install drivers for, etc.
Now, our new S-boxes and other optimizations will provide better performance. Per discussions with a tripcode cracker author, I expect all the way up to 400M c/s on ATI HD 5970, which is close to its theoretical peak speed (approx. 80% of it per some estimates). This is a 20x improvement over our figure for the Core i7 CPU, which is significant. (There's a little room for improvement on the CPU as well, though - specifically, if we pre-generate or runtime-patch the code for each salt as opposed to using pointers at runtime like we do now. This kind of optimization is assumed in the 400M figure for the GPU. So with both having the optimization, the GPU's advantage will be less than 20x.)
Curiously, 400M c/s for 25 iterations of DES will mean that a single ATI HD 5970 with proper code will be able to crack 56-bit DES keys in just 42 days on average.
So, yes, GPUs have an advantage, and we have contributed to that as well.
Very cool. Any chance of using Intel's AES-NI instructions on the i5 and i7 to shortcut some of the work for the cpu version? Or are they too specific to AES? AES-NI can run AES (I forget exactly which one) at something like 3 GBytes/sec on an i7.
(And, of course, programming it directly into a FPGA, but that's a separate topic).
-Matt
Bitslicing tends to use lots of and/or/xor operations. GPU's aren't that much awesomer than CPU's in those operations.
AES-NI is definitely too specific to AES, not reasonably reusable for DES. Yes, we have achieved a speed for DES comparable to that of AES with AES-NI.
We're actually considering building a password hashing method on top of something like this, where bitslice DES has the advantage of being scalable to arbitrary SIMD vector widths and not requiring specialized instructions for efficient implementation. DES is also FPGA-friendly (more so than AES), and we have a project to implement password hashing for authentication servers equipped with FPGA boards:
http://www.openwall.com/lists/crypt-dev/2011/04/05/2 - project rationale
http://www.openwall.com/lists/crypt-dev/2011/05/09/1 - alternative approach
We're also considering Eksblowfish-like constructions, though - such as to make use of Xilinx Block RAMs (and thus require attackers to use more resources too).
BTW, not sure if I am speaking to the right Matt, but of the two SHA-crypt flavors the SHA-512 based one actually has a practical advantage over the SHA-256 one: more complete use of 64-bit CPUs in servers. So I think Dragonfly BSD's choice was a mistake. GPU implementations for both are being worked on, and the difference should be seen.
Bitwise operations are not an issue. Besides, we have versions of our S-box expressions that primarily use "bit select" instructions, such as ATI's BFI_INT - these work on PowerPC/AltiVec in JtR 1.7.8, but I think they will see even more use on high-end AMD/ATI GPUs (this is what we primarily had in mind).
The real issue is register pressure (bitslice DES needs a lot of registers) and memory latencies. In our S-box expressions, we tried to minimize not only gate count, but also the number of registers needed for storage of temporary values in a software implementation. This was among the criteria applied to choose a few best versions among thousands of same-gate-count expressions that we generated. We also cared about the amount of inherent parallelism available in a single instance of the code for each S-box, even though it sort of contradicted the preference to require fewer registers.
Both are no competition for AES at all. Also, 3DES is still of dubious security and DES can be brute-forced at home today. So this is really non-news, as it does not hold any relevance for people that care about having secure encryption.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Yah, that's me (DragonFly). You mean for the default password encryption? We switched the password encryption to sha-256 by default as one of the google code-in projects. I wasn't particularly involved but I'm sure you are correct. All I can say is that it was better than what we had before :-)
I guess SHA is turning out to not be quite as secure for its key-length as RSA was/is.
My personal position on password files is that it doesn't matter how good the encryption is. Social, physical, and psychological algorithms radically reduce the number of attempts needed to figure out someone's badly thought-out password and things like requiring a number, a capital letter, and a minimum of 8 or 10 chars just isn't good enough. Businesses can't really require ultra-secure passwords without pissing off their user base, so it's a losing proposition no matter how you twist it.
So we basically do not allow passwords at all on any of our development machines or servers, with exception to console access (a weak point to be sure but there isn't a whole lot I can do about it). The console server itself uses ssh. The master password file is *'d out.
Developers have to login via ssh or not at all, and are not allowed to use private keys on shared shell machines (they have to key-forward or not fan-out from the shared shell machine at all, which is more preferable). Then a security breech can't fan-out into multiple vectors as it does when an (encrypted) master password file is stolen from a computer and the attacker has an infinite amount of time to crack multiple passwords offline, or if the private key is stolen.
The biggest problem businesses have when their sites get hacked is not the closing of the security hole, but instead preventing the attacker from re-entering the system via an infinite number of broken password. Having to force millions of users to change their passwords is becoming a non-starter (I think Sony found that out the hard way).
Random passwords might be more workable but since random passwords cannot be remembered anyway one might as well go with a more secure and much longer public key.
That's my opinion anyhow :-)
-Matt
NTMLv1 was the problem and it hasn't been used unless talking to very old servers in a very long time. You could always turn off NTLM password storage (since XP at least) and in Vista and Windows 7 it is off by default.
NTMLv2 uses a challenge response system and so you can't offline crack it in the same way.
http://lkml.org/lkml/2005/8/20/95
Matt -
I have so many comments on what you wrote that I don't dare to post them. :-) But I'll say a few things:
Password policies still make sense to me when combined with modern (salted and stretched) password hashes, particularly for large user databases where each account is of relatively little value (your Sony example applies here). Rather than absolutely require certain character classes, users should also be given the option to use longer passphrases, where the number of required character classes can be reduced to 2. I think you have our passwdqc in DragonFly (via FreeBSD), right? Well, it includes passphrase support by default, starting with 3 words of combined length 11, including separator characters - or longer, indeed.
Thank you for describing your authentication methods policy for developers. We use a similar policy for multi-developer or multi-sysadmin projects: http://openwall.info/wiki/internal/ssh
- Alexander
If I encrypt a string with DES, and you intercept it but I didn't tell you it's DES, can you tell by inspecting the ciphertext that it's DES (and then use DES cracking tools to crack it)?
--
make install -not war