Slashdot Mirror


17% Smaller DES S-box Circuits Found

solardiz writes "DES is still in use, brute-force key search remains the most effective attack on it, and it is an attractive building block for certain applications (the key size may be increased e.g. with 3DES). Openwall researchers, with funding from Rapid7, came up with 17% shorter Boolean expressions representing the DES S-boxes. Openwall's John the Ripper 1.7.8 tests over 20 million combinations against DES-based crypt(3) per second on a Core i7-2600K 3.4 GHz, which roughly corresponds to a DES encryption speed of 33 Gbps."

17 of 45 comments (clear)

  1. ONLY 17%? by mozumder · · Score: 2

    A tight datapath-driven layout on standard S-boxes has the potential to recover more area. just sayin...

    1. Re:ONLY 17%? by solardiz · · Score: 5, Insightful

      We were not the first to generate and try to optimize Boolean expressions for the S-boxes. Other researchers worked on this before, starting 1997 when Eli Biham wrote his classic paper on bitslice DES. 17% is our improvement compared to those previous results. To me, it is impressive that after 14 years and numerous attempts by others, including successful ones, it was still possible to improve on the previous best results by as much as 17% at once. My gut feeling is that further improvements, while definitely possible, will be more limited. But the again, some people I spoke to had thought that our 17% was not possible.

  2. Stupid question from crypto-newb by DriedClexler · · Score: 3, Interesting

    Why did it take so long to find a shorter boolean expression? Aren't there programs that take in a truth table and churn through all the expressions that can generate it? And isn't the S-box I/O size pretty small to begin with?

    --
    Information theory is life. The rest is just the KL divergence.
    1. Re:Stupid question from crypto-newb by John+Meacham · · Score: 2

      Determining whether two boolean circuits are equivalent is a famously difficult problem to solve. In fact, it can be reduced to SAT (http://en.wikipedia.org/wiki/Boolean_satisfiability_problem) which was the very first algorithm that was shown to be NP-complete, which in general means it is impractical to solve on real computers.

      --
      http://notanumber.net/
    2. Re:Stupid question from crypto-newb by NoSig · · Score: 3, Interesting

      Huge SAT problems are routinely solved on computers. In fact the CPU in your computer was probably formally tested in software by solving a huge SAT problem. NP-completeness does not necessarily mean that a problem can't be solved in practice even if it is huge. Complexity theory does provide an approximation to what is tractable, but it isn't all that accurate.

    3. Re:Stupid question from crypto-newb by DriedClexler · · Score: 2

      Sure, but you're not being given an arbitrarily large input size; it's something like 8 input/8 output at most. The traveling salseman problem isn't horrendously intractable if you only have e.g. 10 cities.

      --
      Information theory is life. The rest is just the KL divergence.
    4. Re:Stupid question from crypto-newb by solardiz · · Score: 2

      6-to-4 is large enough that you can't realistically find a perfect solution (the absolute smallest gate count) on present computers and given present knowledge. You can do it for 5-to-1, though. Also, generic Boolean expression minimization tools produce relatively poor results for DES S-boxes; specialized algorithms are the way to go. IIRC, I tried Espresso - http://en.wikipedia.org/wiki/Espresso_heuristic_logic_minimizer - in late 1990s. It couldn't even get close to Matthew Kwan's results from 1998, where he used a specialized algorithm.

  3. Re:i7 what? Who cares? by pla · · Score: 3, Insightful

    Who cares what some CPU can do. The real question is how fast does it run on a modern GPU? Generally speaking encryption cracking and such is way faster on a GPU due this problem being highly parallel in nature.

    TFA has nothing to do with the target platform. If you can improve the algorithm itself by 17%, then it doesn't matter if that means (33/0.83)MB/s on a CPU or (333/0.83)MB/s on a GPU.

    Granted, some optimizations may adapt worse (or better!) to the instructions available on a GPU than they do on a CPU, but the task of password cracking inherently parallelizes almost perfectly, so 17% just means 17%, period.

  4. Re:i7 what? Who cares? by solardiz · · Score: 2

    Actually, a lot of people care about CPUs. I spoke to someone from a penetration testing company the other day. They run a lot of password hash cracking. And they have 10x more CPUs (used for other purposes as well) than GPUs (bought specifically for password cracking). Given that performance of DES-based crypt(3) on GPUs is by far not as impressive as it is for other hash types, they typically test this sort of hashes on CPUs and not GPUs.

    That said, yes, when we worked on the S-boxes, we thought of GPUs as well. One of our target sets of "logic gates" is specifically that of high-end AMD/ATI GPUs (it also works well for Cell, PowerPC/AltiVec, and AMD XOP, but we deliberately excluded gates/instructions that are present on only some of these four platforms). The author of one of the GPU-based cracking tools (for tripcodes) reported a 20% improvement on Radeon HD 5970 due to our new S-boxes. Andrey Belenko of ElcomSoft wrote in a tweet that "Effect for GPUs might be well above 20%, actually."

  5. Re:i7 what? Who cares? by solardiz · · Score: 4, Interesting

    Here are some specific performance numbers for DES-based crypt(3) on GPUs (for comparison, recall that we're reporting over 20M c/s on a CPU):

    oclhashcat-plus is reported to achieve 55M on ATI HD 5970, only 25M on NVidia GTX570 at 1600 MHz core clock, 310M on 8x ATI HD 6970, 181M on 7x NVidia GTX580 (1594 MHz). The numbers for oclhashcat-lite are very similar (57M, 26M, 297M, 181M, respectively). These are off the hashcat website. This does not use our new S-boxes yet (I expect that future versions of *hashcat tools will).

    Notice how the number for high-end NVidia is on par with that for our CPU, and for ATI is less than 3x better. Of course, GPUs do have an advantage, but it still does make sense to use CPUs as well, which a typical organization has more of and doesn't need to spend extra time to deploy, install drivers for, etc.

    Now, our new S-boxes and other optimizations will provide better performance. Per discussions with a tripcode cracker author, I expect all the way up to 400M c/s on ATI HD 5970, which is close to its theoretical peak speed (approx. 80% of it per some estimates). This is a 20x improvement over our figure for the Core i7 CPU, which is significant. (There's a little room for improvement on the CPU as well, though - specifically, if we pre-generate or runtime-patch the code for each salt as opposed to using pointers at runtime like we do now. This kind of optimization is assumed in the 400M figure for the GPU. So with both having the optimization, the GPU's advantage will be less than 20x.)

    Curiously, 400M c/s for 25 iterations of DES will mean that a single ATI HD 5970 with proper code will be able to crack 56-bit DES keys in just 42 days on average.

    So, yes, GPUs have an advantage, and we have contributed to that as well.

  6. Re:i7 what? Who cares? by m.dillon · · Score: 2

    Very cool. Any chance of using Intel's AES-NI instructions on the i5 and i7 to shortcut some of the work for the cpu version? Or are they too specific to AES? AES-NI can run AES (I forget exactly which one) at something like 3 GBytes/sec on an i7.

    (And, of course, programming it directly into a FPGA, but that's a separate topic).

    -Matt

  7. Re:Damn, right before the weekend by layer3switch · · Score: 2

    you mean 1974....

    --
    "Don't let fools fool you. They are the clever ones."
  8. Re:i7 what? Who cares? by solardiz · · Score: 2

    AES-NI is definitely too specific to AES, not reasonably reusable for DES. Yes, we have achieved a speed for DES comparable to that of AES with AES-NI.

    We're actually considering building a password hashing method on top of something like this, where bitslice DES has the advantage of being scalable to arbitrary SIMD vector widths and not requiring specialized instructions for efficient implementation. DES is also FPGA-friendly (more so than AES), and we have a project to implement password hashing for authentication servers equipped with FPGA boards:

    http://www.openwall.com/lists/crypt-dev/2011/04/05/2 - project rationale
    http://www.openwall.com/lists/crypt-dev/2011/05/09/1 - alternative approach

    We're also considering Eksblowfish-like constructions, though - such as to make use of Xilinx Block RAMs (and thus require attackers to use more resources too).

    BTW, not sure if I am speaking to the right Matt, but of the two SHA-crypt flavors the SHA-512 based one actually has a practical advantage over the SHA-256 one: more complete use of 64-bit CPUs in servers. So I think Dragonfly BSD's choice was a mistake. GPU implementations for both are being worked on, and the difference should be seen.

  9. Re:i7 what? Who cares? by solardiz · · Score: 2

    Bitwise operations are not an issue. Besides, we have versions of our S-box expressions that primarily use "bit select" instructions, such as ATI's BFI_INT - these work on PowerPC/AltiVec in JtR 1.7.8, but I think they will see even more use on high-end AMD/ATI GPUs (this is what we primarily had in mind).

    The real issue is register pressure (bitslice DES needs a lot of registers) and memory latencies. In our S-box expressions, we tried to minimize not only gate count, but also the number of registers needed for storage of temporary values in a software implementation. This was among the criteria applied to choose a few best versions among thousands of same-gate-count expressions that we generated. We also cared about the amount of inherent parallelism available in a single instance of the code for each S-box, even though it sort of contradicted the preference to require fewer registers.

  10. Re:i7 what? Who cares? by m.dillon · · Score: 2

    Yah, that's me (DragonFly). You mean for the default password encryption? We switched the password encryption to sha-256 by default as one of the google code-in projects. I wasn't particularly involved but I'm sure you are correct. All I can say is that it was better than what we had before :-)

    I guess SHA is turning out to not be quite as secure for its key-length as RSA was/is.

    My personal position on password files is that it doesn't matter how good the encryption is. Social, physical, and psychological algorithms radically reduce the number of attempts needed to figure out someone's badly thought-out password and things like requiring a number, a capital letter, and a minimum of 8 or 10 chars just isn't good enough. Businesses can't really require ultra-secure passwords without pissing off their user base, so it's a losing proposition no matter how you twist it.

    So we basically do not allow passwords at all on any of our development machines or servers, with exception to console access (a weak point to be sure but there isn't a whole lot I can do about it). The console server itself uses ssh. The master password file is *'d out.

    Developers have to login via ssh or not at all, and are not allowed to use private keys on shared shell machines (they have to key-forward or not fan-out from the shared shell machine at all, which is more preferable). Then a security breech can't fan-out into multiple vectors as it does when an (encrypted) master password file is stolen from a computer and the attacker has an infinite amount of time to crack multiple passwords offline, or if the private key is stolen.

    The biggest problem businesses have when their sites get hacked is not the closing of the security hole, but instead preventing the attacker from re-entering the system via an infinite number of broken password. Having to force millions of users to change their passwords is becoming a non-starter (I think Sony found that out the hard way).

    Random passwords might be more workable but since random passwords cannot be remembered anyway one might as well go with a more secure and much longer public key.

    That's my opinion anyhow :-)

    -Matt

  11. Re:DES is slow and 3DES is slower by solardiz · · Score: 2

    Slow? DES used to be slow prior to bitslicing. The 33 Gbps figure I mention is on par with that for AES using specialized instructions, but without reliance on such instructions. Sure, 3DES is 3x slower. But even for 3DES we get around 10 cycles/byte on one CPU core, which is on par with AES without specialized instructions. That said, data encryption with DES/3DES is in fact not the primary intended use for our results. We realize perfectly well that people want to hear "AES" these days.

    DES is being used for non-encryption a lot. Is authentication truly of no relevance to people that care about having secure encryption?

    Is security auditing or other work on/with existing systems that use DES as a component now not worthwhile? Should we treat them as black boxes? It is not realistic to expect all of them to be gone in a few years from now. So research on DES is still relevant. Granted, smaller S-box circuits don't directly enable an attack better than slightly faster key search, but they may be useful in further research, including in cryptanalysis of DES itself - e.g., bitslice implementations of DES were used for differential cryptanalysis of DES.

    There are side-channel attacks on AES. Sure, they are not always relevant, but so are the DES/3DES concerns you mention. In many cases, side-channel attacks are a practical threat.

    How many fully pipelined AES cores can you fit in an FPGA chip doing password hashing in an authentication server (with lots of parallelism included per one hash computation by our new hashing method)? And how many DES? The difference may be an order of magnitude, in favor of DES. And this means that our password hashes become this much slower to attack by CPUs/GPUs, compared to hypothetical hashes built on top of AES yet implemented in FPGA. (The small key and block sizes of DES may be dealt with by appropriate use of DES, and the slowdown is not a problem at all for this application - it's only efficient use of resources that matters.)

    We actually wanted to build a password hashing method on top of SHA-2 and/or AES - since this is what people want to hear - but it is so tempting to build upon DES and/or Blowfish instead, resulting in much better properties against a number of realistic attack scenarios (offline password cracking on different kinds of hardware) that we're seriously considering these. To make people happy, we might call this most important component "non-crypto", add a PBKDF2 with SHA-256 or SHA-512 step, and show how the cryptographic security of our hashing method as a whole only depends on the latter. Everyone is happy. But DES, if we use it in the "non-crypto" component, plays an important role.

    Summary: for some applications AES is better (perhaps for most of them), but for some DES is a better building block.

    Finally, circuit minimization has uses beyond DES, and similarly sized S-boxes exist in other ciphers. So advances in this area may have uses beyond DES.

  12. Re:Can You Tell It's DES? by owlstead · · Score: 2

    Nowadays, if the block size is 8 bytes, you can be pretty sure it is DES or triple DES. Keeping the algorithm secret has never worked well in cryptography.

    Decrypting single DES is now so easy* that the attacker would probably try it as one of the first things. DES is considered only good enough for real time crypto where offline analysis is not feasible or useful.

    So the short answer is yes.

    *as long as you've got enough to work on