17% Smaller DES S-box Circuits Found

ONLY 17%? by mozumder · 2011-07-01 06:41 · Score: 2

A tight datapath-driven layout on standard S-boxes has the potential to recover more area. just sayin...

Re:ONLY 17%? by Anonymous Coward · 2011-07-01 07:09 · Score: 1

That's true, but don't forget that on standard S-boxes the non-polynomial coefficients are 3A-congruent with the inverse configuration. You can't just generalise this across all sub-expressions, as Lehey proved. So you are right for some specific constraints on standard tightly-bound S-boxes, but that is not applicable in this case. You really have to focus on the general Booleans for any improvement. And 17% isn't too far from the theoretical limit of 22% either. So kudo's for the team!
Re:ONLY 17%? by solardiz · 2011-07-01 07:13 · Score: 5, Insightful

We were not the first to generate and try to optimize Boolean expressions for the S-boxes. Other researchers worked on this before, starting 1997 when Eli Biham wrote his classic paper on bitslice DES. 17% is our improvement compared to those previous results. To me, it is impressive that after 14 years and numerous attempts by others, including successful ones, it was still possible to improve on the previous best results by as much as 17% at once. My gut feeling is that further improvements, while definitely possible, will be more limited. But the again, some people I spoke to had thought that our 17% was not possible.
Re:ONLY 17%? by Anonymous Coward · 2011-07-01 07:27 · Score: 1

My hat is off to you for the longest stream of plausible technobabble I've ever heard.

Damn, right before the weekend by NoNonAlphaCharsHere · 2011-07-01 06:41 · Score: 1

This means the NSA won't have hardware versions of this running until next Tuesday.

Re:Damn, right before the weekend by osu-neko · 2011-07-01 07:22 · Score: 1

Don't be silly. They've had hardware versions of this running since 2006, they just didn't tell anyone. ;)

--
"Convictions are more dangerous enemies of truth than lies."
Re:Damn, right before the weekend by layer3switch · 2011-07-01 08:02 · Score: 2

you mean 1974....

--
"Don't let fools fool you. They are the clever ones."
Re:Damn, right before the weekend by Surt · 2011-07-01 09:45 · Score: 1

And they don't use this, they just use the backdoor.

--
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking

i7 what? Who cares? by Anonymous Coward · 2011-07-01 06:48 · Score: 0

Who cares what some CPU can do. The real question is how fast does it run on a modern GPU? Generally speaking encryption cracking and such is way faster on a GPU due this problem being highly parallel in nature.

Re:i7 what? Who cares? by Jeremiah+Cornelius · 2011-07-01 06:55 · Score: 1

5DES!

Here it comes. :-)

--
"Flyin' in just a sweet place,
Never been known to fail..."

Stupid question from crypto-newb by DriedClexler · 2011-07-01 07:09 · Score: 3, Interesting

Why did it take so long to find a shorter boolean expression? Aren't there programs that take in a truth table and churn through all the expressions that can generate it? And isn't the S-box I/O size pretty small to begin with?

--
Information theory is life. The rest is just the KL divergence.

Re:Stupid question from crypto-newb by TheLink · 2011-07-01 07:24 · Score: 1

Because most of the very smart people are more interested in doing other things?
--
- Too many replies beneath your current threshold
Re:Stupid question from crypto-newb by John+Meacham · 2011-07-01 07:25 · Score: 2

Determining whether two boolean circuits are equivalent is a famously difficult problem to solve. In fact, it can be reduced to SAT (http://en.wikipedia.org/wiki/Boolean_satisfiability_problem) which was the very first algorithm that was shown to be NP-complete, which in general means it is impractical to solve on real computers.

--
http://notanumber.net/
Re:Stupid question from crypto-newb by NoSig · 2011-07-01 07:32 · Score: 3, Interesting

Huge SAT problems are routinely solved on computers. In fact the CPU in your computer was probably formally tested in software by solving a huge SAT problem. NP-completeness does not necessarily mean that a problem can't be solved in practice even if it is huge. Complexity theory does provide an approximation to what is tractable, but it isn't all that accurate.
Re:Stupid question from crypto-newb by DriedClexler · 2011-07-01 07:32 · Score: 2

Sure, but you're not being given an arbitrarily large input size; it's something like 8 input/8 output at most. The traveling salseman problem isn't horrendously intractable if you only have e.g. 10 cities.

--
Information theory is life. The rest is just the KL divergence.
Re:Stupid question from crypto-newb by solardiz · 2011-07-01 08:00 · Score: 2

6-to-4 is large enough that you can't realistically find a perfect solution (the absolute smallest gate count) on present computers and given present knowledge. You can do it for 5-to-1, though. Also, generic Boolean expression minimization tools produce relatively poor results for DES S-boxes; specialized algorithms are the way to go. IIRC, I tried Espresso - http://en.wikipedia.org/wiki/Espresso_heuristic_logic_minimizer - in late 1990s. It couldn't even get close to Matthew Kwan's results from 1998, where he used a specialized algorithm.
Re:Stupid question from crypto-newb by Anonymous Coward · 2011-07-01 08:37 · Score: 0

N is the number of gates needed, not the number of input bits. Relative to the number of bits, the problem is actually double-exponential.
Re:Stupid question from crypto-newb by Anonymous Coward · 2011-07-01 09:05 · Score: 0

Why did it take so long to find a shorter boolean expression?

Because most of the very smart people are more interested in doing other things?
They were missing the right combination of very smart people, or they just didn't put in enough time.

Aren't there programs that take in a truth table and churn through all the expressions that can generate it?
A Karnaugh map is the direct approach. There's no program necessary. However, the result of a karnaugh map doesn't translate into efficient assembly language. It still takes a knowledgeable programmer to take a formula and turn it into optimized code.

Re:i7 what? Who cares? by pla · 2011-07-01 07:22 · Score: 3, Insightful

Who cares what some CPU can do. The real question is how fast does it run on a modern GPU? Generally speaking encryption cracking and such is way faster on a GPU due this problem being highly parallel in nature.

TFA has nothing to do with the target platform. If you can improve the algorithm itself by 17%, then it doesn't matter if that means (33/0.83)MB/s on a CPU or (333/0.83)MB/s on a GPU.

Granted, some optimizations may adapt worse (or better!) to the instructions available on a GPU than they do on a CPU, but the task of password cracking inherently parallelizes almost perfectly, so 17% just means 17%, period.

S-box re-ordering by CO_gun_toter · 2011-07-01 07:25 · Score: 1

What I thought was interesting about the S-box, was the obfuscation they introduced by where they selected the look-up bits from. Just re-order the S-box and use a straight 6 bit index....

Re:i7 what? Who cares? by betterunixthanunix · 2011-07-01 07:25 · Score: 1

3DES can offer up to 112 bits of security; even your GPU could check one key per nanosecond with 10000-way parallelism, you are looking at something to the effect of 16464665330209 years of work to do a brute force key search.

--
Palm trees and 8

Re:i7 what? Who cares? by solardiz · 2011-07-01 07:25 · Score: 2

Actually, a lot of people care about CPUs. I spoke to someone from a penetration testing company the other day. They run a lot of password hash cracking. And they have 10x more CPUs (used for other purposes as well) than GPUs (bought specifically for password cracking). Given that performance of DES-based crypt(3) on GPUs is by far not as impressive as it is for other hash types, they typically test this sort of hashes on CPUs and not GPUs.

That said, yes, when we worked on the S-boxes, we thought of GPUs as well. One of our target sets of "logic gates" is specifically that of high-end AMD/ATI GPUs (it also works well for Cell, PowerPC/AltiVec, and AMD XOP, but we deliberately excluded gates/instructions that are present on only some of these four platforms). The author of one of the GPU-based cracking tools (for tripcodes) reported a 20% improvement on Radeon HD 5970 due to our new S-boxes. Andrey Belenko of ElcomSoft wrote in a tweet that "Effect for GPUs might be well above 20%, actually."

Re:i7 what? Who cares? by SuricouRaven · 2011-07-01 07:33 · Score: 1

I calculate about ten to the fourteenth years worst-case, or half that average. Enough that brute force doesn't look very practical.

Re:i7 what? Who cares? by betterunixthanunix · 2011-07-01 07:43 · Score: 1

Yeah I should have been a bit clearer; that was a worst-case estimate. It is closer to ten to the thirteenth years by my calculations, but that is still far beyond the limits of practicality.

--
Palm trees and 8

Re:i7 what? Who cares? by solardiz · 2011-07-01 07:45 · Score: 4, Interesting

Here are some specific performance numbers for DES-based crypt(3) on GPUs (for comparison, recall that we're reporting over 20M c/s on a CPU):

oclhashcat-plus is reported to achieve 55M on ATI HD 5970, only 25M on NVidia GTX570 at 1600 MHz core clock, 310M on 8x ATI HD 6970, 181M on 7x NVidia GTX580 (1594 MHz). The numbers for oclhashcat-lite are very similar (57M, 26M, 297M, 181M, respectively). These are off the hashcat website. This does not use our new S-boxes yet (I expect that future versions of *hashcat tools will).

Notice how the number for high-end NVidia is on par with that for our CPU, and for ATI is less than 3x better. Of course, GPUs do have an advantage, but it still does make sense to use CPUs as well, which a typical organization has more of and doesn't need to spend extra time to deploy, install drivers for, etc.

Now, our new S-boxes and other optimizations will provide better performance. Per discussions with a tripcode cracker author, I expect all the way up to 400M c/s on ATI HD 5970, which is close to its theoretical peak speed (approx. 80% of it per some estimates). This is a 20x improvement over our figure for the Core i7 CPU, which is significant. (There's a little room for improvement on the CPU as well, though - specifically, if we pre-generate or runtime-patch the code for each salt as opposed to using pointers at runtime like we do now. This kind of optimization is assumed in the 400M figure for the GPU. So with both having the optimization, the GPU's advantage will be less than 20x.)

Curiously, 400M c/s for 25 iterations of DES will mean that a single ATI HD 5970 with proper code will be able to crack 56-bit DES keys in just 42 days on average.

So, yes, GPUs have an advantage, and we have contributed to that as well.

Re:i7 what? Who cares? by m.dillon · 2011-07-01 07:57 · Score: 2

Very cool. Any chance of using Intel's AES-NI instructions on the i5 and i7 to shortcut some of the work for the cpu version? Or are they too specific to AES? AES-NI can run AES (I forget exactly which one) at something like 3 GBytes/sec on an i7.

(And, of course, programming it directly into a FPGA, but that's a separate topic).

-Matt

Re:i7 what? Who cares? by Anonymous Coward · 2011-07-01 08:13 · Score: 0

Bitslicing tends to use lots of and/or/xor operations. GPU's aren't that much awesomer than CPU's in those operations.

Re:i7 what? Who cares? by solardiz · 2011-07-01 08:19 · Score: 2

AES-NI is definitely too specific to AES, not reasonably reusable for DES. Yes, we have achieved a speed for DES comparable to that of AES with AES-NI.

We're actually considering building a password hashing method on top of something like this, where bitslice DES has the advantage of being scalable to arbitrary SIMD vector widths and not requiring specialized instructions for efficient implementation. DES is also FPGA-friendly (more so than AES), and we have a project to implement password hashing for authentication servers equipped with FPGA boards:

http://www.openwall.com/lists/crypt-dev/2011/04/05/2 - project rationale
http://www.openwall.com/lists/crypt-dev/2011/05/09/1 - alternative approach

We're also considering Eksblowfish-like constructions, though - such as to make use of Xilinx Block RAMs (and thus require attackers to use more resources too).

BTW, not sure if I am speaking to the right Matt, but of the two SHA-crypt flavors the SHA-512 based one actually has a practical advantage over the SHA-256 one: more complete use of 64-bit CPUs in servers. So I think Dragonfly BSD's choice was a mistake. GPU implementations for both are being worked on, and the difference should be seen.

Re:i7 what? Who cares? by solardiz · 2011-07-01 08:35 · Score: 2

Bitwise operations are not an issue. Besides, we have versions of our S-box expressions that primarily use "bit select" instructions, such as ATI's BFI_INT - these work on PowerPC/AltiVec in JtR 1.7.8, but I think they will see even more use on high-end AMD/ATI GPUs (this is what we primarily had in mind).

The real issue is register pressure (bitslice DES needs a lot of registers) and memory latencies. In our S-box expressions, we tried to minimize not only gate count, but also the number of registers needed for storage of temporary values in a software implementation. This was among the criteria applied to choose a few best versions among thousands of same-gate-count expressions that we generated. We also cared about the amount of inherent parallelism available in a single instance of the code for each S-box, even though it sort of contradicted the preference to require fewer registers.

DES is slow and 3DES is slower by gweihir · 2011-07-01 08:58 · Score: 1

Both are no competition for AES at all. Also, 3DES is still of dubious security and DES can be brute-forced at home today. So this is really non-news, as it does not hold any relevance for people that care about having secure encryption.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:DES is slow and 3DES is slower by Anonymous Coward · 2011-07-01 09:12 · Score: 0

You may not use DES directly, but everytime you login to a Windows machine or use NTLM, the "LM" part is DES-based and benefits from this improvement. Besides, the goal here is /crack/ weak passwords through bruteforce, where speed is a huge concern.
Re:DES is slow and 3DES is slower by solardiz · 2011-07-01 09:44 · Score: 2

Slow? DES used to be slow prior to bitslicing. The 33 Gbps figure I mention is on par with that for AES using specialized instructions, but without reliance on such instructions. Sure, 3DES is 3x slower. But even for 3DES we get around 10 cycles/byte on one CPU core, which is on par with AES without specialized instructions. That said, data encryption with DES/3DES is in fact not the primary intended use for our results. We realize perfectly well that people want to hear "AES" these days.
DES is being used for non-encryption a lot. Is authentication truly of no relevance to people that care about having secure encryption?
Is security auditing or other work on/with existing systems that use DES as a component now not worthwhile? Should we treat them as black boxes? It is not realistic to expect all of them to be gone in a few years from now. So research on DES is still relevant. Granted, smaller S-box circuits don't directly enable an attack better than slightly faster key search, but they may be useful in further research, including in cryptanalysis of DES itself - e.g., bitslice implementations of DES were used for differential cryptanalysis of DES.
There are side-channel attacks on AES. Sure, they are not always relevant, but so are the DES/3DES concerns you mention. In many cases, side-channel attacks are a practical threat.
How many fully pipelined AES cores can you fit in an FPGA chip doing password hashing in an authentication server (with lots of parallelism included per one hash computation by our new hashing method)? And how many DES? The difference may be an order of magnitude, in favor of DES. And this means that our password hashes become this much slower to attack by CPUs/GPUs, compared to hypothetical hashes built on top of AES yet implemented in FPGA. (The small key and block sizes of DES may be dealt with by appropriate use of DES, and the slowdown is not a problem at all for this application - it's only efficient use of resources that matters.)
We actually wanted to build a password hashing method on top of SHA-2 and/or AES - since this is what people want to hear - but it is so tempting to build upon DES and/or Blowfish instead, resulting in much better properties against a number of realistic attack scenarios (offline password cracking on different kinds of hardware) that we're seriously considering these. To make people happy, we might call this most important component "non-crypto", add a PBKDF2 with SHA-256 or SHA-512 step, and show how the cryptographic security of our hashing method as a whole only depends on the latter. Everyone is happy. But DES, if we use it in the "non-crypto" component, plays an important role.
Summary: for some applications AES is better (perhaps for most of them), but for some DES is a better building block.
Finally, circuit minimization has uses beyond DES, and similarly sized S-boxes exist in other ciphers. So advances in this area may have uses beyond DES.
Re:DES is slow and 3DES is slower by owlstead · 2011-07-01 13:32 · Score: 1

AES is a pretty good algorithm, but I don't think it is good enough precisely because of the side channel weaknesses. I would imagine that another round of competition within the crypto community is necessary right after the SHA-3 competition. Currently, 3DES is probably a better choice for many functions than AES because of the possible side channel attacks.
Regarding the hashing, you are probably better off choosing Skein. It comes with it's own cipher (Threefish), which would also allow some more interesting modes than CBC (which is e.g. vulnerable to padding attacks). Choosing 3DES because it is better in some circumstances than AES still seems to be a step in the wrong direction, there are better candidates.
When I started off in the practical use of crypto, I was thinking that the larger cryptographic community was in front of crypto-analysis. 10 years on, I'm not so sure. It's not like the algorithms are that bad, it seems to me that there are so many ways to make a protocol vulnerable that the actual security has very very little to do with key size (or block size) for that matter.
My company is still using 3DES quite a lot, that's for certain.
Re:DES is slow and 3DES is slower by gweihir · 2011-07-03 22:05 · Score: 1

They still use DES???? Urgh. Ok, then this is indeed of interest.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:DES is slow and 3DES is slower by gweihir · 2011-07-03 22:11 · Score: 1

I do understand your arguments. The primary problem I have with it is that as it can be brute-forced, you need to take a lot more care when using it as building block. True, if you really understand what you are doing, it can be used securely in a number of ways. Using it as part of a hashing-scheme is valid. However, DES is entirely unsuitable for its primary purpose, namely encryption, today. And when doing hashing, it competes against a number of hash functions as well. The article was lacking these limitations. On the other hand, every real expert will see that and not be turned away from looking at this work by my comment ;-)

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:DES is slow and 3DES is slower by cool_arrow · 2011-07-13 07:49 · Score: 1

Question: given that you can't really prove the security of any encryption algorithm, why isn't enciphering data with multiple algorithms and independent keys more common? Is it too difficult to do or perhaps too slow and impractical?
Re:DES is slow and 3DES is slower by owlstead · 2011-07-13 12:30 · Score: 1

It's slow, impractical (we're having enough problems with protocols) and may not offer the same security as simply adding more rounds or complexity to existing algorithms. It's likely to take more memory (think embedded or smart card) as well. It may not help at all against many side channel attacks. And as I said, most of the time it's not the algorithm that's the problem. It's the system that it is deployed in that's vulnerable, not the algorithm itself.
Think XML encryption. Very nicely spec'ed, but try and use it online without cryptographically safe integrity checks and you may end up with a side channel attacks that takes 128 tries on average per byte (random oracle attack), regardless of the algorithm. I won't go into detail on how many systems deploy 2048 bit RSA keys, but are not kept up to date, leaving it vulnerable to any hacker or script kiddie that comes along. That's just from the top of my head, the list is endless.

Re:i7 what? Who cares? by m.dillon · 2011-07-01 09:04 · Score: 2

Yah, that's me (DragonFly). You mean for the default password encryption? We switched the password encryption to sha-256 by default as one of the google code-in projects. I wasn't particularly involved but I'm sure you are correct. All I can say is that it was better than what we had before :-)

I guess SHA is turning out to not be quite as secure for its key-length as RSA was/is.

My personal position on password files is that it doesn't matter how good the encryption is. Social, physical, and psychological algorithms radically reduce the number of attempts needed to figure out someone's badly thought-out password and things like requiring a number, a capital letter, and a minimum of 8 or 10 chars just isn't good enough. Businesses can't really require ultra-secure passwords without pissing off their user base, so it's a losing proposition no matter how you twist it.

So we basically do not allow passwords at all on any of our development machines or servers, with exception to console access (a weak point to be sure but there isn't a whole lot I can do about it). The console server itself uses ssh. The master password file is *'d out.

Developers have to login via ssh or not at all, and are not allowed to use private keys on shared shell machines (they have to key-forward or not fan-out from the shared shell machine at all, which is more preferable). Then a security breech can't fan-out into multiple vectors as it does when an (encrypted) master password file is stolen from a computer and the attacker has an infinite amount of time to crack multiple passwords offline, or if the private key is stolen.

The biggest problem businesses have when their sites get hacked is not the closing of the security hole, but instead preventing the attacker from re-entering the system via an infinite number of broken password. Having to force millions of users to change their passwords is becoming a non-starter (I think Sony found that out the hard way).

Random passwords might be more workable but since random passwords cannot be remembered anyway one might as well go with a more secure and much longer public key.

That's my opinion anyhow :-)

-Matt

NTLM hasn't been in active use for a while by YesIAmAScript · 2011-07-01 10:05 · Score: 1

NTMLv1 was the problem and it hasn't been used unless talking to very old servers in a very long time. You could always turn off NTLM password storage (since XP at least) and in Vista and Windows 7 it is off by default.

NTMLv2 uses a challenge response system and so you can't offline crack it in the same way.

--
http://lkml.org/lkml/2005/8/20/95

Re:NTLM hasn't been in active use for a while by solardiz · 2011-07-01 10:21 · Score: 1

NTMLv2 uses a challenge response system and so you can't offline crack it in the same way.
John the Ripper, in the -jumbo versions (community-enhanced), includes support for cracking of both NTLMv1 and NTLMv2 challenge/responses - see NETNTLM_README in the documentation and NETNTLM_fmt.c and NETNTLMv2_fmt.c in the source tree.
Re:NTLM hasn't been in active use for a while by Anonymous Coward · 2011-07-01 14:43 · Score: 0

LANMAN password storage is possible to disable, but it IS enabled by default in newer operating systems. Dump the hashes yourself and verify.

Re:i7 what? Who cares? by solardiz · 2011-07-01 10:10 · Score: 1

Matt -

I have so many comments on what you wrote that I don't dare to post them. :-) But I'll say a few things:

Password policies still make sense to me when combined with modern (salted and stretched) password hashes, particularly for large user databases where each account is of relatively little value (your Sony example applies here). Rather than absolutely require certain character classes, users should also be given the option to use longer passphrases, where the number of required character classes can be reduced to 2. I think you have our passwdqc in DragonFly (via FreeBSD), right? Well, it includes passphrase support by default, starting with 3 words of combined length 11, including separator characters - or longer, indeed.

Thank you for describing your authentication methods policy for developers. We use a similar policy for multi-developer or multi-sysadmin projects: http://openwall.info/wiki/internal/ssh

- Alexander

Can You Tell It's DES? by Doc+Ruby · 2011-07-01 13:07 · Score: 1

If I encrypt a string with DES, and you intercept it but I didn't tell you it's DES, can you tell by inspecting the ciphertext that it's DES (and then use DES cracking tools to crack it)?

--

--
make install -not war

Re:Can You Tell It's DES? by owlstead · 2011-07-01 13:38 · Score: 2

Nowadays, if the block size is 8 bytes, you can be pretty sure it is DES or triple DES. Keeping the algorithm secret has never worked well in cryptography.
Decrypting single DES is now so easy* that the attacker would probably try it as one of the first things. DES is considered only good enough for real time crypto where offline analysis is not feasible or useful.
So the short answer is yes.
*as long as you've got enough to work on
Re:Can You Tell It's DES? by Doc+Ruby · 2011-07-01 18:39 · Score: 1

So if I encrypt with DES, then run a trivial extra transform (eg. ROT12(destext)), it won't be exactly DES and just running undes(ciphertext) won't decrypt it.
My point isn't to be annoying. It's that there could be value to DES between parties that know it's "DES + ROT12", or whatever trivial function is added, despite DES itself being without real protection value. Especially since DES is now so fast to run.

--
--
make install -not war

Slashdot Mirror

17% Smaller DES S-box Circuits Found

45 comments