solardiz · Slashdot Mirror

Re:Useless + new attack vector on LKRG: A Loadable Linux Kernel Module for Runtime Integrity Checking (bleepingcomputer.com) · 2018-02-05 04:04 · Score: 1

20 years ago I used to build Linux 2.0.x without module support, and this sort of made sense. It still could make a little bit of sense, but the reality is that current kernels are huge (meaning an increase in vulnerability count, too) and most systems are running distro-provided kernels built with module support anyway.

If you want to protect the kernel from root, you need something other than current LKRG "main" branch. Something like Adam's in my opinion even more controversial LKRG "experimental" branch (in the same Bitbucket repo, feel free to explore), which implements what I call (in e-mails with Adam) BSD securelevel on steroids (oh and yes, I used securelevel 20 years ago too; gave up since), and which I find suitable only for a minority of users (sysadmins) who are able to configure that thing reasonably. Of course, it deals with further module (un)loading, and other "legitimate" ways that root could backdoor the kernel.

LKRG "main" as currently announced is for easier reasonable use on typical systems. It doesn't protect the kernel from root (authorized root, or root obtained via e.g. userspace exploits or, unfortunately, via a subset of kernel exploits that will bypass LKRG), but it detects some kernel vulnerability exploits (no, not only for known vulnerabilities - that's an error in the BleepingComputer & Slashdot story) and protects the kernel from those (as well as from other unauthorized changes, such as Rowhammer bitflips).

Your reference to April fools is spot on. This is our most controversial project ever (as the very first sentence of our announcement says), and on January 29 when we made the announcement I happened to say in a chat with Adam: "i wish it were closer to April 1, but that would have been too long a wait ;-)"

We're not delusional, and we try to do our best not to mislead the prospective users of LKRG (see also my other comments here, and the original announcement).

Re:good & bad? on LKRG: A Loadable Linux Kernel Module for Runtime Integrity Checking (bleepingcomputer.com) · 2018-02-05 03:41 · Score: 1

Interesting project, however now i wonder how many people will opt for using this module (which could be easily activated & used) instead of properly patching their systems.

Yes, we share this concern. What matters even more: will fewer or more systems get compromised as a result? Or even: will the cumulative damage of those compromises decrease or increase? We have no answers to these, yet we feel that an imperfect security measure like this may have its reasonable uses on some systems.

The module only detects known vulnerabilities, if you are running a tight ship, you should be all patched and what use does this have then?

"Only detects known vulnerabilities" is an error in the BleepingComputer article, but regardless - yes, ideally you should be all patched, but realistically you might not be and there might be yet unknown vulnerabilities where LKRG, as long as it's relatively unpopular, will likely just happen to defeat the usual/straightforward exploits for them. That said, we do in fact only recommend use of LKRG for systems where you expect not to be up-to-date with all patches anyway, and ask that admins of better maintained systems think twice before deciding on possibly using LKRG.

Re: Known vulnerabilities on LKRG: A Loadable Linux Kernel Module for Runtime Integrity Checking (bleepingcomputer.com) · 2018-02-05 03:28 · Score: 3, Insightful

Actually, it's one of several errors in BleepingComputer's rewording of our original announcement. I am grateful to them for reporting on our work and I understand that journalists have to reword for original content and copyright reasons, but this inevitably leads to errors, and we'd be happier with people reading our original announcement.

No, LKRG is not just for known vulnerabilities. It is both for currently known and for future vulnerabilities that are yet unknown, but it's limited in the vulnerability categories and exploitation/persistence methods that it will catch.

In the original announcement, we acknowledge that LKRG is highly controversial, can be bypassed, is limited in what it can do, and isn't always a good idea to use. We say that it provides merely the controversial notion of security through diversity (as long as LKRG, or a given branch of it, is not very popular), much like running an uncommon OS kernel would, but without the usual drawbacks of actually running an uncommon OS.

Indeed, that's not perfect security, unlike fixing all security vulnerabilities would be - but realistically the Linux kernel is monolithic and so huge (and growing) and distros enable so much of its functionality by default (including with module auto-loading, ouch) that in practice it will always have plenty of vulnerabilities anyway, and a clutch like LKRG may fit some Linux installs just right, unfortunately. Not make them "secure" - just reduce the percentage of successful compromises in the real world.

We try to give some guidelines on where LKRG may be beneficial (on systems that are not well-maintained or not promptly rebooted into new kernels anyway) and where it's probably not (on otherwise hardened and/or well-maintained and promptly updated/rebooted/live-patched systems). We're not delusional, and we try to do our best not to mislead the prospective users of LKRG.

More relevant links on Researchers Reverse-Engineer Dropbox, Cracking Heavily Obfuscated Python App · 2013-08-28 01:27 · Score: 1

Presentation slides (view online or download PDF), and links to the paper (PDF) and "dedrop" source code (GitHub):
http://www.openwall.com/presentations/WOOT13-Security-Analysis-of-Dropbox/

USENIX WOOT '13 web page dedicated to this talk, including video and audio (view/listen online or download the video .mp4 via a direct link from there):
https://www.usenix.org/looking-inside-drop-box

(Somehow the Slashdot story only links to a third-party article and to the paper PDF, but not to any of the authors' and the conference's web-based content.)

Re:why not adapt on John the Ripper Cracks Slow Hashes On GPU · 2012-07-04 14:12 · Score: 4, Interesting

The existing password hashing methods won't run on GPU well for user authentication, even when they do run well for cracking passwords. They lack sufficient parallelism within one hash computation. This is an issue I first raised in 1998, in pre-GPU context (it applies to recent CPUs as well, and the problem is getting worse with time).

A solution is to define a new password hashing method with sufficient (configurable) parallelism within one instance. We could then consider running it on GPU, unless it is GPU-unfriendly by other criteria. Do we really want to, though? GPUs in servers are not yet common, except in computing clusters. Their reliability may be lower than that of other typical server components. The drivers are currently relatively unreliable as well (although they may be reliable enough if running the same code, with no upgrades). Sure, computing clusters use them anyway, and get them to run reliably enough for their needs, but the extra hurdle and/or risk is there. Will we get embedded GPUs in typical servers soon? Will they be similar to current gamers' or HPC GPUs or not? This is not clear. Then there's Intel MIC, which delivers GPU-like performance, but is a lot closer to a CPU - it will require a lot of parallelism in the algorithm too, but it may run certain types of otherwise GPU-unfriendly code. Is this possibly a better target?

For current GPUs, a better strategy might be to make them inefficient - by using GPU-unfriendly hashes (for cracking, and for validation as well - as a side-effect).

We had a project last summer to research this kind of possibilities, focusing on use of FPGA boards in authentication servers. This could optionally buy us GPU-unfriendliness (if we want to make things more difficult for attackers with GPUs, but not FPGAs, and for botnets, which almost surely will lack FPGAs). We even considered some moderate CPU-unfriendliness of the component that we'd put on FPGA. Specifically, we experimented with bcrypt on FPGA, as well as with much smaller Blowfish-like "non-crypto" cores (not actual Blowfish), so that we could hopefully fit hundreds or thousands of those per chip (and have them somewhat CPU-unfriendly as well). Yuri, our GSoC 2011 student working on this project, did have some of this implemented in an experimental fashion, and some of it even worked (on FPGA boards kindly provided by Pico Computing), but an outcome of the summer project was that this would be time-consuming to bring to desired levels of performance and reliability. At that point, the project was put on hold.

A simpler and cheaper alternative (if there are only a handful of customers for this) may be to use dedicated servers, existing HSMs, or microcontrollers for just the password hashing. Indeed, microcontrollers are super slow, so their only function would be to hold and apply a local parameter, with the rest of the hashing method implemented on the host's CPU and RAM. If dedicated servers are used, they would need to be separate from authentication servers - that is, they won't know usernames, won't have access to any database, won't have any persistent storage except for the local parameter, and the OS and software indeed. They will accept password, salt, and parameters (such as the configurable per-hash processing and memory cost settings), and provide the hash. Thus, their attack surface would be minimal and they'd provide an extra layer of security against network-based attacks. We'd do this with FPGA boards as well, and we'd also have the greater/unusual computational complexity as a security layer (in case the local parameter or its backup copy is leaked/stolen), but well - using typical and pre-existing server hardware, drivers, etc. is just simpler and cheaper unless we start a new business and expect to have plenty of customers (although that might be possible).

Re:useless for strong passwords on John the Ripper Cracks Slow Hashes On GPU · 2012-07-04 13:21 · Score: 1

Yes, you don't need to reference a dictionary when you approve something as a passphrase. passwdqc does not reference a dictionary for that. However, you mentioned requiring "a couple of punctuation chars" - and this almost ensures at least 3 words (well, it could also be e.g. a single word mangled by inserting/replacing characters, which is why I said "almost"). passwdqc has a similar requirement, although it does not insist on punctuation specifically. For non-repeats, passwdqc insists on there being enough different characters for the required minimum length (not for the actual length - there's no problem with some repetition if the minimum would be reached without those portions anyway).

Re:useless for strong passwords on John the Ripper Cracks Slow Hashes On GPU · 2012-07-04 13:03 · Score: 2

Sure, but fully going from a passphrase to a phrase-derived short password likely reduces guessing entropy (not the same thing as Shannon entropy, by the way; the latter is even more obviously reduced, but is less relevant). In fact, I think some of the passwords that JtR cracks in its incremental mode (which considers character frequencies) are actually built using the first-letter-of-each-word method. Indeed, many of those passwords will happen to use a subset of possible characters only - those that are more common as first letters of words. In your example, this is defeated by the use of non-letters and capital letters (which may be less frequent than even the least frequent lowercase letters at these character positions), but still. Overall, I think c2RlfV&t would be a decent password for many kinds of uses (if you did not post it, indeed), but it would not provide equivalent security to that of the original passphrase (if that one were not posted as well).

Re:useless for strong passwords on John the Ripper Cracks Slow Hashes On GPU · 2012-07-04 12:21 · Score: 3, Informative

On a serious note, entropy grows with length less than linearly, and you've provided a good example of that. This means that there's little point in using a passphrase this long. A replacement for yours could be: "cannon to R,L,F of them Volley'd and thunder'd" - perhaps about as easy (or as difficult) to memorize and recall reliably, likely roughly the same guessing entropy, but much shorter to type.

Re:useless for strong passwords on John the Ripper Cracks Slow Hashes On GPU · 2012-07-04 12:01 · Score: 4, Informative

I've been using passphrases for about 12 years (and more than that if we count those passphrases on PGP and SSH keys as well), and I'm not growing out of it yet. I often use mixed-character-type passwords as well, and my phrases often use weird word separators, misspelled and/or partial words (less typing, same or better security if you do it right), different languages, etc. The number of words also varies (but with too few words other bits of complexity have to be introduced). For me, what is easier or harder to memorize varies depending on what kind of suitable idea I happen to have at a given time. Besides, the variety in password/phrase types buys me a few extra bits of entropy. Even an attacker who has read this comment or cracked a few of my passwords somewhere doesn't come up with one single pattern on password type that I use - because there are many. Thus, let your users choose between short but complicated passwords and longer but less complicated phrases. Similarly, let them choose between server-generated strings and user-chosen ones (the latter may be subject to policy enforcement). Our passwdqc tool set (PAM module, library, program for use from scripts) gives all of these options by default (but they can be disabled in any combination...) For server-generated strings, passwdqc uses 3-word phrase-like ones, with non-whitespace separators (out of a set of 8) and random word capitalization by default - that's 47 bits, which is currently sufficient in most user authentication contexts when used along with bcrypt hashes. With 4 words and the same approach, it's 64 bits ("pwqgen random=64" will do that) - but that is rarely needed with a decent password hash. (It is reasonable for data encryption keys, though - plus some 20 bits of stretching with a decent KDF.)

Re:Secure Remote Password verifier on John the Ripper Cracks Slow Hashes On GPU · 2012-07-04 11:32 · Score: 0

This comment about SRP is duplicate. I've replied to the other instance of it in detail, up in the thread. In short: SRP is great, but, no, it is not an alternative to better password hashing.

Re:PBKDF2 on John the Ripper Cracks Slow Hashes On GPU · 2012-07-04 11:25 · Score: 5, Informative

SRP is great, but it does not eliminate the need for better password hashing - rather, these things may/should be used together. It does not take breaking DH to merely probe candidate passwords against a stolen/leaked SRP verifier. The Wikipedia article you referenced says that "using of functions like PBKDF2 instead of H for password hashing is highly recommended", and they were referring to the password stretching aspect. Other properties of the hashing method are also relevant, just like they are to "regular" password hashes.

In fact, I complained to Tom Wu about SRP's use of non-iterated SHA-1 in 2000, and I had an e-mail exchange on a similar topic in SPEKE context with David Jablon in 1998 or so. Since then (or at about that time), the need for heavy to compute underlying hashes even along with zero-knowledge password proofs became widely recognized. I am not really into the latter topic, but I did my little bit to influence that field in that minor aspect (and I'm sure many others did as well).

Re:Inferior products always hit the news. on John the Ripper Cracks Slow Hashes On GPU · 2012-07-04 10:46 · Score: 3, Interesting

I fully expected a comment just like yours. :-) hashcat is in fact superior in many ways, but JtR is superior in many others. In the context of this story, since when does hashcat support sha512crypt and bcrypt on GPU? Last time I checked (just before releasing JtR 1.7.9-jumbo-6), it did not. I've just re-checked - as far as I can see, it still does not. So hashcat could not possibly be used for the comparison that this story is about, at this time.

My guess, based on recent hashcat user polls and atom's comments on the forums (yes, I sometimes skim over the topics), is that atom will in fact add support for sha512crypt on GPU soon (especially now that JtR has it, and hashcat "got to" compete and show a better speed, which it likely will) - in fact, even reusing our code is possible since we've BSD-licensed that portion, but I doubt that atom would do that. I am less certain about bcrypt. BTW, atom's expectation, stated on their forums, was that sha512crypt would be only 2-3 times faster on GPU than it is on CPU. We achieved 5.5x, which is thus not bad. Admittedly, the CPU code could be rewritten to use SIMD and be roughly twice faster - thereby bringing us to the 2-3x expectation.

Also, some of us prefer Open Source, even if in some aspects a given implementation is inferior at a given time. Besides the current preferences/beliefs, guess what happens in case at some point atom loses interest in further hashcat development and does not release the sources under an Open Source license - or if something bad happens (I hope not!) preventing him from being able to do that? So far, hashcat is only ~2.5 years old and it is proprietary. (And yes, I am very impressed by what atom did in just 2 years.) John the Ripper has been around since 1996 and it is Open Source. BTW, this difference also means that hashcat can freely borrow low-level implementation ideas from us if atom wanted to (although I think he's good enough on his own not to use this option), whereas hashcat's EULA (as of the last time I checked, which was a long while ago) prevents us from doing the same even via reverse-engineering if we wanted to (although apparently this is not enforceable in many jurisdictions or in case the person never accepted the EULA; no, we don't rely on that and we don't RE hashcat).

Anyhow, I don't think there would be any issue in having a hashcat-focused news story if you or someone else posts one at a right time. :-)

Re:useless for strong passwords on John the Ripper Cracks Slow Hashes On GPU · 2012-07-04 10:01 · Score: 3, Interesting

I am all for passphrases. We've been supporting them in our passwdqc password/passphrase strength checking and policy enforcement tool (initially just a PAM module, then more) since I wrote it in 2000.

Implementation detail: when enforcing passphrase policy, we need to insist on some separators between words being present. passwdqc does, in order for the string to quality as a passphrase rather than password. Apparently, Dropbox does not, and I think that's a flaw. No wordlist can be comprehensive, and a separator-less passphrase is indistinguishable to a password/passphrase strength checker from a long and somewhat obscure dictionary word. Indeed, any passphrase (or a multi-word portion of it) can happen to be found in a dictionary (or on the web, etc.) as well - or just be reused by the user across multiple sites - but that's a somewhat different issue.

Re:PBKDF2 on John the Ripper Cracks Slow Hashes On GPU · 2012-07-04 09:45 · Score: 2

The design of PBKDF2, and the NIST publication you referenced, do not consider the difference in processing cost to defender vs. attacker, whereas that is precisely the aspect I've been focusing on in my analysis. PBKDF2 does nothing to bring the validation vs. cracking speed ratio close to 1.0.

Re:useless for strong passwords on John the Ripper Cracks Slow Hashes On GPU · 2012-07-04 09:26 · Score: 5, Interesting

The fact that not every password is likely to be cracked is precisely what makes password security audits with John the Ripper useful. If every password would be getting cracked, there would be fewer legitimate uses for the tool. ;-)

Memorizing one 16-digit mixed-case alphanumeric password is realistic, but it does not help you all that much unless it's a "master password" (e.g., used to access an encrypted password manager database or to generate other passwords from or to access an encrypted filesystem where you store other passwords in plaintext), because you'd have difficulty memorizing a large number of unique and dissimilar passwords of this kind. Either way, if you're developing a server application or administering a server where users can register with passwords (maybe as one of the authentication options, not necessarily the only one), it becomes sort of your responsibility to make your users' passwords less likely to be cracked, even if the server security is temporarily compromised (you should assume that this might happen). Note that many of your users' passwords might be weaker than you would have liked them to be, and you don't want to enforce too strict a password policy (as that's a tradeoff). This is where the choice of hashing method to use matters, letting you use a less strict password policy for the same level of security or/and resulting in fewer passwords getting cracked (even with no enforced policy, since some people will choose medium complexity passwords on their own).

Re:PBKDF2 on John the Ripper Cracks Slow Hashes On GPU · 2012-07-04 09:07 · Score: 5, Interesting

You make a valid point. I do intend to add a mention of PBKDF2 to a revised version of my presentation, and I am likely to use it or at least HMAC as a component if I design a new password hashing method - not so much because of actual need, but mostly to have an easy and convincing answer about cryptographic security. ;-) However, in the context of this announcement PBKDF2 is arguably less relevant, and it is inferior to the alternatives being considered specifically in the GPU-friendliness aspect (it is more GPU-friendly than all three of SHA-crypt, bcrypt, scrypt). In scrypt, PBKDF2 is used (with SHA-256) to provide/demonstrate cryptographic security, but mostly not computational cost, whereas the analysis here is about the latter, under assumption that all of the alternatives being seriously considered are sufficiently secure cryptographically.

This release of John the Ripper supports PBKDF2 on GPU as well - in the included WPA-PSK cracking code. The release announcement shows a 27x speedup over the also-included CPU code when going from FX-8120 CPU (8 threads) to HD 7970 GPU for WPA-PSK cracking (PBKDF2-HMAC-SHA-1), which clearly shows that it is very GPU-friendly. With SHA-512, it'd be a lot less GPU-friendly, but likely not even to the point of sha512crypt.

Re:NTLM hasn't been in active use for a while on 17% Smaller DES S-box Circuits Found · 2011-07-01 10:21 · Score: 1

NTMLv2 uses a challenge response system and so you can't offline crack it in the same way.

John the Ripper, in the -jumbo versions (community-enhanced), includes support for cracking of both NTLMv1 and NTLMv2 challenge/responses - see NETNTLM_README in the documentation and NETNTLM_fmt.c and NETNTLMv2_fmt.c in the source tree.

Re:i7 what? Who cares? on 17% Smaller DES S-box Circuits Found · 2011-07-01 10:10 · Score: 1

Matt -

I have so many comments on what you wrote that I don't dare to post them. :-) But I'll say a few things:

Password policies still make sense to me when combined with modern (salted and stretched) password hashes, particularly for large user databases where each account is of relatively little value (your Sony example applies here). Rather than absolutely require certain character classes, users should also be given the option to use longer passphrases, where the number of required character classes can be reduced to 2. I think you have our passwdqc in DragonFly (via FreeBSD), right? Well, it includes passphrase support by default, starting with 3 words of combined length 11, including separator characters - or longer, indeed.

Thank you for describing your authentication methods policy for developers. We use a similar policy for multi-developer or multi-sysadmin projects: http://openwall.info/wiki/internal/ssh

- Alexander

Re:DES is slow and 3DES is slower on 17% Smaller DES S-box Circuits Found · 2011-07-01 09:44 · Score: 2

Slow? DES used to be slow prior to bitslicing. The 33 Gbps figure I mention is on par with that for AES using specialized instructions, but without reliance on such instructions. Sure, 3DES is 3x slower. But even for 3DES we get around 10 cycles/byte on one CPU core, which is on par with AES without specialized instructions. That said, data encryption with DES/3DES is in fact not the primary intended use for our results. We realize perfectly well that people want to hear "AES" these days.

DES is being used for non-encryption a lot. Is authentication truly of no relevance to people that care about having secure encryption?

Is security auditing or other work on/with existing systems that use DES as a component now not worthwhile? Should we treat them as black boxes? It is not realistic to expect all of them to be gone in a few years from now. So research on DES is still relevant. Granted, smaller S-box circuits don't directly enable an attack better than slightly faster key search, but they may be useful in further research, including in cryptanalysis of DES itself - e.g., bitslice implementations of DES were used for differential cryptanalysis of DES.

There are side-channel attacks on AES. Sure, they are not always relevant, but so are the DES/3DES concerns you mention. In many cases, side-channel attacks are a practical threat.

How many fully pipelined AES cores can you fit in an FPGA chip doing password hashing in an authentication server (with lots of parallelism included per one hash computation by our new hashing method)? And how many DES? The difference may be an order of magnitude, in favor of DES. And this means that our password hashes become this much slower to attack by CPUs/GPUs, compared to hypothetical hashes built on top of AES yet implemented in FPGA. (The small key and block sizes of DES may be dealt with by appropriate use of DES, and the slowdown is not a problem at all for this application - it's only efficient use of resources that matters.)

We actually wanted to build a password hashing method on top of SHA-2 and/or AES - since this is what people want to hear - but it is so tempting to build upon DES and/or Blowfish instead, resulting in much better properties against a number of realistic attack scenarios (offline password cracking on different kinds of hardware) that we're seriously considering these. To make people happy, we might call this most important component "non-crypto", add a PBKDF2 with SHA-256 or SHA-512 step, and show how the cryptographic security of our hashing method as a whole only depends on the latter. Everyone is happy. But DES, if we use it in the "non-crypto" component, plays an important role.

Summary: for some applications AES is better (perhaps for most of them), but for some DES is a better building block.

Finally, circuit minimization has uses beyond DES, and similarly sized S-boxes exist in other ciphers. So advances in this area may have uses beyond DES.

Re:i7 what? Who cares? on 17% Smaller DES S-box Circuits Found · 2011-07-01 08:35 · Score: 2

Bitwise operations are not an issue. Besides, we have versions of our S-box expressions that primarily use "bit select" instructions, such as ATI's BFI_INT - these work on PowerPC/AltiVec in JtR 1.7.8, but I think they will see even more use on high-end AMD/ATI GPUs (this is what we primarily had in mind).

The real issue is register pressure (bitslice DES needs a lot of registers) and memory latencies. In our S-box expressions, we tried to minimize not only gate count, but also the number of registers needed for storage of temporary values in a software implementation. This was among the criteria applied to choose a few best versions among thousands of same-gate-count expressions that we generated. We also cared about the amount of inherent parallelism available in a single instance of the code for each S-box, even though it sort of contradicted the preference to require fewer registers.

Re:i7 what? Who cares? on 17% Smaller DES S-box Circuits Found · 2011-07-01 08:19 · Score: 2

AES-NI is definitely too specific to AES, not reasonably reusable for DES. Yes, we have achieved a speed for DES comparable to that of AES with AES-NI.

We're actually considering building a password hashing method on top of something like this, where bitslice DES has the advantage of being scalable to arbitrary SIMD vector widths and not requiring specialized instructions for efficient implementation. DES is also FPGA-friendly (more so than AES), and we have a project to implement password hashing for authentication servers equipped with FPGA boards:

http://www.openwall.com/lists/crypt-dev/2011/04/05/2 - project rationale
http://www.openwall.com/lists/crypt-dev/2011/05/09/1 - alternative approach

We're also considering Eksblowfish-like constructions, though - such as to make use of Xilinx Block RAMs (and thus require attackers to use more resources too).

BTW, not sure if I am speaking to the right Matt, but of the two SHA-crypt flavors the SHA-512 based one actually has a practical advantage over the SHA-256 one: more complete use of 64-bit CPUs in servers. So I think Dragonfly BSD's choice was a mistake. GPU implementations for both are being worked on, and the difference should be seen.

Re:Stupid question from crypto-newb on 17% Smaller DES S-box Circuits Found · 2011-07-01 08:00 · Score: 2

6-to-4 is large enough that you can't realistically find a perfect solution (the absolute smallest gate count) on present computers and given present knowledge. You can do it for 5-to-1, though. Also, generic Boolean expression minimization tools produce relatively poor results for DES S-boxes; specialized algorithms are the way to go. IIRC, I tried Espresso - http://en.wikipedia.org/wiki/Espresso_heuristic_logic_minimizer - in late 1990s. It couldn't even get close to Matthew Kwan's results from 1998, where he used a specialized algorithm.

Re:i7 what? Who cares? on 17% Smaller DES S-box Circuits Found · 2011-07-01 07:45 · Score: 4, Interesting

Here are some specific performance numbers for DES-based crypt(3) on GPUs (for comparison, recall that we're reporting over 20M c/s on a CPU):

oclhashcat-plus is reported to achieve 55M on ATI HD 5970, only 25M on NVidia GTX570 at 1600 MHz core clock, 310M on 8x ATI HD 6970, 181M on 7x NVidia GTX580 (1594 MHz). The numbers for oclhashcat-lite are very similar (57M, 26M, 297M, 181M, respectively). These are off the hashcat website. This does not use our new S-boxes yet (I expect that future versions of *hashcat tools will).

Notice how the number for high-end NVidia is on par with that for our CPU, and for ATI is less than 3x better. Of course, GPUs do have an advantage, but it still does make sense to use CPUs as well, which a typical organization has more of and doesn't need to spend extra time to deploy, install drivers for, etc.

Now, our new S-boxes and other optimizations will provide better performance. Per discussions with a tripcode cracker author, I expect all the way up to 400M c/s on ATI HD 5970, which is close to its theoretical peak speed (approx. 80% of it per some estimates). This is a 20x improvement over our figure for the Core i7 CPU, which is significant. (There's a little room for improvement on the CPU as well, though - specifically, if we pre-generate or runtime-patch the code for each salt as opposed to using pointers at runtime like we do now. This kind of optimization is assumed in the 400M figure for the GPU. So with both having the optimization, the GPU's advantage will be less than 20x.)

Curiously, 400M c/s for 25 iterations of DES will mean that a single ATI HD 5970 with proper code will be able to crack 56-bit DES keys in just 42 days on average.

So, yes, GPUs have an advantage, and we have contributed to that as well.

Re:i7 what? Who cares? on 17% Smaller DES S-box Circuits Found · 2011-07-01 07:25 · Score: 2

Actually, a lot of people care about CPUs. I spoke to someone from a penetration testing company the other day. They run a lot of password hash cracking. And they have 10x more CPUs (used for other purposes as well) than GPUs (bought specifically for password cracking). Given that performance of DES-based crypt(3) on GPUs is by far not as impressive as it is for other hash types, they typically test this sort of hashes on CPUs and not GPUs.

That said, yes, when we worked on the S-boxes, we thought of GPUs as well. One of our target sets of "logic gates" is specifically that of high-end AMD/ATI GPUs (it also works well for Cell, PowerPC/AltiVec, and AMD XOP, but we deliberately excluded gates/instructions that are present on only some of these four platforms). The author of one of the GPU-based cracking tools (for tripcodes) reported a 20% improvement on Radeon HD 5970 due to our new S-boxes. Andrey Belenko of ElcomSoft wrote in a tweet that "Effect for GPUs might be well above 20%, actually."

Re:ONLY 17%? on 17% Smaller DES S-box Circuits Found · 2011-07-01 07:13 · Score: 5, Insightful

We were not the first to generate and try to optimize Boolean expressions for the S-boxes. Other researchers worked on this before, starting 1997 when Eli Biham wrote his classic paper on bitslice DES. 17% is our improvement compared to those previous results. To me, it is impressive that after 14 years and numerous attempts by others, including successful ones, it was still possible to improve on the previous best results by as much as 17% at once. My gut feeling is that further improvements, while definitely possible, will be more limited. But the again, some people I spoke to had thought that our 17% was not possible.

Slashdot Mirror

User: solardiz

Comments · 52