John the Ripper Cracks Slow Hashes On GPU
solardiz writes "A new community-enhanced version of John the Ripper adds support for GPUs via CUDA and OpenCL, currently focusing on slow-to-compute hashes and ciphers such as Fedora's and Ubuntu's sha512crypt, OpenBSD's bcrypt, encrypted RAR archives, WiFi WPA-PSK. A 5x speedup over AMD FX-8120 CPU per-chip is achieved for sha512crypt on NVIDIA GTX 570, whereas bcrypt barely reaches the CPU's speed on an AMD Radeon HD 7970 (a high-end GPU). This result reaffirms that bcrypt is a better current choice than sha512crypt (let alone sha256crypt) for operating systems, applications, and websites to move to, unless they already use one of these 'slow' hashes and until a newer/future password hashing method such as one based on the sequential memory-hard functions concept is ready to move to. The same John the Ripper release also happens to add support for cracking of many additional and diverse hash types ranging from IBM RACF's as used on mainframes to Russian GOST and to Drupal 7's as used on popular websites — just to give a few examples — as well as support for Mac OS X keychains, KeePass and Password Safe databases, Office 2007/2010 and ODF documents, Firefox/Thunderbird/SeaMonkey master passwords, more RAR archive kinds, WPA-PSK, VNC and SIP authentication, and it makes greater use of AMD Bulldozer's XOP extensions."
You make a valid point. I do intend to add a mention of PBKDF2 to a revised version of my presentation, and I am likely to use it or at least HMAC as a component if I design a new password hashing method - not so much because of actual need, but mostly to have an easy and convincing answer about cryptographic security. ;-) However, in the context of this announcement PBKDF2 is arguably less relevant, and it is inferior to the alternatives being considered specifically in the GPU-friendliness aspect (it is more GPU-friendly than all three of SHA-crypt, bcrypt, scrypt). In scrypt, PBKDF2 is used (with SHA-256) to provide/demonstrate cryptographic security, but mostly not computational cost, whereas the analysis here is about the latter, under assumption that all of the alternatives being seriously considered are sufficiently secure cryptographically.
This release of John the Ripper supports PBKDF2 on GPU as well - in the included WPA-PSK cracking code. The release announcement shows a 27x speedup over the also-included CPU code when going from FX-8120 CPU (8 threads) to HD 7970 GPU for WPA-PSK cracking (PBKDF2-HMAC-SHA-1), which clearly shows that it is very GPU-friendly. With SHA-512, it'd be a lot less GPU-friendly, but likely not even to the point of sha512crypt.
The fact that not every password is likely to be cracked is precisely what makes password security audits with John the Ripper useful. If every password would be getting cracked, there would be fewer legitimate uses for the tool. ;-)
Memorizing one 16-digit mixed-case alphanumeric password is realistic, but it does not help you all that much unless it's a "master password" (e.g., used to access an encrypted password manager database or to generate other passwords from or to access an encrypted filesystem where you store other passwords in plaintext), because you'd have difficulty memorizing a large number of unique and dissimilar passwords of this kind. Either way, if you're developing a server application or administering a server where users can register with passwords (maybe as one of the authentication options, not necessarily the only one), it becomes sort of your responsibility to make your users' passwords less likely to be cracked, even if the server security is temporarily compromised (you should assume that this might happen). Note that many of your users' passwords might be weaker than you would have liked them to be, and you don't want to enforce too strict a password policy (as that's a tradeoff). This is where the choice of hashing method to use matters, letting you use a less strict password policy for the same level of security or/and resulting in fewer passwords getting cracked (even with no enforced policy, since some people will choose medium complexity passwords on their own).
The existing password hashing methods won't run on GPU well for user authentication, even when they do run well for cracking passwords. They lack sufficient parallelism within one hash computation. This is an issue I first raised in 1998, in pre-GPU context (it applies to recent CPUs as well, and the problem is getting worse with time).
A solution is to define a new password hashing method with sufficient (configurable) parallelism within one instance. We could then consider running it on GPU, unless it is GPU-unfriendly by other criteria. Do we really want to, though? GPUs in servers are not yet common, except in computing clusters. Their reliability may be lower than that of other typical server components. The drivers are currently relatively unreliable as well (although they may be reliable enough if running the same code, with no upgrades). Sure, computing clusters use them anyway, and get them to run reliably enough for their needs, but the extra hurdle and/or risk is there. Will we get embedded GPUs in typical servers soon? Will they be similar to current gamers' or HPC GPUs or not? This is not clear. Then there's Intel MIC, which delivers GPU-like performance, but is a lot closer to a CPU - it will require a lot of parallelism in the algorithm too, but it may run certain types of otherwise GPU-unfriendly code. Is this possibly a better target?
For current GPUs, a better strategy might be to make them inefficient - by using GPU-unfriendly hashes (for cracking, and for validation as well - as a side-effect).
We had a project last summer to research this kind of possibilities, focusing on use of FPGA boards in authentication servers. This could optionally buy us GPU-unfriendliness (if we want to make things more difficult for attackers with GPUs, but not FPGAs, and for botnets, which almost surely will lack FPGAs). We even considered some moderate CPU-unfriendliness of the component that we'd put on FPGA. Specifically, we experimented with bcrypt on FPGA, as well as with much smaller Blowfish-like "non-crypto" cores (not actual Blowfish), so that we could hopefully fit hundreds or thousands of those per chip (and have them somewhat CPU-unfriendly as well). Yuri, our GSoC 2011 student working on this project, did have some of this implemented in an experimental fashion, and some of it even worked (on FPGA boards kindly provided by Pico Computing), but an outcome of the summer project was that this would be time-consuming to bring to desired levels of performance and reliability. At that point, the project was put on hold.
A simpler and cheaper alternative (if there are only a handful of customers for this) may be to use dedicated servers, existing HSMs, or microcontrollers for just the password hashing. Indeed, microcontrollers are super slow, so their only function would be to hold and apply a local parameter, with the rest of the hashing method implemented on the host's CPU and RAM. If dedicated servers are used, they would need to be separate from authentication servers - that is, they won't know usernames, won't have access to any database, won't have any persistent storage except for the local parameter, and the OS and software indeed. They will accept password, salt, and parameters (such as the configurable per-hash processing and memory cost settings), and provide the hash. Thus, their attack surface would be minimal and they'd provide an extra layer of security against network-based attacks. We'd do this with FPGA boards as well, and we'd also have the greater/unusual computational complexity as a security layer (in case the local parameter or its backup copy is leaked/stolen), but well - using typical and pre-existing server hardware, drivers, etc. is just simpler and cheaper unless we start a new business and expect to have plenty of customers (although that might be possible).