MD5crypt Password Scrambler Is No Longer Considered Safe
As reported here recently, millions of LinkedIn password hashes have been leaked online. An anonymous reader writes "Now, Poul-Henning Kamp a developer known for work on various projects and the author of the md5crypt password scrambler asks everybody to migrate to a stronger password scrambler without undue delay. From the blog post: 'New research has shown that it can be run at a rate close to 1 million checks per second on COTS GPU hardware, which means that it is as prone to brute-force attacks as the DES based UNIX crypt was back in 1995: Any 8 character password can be found in a couple of days. The default algorithm for storing password hashes in /etc/shadow is MD5. RHEL / CentOS / FreeBSD user can migrate to SHA-512 hashing algorithms.'" Reader Curseyoukhan was one of several to also point out that dating site eHarmony got the same treatment as LinkedIn. Update: 06/07 20:13 GMT by T : An anonymous reader adds a snippet from Help Net Security, too: "Last.fm has piped up to warn about a leak of their own users' passwords. Users who have logged in to the site were greeted today by a warning asking them to change their password while the site investigates a security problem. Following the offered link to learn more, they landed on another page with another warning."
rot13 isn't safe either.
Good info about storing passwords properly: http://www.f-secure.com/weblog/archives/00002095.html
"... RHEL / CentOS / FreeBSD user can migrate to SHA-512 hashing algorithms."
That's fine and dandy and all, but what about King Shit Debian? Pretty much all of my systems run something Debian-based in one form or another. Is this a simple change for Debian users too?
I have been trying since 1995 and have still not broken in on my Pentium P6
Seriously slashdot?
MD5 has questionned since 1996 and a strong attack on it was realized in 2004. It broke apart since. Nobody serious about security should still be using MD5 now.
Just read the second chapter of wikipedia on MD5 http://en.wikipedia.org/wiki/MD5
It astounds me that Linkedin and eHarmony used unsalted password hashes. That's much worse than using md5 (and, yes, you shouldn't use md5, but, still, first things first).
From the Linkedin Press Release :
The passwords are stored as unsalted SHA-1 hashes,
Come on, guys, get up to at least 1978 in your security policy.
Brute-force was solved decades ago. The local free-net here had a simple solution:
If you get your password wrong, you can't try again for 1 second. Every failure doubles the time required to try again.
Why doesn't everyone do that?
---
ECHELON is a government program to find words like bomb, jihad, plutonium, assassinate, and anarchy.
Looks like it's time to change my password to "password1".
JADBP
There are 95 ASCII characters, which makes 95**8 = 6,634,204,312,890,625 possible 8 character passwords. At one million checks per second a brute force attack will take 6,634,204,312 seconds (210 years).
There's a fad going around right now to use ridiculously slow password hashing algorithms on the web, which the poster apparently has bought into:
If you do this you're opening your site up to an easy DoS attack - a few 10s of login requests per second will slow your server to a crawl. The place where slow hashing algorithms ought to be used is exactly the opposite of where they're used today: encryption of local files, where the user actually has to remember the password, unlike web passwords where you can just use 32 random characters and let your local browser remember it (preferably with the browser's password file encrypted with a slow hash).
Password selection depends on the place you're using the password.
For most websites, enter something like abc321, hit reset password and they kindly reset the password to something and email me the new relatively good password.
It doesn't need that much security, so those are stored in my email.
For places that need better passwords, $ md5sum - lot of random text pounded on the keyboard and result is something like 24a53bc05c6f216e340aa8d5dc08b605
That checksum becomes the password.
For places where I actually have to enter the password without copypaste, something generated like xkcd's battery horse staple correct.
There are no atheists when recovering from tape backup.
SHA-1 is not sufficient, but the summary refers to SHA-512, which is in the SHA-2 set. Now whether SHA-2 is sufficent, or if developers should be migrating to something like bcrypt or SRP is a bigger conversation.
In any case, while I would never tell someone to use MD5, the lack of a salt is much more egregious and made the leak much worse than it had to be.
There was another /. story a few months ago discussing password encryption. What I gleaned was that using md5 for sensitive data on your webserver is insufficient. So I'm a bit surprised (well, maybe not) to see a big site like linkedin cut corners with security. I personally use a Python implementation of Blowfish.
A witty saying proves you are wittier than the next guy.
Is there any reason whatsoever to NOT just use a random generator for web passwords and store them locally? Do you really need to access your LinkedIn account from someone else's computer?
cat /dev/urandom | base64 | tr +=/ 012 | head -c 20; echo
608b2d50a6521a27c12626cedfea0fc3
Whether MD5 is "secure" or not is irrelevant.
Machines that are accessed by users should not be the same servers storing the account security data. One of the key benefits to domain authentication provided by Kerberos and it's relatives is that the authentication data is isolated on a server that is supposed to be doing nothing but authentication and authorization.
That makes it damn hard to break into the security server to steal the password lists in the first place, regardless of what algorithms are used to hash the passwords. The problem is a poorly designed system, not a poorly equipped algorithm.
I do not fail; I succeed at finding out what does not work.
KeePass (or similar) + Dropbox (or similar) s the way to go... different password per site, I don't know the password, copy/paste updated everywhere.
First of all, WTF is a "password scrambler"? If you feel the need to dumb down the phrase "hash algorithm", you're probably submitting to the wrong site.
I LOLed at this article[1] on ZDNet this morning for its sensationalist, lowest-common-denominator "OMG computer hackery stuff" reporting, with its implied link between MD5's weakness (which has been known for years) and the LinkedIn breach (even though they use SHA1), and its ridiculous accompanying screen cap (running user-space tools while logged in as root, which no security-minded user would ever do, but hey "root@" at a shell prompt with lots of hackery output looks l33t).
And now here's basically the same thing on Slashdot. Yawn...
[1] http://www.zdnet.com/blog/security/md5-password-scrambler-no-longer-safe/12317
So, will someone point to a list of the stolen Linkedin accounts so we can see if our was one of them?
If only there were a website where they could connect with other security professionals, exchange ideas and maybe even find people to hire....
If telephones are outlawed, then only outlaws will have telephones.
"The default algorithm for storing password hashes in /etc/shadow is MD5. RHEL / CentOS / FreeBSD user can migrate to SHA-512 hashing algorithms."
FreeBSD has long (like, 10+ years) had support for Blowfish password hashes. Blowfish was a close second in the AES contest, and is quite strong. Enabling it only requires editing /etc/login.conf and afterwards updating any pre-existing passwords.
/dev/random
Algorithms designed to burn CPU, like PBKDF2 recommended in the article, are great if used correctly.
The important thing to remember is that you want to make the client burn CPU, NOT the server. If you let the client trivially initiate 10 http requests that cause a server's CPU to peg for 1 second each, you've created a nice DoS vector.
Are there any existing Javascript crypto libraries that safely offload this work to the client?
I seem to recall that a few years ago, people started being able to easily generate MD5 collisions too, so I wouldn't say this is real big news.
As the summary notes, 8 character passwords can be cracked pretty quickly. 15 Characters with the crappy password rules we've enforced for minimum 8 character passwords become hard for users. It's time we start demanding correcthorsebatterystaple style random word passwords with maximum lengths of 255 characters (and a minimum > 8 characters).
That and WTF the passwords were unsalted? Salt them and DON'T keep the salt in the database.
I put on my robe and wizard hat..
How many times does this have to be said? Having literally hundreds of thousands of sites on the web that require you to create an account with a password is not a good security model. This should be patently obvious to anyone who spends enough time on the web.
If any of you out there are devs (for consumer-facing web companies) out there - I beg of you to push your company to start supporting OpenID as a reliant party.
AccountKiller
- Specify a global salt which is not stored in the database, only in the code (so crackers don't have the full salt even when they own the database)
- Generate a random per user salt which may be stored in the database
- Store the password hashed, not in clear text, as hash(globalSalt + password + perUserSalt)
- Encrypt/Authenticate the connection to the login page over HTTPS
- Do some random stuff that makes your login system different from that of other sites (this is just so your website will not fall victim to automated tools and will require actual skill to crack).
If you do any less, your problem is not whether you use md5 or some other hash. In my experience from old code bases I maintained from prior coders an average of less than 1 of those bullet points is followed (and of course that number was meticulously calculated and not made up).
That said, implementing this should take even a Java programmer less than 200 lines of code.
The argument is that "we can do 1 million hashes in one second on a GPU, thus we can perform a dictionary attack in just a few days." It is an argument about execution speed versus size of the dictionary. The size of the dictionary is limited by the human brain, and is not going to change any time soon. The execution speed is expected to decrease as a function of Moore's law, GPU, etc. The solution cannot be to move to SHA1, or even SHA256, because these algorithms don't take much longer to run than MD-5. They can't, because they are used in scenarios where servers have to process millions of messages.
The solution is probably to do something special for password storage. Use salt of course. But also do something like "run SH-xxx N times" where N is a number that grows larger as Moore's law progresses.
The issue with doing the hash client-side is that now the hash has become the password. If someone steals the list of hashes it's game over, they can just emulate the client sending the hash and the server won't know that they didn't start from the password and perform a hash. The hash must be done server-side.
On a decent nvidia box here's the run output against a ... theoretical sha1
hashtable Status.......: Exhausted
Input.Mode...: Mask (?1?1?1?1?1?1?1)
Hash.Type....: SHA1
Time.Running.: 7 hours, 28 mins
Time.Left....: 0 secs
Time.Util....: 26894969.7ms/2475.6ms Real/CPU, 0.0% idle
Speed........: 130.9M c/s Real, 117.4M c/s GPU
Recovered....: 14745/6143150 Digests, 0/1 Salts
Progress.....: 3521614606208/3521614606208 (100.00%)
Rejected.....: 0/3521614606208 (0.00%)
HW.Monitor.#1: 0% GPU, 85c Temp
That's 130M c/s not 1M c/s and Sha1 isn't as fast as Md5 on hashcat.
I could see 1Mc/s on a middle tier single socket/core CPU though, that sounds about right.
I've been playing with a dedicated hash database that is on its own server, so hosts bounce a request off this appliance and get a "yes", "no", or "timeout". Too many "no"s in too short a time make the hash validation appliance refuse to give any answers for a period of time.
If done correctly, for someone to get the hash database, they would have to find a way to physically get access to the appliance, then dump the box. It isn't perfect (which is why a better algorithm like bcrypt should be used to store hashes), but having a layer of security whose sole purpose is to keep the list of hashes out of the hands of the blackhats means that after a breach is rectified, users are not forced to change all their passwords (unless their accounts were directly involved).
So one could trivially deny service to all of your users by bouncing a bunch of bogus login attempts off your front end?
Who told you my password?
I think the problem is overhyped. If you're an online site and some hacker already has the hashes and salts of user passwords to bruteforce, you're typically already so pwned it doesn't matter if you are using SHA2048 or whatever.
;).
For the same reason it doesn't matter that much even if I use 8 character passwords for noncritical online sites. I'd be flattered if the attackers are going to DDoS the site just to crack my password via the site's login page! If the site is famous enough I might even have enough warning to switch to a longer password when the DDoS attack hits the news
Whereas if they are bruteforcing my password offline - it means the site has already been compromised. And they are likely to be able to access the rest of my data in that site, possibly do actions using my account and perhaps with a bit more effort get the plaintext of my password the next time I log in.
So use different passwords for different sites, don't use passwords that are too short or obvious that they can be bruteforced online, but don't sweat making them super long unless its important or you're paranoid- since the site is more likely to get pwned before they bruteforce it online.
Getting pwned or compromised isn't a rare thing. I've signed up for different stuff using unique email addresses, and I've noticed spam coming to a few of those addresses. Maybe one day I'll have to create new slashdot/facebook/etc accounts when my current ones get pwned. Big deal.
For "offline" stuff like GPG, truecrypt, yes please do use strong and long passphrases.
Salt means rainbow tables become impractical. Salt also means a collision attack is also mitigated. Salt is not expected to be any more secret than hash, it just changes tho way it works.
XML is like violence. If it doesn't solve the problem, use more.
Someone whack the remaining slashdot users with a clue-stick.
The amount of total ignorance in the comments, has overflowed my desire to FTFY, across almost all posts... but here we go.
- Twofish (not Blowfish) was a strong running candidate in the AES
- This is not the same as using MD5 for passwords
- MD5 suffered from collisions, but it didn't really matter. If you prepended a random string, finding a collision is a much harder problem.
- This isn't a dictionary attack, it's a brute-force attack made possible by an advanced parallel implementation of a for-purpose cracking algorithm
The point being: running a password through function X n^8 times is easy, and getting easier (to the point where X=MD5-crypt is not really secure any more). (Where n is the keyspace., and 8 is the number of characters in your password)
Either bump n^8 up (by increasing your password length), or make X so convoluted that it makes it harder. This does *not* mean a longer output hash.
How much extra compute time is MD5 vs SHA1 vs SHA512: not much.
It would be interesting to try to create some cryptosystem providing: - a Hash(Key, Password) function; - a Compare(Key1, Key2, Hash1, Hash2) which returns true only when Password1=Password2 in Hash(Key1, Password1)=Hash(Key2, Password2). When a user registers an account: - Server gives the client a randomly generated challenge Key1; - Client provides H = Hash(Key1, Password) to the server; - Server stores Key1 and H. When a user logs in: - Server gives the client a randomly generated challenge Key2; - Client provides h' = Hash(Key2, Password); - Server computes Compare(Key1, Key2, h, h'). I'm not sure creating such a cryptosystem is possible, I have not given it much thought. It would be nice if Hash were expensive to compute but Compare were cheap. This has the added bonus that the server will never know your password.
The argument is that "we can do 1 million hashes in one second on a GPU, thus we can perform a dictionary attack in just a few days." It is an argument about execution speed versus size of the dictionary. The size of the dictionary is limited by the human brain, and is not going to change any time soon. The execution speed is expected to decrease as a function of Moore's law, GPU, etc. The solution cannot be to move to SHA1, or even SHA256, because these algorithms don't take much longer to run than MD-5. They can't, because they are used in scenarios where servers have to process millions of messages.
The solution is probably to do something special for password storage. Use salt of course. But also do something like "run SH-xxx N times" where N is a number that grows larger as Moore's law progresses.
True, all true, but entirely missing the point. Never store any customer data at all unencrypted, even password hashes. There are many ways to have your data pwnt, but bulk copy of the data at rest is the easiest and most common. That data (bulk data at rest) should never be unencrypted, ever.
Socialism: a lie told by totalitarians and believed by fools.
That is what one tries to avoid. The best is to have multiple blocking mechanisms. First is by IP address, so if someone is hacking user Alice's account, the site trying to hack in as Alice gets blocked on the IP level.
The hashing appliance is a work in progress. Either a delay between handing out replies for the same user or an outright lockout serve the same purpose. The trick is to slow down a dictionary attack, as well as make it difficult for an attacker to grab the list of user password hashes.
Storing is meaningless if companies ** cough ** virgin-mobile ** cough** send you your passwords or PINs in an email every time you change it. And they include your phone number and name in the same email. And when asked to stop it they claim to adhere to industry standard security standards.
You are correct that single password hashes when you have access to the salt is not harder. But breaking all of the password hashes that have different salts is much harder, even if you have access to the salts, as you cannot simply build one lookup table; your space/time requirements grow dramatically.
That's just the description of a challenge system, which works very well by the way, if you create the challenge function f(key,challenge) not too trivial
You missed the point.
Spoofed source IPs would defeat this method easily.
It's useless for an actual online brute-force attack, since the attacker clearly wants to receive a reply from his password attempts, but if the aim is just to DoS the authentication server, that's not a problem. You should send the client a token and wait to get it back before you start blowing CPU cycles on determining whether their password hash is legit.
DRM: Terminator crops for your mind!
There's ways to keep the data passed back and forth from being constant. A challenge-response system, where the server sends a random blob and requires the client to manipulate it using the password in a fashion that it can verify using the database-stored hash, without actually transmitting either the password or hash. Of course, now we're well beyond the simple "hash this and see if it matches" and into automatic handshaking protocols...
Does "run SH-xxx N times" actually grow in computational complexity as proportionally to O(N)? I don't happen to know enough about SHA-x to answer this, but I do remember that triple-DES did not actually increase complexity proportional to a tripled key length.
http://en.wikipedia.org/wiki/Meet-in-the-middle_attack
DRM: Terminator crops for your mind!
http://www.buzzfeed.com/jwherrman/the-23-most-depressing-leaked-linkedin-passwords
Why is Snark Required?
I'm always interested in the latest status of how fast the various popular hashes (and encryption) can be brute forced by:
- home computer / GPU
- super computer
- (blade) server cluster (enterprise/government scale)
- distributed.net-like project
- maybe ASICs or dedicated hardware? Like EFF DES cracker/Deep Crack
And the expectations for the next 20 to 50 years, assuming no revolutionary thing as quantum computing will emerge.
Does anyone know an up to date website dedicated to this? Articles tend to be obsolete in a few years.
You cannot spoof an IP in this case. You are establishing a TCP connection and sending login info over HTTP. The TCP connection starts with a 3 way handshake, which is not possible in the case of spoofed IPs. IP based throttling, and dedicate hash processor (may be someone could build a co-processor card for this) sounds great actually.
The hashing algorithm is far more important in offline attacks than in online attacks. Please don't suggest otherwise, as you don't appear to have a background in security (this isn't a personal attack, but people without security backgrounds should never give security advice).
If the table containing password hashes is compromised and leaked, the only thing preventing the plaintext being bruteforced is the strength of the algorithm. Salting prevents using a rainbow table, but you have to assume the salt is also compromised. If there's a global salt, it's just a matter of building a new rainbow table with the known shared salt: this is time consuming, but not that bad especially for weaker passwords IF THE ALGORITHM WAS CHOSEN POORLY. When each password has its own salt, an attacker must go one at a time, slowing the entire process down by a factor of [number of users in the compromised system].
The algorithm's speed comes into play here. MD5 and SHA1 are meant to be used as checksums and are designed run as fast as possible - hundreds of thousands of times per second on password-length strings with modern hardware. This of course disregards the fact that they are not cryptographically secure. SHA256, same situation: it's a bit slower and while it's currently still considered crytographically secure, it's designed for speed. I can compute a rainbow table for a dictionary attack in about two seconds (or two seconds per row, assuming unique per-row salts). Compare to something like bcrypt or PBKDF2* which include a number of rounds specifically to slow things down: with even a relatively low number of rounds ($08$ to $09$ in bcrypt, for example) modern hardware caps at about ten hashes per second. Now going down /usr/share/dict/words takes 25,000 seconds - seven hours - instead of 1-2 seconds. That timeframe is going to get exponentially longer when you consider variations, substitutions, mixed case, and multi-word passwords, assuming the password being attacked is dictionary-based at all.
You do raise a valid point about the other damage a data breach may cause, but aside from encrypted data (such as financial information), the password is the most damaging thing an attacker could retrieve since that probably grants access to a whole host of other sites since so many people re-use passwords. If you're OK with that, fine - go ahead and re-use crappy passwords everywhere. But for the love of security, please don't give advice on how people/companies should hash passwords.
*PBKDF2 is meant to be used as a key derivation function to convert a password into a cryptographically-secure encryption key. It's not the best choice for password digests, although because it includes a work factor like bcrypt is still relevant to the discussion, and is certainly a better choice than MD5 or SHA-family alone.
How are sites slashdotted when nobody reads TFAs?
Lack of a salt makes no difference. Salting is designed to defeat rainbow table attacks. However no actual criminals who are cracking passwords are using rainbow tables. It's all done using GPUs, which don't care about salts. Also the summary of TFA is wildly misleading. 1 million attempts per second? Uh, no. Last I checked oclhashcat-plus was capable of about 50 billion attempts per second on a 4xATI5970 rig. You can break most "normal" passwords in under one second with such hardware.
The most common way to encrypt such data is still easily bypassed if the site has an sql injection vulnerability. I agree you should encrypt all customer data, but don't forget that the data may be illicitly extracted through a process that needs access to the data unencrypted.
Ever logged in to a mail server before? You logged in with that method.
How it works:
1. Make standard hash of user's password.
2. Make random 'challenge'. Give to user.
3. User sends back HASH(HASH(password) + challenge)
4. Server computes HASH(Stored hash + challenge)
5. Authentication with no password sent or stored in cleartext.
If the server can decrypt it, an attacker can as well.
Its just a minor speed bump along the way.
Great... so instead of one password per site, someone just needs to log into your DropBox account and crack your (hopefully fairly strong) KeePass password, and they get everything -- not just all your passwords, but what sites they're for and what the associated usernames are. All sitting out there on a public server 24x7.
How strong is your KeePass password?
The hashing algorithm is far more important in offline attacks than in online attacks. Please don't suggest otherwise, as you don't appear to have a background in security (this isn't a personal attack, but people without security backgrounds should never give security advice).
Did you read the parent post?
1. Make standard hash of user's password.
2. Make random 'challenge'. Give to user.
3. User sends back HASH(HASH(password) + challenge)
4. Server computes HASH(Stored hash + challenge)
5. Authentication with no password sent or stored in cleartext.
Except now your "standard hash" is effectively the password, and it presumably is stored in cleartext (or at least reversible encryption) in order to do step 4.
And the hash is the password again...
If I have stolen $hash from server, I won't need to break that hash to emulate the user.
I just need to get challenge and return hash($stolen_hash + challenge) to the server. Congratulations, you re-invented saving "plaintext" passwords! ;)
It's The Golden Rule: "He who has the gold makes the rules."
The attack on multiple encryption relies on two things: 1) Input and output space are both finite 2) You can invert encryption under possible key values to "meet in the middle" Neither of these is true for hash functions (although in this specific password case you can put reasonable bounds on the size of the password to get (1) under assumptions).
even if you have access to the salts, as you cannot simply build one lookup table; your space/time requirements grow linearly in the number of distinct salts.
FTFY
All of security is just speedbumps along the attackers way. That's what security does, whether physical or IT: it delays the attacker by some amount.
If all of your customer data is plaintext at rest, then the attacker just needs to pwn any server anywhere that can see that data, and bulk-copy it off. That might not even raise any internal alarms, since it's justa file copy.
If all of your customer data is encrypted at rest, then the attacker needs to pwn each machine thatknows how to decrypt each part of that data, and not just to run an arbitrary process/shell on that server, but actually understand and interact with the process that holds the keys. That's a smaller hole. And if it's something like an SQL injection attack to get at DB data, you have to be vulnerable to that specific atack, and there's at least a chance that some massive query that returns all rows of all tables will get noticed, so even if you don't stop it maybe you at least know it just happened.
Socialism: a lie told by totalitarians and believed by fools.
Building a rainbow table to crack a list of hashes is a stupid thing to do. People who don't understand rainbow tables should not suggest building them.
If you're OK with that, fine - go ahead and re-use crappy passwords everywhere. But for the love of security, please don't give advice on how people/companies should hash passwords.
Uh, Mr. Security Expert? Take a chill pill and re-read his post.
The execution speed is expected to decrease as a function of Moore's law, GPU, etc.
Wouldn't the execution speed increase? Or did you mean that the execution time would decrease?
it is not MD5 we are talking but MD5crypt: http://en.wikipedia.org/wiki/Crypt_(Unix)#MD5-based_scheme
Which is 1000 times MD5, so you get not 50 billion attempts on your rig but 0.05 billion
Atari rules... ermm... ruled.
That method is nothing like I described and I doubt anyone serious about security is using it.
SASL does store the password in clear text though if you use CRAM-MD5 or DIGEST-MD5 authentication.
Use strings on the SASL db to confirm this.
I worked on a search engine in 95, and used md5 hashes (of content minus tags) as keys for pages to detect identical pages, and I remember having 2-3 md5 collisions out of maybe the first 100.000 indexed pages.
On second thought, the idea above is merely a zero-knowledge proof which can be easily achieved with homomorphic encryption. I think this could be achieved with javascript so website developers can easily integrate a more secure login with almost zero overhead server-side, and unnoticeable overhead client-side.
You seem to be replying to the wrong post.
Your question inspired me to write a proof of concept.
Do you care about the security of your wireless mouse?
The issue with doing the hash client-side is that now the hash has become the password. If someone steals the list of hashes it's game over, they can just emulate the client sending the hash and the server won't know that they didn't start from the password and perform a hash. The hash must be done server-side.
Not necessarily. If the salt or part of the salt was a "Proof of work" package, it could be offloaded to the client side.
i.e.
my password is stored in regular hash/salt format, the salt being a challenge string.
The proof of work would be to vary the challenge string on a pre-agreed upon mutation, and produce a hash of the mutated string so that the value starts with 0000. That hash added to a secondary salt based on the challenge string constitutes the actual salt. Naturally the proof of work can vary with hardware improvements.
Now I can off load most of that work to the client to calculate the salt for me.