MD5 Collision Source Code Released

← Back to Stories (view on slashdot.org)

MD5 Collision Source Code Released

Posted by Zonk on Tuesday November 15, 2005 @09:16AM from the collisiontacular dept.

SiliconEntity writes "The crypto world was shaken to its roots last year with the announcement of a new algorithm to find collisions in the still widely-used MD5 hash algorithm. Despite considerable work and commentary since then, no source code for finding such collisions has been published. Until today! Patrick Stach has announced the availability of his source code for finding MD5 collisions and MD4 collisions (Coral cache links provided to prevent slashdotting). MD4 collisions can be found in a few seconds (but nobody uses that any more), while MD5 collisions (still being used!) take 45 minutes on a 1.6 GHz P4. At last we will be able to implement various attacks which have been purely hypothetical until now. This more than anything should be the final stake in the heart of MD5, now that anyone can generate collisions whenever they want."

15 of 411 comments (clear)

Min score:

Reason:

Sort:

Replacement Hash Functions by Anonymous Coward · 2005-11-15 09:23 · Score: 5, Informative

Recommended replacements are SHA (preferably SHA-2), WHIRLPOOL and/or RIPEMD.

http://en.wikipedia.org/wiki/SHA-2
http://en.wikipedia.org/wiki/WHIRLPOOL
http://en.wikipedia.org/wiki/RIPEMD-160
Re:SHA1 by psykocrime · 2005-11-15 09:25 · Score: 5, Informative

So is SHA1 the recommended alternative?

No, see:

http://www.computerworld.com/securitytopics/securi ty/story/0,10801,99852,00.html

and

http://www.computerworld.com/softwaretopics/softwa re/story/0,10801,105875,00.html

I like this quote:

"SHA-1 is a wounded fish in shark-infested waters, but I'm more worried about MD5 because it's used everywhere," said Niels Ferguson, a cryptographer at Microsoft Corp. "Try to switch away from SHA-1 as quickly as you can, but switch away from MD5 first," he said when asked what recommendations he has regarding the algorithms during a panel discussion at the conference.

--
// TODO: Insert Cool Sig
"broken" does not mean broken by Edgewize · 2005-11-15 09:33 · Score: 5, Informative

This program is an efficient way to generate two source blocks with the same resulting MD5. This program does NOT allow you to match an arbitrary MD5 hash. That may come some day, but unless I've missed a very important paper somewhere, it has not happened yet.

This does not totally invalidate MD5 for verification. This attack still does not let you poison a torrent feed, etc, unless you are the author of the original source data and you engineered the data specifically to be vulnerable to this attack.
Re:bittorrent? by Wesley+Felter · 2005-11-15 09:34 · Score: 4, Informative

BitTorrent uses SHA-1.
Re:So what the hell do I do now? by yamla · 2005-11-15 09:35 · Score: 4, Informative

That's not what MD5 sums are used for. TCP/IP already has packet integrity. MD5 sums are indeed used to make sure you don't have a malware-filled ISO. The trick is that you grab the MD5 sum from a trusted source, then you can grab the ISO image from any mirror site. Assuming MD5 is safe (obviously not the case), you know your downloaded ISO is exactly the same as the one distributed from the central repository.

--

Oceania has always been at war with Eastasia.
Re:The end of edonkey by n0dalus · 2005-11-15 09:41 · Score: 4, Informative

No. This only helps you find collisions in two randomly generated strings.
It is still very difficult to produce a colliding file given a pre-existing file on the network.
It should also be noted that edonkey splits a file into 9500KB chunks, and then into smaller chunks again, and hashes each one. It would be far more difficult to produce a chunk that causes collisions on all three levels.
Anyway, I expect an eMule extension will come out soon to allow for sharing of SHA1 hashes between clients (if it doesn't already exist).
Re:So you found a collision, big deal by Krischi · 2005-11-15 09:50 · Score: 5, Informative

See this: http://www.cits.rub.de/MD5Collisions/

It demonstrates the generation of two postscript files with the same MD5 hash that nevertheless display completely differently.
Re:SHA1 by Anonymous Coward · 2005-11-15 09:50 · Score: 5, Informative

No, MD5 and SHA1 were found to have better than brute-force attacks within a few months of each other.
Crypto people are recommending SHA-256 or SHA-512 which is only like SHA-1 in name.
Obviously check your the hash length beforehand and make sure your database column is wide enough.
When migrating existing hashes to the new hash be careful not to store the old hash anywhere -- that can be the weak link in the chain. For example, generating passwords and having the MD5 around lets attackers generate valid inputs and then try them against the more computationally complex hash. It gives them an approach to attacking your stronger hash.
Take a copy of your database and hash all the existing passwords into SHA-512 form, and you'll need some way of distinguishing the MD5-to-SHA512 hashes from the SHA512 hashes, so add a date column with todays date in it. Then write a function "hashString" as a wrapper that can identify when something was hashed, and go down a different branch of code based on that.
The first branch does MD5 then SHA512, the second branch does SHA512, and it does this based on the date column.
And, of course, re-salt both branches.
Re:So you found a collision, big deal by andyh1978 · 2005-11-15 10:03 · Score: 4, Informative

Maybe someone could explain why collisions are a serious problem for MD5. Or at least in what instances they are. I can see that in some cases, such as password hashing this could be a problem.
It's not a problem in password hashing. There is still no feasible way to compute one of the infinite plaintexts that would generate a given MD5 from just the MD5. Rainbow Tables are the main threat there, but they're defeated by salting (e.g. HMAC-MD5) as you have to regenerate the tables all over again (and find the salt in the first place). It doesn't hurt to go to a larger, more complex hash, but for this purpose, there's no additional worries. It's still "preimage resistant".
Re:MD5 is not an encryption algo by swillden · 2005-11-15 10:06 · Score: 4, Informative

t's also the default algorithm to hash passwords (i.e. if you type in your password, it gets hashed into an MD5 sum which is then compared to what the MD5-ed password should be, thereby avoiding plaintext password storage).
Doesn't matter. This attack has no significant effect on password hashing, with or without salt.
This attack allows you to find a pair of plaintexts that hash to the same value; you don't get to pick either the plaintexts or the hash value. It does not help you find a plaintext that hashes to a given value. To use this to attack an unsalted password hashing system you would need to first generate a collision, then convince the target of your attack to set one of those plaintexts to be his/her password, then you could log in using the other plaintext. But if you can convince the target to use a particular password, why not just use that to log in?
This is not an insignificant cryptologic result, and people should move away from MD-5 as fast as practically possible (actually, people have been moving away from it for years due to some results against MD-4, which MD-5 is very similar to) but it doesn't really have any practical implications right now.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:SHA-1??? by poemofatic · 2005-11-15 10:12 · Score: 5, Informative

Huh? The SHA-2 family have been standardized, approved by NIST, and recommended by the NSA as part of their suite B for some time now. They are *much* more proven than Whirlpool and required for government use, whereas Whirlpool is not allowed for government use. Look at the SHA-512, SHA-384, SHA-256 CMVP instructions and validation lists before you say that NIST has not approved these hashes.

--
When in doubt, have a man come through a door with a gun in his hand.
Q and A by Sheepdot · 2005-11-15 10:13 · Score: 4, Informative

For those of you who store passwords as hashes in your web apps, I've developed a little "Q & A" post here that explains this as best as possible.

Question: Does this mean all my MD5 passwords for all my users can be cracked?
Answer: The short answer is yes, they can be cracked. The long answer is no, not if you used a salt, and the attacker has to get those md5 hashes first. You are not safe you are storing your user's password field input directly to the database ala the php/sql method of:
INSERT INTO users VALUES ('user','" . md5($password) . "');

Question: How should I remedy this?
Answer: Always use a salt or salts. For example in the case above you could use this php/sql method instead:
INSERT INTO users VALUES ('user','" . sha1(md5($password . '¥1i9k') . 'a-thirty-five-ch4racter-l0ng-str1ng' . md5($password)) . "');

Question: How/Why is this safer?
Answer: Collisions are only direct input for the md5 function to get the same md5 hash. So in the above case, $password was directly taken from the user and made into a hash. Assuming an attacker got an SQL injection in and grabbed the database, they could run this collision creator on a hash and produce an input that gave that exact hash.

But, this would be much more difficult with any code that introduced a salt. That is why the second code is better, it includes two salts that the attacker (through his SQL injection) is unable to account for.
Re:MD5 is not an encryption algo by Haeleth · 2005-11-15 12:14 · Score: 4, Informative

You don't seem to understand... Having the MD5 hash of a piece of software and the ability to generate a collision for that hash will--

Stop right there, because it's quite clear that the one who doesn't understand is you. Nobody has the ability to generate a collision for a given MD5 hash. All we have is the ability to generate two bits of random junk that share the same hash. This makes some attacks possible, but it does not make it possible for you to distribute a malicious version of someone else's software that has the same MD5 hash as their version.
MOD ME DOWN by swillden · 2005-11-15 13:59 · Score: 5, Informative

The parent comment, which I wrote, was based on a severe misunderstanding of the extent of the capability of the attack. In particular, I didn't realize that the attack could find collisions even with arbitrary, attacker-specified IVs. What that means is that it is indeed possible to generate x.509 certificates containing different keys but the same MD5 hash (and therefore the same signature). In fact, it's been done.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:SHA1 by Anonymous Coward · 2005-11-15 14:19 · Score: 4, Informative

If you can find a collision in one, you can, for pretty much the same work, find a collision in the second hash. Cascading hash functions like this DOES NOT WORK.

Look up the Joux Multicollision paper from 2004 CRYPTO. This is a famous result.

This scheme you propose has been broken by Joux.

-Matt