MD5 Collision Source Code Released

← Back to Stories (view on slashdot.org)

MD5 Collision Source Code Released

Posted by Zonk on Tuesday November 15, 2005 @09:16AM from the collisiontacular dept.

SiliconEntity writes "The crypto world was shaken to its roots last year with the announcement of a new algorithm to find collisions in the still widely-used MD5 hash algorithm. Despite considerable work and commentary since then, no source code for finding such collisions has been published. Until today! Patrick Stach has announced the availability of his source code for finding MD5 collisions and MD4 collisions (Coral cache links provided to prevent slashdotting). MD4 collisions can be found in a few seconds (but nobody uses that any more), while MD5 collisions (still being used!) take 45 minutes on a 1.6 GHz P4. At last we will be able to implement various attacks which have been purely hypothetical until now. This more than anything should be the final stake in the heart of MD5, now that anyone can generate collisions whenever they want."

29 of 411 comments (clear)

Min score:

Reason:

Sort:

shaken to our what? by tomstdenis · 2005-11-15 09:19 · Score: 3, Insightful

It's important news but not really that shocking. MD5 was not something professionals would recommend for a few years already.

Any half-way intelligent cryptographer would have suggested SHA-1, TIGER or perhaps HAVAL since quite some time already.

Tom

--
Someday, I'll have a real sig.
1. Re:shaken to our what? by ZachPruckowski · 2005-11-15 09:26 · Score: 2, Insightful
  
  It's important news but not really that shocking. MD5 was not something professionals would recommend for a few years already.
  
  But a household user can crack in an hour on a normal mid-line computer something that "a few years" ago professionals might have recommended. That's the news. If low-end PCs can crack encryption that's only a few years outdated, then the hounds are nipping at the heels of the cyptography industry. And imagine what hackers could do with more powerful computers (yes, I know there is a non-applicability issue with some forms of encryption). Or the government with all those Cray super-monstrosities.
MD5 broken, not useless by Anonymous Coward · 2005-11-15 09:21 · Score: 2, Insightful

Keeping in mind where MD5 is broken is important, so that good uses for this tool are not disposed of out-of-hand.

md5 is still good for keeping track of if your files have changed. It should not be used for document signing.
Why? by mboverload · 2005-11-15 09:23 · Score: 3, Insightful

Why not just use 2 different algorithms? Yes, it's possible. Or hell, use 3. Can some one tell me why not this isn't a standard practice? Even if one has a weakness, you still have the other to back it up

I use HMAC-SHA-256 with PasswordMaker.

https://passwordmaker.org/passwordmaker.html
1. Re:Why? by einhverfr · 2005-11-15 09:36 · Score: 5, Insightful
  
  Even if SHA1 and MD5 have attackable collisions the chances are very low that you can find a meaningful collision that affects both algorithms.
  
  --
  
  LedgerSMB: Open source Accounting/ERP
2. Re:Why? by jonabbey · 2005-11-15 10:07 · Score: 2, Insightful
  
  True, but you could use a hash function like SHA1('data').MD5('data'), where the . operator stands for string concatenation.
  
  The reason that this isn't generally done is that should not provide more security than a proper cryptographic hash algorithm that produces hashes as long as the two different hash algorithms concatenated together.
  
  If you want additional collision resistance, just generate a longer hash. I believe this is how people are advised to handle SHA-class algorithms right now.
  
  --
  - jon
  
  Ganymede, a GPL'ed metadirectory for UNIX
3. Re:Why? by einhverfr · 2005-11-15 10:24 · Score: 2, Insightful
  
  If you want additional collision resistance, just generate a longer hash. I believe this is how people are advised to handle SHA-class algorithms right now.
  
  Hmmm....
  
  I actually like the concatenation more. The reason is that the longer hash solution only provides protection provided that the hash algorythm is not broken. Once it is broken your entire system fails. In other words, the security doesn't have much of a concept of depth.
  
  If you use two independant hash algorythms that are different, then the chance that a given change will provide a collision on both hashes is quite remote. This means that one not only has to break both algorythms but has to break both *in the same way.* This means that you could quite possibly use two broken hash algorythms safely.
  
  --
  
  LedgerSMB: Open source Accounting/ERP
4. Re:Why? by jonabbey · 2005-11-15 11:48 · Score: 2, Insightful
  
  Now, with my proposal, one would include independant hashs which would be checked independantly. If either one fails, one assumes that the data has been tampered with. The issue is that it would be difficult to defeat both simultaneously for this specific type of check. Being able to do so on demand while editing the file in a meaningful way might well prove impossible.
  
  Yes, but only if you mean 'might well prove impossible' in the same way that it 'might well prove impossible' to break SHA-1 or MD5. There's nothing magical in the mathematics that makes a hash generated partially by SHA-1 and partially by MD5 harder to break than a hash generated by SHA-256 or the like, which generates a longer hash than SHA-1 or MD5 alone.
  
  Remember, as long as the domain of source files to be hashed includes all possible data files longer than the generated hash, there will be collisions in the function's range. This is true even in the SHA1('data').MD5('data') case. And as long as there are collisions, it's just a question of how difficult it is to find them.
  
  Yes, prefixing your MD5 hash with an SHA1 hash should make it much harder to find a collision in both algorithms simultaneously, but the very same difficulty could be achieved with a single hash algorithm which generates a longer hash. The magic is in the quality of the algorithm and the length of the output, not in the fact that two algorithms were put to use.
  
  --
  - jon
  
  Ganymede, a GPL'ed metadirectory for UNIX
5. Re:Why? by Lord+Ender · 2005-11-15 12:46 · Score: 2, Insightful
  
  Combining hashes is effectively just inventing a new hash algorithm with longer digests.
  
  As a mathematician, this "seems" like it would be intuitively much more difficult for someone to discover an attack against.
  
  As an engineer, this seems like it is obviously NOT a "good" solution.
  
  If H1() and H2() are both weak, the hybrid H12() is obviously stronger than either, but also fundamentally flawed.
  
  --
  A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
bittorrent? by rayde · 2005-11-15 09:24 · Score: 4, Insightful

doesn't bittorrent use md5 to verify the sections of files it has downloaded? will this facilitate poison seeds? or does BT use something more complex than md5?
Re:So what the hell do I do now? by DreadSpoon · 2005-11-15 09:26 · Score: 5, Insightful

Do nothing.

MD5 has not been invalidated for those uses. Checking the MD5 sum of an ISO download is not done for security purposes, it's done so that you can make sure you didn't get a bad byte or two somewhere in that 650MB. I mean, if hackers could upload a malware-filled ISO to the FTP server, they could upload a new MD5SUMS file too, right?
This is misleading - MD5 is still useful by hoggoth · 2005-11-15 09:28 · Score: 5, Insightful

This new algorithm does not ruin the usefulness of MD5 hashes. The algorithm can generate two documents that have the same MD5 hash, an MD5 collision. But it can NOT generate an MD5 collision starting with an existing document. In practical terms, this means a file that has been signed with an MD5 hash is STILL secure. Nobody can replace the file with a different file that will have the same MD5 hash. However someone can prepare in advance two documents with the same MD5 hash and trick someone into believing one document is really the other. So if you trust the original source (a Linux distro for example) you can be confident you are downloading the original document.

--
- For the complete works of Shakespeare: cat /dev/random (may take some time)
1. Re:This is misleading - MD5 is still useful by MoogMan · 2005-11-15 09:49 · Score: 2, Insightful
  
  Nobody can replace the file with a different file that will have the same MD5 hash.
  
  Yet.
Collisions do not mean the end of MD5 by afaik_ianal · 2005-11-15 09:31 · Score: 5, Insightful

This more than anything should be the final stake in the heart of MD5, now that anyone can generate collisions whenever they want.

No, no, no. This does not allow an attacker to generate any collision they like. They cannot find data that collides with a piece of data I provide them with. All they can do is provide me with 2 pieces of data that happen to collide.

This means that an attacker can theoretically provide 2 different documents to people with the same hash, but they cannot easily produce a document that has the same hash as a document I have written.

(Disclaimer: I haven't actually been able to RTFA (it's /.'d), but unless they have made an enormous breakthrough since this was last reported, this attack has very little implications for those of us who use MD5).
Will people stop saying this? by ivan256 · 2005-11-15 09:33 · Score: 2, Insightful

This more than anything should be the final stake in the heart of MD5

No, no it won't be. It won't be, because MD5 is useful for many things where the existance of an "easy" (in quotes because easy is relative) method of generating colisions is irrelavant.

It won't even kill off the use of MD5 checksums as a signature for verifying authenticity, because if your data is smaller than the checksum there may not be a colision at all, and an exploit wouldn't matter.

This is an important discovery, but it doesn't make MD5 useless any more than CRC32 is useless.
Got Salt? by Anonymous Coward · 2005-11-15 09:34 · Score: 2, Insightful

OK, so clearly a scripted attack against MD5 is bad.

But aren't most people using MD5 using salted (as opposed to unsalted) hashes? (for those unclear on the difference, a "salted" has basically uses a local seed as part of it's MD5 hash, in addition to the value to be encrypted)

Doesn't seem likely that salted hashes can be easily broken by this technique, although clearly it's a concern that, should the salt value become known, all your passwords, etc, become breakable...
MD5 and verification by n0dalus · 2005-11-15 09:34 · Score: 4, Insightful

Just because collisions can be generated doesn't mean that MD5 is dead.
It might only take minutes to calculate two random strings with the same hash, but it would still take a very long time to calculate a second string that collides with a pre-existing string. So even though it is now cryptographically weak, it can still be used effectively to check the integrity of files.
MD5 is not an encryption algo by Junta · 2005-11-15 09:36 · Score: 1, Insightful

It is a hash algo. It's used not to protect the content of anything, just to provide a method to validate content integrity, to show nothing accidental or intentional happened to change it.

--
XML is like violence. If it doesn't solve the problem, use more.
1. Re:MD5 is not an encryption algo by swillden · 2005-11-15 11:49 · Score: 2, Insightful
  
  For one: any system that uses one-way hashed password storage or key generation mechanism, as many sites use, is now some percentage easier to violate because multiple strings may resolve to same hash.
  It's the case with all one-way hashes that multiple strings hash to the same value. Since an n-bit hash function can only produce 2^n distinct hash values, if there are more than 2^n possible input plaintexts, than multiple plaintexts hash to single values. In fact, since MD5 and similar can hash inputs of any length, on average there are an infinite number of plaintexts that hash to any given n-bit value.
  The existence of this attack doesn't change the existence or distribution of collsions... we always knew they existed. It just means that we can find some of them. Ideally, we'd like it to be practically impossible to find *any*, even though we know they exist.
  
  --
  Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
SHA-1??? by jd · 2005-11-15 09:39 · Score: 3, Insightful

SHA-1 has known attacks, although none have (yet) proven to be useful for an exploit. The SHA-2 family (eg: SHA-256) are "unproven" and not part of the FIPS-180 standard (so cannot be used for US Government work), but I would regard them as being "probably safer" than SHA-1 for secure work.

TIGER is good, as is Whirlpool. Whirlpool has the advantage that it uses AES as the basis, and AES is regarded as secure. It was also certified for secure work by NESSIE - a European group trying to do for the EU what NIST does for the US - which means that it's probably certified for use in secure environments in Europe.

According to the Hashing Function Lounge, there are other hashing functions that have not been broken (eg: cellhash and fft-hash) but these are sufficiently obscure that a lack of a known exploit may be through lack of study and not through the presence of good security. It would make them good for beating skript-kiddies, as they won't have the skills to find exploits and those skilled enough at finding them aren't studying those algorithms much. (I don't like security through obscurity, but technically these aren't obscured as anyone CAN study the algorithm.)

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:SHA1 by flosofl · 2005-11-15 10:04 · Score: 3, Insightful

For the love of _______! (fill in appropriate name for your particular beliefs)

Someone mod the GP post Funny, before we get more "informative" posts. It looked like a tongue-in-cheek comment to me. I actually laughed. Then I saw the follow-ups from the Unfunny Brigade... It was a joke!

Seriously, who doesn't know about the SHA-1 weakness by now.

--
"This calls for a very special blend of psychology and extreme violence" - Vyvyan "The Young Ones"
Re:WTF this source is useless by plalonde2 · 2005-11-15 10:16 · Score: 2, Insightful

It's thoroughly documented: mind you you need to start with the reference given in the first comment. Variable names and comments relate to that paper.
Not all code documentation lives in the source.
Re:WTF this source is useless by Anonymous Coward · 2005-11-15 10:21 · Score: 1, Insightful

Not all code documentation lives in the source.

True, but if you aren't going to have useful documentation in the source why bother at all?
breaking torrents? by OrangeTide · 2005-11-15 10:22 · Score: 4, Insightful

Ah! That's a very good point.

now if you you were a software company you could put torrents out (I assume they use blocks of MD5sum), and then after the torrent becomes popular start randomly seeding people with blocks that hash correctly but are complete garbage (since you can't pick what exactly you hash). if you do it right you would have games out there that would still mostly run. but would crash, or have garbled game data, etc.

I'm not sure if this is really all that useful. but this exploit certainly seems to make it easy to do.

--
“Common sense is not so common.” — Voltaire
1. Re:breaking torrents? by dougmc · 2005-11-15 11:13 · Score: 3, Insightful
  
  That's what I was thinking -- this being used to break torrents and other p2p setups.
  Though to be fair, most games seem to come in the form of a compressed archive of some sort -- either a bunch of .rar files (for warez) or a .exe file with .cab files (for Windows installers) or something similar. In that case, the corruption would be detected at installation, though it wouldn't be easy to determine from the torrent exactly which blocks are corrupt.
  In short, MD5 being broken (and now the code being available) is very bad. I expect to see the anti-piracy vigilantes jumping on this very quickly to create code to totally break bit torrent and similar things -- it would come up and look like a seed, but would be spewing garbage that the other clients couldn't detect as garbage. Of course, such poisoning would also end up being used to corrupt completely legitimate torrents as well, just because people think it's fun.
  (The fix? Update things like bittorrent to use hash routines that haven't been cracked yet, or to use multiple hash routines on the same blocks.)
Re:So you found a collision, big deal by iabervon · 2005-11-15 10:24 · Score: 3, Insightful

This generates "weak collisions", which are where the attacker finds a pair of texts which hash to the same value, not "strong collisions", which are where the attacker finds a text that hashes to the same value as a text chosen by the user. So, someone could now set their password to some string, and then be able to type a different string to get in. (Except that neither are likely to be ascii text, making it a bit tough to type).

The actual issues are for document signing; the attacker could give you one document to sign, and use the signature on a different document with the same hash. There are smaller issues in the case where code expects no two documents to have the same hash. Obviously, collisions must exist, but the code to handle the case is likely not to be well-tested, since test cases were previously impractical to find.
Re:So what the hell do I do now? by Aranth+Brainfire · 2005-11-15 10:40 · Score: 3, Insightful

"That's not what MD5 sums are used for."

Heh, sorry, but you lose. I use them for that.

I've never had a problem with an "infected" iso, but I have had one or two that downloaded wrong and flat out failed to work after I burned them. Sure enough, I check the md5 sum and it's not correct- download the iso again from the same site and it works.

So I guess maybe either my hard drive is funky (or some other system failure on one end) or this "packet integrity" isn't so great.

--
"Quoting yourself is stupid." -Me
Re:Easy cracking of linux user passwords by Anonymous Coward · 2005-11-15 11:42 · Score: 1, Insightful

Rubbish. This finds a collision between two "randomly" generated streams of data. Even if it could generate a collision for a known piece of data you would still need the original document - namely the password and salt, and if you already have the password then you don't need to find a collision.

It won't magically produce a stream which hashes to a desired value - that's still the (infeasible) domain of brute force.

Sure there are implications as mentioned with the two postscript documents (and other areas, I'd imagine), but unless you have a hand in producing the original document(s) it's not _such_ a biggie.
Re:Q and A by PhiRatE · 2005-11-15 12:04 · Score: 3, Insightful

No, a salt can be public knowledge, a salt is there to prevent the use of a pre-hashed dictionary, it does not help in the event of a matcher.

Way back when the hash was unix crypt(), everyone got a little nervous about the idea that someone with a lot of computing power to burn might just fill up a large hard-drive with a map of plaintext->hash for the entire oxford dictionary for example. If they did that, then all they'd need to do subsequently to figure out the plaintext for any of the values they did the crypt()'ing for would be to look through the rive for the crypt value that matched the crypt value in the password file.

The idea of a salt is simple, it changes the hashed result in a computably repeatable fashion, but one that requires that the entire hash has to be performed with the salt in mind in order to get the correct end result.

So using a completely public bit of knowledge is fine. What is preferable is that the salt be random yet repeatable. That's why you see unix passwd files have the salt prepended to the hash (in crypt, it use to be the first 2 characters for example), you generate it randomly once, then whenever you want to see if the plaintext you've been given is the same, you grab the salt, hash the plaintext with it, and then check to see if the result matches the original hash.

In summary, a salt protects you from the problem of a pre-computed hash dictionary, nothing else.

--
You can't win a fight.