MD5 Collision Source Code Released
SiliconEntity writes "The crypto world was shaken to its roots last year with the announcement of a new algorithm to find collisions in the still widely-used MD5 hash algorithm. Despite considerable work and commentary since then, no source code for finding such collisions has been published. Until today! Patrick Stach has announced the availability of his source code for finding MD5 collisions and MD4 collisions (Coral cache links provided to prevent slashdotting). MD4 collisions can be found in a few seconds (but nobody uses that any more), while MD5 collisions (still being used!) take 45 minutes on a 1.6 GHz P4. At last we will be able to implement various attacks which have been purely hypothetical until now. This more than anything should be the final stake in the heart of MD5, now that anyone can generate collisions whenever they want."
It's important news but not really that shocking. MD5 was not something professionals would recommend for a few years already.
Any half-way intelligent cryptographer would have suggested SHA-1, TIGER or perhaps HAVAL since quite some time already.
Tom
Someday, I'll have a real sig.
Keeping in mind where MD5 is broken is important, so that good uses for this tool are not disposed of out-of-hand.
md5 is still good for keeping track of if your files have changed. It should not be used for document signing.
Why not just use 2 different algorithms? Yes, it's possible. Or hell, use 3. Can some one tell me why not this isn't a standard practice? Even if one has a weakness, you still have the other to back it up
I use HMAC-SHA-256 with PasswordMaker.
https://passwordmaker.org/passwordmaker.html
doesn't bittorrent use md5 to verify the sections of files it has downloaded? will this facilitate poison seeds? or does BT use something more complex than md5?
Do nothing.
MD5 has not been invalidated for those uses. Checking the MD5 sum of an ISO download is not done for security purposes, it's done so that you can make sure you didn't get a bad byte or two somewhere in that 650MB. I mean, if hackers could upload a malware-filled ISO to the FTP server, they could upload a new MD5SUMS file too, right?
This new algorithm does not ruin the usefulness of MD5 hashes. The algorithm can generate two documents that have the same MD5 hash, an MD5 collision. But it can NOT generate an MD5 collision starting with an existing document. In practical terms, this means a file that has been signed with an MD5 hash is STILL secure. Nobody can replace the file with a different file that will have the same MD5 hash. However someone can prepare in advance two documents with the same MD5 hash and trick someone into believing one document is really the other. So if you trust the original source (a Linux distro for example) you can be confident you are downloading the original document.
- For the complete works of Shakespeare: cat
This more than anything should be the final stake in the heart of MD5, now that anyone can generate collisions whenever they want.
/.'d), but unless they have made an enormous breakthrough since this was last reported, this attack has very little implications for those of us who use MD5).
No, no, no. This does not allow an attacker to generate any collision they like. They cannot find data that collides with a piece of data I provide them with. All they can do is provide me with 2 pieces of data that happen to collide.
This means that an attacker can theoretically provide 2 different documents to people with the same hash, but they cannot easily produce a document that has the same hash as a document I have written.
(Disclaimer: I haven't actually been able to RTFA (it's
This more than anything should be the final stake in the heart of MD5
No, no it won't be. It won't be, because MD5 is useful for many things where the existance of an "easy" (in quotes because easy is relative) method of generating colisions is irrelavant.
It won't even kill off the use of MD5 checksums as a signature for verifying authenticity, because if your data is smaller than the checksum there may not be a colision at all, and an exploit wouldn't matter.
This is an important discovery, but it doesn't make MD5 useless any more than CRC32 is useless.
OK, so clearly a scripted attack against MD5 is bad.
But aren't most people using MD5 using salted (as opposed to unsalted) hashes? (for those unclear on the difference, a "salted" has basically uses a local seed as part of it's MD5 hash, in addition to the value to be encrypted)
Doesn't seem likely that salted hashes can be easily broken by this technique, although clearly it's a concern that, should the salt value become known, all your passwords, etc, become breakable...
Just because collisions can be generated doesn't mean that MD5 is dead.
It might only take minutes to calculate two random strings with the same hash, but it would still take a very long time to calculate a second string that collides with a pre-existing string. So even though it is now cryptographically weak, it can still be used effectively to check the integrity of files.
It is a hash algo. It's used not to protect the content of anything, just to provide a method to validate content integrity, to show nothing accidental or intentional happened to change it.
XML is like violence. If it doesn't solve the problem, use more.
TIGER is good, as is Whirlpool. Whirlpool has the advantage that it uses AES as the basis, and AES is regarded as secure. It was also certified for secure work by NESSIE - a European group trying to do for the EU what NIST does for the US - which means that it's probably certified for use in secure environments in Europe.
According to the Hashing Function Lounge, there are other hashing functions that have not been broken (eg: cellhash and fft-hash) but these are sufficiently obscure that a lack of a known exploit may be through lack of study and not through the presence of good security. It would make them good for beating skript-kiddies, as they won't have the skills to find exploits and those skilled enough at finding them aren't studying those algorithms much. (I don't like security through obscurity, but technically these aren't obscured as anyone CAN study the algorithm.)
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
For the love of _______! (fill in appropriate name for your particular beliefs)
Someone mod the GP post Funny, before we get more "informative" posts. It looked like a tongue-in-cheek comment to me. I actually laughed. Then I saw the follow-ups from the Unfunny Brigade... It was a joke!
Seriously, who doesn't know about the SHA-1 weakness by now.
"This calls for a very special blend of psychology and extreme violence" - Vyvyan "The Young Ones"
Not all code documentation lives in the source.
Not all code documentation lives in the source.
True, but if you aren't going to have useful documentation in the source why bother at all?
Ah! That's a very good point.
now if you you were a software company you could put torrents out (I assume they use blocks of MD5sum), and then after the torrent becomes popular start randomly seeding people with blocks that hash correctly but are complete garbage (since you can't pick what exactly you hash). if you do it right you would have games out there that would still mostly run. but would crash, or have garbled game data, etc.
I'm not sure if this is really all that useful. but this exploit certainly seems to make it easy to do.
“Common sense is not so common.” — Voltaire
This generates "weak collisions", which are where the attacker finds a pair of texts which hash to the same value, not "strong collisions", which are where the attacker finds a text that hashes to the same value as a text chosen by the user. So, someone could now set their password to some string, and then be able to type a different string to get in. (Except that neither are likely to be ascii text, making it a bit tough to type).
The actual issues are for document signing; the attacker could give you one document to sign, and use the signature on a different document with the same hash. There are smaller issues in the case where code expects no two documents to have the same hash. Obviously, collisions must exist, but the code to handle the case is likely not to be well-tested, since test cases were previously impractical to find.
"That's not what MD5 sums are used for."
Heh, sorry, but you lose. I use them for that.
I've never had a problem with an "infected" iso, but I have had one or two that downloaded wrong and flat out failed to work after I burned them. Sure enough, I check the md5 sum and it's not correct- download the iso again from the same site and it works.
So I guess maybe either my hard drive is funky (or some other system failure on one end) or this "packet integrity" isn't so great.
"Quoting yourself is stupid." -Me
Rubbish. This finds a collision between two "randomly" generated streams of data. Even if it could generate a collision for a known piece of data you would still need the original document - namely the password and salt, and if you already have the password then you don't need to find a collision.
It won't magically produce a stream which hashes to a desired value - that's still the (infeasible) domain of brute force.
Sure there are implications as mentioned with the two postscript documents (and other areas, I'd imagine), but unless you have a hand in producing the original document(s) it's not _such_ a biggie.
No, a salt can be public knowledge, a salt is there to prevent the use of a pre-hashed dictionary, it does not help in the event of a matcher.
Way back when the hash was unix crypt(), everyone got a little nervous about the idea that someone with a lot of computing power to burn might just fill up a large hard-drive with a map of plaintext->hash for the entire oxford dictionary for example. If they did that, then all they'd need to do subsequently to figure out the plaintext for any of the values they did the crypt()'ing for would be to look through the rive for the crypt value that matched the crypt value in the password file.
The idea of a salt is simple, it changes the hashed result in a computably repeatable fashion, but one that requires that the entire hash has to be performed with the salt in mind in order to get the correct end result.
So using a completely public bit of knowledge is fine. What is preferable is that the salt be random yet repeatable. That's why you see unix passwd files have the salt prepended to the hash (in crypt, it use to be the first 2 characters for example), you generate it randomly once, then whenever you want to see if the plaintext you've been given is the same, you grab the salt, hash the plaintext with it, and then check to see if the result matches the original hash.
In summary, a salt protects you from the problem of a pre-computed hash dictionary, nothing else.
You can't win a fight.