MD5 To Be Considered Harmful Someday
Effugas writes "I've completed an applied security analysis (pdf) of MD5 given Xiaoyun Wang et al's collision attack (covered here and here). From an applied perspective, the attack itself is pretty limited -- essentially, we can create 'doppelganger' blocks (my term) anywhere inside a file that may be swapped out, one for another, without altering the final MD5 hash. This lets us create any number of binary-inequal files with the same md5sum. But MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash. Wang released the two files needed (but not the collision finder itself). A tool, Stripwire, demonstrates the use of colliding datasets to create two executable packages with wildly different behavior but the same MD5 hash. The faults discovered are problematic but not yet fatal; developers (particularly of P2P software) who claim they'd like advance notice that their systems will fail should take note."
Another option is to hash against two very different algorithms, that even if both are partially insecure, the chances of being able to trick both are exponentially higher.
-nB
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
It doesn't have to be harmful to break a ptp system. There's a pretty common exploit on Kazaa where people have a file just containing random junk that registers as a match to a popular file. If you download taht file, and get any portion of it from the fake, the file is corrupted and useless. Somebody could use these fake files to "poison" popular torrents, making it very unlikely that anybody on them will get uncorrupted files.
He can create a file that MD5sum's to the same result as a legitimate file, but does not have full control over the content or size of the result (making this a mostly useless avenue of exploitation except for people who want to spread trash on P2P networks -- I.E. it shouldn't particularly bother anyone except people who already don't care about security).
Or he can create two files that MD5sum to the same result. But he has to have control over both files, which offers effectively no advantage to someone who is trying to spread malware or tamper with existing archives that have been MD5summed.
Consequently, while this is of academic interest I don't see what the big deal is; any time you reduce a large file to a fingerprint you will inevitably run into problems like this because it is impossible to represent one-to-one every individual possible combination of a large set of data in smaller sets ("fingerprints"). You can reduce the risk by increasing the set domain with a larger variadic function but it is impossible to escape this constraint without using fingerprints as large as the data itself.
Try not. Do or do not, there is no try.
-- Dr. Spock, stardate 2822-3.
Your statement is ironic in the extreme. The big risk here is NOT P2P apps. Here's the real risk.
Using one of these collision generators, I can create two x.509 certificate requests which have the same MD5 hash. One request says, "I am John Smith, kshdfkhs8i76y238888888" and the other request says, "I am Microsoft Corp., oiushir87dsfhgkjshdfg"
Now, I get Verisign to issue me a certificate for the first request. Since the hash is the same, I can rewrite the certificate to say that I am Microsoft Corp, and nobody will ever be able to tell the difference. Now, I am able to sign code as if I were Microsoft, and Dominate The Earth.
You're using a different definition of a secure hash than everybody else. It's rather obvious that for files larger than the length of the hash (128 bit for md5), there must be quite a lot sharing the same hash, for a given file length about 2^(filelength in bits - hashlength in bits). However for a hash to be considered secure, it's only required that finding two files with the same hash must be as hard as trying (in md5's case 2^127 different files), but in md5's case you can compute those collisions much cheaper under certain circumstances.
Another condition is obviously that the message should not be reconstructable from the hash.
Maybe, maybe not. The new technique would certainly be more difficult to analyse mathematically, but just stacking complicated but flawed methods does not necessarily result in a more secure method: typically, the security of the weakest link determines the security of the whole system.
What you say reminds me of Don Knuth's experience when he wrote his first innocent 'super' pseudo random number generator (reported in his Art of Computer Programming, Volume 2, page 4: "Algorithm K" ;-): he composed all sorts of complicated operations, but had to learn the resulting number sequence was far from more (pseudo-)random, in fact much worse than the the standard 1-line modulo function.
Another case of (false sense of) security through obscurity?
--
Try Nuggets , our SMS search engine which uses question answering technology; now available across the UK.