Practical Exploits of Broken MD5 Algorithm
jose parinas writes "A practical sample of an MD5 exploit can be found, with source code included,in codeproject, a site for .Net programmers.
The intent of the demos is to demonstrate a very specific type of attack that exploits the inherent trust of an MD5 hash. It's sort of a semi-social engineering attack.
At Microsoft, the MD5 hash functions are banned.
The main problem is that the attack is directed to the distribution of software process, as you can understand reading the paper, Considered Harmful Someday. Some open source programs, like RPM, use MD5, and in many open source distributions MD5 is used as check sum."
...better use Tiger or Whirlpool (based on AES). AFAIK there are no known vulnerabilities or attacks for these two yet.
Do not be alarmed. This is only a test.
This seems to work on the assumption that you want to do some harm with a program you created yourself, you can't actually take a random RPM and turn it into an evil RPM with the same MD5. So, yes, it's bad, but it's not as bad as you might think.
Considering this is such a "well known" result, you would think that MD5 should have been abandoned long ago. Is this true for other popular hash functions?
"Yields falsehood when preceded by its own quotation" yields falsehood when preceded by its own quotation.
This completely misses the point of cryptographic hashes.
The point is that it is supposed to be difficult to find another data set which hashes to the same value without doing a brute-force search. Of course you will get collisions, but the changes are (supposed to be) 1 in 2^80 with MD5 or 1 in 2^128 with SHA-1.
The exploits mentioned above are that the algorithms (MD5 and to some extent SHA-1) have been broken to allow you to construct a piece of data which hashes to the same value as the original. This is VERY different from the fact that you get collisions.
(sigh) Insightful, my ass... Checksums are NOT reversible. The main trick here is to replace one file with another and leave the hash/checksum the same by patching the fake file. (For practically every file format there exists a spare space where this patching could be done.)
RPM uses both MD5 and SHA1, the chances of finding a collision that satisfies both hashes is small, even if both MD5 and SHA1 are compromised since the hash the data differently.
rpm -Kvv xorg-x11-libs-6.8.2-37.FC4.49.2.i386.rpm /var/lib/rpm/Packages rdonly mode=0x0 /var/lib/rpm/Packages /var/lib/rpm/Pubkeys rdonly mode=0x0
./updates-released/packages/xorg-x11-libs-6.8.2-37 .FC4.49.2.i386.rpm: /var/lib/rpm/Pubkeys /var/lib/rpm/Packages
D: Expected size: 2655615 = lead(96)+sigs(344)+pad(0)+data(2655175)
D: Actual size: 2655615
D: opening db index
D: locked db index
D: opening db index
D: read h# 278 Header sanity check: OK
D: ========== DSA pubkey id b44269d0 4f2a6fd2 (h#278)
Header V3 DSA signature: OK, key ID 4f2a6fd2
Header SHA1 digest: OK (f37bf5cb97db696f14133b90e23f2455b9f94587)
MD5 digest: OK (8eda29837b6992876bd867df03b3b8af)
V3 DSA signature: OK, key ID 4f2a6fd2
D: closed db index
D: closed db index
D: May free Score board((nil))
The point is to reduce the set among which to do an exhaustive search (one small hash bucket versus all known files on the system), and not to verify some kind of signature.
Any successful attack on the hash would only be useful to make the system slow and unefficient (by making an excessive number of files end up in one bucket), but cannot corrupt it.
Misunderstandment. I never made a statement about MD5 not being "free". But MD5 is vulnerable, that's why I pointed out some alternatives.
Do not be alarmed. This is only a test.
But all that enables you to do is replace an MD5'd file with garbage that happens to have the same MD5 sum. It's hard to deliver a payload when you're limited to tricking a target into downloading what would be (essentially) a random string of ones and zeroes.
At Toorcon this year, Dan Kaminsky showed how to generate two valid, nicely rendered, html files with the same hash . Basically, he injects javascript into the page to remove the rubbish at the begining of the file. But how often to do you view the source of a page you're visiting. It'd be hard for a layperson to notice this. Make no mistake about it, the collision attack is very dangerous.
Simon
The NIST is having a two-day workshop in Gaitherburg, Maryland (USA) on October 31-Nov 1. Xiaoyun Wang will be giving a keynote speech, and there'll be plenty of technical material to go around. The workshop website is: www.csrc.nist.gov/pki/HashWorkshop/program.htm. I don't work for NIST or anything, but I thought this was interesting and they haven't really done a good job getting the word out about this conference.
Wer mit Ungeheuern kämpft, mag zusehn, dass er nicht dabei zum Ungeheuer wird. --Nietzsche
It doesn't seem to make it any easier to take an arbitrary original and just magic up another file which has the same hash.
Yes it does.
Write boring code, not shiny code!
This is a complicated issue. Generally, the security offered by an encryption algorithm isn't measured by its popular usage, but by the amount of time qualified professional cryptographyers/mathematicians/hackers have studied it without finding a critical vulnerability. My claim is probably too broad: there is no magical formula that determines how secure an algorithm is. But in depth work by professionals does endear confidence in an algorithm.
As a general rule of thumb, it is wise to use an algorithm that has been seriously studied for 10-20 years. At this point, it is modern enough to withstand modern brute force attacks, and (hopefully) understood well enough to ensure that there are no structural vulnerabilities. If it is much older than that and still studied, it is likely because a flaw has been found and people are trying to push it as far as it goes.
After all, I am strangely colored.
I think you're missing what the OP is saying here. The OP is suggesting to use something like MD5+SHA1, algorithms with different techniques for generating hashes so that you decrease the probability of creating a collision that works for both, not using something like double MD5
Marxism is the opiate of dumbasses
No, it doesn't.
You linked an example which takes a document A, and produces two documents A1 and A2 where
A1 looks like A, but A2 does not look like A
and MD5(A1) = MD5(A2)
BUT, critically, MD5(A1) is not MD5(A)
So neither of the documents created is an imposter. Arbitrary payloads are still protected by MD5. If you don't agree, simply reply with a link to a file that has the MD5 hash 7a0a25a5c71fa2639a41ee743aa5e2b7
No-one can do this yet, and they may never be able to do it. But MD5 has failed one of its original design requirements and that's why people should stop using it and divert resources to ensuring that its replacements are safer.
No, you can't. You misunderstood the nature of the break.
The break allows you to create two special "twin" documents A1 and A2 with different content where MD5(A1) and MD5(A2) are the same
BUT critically it doesn't allow you to take arbitrary document B (written by me) and create document B1 where MD5(B) = MD5(B1)
So you cannot create files that are impostors for my existing documents, and thus people who trust me can continue to trust my MD5 sums until a further break occurs. That break may come in weeks, or years, or never.
You've been posting your incorrect summary throughout this thread, please stop that.
said this before...
... LAST YEAR.
Dan Kaminsky is actually the dude who came up with the Stripwire idea.
Tom
Someday, I'll have a real sig.
Lots of file types allow for arbitrary junk at certain places.
For example, a very basic form of steganography: cat a .zip file to the end of a .gif file. The result is a valid file that can be displayed as an image (which ignores trailing junk), and decompressed with zip (which ignores leading junk).
Most file formats I've written don't care about junk at the end of the file. It'll be stripped off if you load and then save, but the program won't notice or complain. One program even preserves records it doesn't recognise (which could be secret messages, or just random crap).
As far as I know, the technique used for finding these MD5 collisions, cannot be performed with a GIVEN hash. So it's not possible to create, say, a copy of an already available RPM, add malicious code to it, and easily find some data to add to it to generate the same hash. This is not possible.
The only thing the current 'crack' does is create two RANDOM input files that generate the same hashed output. So it's only useful for someone who can control both the 'original' and the 'malicious' version of the data which is being protected by an MD5 hash.
So the dangers here are kind of limited though you could still do a lot of damage with it.
Using multiple hashes is a known way to increase security (some protocols use it) - it *does* work because the number of potential collisions is reduced - if you can create a collission with the MD5 (far from simple, in fact) it's extremely unlikely to *also* collide with the SHA1 hash - the number of plaintexts where they both collide is significantly lower.
No, even with this exploit, it is still very difficult for the bad-faith programmer to find a bad.bin with the same hash as the good.bin generated by the good-faith programmer.
However, what the bad-faith programmer can do with this exploit is take good.bin and produce two new files, good2.bin and bad.bin. good2.bin executes identically to good.bin, but has the same hash as bad.bin. Note that good.bin and good2.bin _DO_NOT_ have the same hash.
Now what the bad-faith programmer must do is convince the good-faith programmer to distribute good2.bin as good.bin. This seems quite difficult. However, if the bad-faith programmer achieves this, he can then distribute bad.bin, demonstrating that it has the correct hash.
Does this make sense? Do you see where your argument is wrong?
AFAIK this is an attack on the underlying Merkel-Damgard paradigm which both SHA-1 and MD5 (amoungst others) employ. The paradigm goes as follows:
...
:-)
IV | Intialisation vector of n-bits
MB_i | Message Black i of n-bits
HB_i | Hashblock i of n-bits
f:(IV , MB_i) -> HB_i | is the underlying compression function which takes both an IV and a message block as input and outputs a hashblock.
First the orginal messaage is split up (and padded if need be) into n-bit blocks. Then f is applied with an IV and MB_1 as input resulting in HB_1. f is then applied iteratively each time tacking the next message block as input while using the previouse hash block as the IV input.
f(IV, MB_1) = HB_1
f(HB_1, MB_2) = HB_2
f(HB_s-1, MB_s) = HB_s = H(Message)
Merkle and Damgard proved that the over all construction is collision resistent given that the compression function f is collision resistant.
As the parent post pointed out though the last block had better include the over all message length. If this is not the case then by extendeing 2 different but colliding messages x,y with the same plain text q the input to the compression function becomes identical since H(x) = H(y) = IV input for f. If on the other hand the length of the orginal message is included in the last block then even though the IV input for f is the same for f(H(x),q_1) and for f(H(y),q_1)..., the final message block (q_s) will again be different resulting in a different final hash block.
If on the other hand len(x) = len(y) then the problem persists since both IV and message blocks will be the same when the final iteration of the algorithm is reached.
Infact this attack is even stronger since by the same reasoning one can see that to produce H(x+q) all one needs is to know is H(x) (and len(x) if that is included in the last message block). No other information is needed about the orginal message x! H(x) is simply inserted as the IV for f when hashing q and so the iteration is "jump started" just where x finishes. (If the length is included in the last block then all that need be used is len(x)+len(q).)
Disclaimer: Not to 100% sure about all this though so please feel free to correct me if i'm wrong...
No it doesn't.
The Wang vector pair floating about at the moment, when prepended to 2 useable files will produce a MD5 collision of the said files. BUT - as a result of doing this you are also going to corrupt these files and make them unuseable (executeables, MP3's etc, obviously not text documents).
All the proof-of-concept article shows is the two attack vectors by Wang in use with 3 simple programs. You will notice the "md5extractor", which needs to be in place to remove the arbitrary vector data before the evil good.exe becomes dangerous. This exact procedure doesn't apply to most software distribution actually, how are you going to get the extractor on the victims computer in the first place?
This could be a problem is somebody can produce an attack vector pair that does produce a valid executeable/PE header or and MP3 header. But these have structure and leave much less room for the vector, may place restrictions on the payload, and might not even be possible.
The webpage thing described in the comment you link to is pretty harmless. Who the hell usines a MD5 hash on a HTML documents? Misleading documentation? Browser exploits? Unlikely.
The fact remains if you were to try and use this method you would really be doing, and what you will have to do, is nothing more than trick the user by normal means (human failure).
Coincedentally, for use in authentication you would be a fool NOT to be running sanity checks on input anyway. For use in authentication, salted and sanity checked input to MD5 should is still very very safe.
I can't see a reason why P2P applications implemented for networks using MD5 file verification can't start popping off bytes at the beginning of downloads (the first block) and try it with another payload to detect and reject people using this type of multicollision attack. In addition these applications could check for valid MP3, AVI, JPEG, headers etc.
The author of the "MD5 - Someday to be considered harmful" paper is correct. MD5 is risky for some purposes, P2P networks still using MD5 without any smarts may be ruined, but the hash far from dead if used carefully backed up by other checks. What makes people think moving to SHA-1 or Whirlpool is going to solve these problems (OK with SHA-1 different types of attacks) in the longer term?
Relying on the hash mechanism alone is just a bad habit to get into. People are switching because it's just best to play it safe when people (myself included) don't understand the full significance of the attacks produced this year.
RPM only uses MD5 to check for corruption of the type you might find during download. RPM actually uses GPG or PGP to sign the generated RPMs for security, and GPG is (afaik) capable of using nearly any hashing algorythm including ones that are yet to be invented. So as far as security is concerned RPM doesn't use MD5 but rather uses whatever hashing algorythm the GPG key that signed the RPM was generated with.
Windows is a bonfire, Linux is the sun. Linux only looks smaller if you lack perspective.
Whirlpool is not "based on AES". It shares a few similiarities (and a designer) but it is a distinct algorithm in its own right. It has a larger block size, different S-boxes, a different linear component, a different key schedule and so on.
I would interpret "based on AES" as meaning it actually uses AES itself (perhaps in tandem Davis-Meyer mode or similar).
I like Whirlpool but it's not fast. I think Tiger is quite a bit faster.
Bernstein's cache timing attacks, and the ever-growing gap between processor speeds and memory access times, mean that table-based primitives (which both of these are) are going out of style.
Xenu loves you!
Yes, for example Whirlpool (http://www.iso.org/iso/en/CatalogueDetailPage.Cat alogueDetail?CSNUMBER=39876&scopelist=)
Mods, parent is not insightful, parent is wrong. And it's not hard to understand why the parent has a hard time seeing how MD5 is involved when they have an incorrect idea of how it works.
As a bit of a hint to NightHwk1, extremely carefully check the first bytes of those two pages. They are indeed not identical and do indeed hash to the same MD5 value.
The rest of the "trick" of course hinges on the fact that a single bit change to a Turing Machine can completely alter the resulting output. "Trick" is put in quotes because that isn't really a "trick", it's a fundamental truth about computing and is the reason why twiddling even a few bits in an MD5-hashed block (which is all this break seems to be able to do) is such a big deal. This is how they demonstrate it constructively, since most people aren't programmers and won't understand that.
It's not nearly as scary as swaping one document for another, or one binary for another, but it's still quite useful.
Think of P2P networks. Gnutella uses SHA1 hashes to verify files, file mirrors, and the final downloaded file. If some RIAA employees could find the SHA1 hash of a very popular song, and generate junk with the same hash, they could have people downloading their junk (pardon the pun) and the P2P app wouldn't know it was corrupt, and wouldn't have any way to avoid the junk file.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant