Slashdot Mirror


Practical Exploits of Broken MD5 Algorithm

jose parinas writes "A practical sample of an MD5 exploit can be found, with source code included,in codeproject, a site for .Net programmers. The intent of the demos is to demonstrate a very specific type of attack that exploits the inherent trust of an MD5 hash. It's sort of a semi-social engineering attack. At Microsoft, the MD5 hash functions are banned. The main problem is that the attack is directed to the distribution of software process, as you can understand reading the paper, Considered Harmful Someday. Some open source programs, like RPM, use MD5, and in many open source distributions MD5 is used as check sum."

28 of 253 comments (clear)

  1. So if you need a freely available hash algorithm by MadMoses · · Score: 4, Informative

    ...better use Tiger or Whirlpool (based on AES). AFAIK there are no known vulnerabilities or attacks for these two yet.

    --

    Do not be alarmed. This is only a test.
  2. A quick note by Darren+Winsper · · Score: 4, Informative

    This seems to work on the assumption that you want to do some harm with a program you created yourself, you can't actually take a random RPM and turn it into an evil RPM with the same MD5. So, yes, it's bad, but it's not as bad as you might think.

    1. Re:A quick note by provolt · · Score: 3, Informative

      The signature doesn't do anything to solve the problem. If I create an evil tarball that has the same hash as the good tarball, the signature will verify properly for both files. When you download the file, you won't know if it's the good or evil one.

      GPG signs the hash. With a strong hash function, it's as good as signing the tarball. With a weak hash functions, one signature would match for many files.

  3. H(x) == H(y) - H(x + q) == H(y + q) ? by strcmp · · Score: 3, Informative
    There is a known result about MD5 hash function, is this: If MD5(x) == MD5(y) then MD5(x+q) == MD5(y+q) So, if you have a pair of messages, x and y, with the same MD5 value, you can append a payload q, and the MD5 value keeps the same, the size of q is arbitrary.

    Considering this is such a "well known" result, you would think that MD5 should have been abandoned long ago. Is this true for other popular hash functions?

    --
    "Yields falsehood when preceded by its own quotation" yields falsehood when preceded by its own quotation.
  4. Re:Checksums are always going to be vulnerable by Anonymous Coward · · Score: 5, Informative

    This completely misses the point of cryptographic hashes.

    The point is that it is supposed to be difficult to find another data set which hashes to the same value without doing a brute-force search. Of course you will get collisions, but the changes are (supposed to be) 1 in 2^80 with MD5 or 1 in 2^128 with SHA-1.

    The exploits mentioned above are that the algorithms (MD5 and to some extent SHA-1) have been broken to allow you to construct a piece of data which hashes to the same value as the original. This is VERY different from the fact that you get collisions.

  5. Re:Checksums are always going to be vulnerable by ceeam · · Score: 4, Informative

    (sigh) Insightful, my ass... Checksums are NOT reversible. The main trick here is to replace one file with another and leave the hash/checksum the same by patching the fake file. (For practically every file format there exists a spare space where this patching could be done.)

  6. Actually RPM uses MD5 and SHA1 by seifried · · Score: 5, Informative

    RPM uses both MD5 and SHA1, the chances of finding a collision that satisfies both hashes is small, even if both MD5 and SHA1 are compromised since the hash the data differently.

    rpm -Kvv xorg-x11-libs-6.8.2-37.FC4.49.2.i386.rpm
    D: Expected size: 2655615 = lead(96)+sigs(344)+pad(0)+data(2655175)
    D: Actual size: 2655615
    D: opening db index /var/lib/rpm/Packages rdonly mode=0x0
    D: locked db index /var/lib/rpm/Packages
    D: opening db index /var/lib/rpm/Pubkeys rdonly mode=0x0
    D: read h# 278 Header sanity check: OK
    D: ========== DSA pubkey id b44269d0 4f2a6fd2 (h#278)
    ./updates-released/packages/xorg-x11-libs-6.8.2-37 .FC4.49.2.i386.rpm:
    Header V3 DSA signature: OK, key ID 4f2a6fd2
    Header SHA1 digest: OK (f37bf5cb97db696f14133b90e23f2455b9f94587)
    MD5 digest: OK (8eda29837b6992876bd867df03b3b8af)
    V3 DSA signature: OK, key ID 4f2a6fd2
    D: closed db index /var/lib/rpm/Pubkeys
    D: closed db index /var/lib/rpm/Packages
    D: May free Score board((nil))

  7. Re:filesystems... by ArsenneLupin · · Score: 5, Informative
    Reiserfs (and lots of other filesystems ... and lots of other system components in general) might use hash functions to speed up lookups. These hash functions however do not need to be cryptographically secure. The hash (which is usually very short, so much that brute force would be feasible) is only used as an index into an array of "buckets". Each bucket may contain multiple files, and the system still uses a bit-by-bit comparison on the full names to find the correct entry.

    The point is to reduce the set among which to do an exhaustive search (one small hash bucket versus all known files on the system), and not to verify some kind of signature.

    Any successful attack on the hash would only be useful to make the system slow and unefficient (by making an excessive number of files end up in one bucket), but cannot corrupt it.

  8. Re:So if you need a freely available hash algorith by MadMoses · · Score: 2, Informative

    Misunderstandment. I never made a statement about MD5 not being "free". But MD5 is vulnerable, that's why I pointed out some alternatives.

    --

    Do not be alarmed. This is only a test.
  9. Re:Checksums are always going to be vulnerable by Ckwop · · Score: 5, Informative

    But all that enables you to do is replace an MD5'd file with garbage that happens to have the same MD5 sum. It's hard to deliver a payload when you're limited to tricking a target into downloading what would be (essentially) a random string of ones and zeroes.


    At Toorcon this year, Dan Kaminsky showed how to generate two valid, nicely rendered, html files with the same hash . Basically, he injects javascript into the page to remove the rubbish at the begining of the file. But how often to do you view the source of a page you're visiting. It'd be hard for a layperson to notice this. Make no mistake about it, the collision attack is very dangerous.


    Simon

  10. Speaking of hashing algorithms... by scovetta · · Score: 2, Informative

    The NIST is having a two-day workshop in Gaitherburg, Maryland (USA) on October 31-Nov 1. Xiaoyun Wang will be giving a keynote speech, and there'll be plenty of technical material to go around. The workshop website is: www.csrc.nist.gov/pki/HashWorkshop/program.htm. I don't work for NIST or anything, but I thought this was interesting and they haven't really done a good job getting the word out about this conference.

    --
    Wer mit Ungeheuern kämpft, mag zusehn, dass er nicht dabei zum Ungeheuer wird. --Nietzsche
  11. Re:Checksums are always going to be vulnerable by Pieroxy · · Score: 2, Informative

    It doesn't seem to make it any easier to take an arbitrary original and just magic up another file which has the same hash.

    Yes it does.

  12. Re:So if you need a freely available hash algorith by poopdeville · · Score: 5, Informative
    I am no cryptography expert so I can not read and understand those algorithms. But the fact that there are no known vilnerabilities for an algorithm doesn't make it secure. Maybe they are just not used as much as other well known algorithms. And therefore nobody has found vulnerabilities for them yet?

    This is a complicated issue. Generally, the security offered by an encryption algorithm isn't measured by its popular usage, but by the amount of time qualified professional cryptographyers/mathematicians/hackers have studied it without finding a critical vulnerability. My claim is probably too broad: there is no magical formula that determines how secure an algorithm is. But in depth work by professionals does endear confidence in an algorithm.

    As a general rule of thumb, it is wise to use an algorithm that has been seriously studied for 10-20 years. At this point, it is modern enough to withstand modern brute force attacks, and (hopefully) understood well enough to ensure that there are no structural vulnerabilities. If it is much older than that and still studied, it is likely because a flaw has been found and people are trying to push it as far as it goes.

    --
    After all, I am strangely colored.
  13. Re:Checksums are always going to be vulnerable by ocelotbob · · Score: 2, Informative

    I think you're missing what the OP is saying here. The OP is suggesting to use something like MD5+SHA1, algorithms with different techniques for generating hashes so that you decrease the probability of creating a collision that works for both, not using something like double MD5

    --

    Marxism is the opiate of dumbasses

  14. Re:Checksums are always going to be vulnerable by Anonymous Coward · · Score: 3, Informative

    No, it doesn't.

    You linked an example which takes a document A, and produces two documents A1 and A2 where

    A1 looks like A, but A2 does not look like A
    and MD5(A1) = MD5(A2)

    BUT, critically, MD5(A1) is not MD5(A)

    So neither of the documents created is an imposter. Arbitrary payloads are still protected by MD5. If you don't agree, simply reply with a link to a file that has the MD5 hash 7a0a25a5c71fa2639a41ee743aa5e2b7

    No-one can do this yet, and they may never be able to do it. But MD5 has failed one of its original design requirements and that's why people should stop using it and divert resources to ensuring that its replacements are safer.

  15. Re:But... by Anonymous Coward · · Score: 1, Informative

    No, you can't. You misunderstood the nature of the break.

    The break allows you to create two special "twin" documents A1 and A2 with different content where MD5(A1) and MD5(A2) are the same

    BUT critically it doesn't allow you to take arbitrary document B (written by me) and create document B1 where MD5(B) = MD5(B1)

    So you cannot create files that are impostors for my existing documents, and thus people who trust me can continue to trust my MD5 sums until a further break occurs. That break may come in weeks, or years, or never.

    You've been posting your incorrect summary throughout this thread, please stop that.

  16. Ahem... by tomstdenis · · Score: 2, Informative

    said this before...

    Dan Kaminsky is actually the dude who came up with the Stripwire idea. ... LAST YEAR.

    Tom

    --
    Someday, I'll have a real sig.
  17. Re:compressed content safe (?) by Shano · · Score: 2, Informative

    Lots of file types allow for arbitrary junk at certain places.

    For example, a very basic form of steganography: cat a .zip file to the end of a .gif file. The result is a valid file that can be displayed as an image (which ignores trailing junk), and decompressed with zip (which ignores leading junk).

    Most file formats I've written don't care about junk at the end of the file. It'll be stripped off if you load and then save, but the program won't notice or complain. One program even preserves records it doesn't recognise (which could be secret messages, or just random crap).

  18. There are some limitations to this attack by GekkePrutser · · Score: 5, Informative

    As far as I know, the technique used for finding these MD5 collisions, cannot be performed with a GIVEN hash. So it's not possible to create, say, a copy of an already available RPM, add malicious code to it, and easily find some data to add to it to generate the same hash. This is not possible.

    The only thing the current 'crack' does is create two RANDOM input files that generate the same hashed output. So it's only useful for someone who can control both the 'original' and the 'malicious' version of the data which is being protected by an MD5 hash.

    So the dangers here are kind of limited though you could still do a lot of damage with it.

  19. Re:Checksums are always going to be vulnerable by Tony+Hoyle · · Score: 2, Informative

    Using multiple hashes is a known way to increase security (some protocols use it) - it *does* work because the number of potential collisions is reduced - if you can create a collission with the MD5 (far from simple, in fact) it's extremely unlikely to *also* collide with the SHA1 hash - the number of plaintexts where they both collide is significantly lower.

  20. Re:Not a problem for software distribution.. by Anonymous Coward · · Score: 1, Informative

    No, even with this exploit, it is still very difficult for the bad-faith programmer to find a bad.bin with the same hash as the good.bin generated by the good-faith programmer.

    However, what the bad-faith programmer can do with this exploit is take good.bin and produce two new files, good2.bin and bad.bin. good2.bin executes identically to good.bin, but has the same hash as bad.bin. Note that good.bin and good2.bin _DO_NOT_ have the same hash.

    Now what the bad-faith programmer must do is convince the good-faith programmer to distribute good2.bin as good.bin. This seems quite difficult. However, if the bad-faith programmer achieves this, he can then distribute bad.bin, demonstrating that it has the correct hash.

    Does this make sense? Do you see where your argument is wrong?

  21. Merkle Damgard: Why len(x) and len(y) matter by 0ptix · · Score: 2, Informative

    AFAIK this is an attack on the underlying Merkel-Damgard paradigm which both SHA-1 and MD5 (amoungst others) employ. The paradigm goes as follows:

                                          IV | Intialisation vector of n-bits
                                      MB_i | Message Black i of n-bits
                                      HB_i | Hashblock i of n-bits
    f:(IV , MB_i) -> HB_i | is the underlying compression function which takes both an IV and a message block as input and outputs a hashblock.

    First the orginal messaage is split up (and padded if need be) into n-bit blocks. Then f is applied with an IV and MB_1 as input resulting in HB_1. f is then applied iteratively each time tacking the next message block as input while using the previouse hash block as the IV input.

    f(IV, MB_1) = HB_1
    f(HB_1, MB_2) = HB_2 ...
    f(HB_s-1, MB_s) = HB_s = H(Message)

    Merkle and Damgard proved that the over all construction is collision resistent given that the compression function f is collision resistant.

    As the parent post pointed out though the last block had better include the over all message length. If this is not the case then by extendeing 2 different but colliding messages x,y with the same plain text q the input to the compression function becomes identical since H(x) = H(y) = IV input for f. If on the other hand the length of the orginal message is included in the last block then even though the IV input for f is the same for f(H(x),q_1) and for f(H(y),q_1)..., the final message block (q_s) will again be different resulting in a different final hash block.

    If on the other hand len(x) = len(y) then the problem persists since both IV and message blocks will be the same when the final iteration of the algorithm is reached.

    Infact this attack is even stronger since by the same reasoning one can see that to produce H(x+q) all one needs is to know is H(x) (and len(x) if that is included in the last message block). No other information is needed about the orginal message x! H(x) is simply inserted as the IV for f when hashing q and so the iteration is "jump started" just where x finishes. (If the length is included in the last block then all that need be used is len(x)+len(q).)

    Disclaimer: Not to 100% sure about all this though so please feel free to correct me if i'm wrong... :-)

  22. Re:Checksums are always going to be vulnerable by baadger · · Score: 3, Informative

    No it doesn't.

    The Wang vector pair floating about at the moment, when prepended to 2 useable files will produce a MD5 collision of the said files. BUT - as a result of doing this you are also going to corrupt these files and make them unuseable (executeables, MP3's etc, obviously not text documents).

    All the proof-of-concept article shows is the two attack vectors by Wang in use with 3 simple programs. You will notice the "md5extractor", which needs to be in place to remove the arbitrary vector data before the evil good.exe becomes dangerous. This exact procedure doesn't apply to most software distribution actually, how are you going to get the extractor on the victims computer in the first place?

    This could be a problem is somebody can produce an attack vector pair that does produce a valid executeable/PE header or and MP3 header. But these have structure and leave much less room for the vector, may place restrictions on the payload, and might not even be possible.

    The webpage thing described in the comment you link to is pretty harmless. Who the hell usines a MD5 hash on a HTML documents? Misleading documentation? Browser exploits? Unlikely.

    The fact remains if you were to try and use this method you would really be doing, and what you will have to do, is nothing more than trick the user by normal means (human failure).

    Coincedentally, for use in authentication you would be a fool NOT to be running sanity checks on input anyway. For use in authentication, salted and sanity checked input to MD5 should is still very very safe.

    I can't see a reason why P2P applications implemented for networks using MD5 file verification can't start popping off bytes at the beginning of downloads (the first block) and try it with another payload to detect and reject people using this type of multicollision attack. In addition these applications could check for valid MP3, AVI, JPEG, headers etc.

    The author of the "MD5 - Someday to be considered harmful" paper is correct. MD5 is risky for some purposes, P2P networks still using MD5 without any smarts may be ruined, but the hash far from dead if used carefully backed up by other checks. What makes people think moving to SHA-1 or Whirlpool is going to solve these problems (OK with SHA-1 different types of attacks) in the longer term?

    Relying on the hash mechanism alone is just a bad habit to get into. People are switching because it's just best to play it safe when people (myself included) don't understand the full significance of the attacks produced this year.

  23. RPM is still relatively safe... by PAjamian · · Score: 2, Informative

    RPM only uses MD5 to check for corruption of the type you might find during download. RPM actually uses GPG or PGP to sign the generated RPMs for security, and GPG is (afaik) capable of using nearly any hashing algorythm including ones that are yet to be invented. So as far as security is concerned RPM doesn't use MD5 but rather uses whatever hashing algorythm the GPG key that signed the RPM was generated with.

    --
    Windows is a bonfire, Linux is the sun. Linux only looks smaller if you lack perspective.
  24. Whirlpool is not based on AES by Paul+Crowley · · Score: 2, Informative

    Whirlpool is not "based on AES". It shares a few similiarities (and a designer) but it is a distinct algorithm in its own right. It has a larger block size, different S-boxes, a different linear component, a different key schedule and so on.

    I would interpret "based on AES" as meaning it actually uses AES itself (perhaps in tandem Davis-Meyer mode or similar).

    I like Whirlpool but it's not fast. I think Tiger is quite a bit faster.

    Bernstein's cache timing attacks, and the ever-growing gap between processor speeds and memory access times, mean that table-based primitives (which both of these are) are going out of style.

  25. Re:Checksums are always going to be vulnerable by Anonymous Coward · · Score: 1, Informative
  26. Re:Yadda Yadda by Jerf · · Score: 2, Informative

    Mods, parent is not insightful, parent is wrong. And it's not hard to understand why the parent has a hard time seeing how MD5 is involved when they have an incorrect idea of how it works.

    As a bit of a hint to NightHwk1, extremely carefully check the first bytes of those two pages. They are indeed not identical and do indeed hash to the same MD5 value.

    The rest of the "trick" of course hinges on the fact that a single bit change to a Turing Machine can completely alter the resulting output. "Trick" is put in quotes because that isn't really a "trick", it's a fundamental truth about computing and is the reason why twiddling even a few bits in an MD5-hashed block (which is all this break seems to be able to do) is such a big deal. This is how they demonstrate it constructively, since most people aren't programmers and won't understand that.

  27. Re:Checksums are always going to be vulnerable by evilviper · · Score: 2, Informative
    But all that enables you to do is replace an MD5'd file with garbage that happens to have the same MD5 sum.

    It's not nearly as scary as swaping one document for another, or one binary for another, but it's still quite useful.

    Think of P2P networks. Gnutella uses SHA1 hashes to verify files, file mirrors, and the final downloaded file. If some RIAA employees could find the SHA1 hash of a very popular song, and generate junk with the same hash, they could have people downloading their junk (pardon the pun) and the P2P app wouldn't know it was corrupt, and wouldn't have any way to avoid the junk file.
    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant