Slashdot Mirror


MD5 Proven Ineffective for App Signatures

prostoalex writes "Marc Stevens, Arjen K. Lenstra, and Benne de Weger have released their paper 'Vulnerability of software integrity and code signing applications to chosen-prefix collisions for MD5'. It describes a reproducible attack on MD5 algorithms to fake software signatures. Researchers start off with two simplistic Windows applications — HelloWorld.exe and GoodbyeWorld.exe, and apply a known prefix attack that makes md5() signatures for both of the applications identical. Researchers point out: 'For abusing a chosen-prefix collision on a software integrity protection or a code signing scheme, the attacker should be able to manipulate the files before they are being hashed and/or signed. This may mean that the attacker needs insider access to the party operating the trusted software integrity protection or code signing process.'"

31 of 117 comments (clear)

  1. Nothing new by grumbel · · Score: 5, Insightful

    Unless I am missing something this is really nothing new. The same has been demonstrated with a webpage and javascript years ago, i.e. two different webpages producing the same MD5, doing it again with an .exe doesn't really sound all that interesting, especially since the attacker still needs to manipulate both the good .exe and the evil .exe and when he has access to the good .exe you are toast anyway.

    This of course doesn't mean we should continue to use MD5, but the attack is really of rather theoretical nature.

    1. Re:Nothing new by Anonymous Coward · · Score: 2, Informative
    2. Re:Nothing new by Bert64 · · Score: 5, Interesting

      If he has access to the good exe *before* it's signed, why not simply replace it with the malicious one so that the malicious one gets signed and distributed instead of the good one...

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    3. Re:Nothing new by Anonymous Coward · · Score: 5, Informative

      No, this is different. In the case of the colliding webpages, bit level inspection immediately reveals what's going on: both "good" and "bad" version are included in the webpages, with an if-statement to choose which one to display.

      When you inspect these binaries at bit level, they contain only the "good" or the "bad" version, and some random data appended to it to make the MD5 hash of the files collide. This technique thus also works for file formats which don't have control statements such as "if" or "file starts at offset". See also: http://www.win.tue.nl/hashclash/Nostradamus/, scroll down to: "Didn't Daum and Lucks do something like this in 2005?"

      Marc Stevens already constructed these "chosen-prefix" collisions for X.509 Certificates, see the HashClash project page. What's new in these results, is that it did not require massively distributed computing efforts, only one Playstation 3 and less than two days of computation. There is no paper available yet as to how he achieved this major optimization, but his MSc thesis gives a clue: see "future work" at the end of section 7.4.

    4. Re:Nothing new by MathFox · · Score: 4, Informative

      This is a different kind of attack: the "old" collision prefix attack had two blocks X and Y with the same hash that allowed one to create two programs:
            X; if (X) then GOOD else EVIL
      and
            Y; if (X) then GOOD else EVIL
      but the evil code would be in the signed good program, it would not be run.

      The new attack is different: it is a method to generate blocks GX and EX for two random files such that the files GOOD+GX and EVIL+EX hash to the same checksum.

      --
      extern warranty;
      main()
      {
      (void)warranty;
      }
    5. Re:Nothing new by The+New+Andy · · Score: 2, Informative
      This is a different attack. The previous attacks meant that you could make two files with the same MD5 by making them completely identical, except for one small block which was any known collision.

      This attack means that you get to choose the two files, and the attack generates two blocks to append to the original files which mean they hash to the same value.

      So the exploits before have been:

      File 1:

      x = [A]
      if (x == [A]) {
      do one thing
      }
      else {
      do something else
      }

      File 2:

      x = [B]
      if (x == [A]) {
      do one thing;
      }
      else {
      do something else;
      }
      Where [A] and [B] are blocks which collide (and they are aligned on block boundaries for calculating the hash).

      The new attack is exploited like this::

      File 1:
      do one thing;
      ignore the rest of the file;
      [C]

      File 2:
      do something else;
      ignore the rest of the file;
      [D]
      Where [C] and [D] are generated by this algorithm. This means that on a quick glance, the code doesn't look completely silly and there is no trace of what the hidden content says (just a bit of random stuff at the end of a file which might suggest that it is there)
    6. Re:Nothing new by grumbel · · Score: 2, Informative

      ### The attacker does not need access to good.exe.

      He *does* need access to good.exe. You can't generate a file that matches a given MD5, what you however can is generate two files that have the same MD5 and different content, both good.exe and evil.exe contain appended data to make the sums match. Its still a weakness, but a much less critical one then being able to generate a file for a given MD5.

    7. Re:Nothing new by kasperd · · Score: 3, Interesting

      After having read the actual article I realize that there in fact is something new in it. The slashdot story put all the focus on software signing, which is not the interesting part of the article. The interesting part of the article is, that they have found a new and stronger way to produce collisions. For one thing it is going to be a lot less obvious that a file is crafted. The original attack required all the colliding files to contain all the meaningful content with some psuedorandom content to select between them. The new attack doesn't require this, in fact you could even produce collisions beteween files of different formats. Like a jpg file and an exe file with the same md5 hash. But still it is just a collision attack, it produces collisions between two crafted files. They don't produce collisions between a collision between an arbitrary original file and one crafted file.

      --

      Do you care about the security of your wireless mouse?
    8. Re:Nothing new by Kjella · · Score: 2, Interesting

      Sneaking it past security control perhaps? Here's good.exe, run it in a sandbox all you like and it won't do anything funny. Then mark this MD5-sum as good and add it to the list of trusted installers, while I'll replace it with evil.exe before distribution/installation in the production environment.

      For a pracical example:
      1. Become a kernel contributor on some obscure driver.
      2. Add a magic number somewhere, which is the good twin.
      3. Wait for this to flow upstream to Linus, then downstream to all the distros.
      4. Find a way to hack a mirror of your distro of choice
      5. Replace the signed kernel with your trojaned kernel, that'll still be signed
      6. Wait for people to install trojaned systems (enterprise systems!)
      7. Profit (there is no ???)

      Of course, this assumes you can use it knowing just the little magic bits. If you need to be the one compiling both good and evil using the exact same source, then it's very limited.

      --
      Live today, because you never know what tomorrow brings
  2. Well everyone's boned then by tietokone-olmi · · Score: 3, Interesting

    [...] This may mean that the attacker needs insider access to the party operating the trusted software integrity protection or code signing process.

    An attack that requires insider access? Well colour me frightened!

    Or don't. That's more accurate anyhow.

    1. Re:Well everyone's boned then by Anonymous Coward · · Score: 2, Interesting

      Perhaps you should read this article with particular reference to the table 'Stages in the life cycle of cryptographic hash functions'. By the way you are one or two stages behind.

  3. Well, duh! by YA_Python_dev · · Score: 4, Insightful

    The problem has nothing to do with salt, and can be certainly temporarily "fixed" switching to SHA-1 or, even better, SHA-2. But the real root of the problem here is that, for the attack to work, someone signed as trusted a binary file that contained malicious code in the first place, even if in a disable form.

    Let me explain that. First, this is very old news: we know since 2004 that collision can be found in MD5 hashes (two different files with the same md5sum), and there now are tools that can generate collisions in seconds. All you need is a common prefix and suffix for both files and two block of 128 bytes that are generated automatically and you can insert between the prefix and the suffix to create the two files.

    Applying this to pretty much any file type that can contain binary data (even XML 1.1!) is trivial. For an executable file you can simply insert code in your prefix/suffix that looks at the pseudo-random 128 bytes and does radically different things depending on it. This as already been demonstrated for HTML+JS and even for postscript files.

    Bottom line: if you have an executable file from an untrusted source it may contain bad things (the attack described requires that both the original signed file and the file that you are actually executing are generated by the same hostile source).

    --
    There's a hidden treasure in Python 3.x: __prepare__()
    1. Re:Well, duh! by hotrodent · · Score: 2, Insightful
      Something I've never understood about this problem: Why is the following not an easy "fix"?


      1) Generate an MD5 hash for a file.
      2) Generate an SHA-2 hash.
      3) .. more as needed ...
      4) Concatenate the results for a "super hash"
      5) Profit?

      Surely to manipulate 2 (or more) schemes to ensure the super hash is the same on a tampered file would be _many_ orders of magnitude harder?
      Trying to make the SHA-2 match would destroy all the previous work done to make the MD5 match, then fixing the MD5 would change the SHA-2 again.
      IANAC (cryptographer) so excuse my ignorance on this if I'm missing something.

  4. Not accurate, not new by Niten · · Score: 4, Insightful

    MD5 collision attacks aren't really new, although this is a powerful example. An equally meaningful example of a collision attack on the algorithm, in the form of two different PostScript files with the same MD5 hash, was provided at least two years ago (IIRC).

    The key to understanding the limits of this demonstration's significance is to realize that a collision attack is quite different from a prefix attack. These researchers were able to create a pair of executables having the same hash value by specially constructing them as such; crafting a new executable to match a specific hash value corresponding to some other party's executable is vastly more difficult to achieve.

    So while this demonstrates MD5 to be useless for uses where the purported signatory is to be included in our threat analysis -- as has already been demonstrated to us by other researchers -- the algorithm is still relatively safe if our only goal is to ensure that a given executable almost certainly came from a specific party (rather than showing that it is a specific executable from said party). In other words, one could conceivably use MD5 to verify that the Ubuntu packages on that FTP server were in fact produced by Canonical. So no, demonstration does not mark MD5 as completely useless for code signing; the most common applications of code signing are entirely unconcerned with collisions in the hash function.

    In conclusion: the title is terribly misleading, or possibly just misinformed. Boo! Hiss!

  5. Birthday Attack by tangent3 · · Score: 4, Insightful

    This is an example of a Birthday Attack. 1. Attacker generates Good.exe and Evil.exe which hashes to the same MD5 2. Attacker passes Good.exe to the key owner to sign 3. Key owner signs and release Good.exe and Good.exe.MD5 4. Attacker releases Evil.exe as Good.exe This of course, requires some serious social engineering to work. MD5 is outdated, yes, but at the moment it is still resilient against a normal attack where an attacker has to generate an Evil.exe to hash to the same MD5 as an already-available Good.exe

    1. Re:Birthday Attack by Anonymous Coward · · Score: 4, Informative

      Sorry but you are wrong. The attack uses two md5 inputs which collide to construct two programs which are otherwise identical. The program can then be contrived to exhibit different behaviour depending on which of the two colliding inputs was used. This is nothing to do with the birthday paradox (except that it may have been used to find the collisions in the first place). Otherwise you description of the attack is accurate.

  6. use two hash functions by m2943 · · Score: 4, Informative

    The particular scenario they describe is irrelevant; MD5 checksums aren't intended to protect against that. If the attacker can manipulate the original file, he can usually simply alter it to become malicious itself.

    The case that matters is producing a program with the same checksum as a given program, without the ability to manipulate the correct program beforehand. That's still hard.

    Nevertheless, code signing mechanisms in general should probably be prepared for flaws in hash functions. It might be best always to use two hash functions and to have some strategy of migrating. That way, if one hash function gets compromised, there is still another one in place and can be used until the original one has been replaced.

    1. Re:use two hash functions by mollymoo · · Score: 3, Interesting

      The particular scenario they describe is irrelevant; MD5 checksums aren't intended to protect against that. If the attacker can manipulate the original file, he can usually simply alter it to become malicious itself.

      The problem as I see is that the harmless version can be released and gain trust. That version can be tested and inspected, even checking the binary wouldn't reveal malicious code because there wouldn't be any malicious code to find - no dodgy looking system calls, for example. Just a chunk of seemingly random data, which could be disguised as a lookup table, compressed image or whatever. At some later point, after the harmless version has gained trust, its use has become more widespread and the rate of downloads has increased correspondingly, it can be replaced by the malicious version. So while you could initially release a malicious version, being able to first release a harmless version can widen the impact of an attack.

      --
      Chernobyl 'not a wildlife haven' - BBC News
  7. Ah yes, this again by Effugas · · Score: 4, Interesting

    OK, it's pretty damn cool to see people 'round here referencing my work on Javascript MD5 collisions :)

    The relevant links are:

    http://www.doxpara.com/research/md5/t1.html
    http://www.doxpara.com/research/md5/t2.html ...and the original paper:

    http://www.doxpara.com/research/md5/md5_someday.pdf

    I'm pretty sure I talked about third party attestation in that paper.

    A more interesting point was made to me just the other day, which is that there's always enough ambient entropy in any real world system to deviate between trusted and untrusted behavior. In other words, for a turing complete app, you *can't* create a meaningful hash, because you aren't capturing all bits that will drive the execution flow. So, getting code signed really doesn't assert anything other than a business relationship. App signatures don't actually work, for any arbitrarily good hash.

    1. Re:Ah yes, this again by Henry+V+.009 · · Score: 2, Insightful

      A more interesting point was made to me just the other day, which is that there's always enough ambient entropy in any real world system to deviate between trusted and untrusted behavior. In other words, for a turing complete app, you *can't* create a meaningful hash, because you aren't capturing all bits that will drive the execution flow. So, getting code signed really doesn't assert anything other than a business relationship. App signatures don't actually work, for any arbitrarily good hash.
      That is simply wrong. For any "arbitrarily good" hash there exist collisions out there between bad_app and good_app. However, if finding bad_app or bad_app2, etc., is computationally impracticable (which is the definition of a good hash), then the hash is quite useful.

      Currently md5 is fairly broken, in that a person can specially prepare good_app and bad_app. However, it is not yet completely broken (like CRC, if it were ever used as a secure hash), in that it is not yet possible for someone to take an arbitrary md5-signed file (like the recently released ubuntu iso, to give an example) and generate a collision. The birthday paradox makes the former much easier than the later.
  8. This is a REMOTE attack, and reasonably potent by CarpetShark · · Score: 3, Informative

    An attack that requires insider access? Well colour me frightened!


    If you'd read the article, you'd see that one of the (prominent) possible attack scenarios listed is that of software distribution: distribute a good file, with the intent of replacing it later. For example, in debian, even with MD5 checksums on all your data, and tools reporting what's changed during the software update, this would still allow downloading infected files, without noticing.

    It's a danger both from malicious distributors, and from hacked distribution sites.
    1. Re:This is a REMOTE attack, and reasonably potent by quigonn · · Score: 4, Funny

      If you'd read the article,

      Reading the article? THIS IS SLASHDOT!!!!!!1!
      --
      A monkey is doing the real work for me.
  9. ONE block, surely by CarpetShark · · Score: 2, Interesting

    Surely the point is that, if you can generate two blocks that do this, then you can generate one block to pair with a previously known block -- such as something in open source code.

    1. Re:ONE block, surely by jthill · · Score: 5, Informative

      TFA points out specifically that no one knows how to target a specific hash code. All they can do is make two files converge on the same hash code by inserting data into *each* of them.

      --
      As always, all IMO. Insert "I think" everywhere grammatically possible.
  10. Use GnuPG instead by gweihir · · Score: 4, Insightful

    As many projects have done for years. md5 sums as crypto-protection are more or less a historic way to do it.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  11. Door locks are insecure if you can get a key too! by Nezer · · Score: 2, Insightful

    This may mean that the attacker needs insider access to the party operating the trusted software integrity protection or code signing process. Isn't this a bit like saying that door locks are insecure although you may need access to a party trusted with the keys in order to exploit? Aren't these "trusted parties" *always* a potential weak-link in the security chain?
  12. Not a real life scenario... by Matthieu+Araman · · Score: 3, Insightful

    Real life scenario :

    developper A produce software X(for example openssh), calculate hash of program X and sign the hash with his PGP key.
    He then put all these files on mirrors servers on Internet (but not his private PGP key !)

    One mirror is hijacked by B.
    B wan't to replace X by X' with the same hash than X

    This article doesn't provide anything as it says MD5(X+a)=MD5(Y+a), which imply you have to change A in the first place which can't be done easily (and if you can change the original program, then what's the point ?)

  13. Re:So MD5SUM veriefies downloads only by grumbel · · Score: 2, Informative

    An MD5 checksum file alone serves no other purpose then to check that the download is correct, since an attacker that can upload a changed file could also just change the MD5 checksum file. Things look a little different if you get the MD5 from a different trusted source or when the MD5 file is signed by a GPG key.

  14. what it really means .. by rs232 · · Score: 2, Interesting

    "if you can change the original program, then what's the point ?)"

    Well, what it means is that an evil software megacorporation could publish a digitally signed app that could be replaced with another presumably nefarious prog later on ..

    Re:Not a real life scenario...

    --
    davecb5620@gmail.com
  15. Re:Bost projects I've seen.. by gweihir · · Score: 2, Insightful

    I agree that today basically the only use of md5's is integrity checks against transmission and storage errors. I sometimes use them on backups.

    You are quite right, that md5 does not provide and connection to the signer. With a PGP/GPG signature, once I have the correct public key, I can verify all and every signature made with it. And if I do not have the correct key, the first genuine signature will result in an error. Howeber I guess most people do not bother. Even if it is easy. For a kernel download, e.g., it adds about 5 seconds.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  16. Re:Bost projects I've seen.. by AnyoneEB · · Score: 2, Informative

    As I understand it, the normal way to generate a digital signature is to use a hash algorithm like MD5 or SHA1 and then encrypt the hash with a private key. Then you verify by hashing the file and decrypting the signature with the public key and checking to see if they match. Therefore, distributing signatures instead of hashes is orthogonal to the discussion at hand. If the hash is broken, then the signature is broken, too.

    See Wikipedia for more information on digital signatures.

    --
    Centralization breaks the internet.