MD5 Proven Ineffective for App Signatures
prostoalex writes "Marc Stevens, Arjen K. Lenstra, and Benne de Weger have released their paper 'Vulnerability of software integrity and code signing applications to chosen-prefix collisions for MD5'. It describes a reproducible attack on MD5 algorithms to fake software signatures. Researchers start off with two simplistic Windows applications — HelloWorld.exe and GoodbyeWorld.exe, and apply a known prefix attack that makes md5() signatures for both of the applications identical. Researchers point out: 'For abusing a chosen-prefix collision on a software integrity protection or a code signing scheme, the attacker should be able to manipulate the files before they are being hashed and/or signed. This may mean that the attacker needs insider access to the party operating the trusted software integrity protection or code signing process.'"
Need more salt.
Perhaps they should consider using something else than SHA-1? SHA-2 anyone?
Unless I am missing something this is really nothing new. The same has been demonstrated with a webpage and javascript years ago, i.e. two different webpages producing the same MD5, doing it again with an .exe doesn't really sound all that interesting, especially since the attacker still needs to manipulate both the good .exe and the evil .exe and when he has access to the good .exe you are toast anyway.
This of course doesn't mean we should continue to use MD5, but the attack is really of rather theoretical nature.
An attack that requires insider access? Well colour me frightened!
Or don't. That's more accurate anyhow.
"This may mean that the attacker needs insider access to the party operating the trusted software integrity protection or code signing process.'"
Let's see now, the attacker already has access to the machine and is probably the one creating or comparing the MD5. Is the problem really with MD5?
The problem has nothing to do with salt, and can be certainly temporarily "fixed" switching to SHA-1 or, even better, SHA-2. But the real root of the problem here is that, for the attack to work, someone signed as trusted a binary file that contained malicious code in the first place, even if in a disable form.
Let me explain that. First, this is very old news: we know since 2004 that collision can be found in MD5 hashes (two different files with the same md5sum), and there now are tools that can generate collisions in seconds. All you need is a common prefix and suffix for both files and two block of 128 bytes that are generated automatically and you can insert between the prefix and the suffix to create the two files.
Applying this to pretty much any file type that can contain binary data (even XML 1.1!) is trivial. For an executable file you can simply insert code in your prefix/suffix that looks at the pseudo-random 128 bytes and does radically different things depending on it. This as already been demonstrated for HTML+JS and even for postscript files.
Bottom line: if you have an executable file from an untrusted source it may contain bad things (the attack described requires that both the original signed file and the file that you are actually executing are generated by the same hostile source).
There's a hidden treasure in Python 3.x: __prepare__()
MD5 collision attacks aren't really new, although this is a powerful example. An equally meaningful example of a collision attack on the algorithm, in the form of two different PostScript files with the same MD5 hash, was provided at least two years ago (IIRC).
The key to understanding the limits of this demonstration's significance is to realize that a collision attack is quite different from a prefix attack. These researchers were able to create a pair of executables having the same hash value by specially constructing them as such; crafting a new executable to match a specific hash value corresponding to some other party's executable is vastly more difficult to achieve.
So while this demonstrates MD5 to be useless for uses where the purported signatory is to be included in our threat analysis -- as has already been demonstrated to us by other researchers -- the algorithm is still relatively safe if our only goal is to ensure that a given executable almost certainly came from a specific party (rather than showing that it is a specific executable from said party). In other words, one could conceivably use MD5 to verify that the Ubuntu packages on that FTP server were in fact produced by Canonical. So no, demonstration does not mark MD5 as completely useless for code signing; the most common applications of code signing are entirely unconcerned with collisions in the hash function.
In conclusion: the title is terribly misleading, or possibly just misinformed. Boo! Hiss!
This is an example of a Birthday Attack. 1. Attacker generates Good.exe and Evil.exe which hashes to the same MD5 2. Attacker passes Good.exe to the key owner to sign 3. Key owner signs and release Good.exe and Good.exe.MD5 4. Attacker releases Evil.exe as Good.exe This of course, requires some serious social engineering to work. MD5 is outdated, yes, but at the moment it is still resilient against a normal attack where an attacker has to generate an Evil.exe to hash to the same MD5 as an already-available Good.exe
The particular scenario they describe is irrelevant; MD5 checksums aren't intended to protect against that. If the attacker can manipulate the original file, he can usually simply alter it to become malicious itself.
The case that matters is producing a program with the same checksum as a given program, without the ability to manipulate the correct program beforehand. That's still hard.
Nevertheless, code signing mechanisms in general should probably be prepared for flaws in hash functions. It might be best always to use two hash functions and to have some strategy of migrating. That way, if one hash function gets compromised, there is still another one in place and can be used until the original one has been replaced.
OK, it's pretty damn cool to see people 'round here referencing my work on Javascript MD5 collisions :)
...and the original paper:
The relevant links are:
http://www.doxpara.com/research/md5/t1.html
http://www.doxpara.com/research/md5/t2.html
http://www.doxpara.com/research/md5/md5_someday.pdf
I'm pretty sure I talked about third party attestation in that paper.
A more interesting point was made to me just the other day, which is that there's always enough ambient entropy in any real world system to deviate between trusted and untrusted behavior. In other words, for a turing complete app, you *can't* create a meaningful hash, because you aren't capturing all bits that will drive the execution flow. So, getting code signed really doesn't assert anything other than a business relationship. App signatures don't actually work, for any arbitrarily good hash.
If you'd read the article, you'd see that one of the (prominent) possible attack scenarios listed is that of software distribution: distribute a good file, with the intent of replacing it later. For example, in debian, even with MD5 checksums on all your data, and tools reporting what's changed during the software update, this would still allow downloading infected files, without noticing.
It's a danger both from malicious distributors, and from hacked distribution sites.
Surely the point is that, if you can generate two blocks that do this, then you can generate one block to pair with a previously known block -- such as something in open source code.
As many projects have done for years. md5 sums as crypto-protection are more or less a historic way to do it.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
I alsways thought that MD5SUMS where there only to verify wether a download was successfull or not.
Don't fight for your country, if your country does not fight for you.
Real life scenario :
developper A produce software X(for example openssh), calculate hash of program X and sign the hash with his PGP key.
He then put all these files on mirrors servers on Internet (but not his private PGP key !)
One mirror is hijacked by B.
B wan't to replace X by X' with the same hash than X
This article doesn't provide anything as it says MD5(X+a)=MD5(Y+a), which imply you have to change A in the first place which can't be done easily (and if you can change the original program, then what's the point ?)
Create trusted_program.
/bin/false, use this technique to generate trusted_program2 and false2.
Take trusted_program and
Post both trusted_program2 and false2 on your web page along with their shared md5sum and invite the user to download them (presumably the user trusts you and your web server or he wouldn't download them in the first place.)
The user is now confident that you cannot replace trusted_program2 with malicious_program without changing the md5sum, because this technique only works with two prefixes, not three.
Is Betteridge's Law of Headlines Correct?
"if you can change the original program, then what's the point ?)"
..
Well, what it means is that an evil software megacorporation could publish a digitally signed app that could be replaced with another presumably nefarious prog later on
Re:Not a real life scenario...
davecb5620@gmail.com
As others have pointed out, there is nothing new in this. The same has been demonstrated with other languages before. For example a few years ago it was demonstrated with postscript, and that was as far as I know the first demonstration with meaningful content. While that may have come as a surprise to some people, it was only a minor curiosity to people understanding how md5 works. Doing this thing with exe files is less significant than it was to do it with postscript files for the following reason. You are not likely to sign an exe file from an untrusted source, because there is no way to verify if the content is malicious, and most people know this. In fact that is the very reason for having signatures in the first place. With postscript files it is different, to most people a postscript file is just a document. With a document you can read it, and then you know exactly what the content is. At least I can understand why people would think that way. So less social engineering is required to get your crafted postscript file signed than it would to get a crafted exe file signed. If you can get somebody to sign your carefully crafted exe file, then this attack doesn't matter anyway. Because if you could get your malicious code into the signed executable, there are lots of other ways to trigger it. Having an interchangable piece of pseudorandom binary data in the file itself is a neat way to trigger your code. But it can be based on other external factors such as timing, the IP address of the machine, the existence of certain other files on the system, or just some secret sequence of inputs. It is all just the matter of putting a backdoor into a piece of code, and attacking md5 in this way is not even the most convenient backdoor you could make.
If it was possible to make a crafted file that would match the md5 of some existing file, which you had no control over, then there would be a lot more reason to worry. Luckily that is not the case yet. Still the demonstration of collisions does serve as a warning, that md5 is weak and maybe somebody will be able to completely break it at some point. There has been a reason to worry about that ever since the collisions were first demonstrated. The construction of additional collisions with meaningful content doesn't change that threat in any way. If you were not worried before this news but you are worried afterwards, it is because you didn't understand the threat.
Do you care about the security of your wireless mouse?
Comment removed based on user account deletion
Okay so someone was a bit late to learn that MD5 collisions are indeed possible. Congrats, you're still retarded!
It's not exactly hard to understand that a 128-bit hash is going to be less unique than a multi-kilobyte executable. I believe 3rd grade math has that covered. With processor speed increasing steadily, these things become easier to break with each passing day.
-Billco, Fnarg.com
Still got him his moderation boosts, tho.
I see a lot of comments about how, since this attack requires access to the file both before and after signing, this is a non-issue. In most cases you're right, but get creative.
You have a lengthy verification process for new software - you check it over thoroughly to make sure it can be trusted, and after you certify it as trustworthy you sign it and only need to re-certify if the signature changes next time you download it from me.
I deliver a new version of the software to you (the "good" version), you certify and sign it (using MD5, unfortunately for you). I swap out the "evil" one, and next time you download it -- sure enough, the signature verifies it's fine.
What if you even had a virus scanner that used MD5's on executables for lazy re-scanning when they'd been modified?
I'm not sounding the "holy crap we're doomed" alarms, just pointing out that if you can take two different files and get the same "signature" from them, it's not a very good "signature", now is it?
Wrong. True of other breaches, not this one.
FromMSDN Library:
(emphasis added)If you want security it has to be in effect 100% of the time. Not just here and there WHEN we have time for it and we don't bypass it to improve performance.
the issue here is not whether MD5 is vulnerable but whether it is being used all the time like it needs to be
anything and everything that is executable needs a signature that can be verified before it is executed and until that standard is made mandatory RATS will continue to have a festival which will only get worse and fast.
NO SIGNATURE? NO EXECUTE.
Cryptography such as these digital signatures is pretty good these days: proper use will render any attack on the cryptography itself a poor choice of options.
But Bruce Schneier notes in is recent book that all too often cryptography is like putting a post in the middle of a field and hoping the attacker runs into it. If there is anyway around the post the attacker will just take the easy way out and never bother the cryptography. He's not playing your game; he's playing a different game and he is governed only by the opportunities left open to him.
Signatures should be required on all eMails as well and any eMail without a signature that you recognize and approve should go into quarantine so you can dispose of it.
Back in the day I remember always being told that a single hash function was never secure for verifying information... and that for security you should use two -different- algorithms or more. Simply because an attacker can manipulate the data to collide in a single function, it's that much more difficult to manipulate the data to collide in two entirely different hash spaces.
Did this concept change over the years, or is it just me? heh
or you could just crack the MD5 check itself, since you are modifying the program anyways, seems kinda pointless.
Use public-private key signing rather than hashes (a hash is pretty limited, for *every* file to transfer, there must be a checksum in existence on the client side that got there through a 'secure' means. Signing means they just need to be confident they got your public key once and from then on out, your signatures can be proven/disproven on files without need for further guaranteed secure means.
About the only place I see MD5 sums used much is for large iso files, get the md5 sum from the distribution site, then grab the iso from a mirror and make sure it's ok. For apt and yum, where signatures are checked automatically, it's pretty certain they use public-private key signing.
XML is like violence. If it doesn't solve the problem, use more.
Think voting machines. So far that has been the most requested approach, a verified hash code from open source, that is verified on each machine...
Surely a hybrid MD5+SHA1 signature would prove better? You can find weaknesses in each, but putting them together and the likelihood of the both weaknesses appearing at the same time would be greatly diminished. Other than extra CPU requirements, are there any issues with this approach?
Jumpstart the tartan drive.
The grandparent wasn't trying to say that hashes can't identify applications. Instead it was trying to say that there's no such thing as a good app and a bad app. It's impossible to tell what an app will do in advance since very few apps can be entirely understood without understanding their entire runtime environment which is impossible.
However, the grandparent is still wrong; IMHO. App signing says a) this is the app that I built; you can trust it to behave that way. b) this app was built by XX who's reputation you can check up on. In general, a good application is designed so that it behaves properly independent of the different environmental inputs, within the scope of "normal" computer behavior (radiation attacks, for example, not included). An MD5 sum is not telling you that the formal proof of the application is good (there probably isn't one); instead it is allowing you to predict how the application was designed.
Although AES-256 is not a hashing algorithm, I've seen it applied in hashing. Since it is a block cipher, when you encrypt a file, at the last iteration you have a chunk of 256 bits, which is used as a digest. If you change anything in the file, the change will propagate to other blocks (if encryption is done in CBC mode), so the last block (i.e. digest) will be different.
The saddest poem
There seems to a yet another massive explosion of XOR thought that seems to ignore the possibility of using more than one hash to sign an object, whether it be code, text or other data.
It has occurred to a few people--just a few people--to sign objects with both MD5 and SHA-1.
It seems that it is more difficult to get both MD5 and SHA-1 collisions by quite some orders of magnitude. Someday, perhaps, it can be done but not today. Well, no one has said so, at least.
Anyone for some OR thinking?
me. --a by-product of public education
Oh, the discussuion basically says that hash-only is broken anyways, even if the hash remains secure. Therefore breaking this specific hash does not matter a lot.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.