Practical Exploits of Broken MD5 Algorithm
jose parinas writes "A practical sample of an MD5 exploit can be found, with source code included,in codeproject, a site for .Net programmers.
The intent of the demos is to demonstrate a very specific type of attack that exploits the inherent trust of an MD5 hash. It's sort of a semi-social engineering attack.
At Microsoft, the MD5 hash functions are banned.
The main problem is that the attack is directed to the distribution of software process, as you can understand reading the paper, Considered Harmful Someday. Some open source programs, like RPM, use MD5, and in many open source distributions MD5 is used as check sum."
If you contract your file from x bytes down to a fixed size, no matter what algorithm you use, you will always have collisions.
Unless you start to give your hash keys as the same size as the original file, there is not anything that can be done about it, ever.
liqbase
...better use Tiger or Whirlpool (based on AES). AFAIK there are no known vulnerabilities or attacks for these two yet.
Do not be alarmed. This is only a test.
Now we know why people distribute modified game ISOs on the net and check it with md5 :P In all technicality, couldnt this mean that someone could land a virus on someone else's machine because the person trusted the hash?
Viable Slashdot alternatives: https://pipedot.org/ and http://soylentnews.org/
Isn't this the problem with all algorithms? Only way to know if something is something... is to see something instead of its checksum?
As for md5... with only 32bits, it should've came up with repetitive hashes in end anyways?
I guess since the article explains some issues against md5 security, the only answer would be to trust the source that is supplying the hash in the first place? Coming down to the fact that a system is only as secure as its user?
At Microsoft, the MD5 hash functions are banned.
they use crc instead!
This seems to work on the assumption that you want to do some harm with a program you created yourself, you can't actually take a random RPM and turn it into an evil RPM with the same MD5. So, yes, it's bad, but it's not as bad as you might think.
Considering this is such a "well known" result, you would think that MD5 should have been abandoned long ago. Is this true for other popular hash functions?
"Yields falsehood when preceded by its own quotation" yields falsehood when preceded by its own quotation.
This isn't a problem for software distribution, really, since the good.bin file needs to start with a vector designed to enable a collision. A good-faith programmer wouldn't include that vector.
It is a problem for stuff like contracts; you draw up two versions of a contract, a good one and an evil one, let someone sign the good one, and later keep them to the clauses in the evil one.
So while there IS a very big problem, the example is a bit contrived.
AFAIK there are no known vulnerabilities or attacks for these two yet
I am no cryptography expert so I can not read and understand those algorithms. But the fact that there are no known vilnerabilities for an algorithm doesn't make it secure.
Maybe they are just not used as much as other well known algorithms. And therefore nobody has found vulnerabilities for them yet?
thomasdamgaard.dk.
Two pages, same hashes, etc. (This is the guy who wrote the MD5 someday paper.)
http://www.doxpara.com/t1.html
http://www.doxpara.com/t2.html
RPM uses both MD5 and SHA1, the chances of finding a collision that satisfies both hashes is small, even if both MD5 and SHA1 are compromised since the hash the data differently.
rpm -Kvv xorg-x11-libs-6.8.2-37.FC4.49.2.i386.rpm /var/lib/rpm/Packages rdonly mode=0x0 /var/lib/rpm/Packages /var/lib/rpm/Pubkeys rdonly mode=0x0
./updates-released/packages/xorg-x11-libs-6.8.2-37 .FC4.49.2.i386.rpm: /var/lib/rpm/Pubkeys /var/lib/rpm/Packages
D: Expected size: 2655615 = lead(96)+sigs(344)+pad(0)+data(2655175)
D: Actual size: 2655615
D: opening db index
D: locked db index
D: opening db index
D: read h# 278 Header sanity check: OK
D: ========== DSA pubkey id b44269d0 4f2a6fd2 (h#278)
Header V3 DSA signature: OK, key ID 4f2a6fd2
Header SHA1 digest: OK (f37bf5cb97db696f14133b90e23f2455b9f94587)
MD5 digest: OK (8eda29837b6992876bd867df03b3b8af)
V3 DSA signature: OK, key ID 4f2a6fd2
D: closed db index
D: closed db index
D: May free Score board((nil))
The point is to reduce the set among which to do an exhaustive search (one small hash bucket versus all known files on the system), and not to verify some kind of signature.
Any successful attack on the hash would only be useful to make the system slow and unefficient (by making an excessive number of files end up in one bucket), but cannot corrupt it.
Wouldn't work. TFA points out a way to generate to packages, a good one, and an evil twin. Both are constructed by the algorithm.
It does not however show how to make an evil twin for an existing package. This considerably lessens its danger.
Think about it: the only guy who could pull this off would be the original author. But then, the original author could achieve that same goal much more easily than by "breaking" MD5.
Misunderstandment. I never made a statement about MD5 not being "free". But MD5 is vulnerable, that's why I pointed out some alternatives.
Do not be alarmed. This is only a test.
The NIST is having a two-day workshop in Gaitherburg, Maryland (USA) on October 31-Nov 1. Xiaoyun Wang will be giving a keynote speech, and there'll be plenty of technical material to go around. The workshop website is: www.csrc.nist.gov/pki/HashWorkshop/program.htm. I don't work for NIST or anything, but I thought this was interesting and they haven't really done a good job getting the word out about this conference.
Wer mit Ungeheuern kämpft, mag zusehn, dass er nicht dabei zum Ungeheuer wird. --Nietzsche
While this is a very serious attack, it doesn't yet mean that every use of MD5 is totally unsafe.
... yet :-)
It is feasible now to generate 2 different pieces of data with the same MD5 hash. As many file formats allow one to embed invisible 'junk', it is possible to create a 'good' version and an 'evil' version with the same MD5 hash.
BUT it is NOT (yet) feasible to create a piece of data with a given MD5 hash. This means that if you can not modify the 'good' version, you can't create a matching 'evil' version.
An example of where the usage of MD5 isn't broken are *nix passwords. Your password is hashed with MD5 (a salt is added to your password too, but that's not important here), and that hash value is stored. Anyone who can supply a password (doesn't have to be the same one!) which has the same MD5 hash, is allowed access.
So if it would be feasible to generate data with a given MD5 hash, one could easily generate a matching password, when given the MD5 hash (which you can often easily acquire, especially in NIS/yp environments). But luckily, this is not possible
So you'd better start protecting those hashes (that's what a shadow password file does), en better yet, move to a better algorithm. Like the Blowfish algorithm that OpenBSD has been using for years now.
The problem is that a broken algorithm just makes that piece of social engineering a lot easier.
If I just told you you can download the latest auto-installer of the latest WoW patch from www.i-pwn-ur-puter.ru instead through the slow Blizzard installer, you might think "uh, wtf, I think I'll play it safe anyway and get it directly from Blizzard. I trust them more than I trust a warez and script-kiddie site."
Now picture that I tell you "and here's a link to the MD5 sum on Blizzard's site. You can check for yourself that the the file on our site is the original file and it hasn't been tampered with." In fact, I would even _urge_ you to make a habit to check all your downloads against the original MD5 sums, for your own good.
It already looks a lot safer and more legitimate. Well, maybe not to _you_, but to a lot of people it does.
That's the whole problem. That false sense of security makes the "if we can convince you to run our insecure extractor code" part a helluva lot easier.
A polar bear is a cartesian bear after a coordinate transform.
Once again, OSX proves to be more secure!
*ducks*
surprisingly many stories hashes to the same value..
This is a complicated issue. Generally, the security offered by an encryption algorithm isn't measured by its popular usage, but by the amount of time qualified professional cryptographyers/mathematicians/hackers have studied it without finding a critical vulnerability. My claim is probably too broad: there is no magical formula that determines how secure an algorithm is. But in depth work by professionals does endear confidence in an algorithm.
As a general rule of thumb, it is wise to use an algorithm that has been seriously studied for 10-20 years. At this point, it is modern enough to withstand modern brute force attacks, and (hopefully) understood well enough to ensure that there are no structural vulnerabilities. If it is much older than that and still studied, it is likely because a flaw has been found and people are trying to push it as far as it goes.
After all, I am strangely colored.
And to think that they're still working on getting MD5 digests in TCP packets...the algorithm could be as useless as the checksums they currently use by the time this change would have become widely accepted (now it's probably never going to happen).
Please correct me if I got my facts wrong.
Not necessarily - it depends on how much the author is trusted to begin with. Certain types of software are very closely checked by the open source community, and any trojan will be discovered if it exists in the package.
Say I write a bit of security software (for which most people take the time to compare checksums). As a relative nobody, lots of people are going to scrutinise the source code before using it. Any new release will also be checked. Only after it's been scrutinised and built up a reasonable userbase is it worth switching it for the evil version - otherwise, the evil version would be discovered early and nobody would use it.
Someone like Schneier could probably release a trojan directly and people would install it without thinking (I trust he's too responsible for that, though). For the rest of us, this gives a feasible way to sneak in evil code without anyone checking it.
said this before...
... LAST YEAR.
Dan Kaminsky is actually the dude who came up with the Stripwire idea.
Tom
Someday, I'll have a real sig.
Lots of file types allow for arbitrary junk at certain places.
For example, a very basic form of steganography: cat a .zip file to the end of a .gif file. The result is a valid file that can be displayed as an image (which ignores trailing junk), and decompressed with zip (which ignores leading junk).
Most file formats I've written don't care about junk at the end of the file. It'll be stripped off if you load and then save, but the program won't notice or complain. One program even preserves records it doesn't recognise (which could be secret messages, or just random crap).
As far as I know, the technique used for finding these MD5 collisions, cannot be performed with a GIVEN hash. So it's not possible to create, say, a copy of an already available RPM, add malicious code to it, and easily find some data to add to it to generate the same hash. This is not possible.
The only thing the current 'crack' does is create two RANDOM input files that generate the same hashed output. So it's only useful for someone who can control both the 'original' and the 'malicious' version of the data which is being protected by an MD5 hash.
So the dangers here are kind of limited though you could still do a lot of damage with it.
AFAIK this is an attack on the underlying Merkel-Damgard paradigm which both SHA-1 and MD5 (amoungst others) employ. The paradigm goes as follows:
...
:-)
IV | Intialisation vector of n-bits
MB_i | Message Black i of n-bits
HB_i | Hashblock i of n-bits
f:(IV , MB_i) -> HB_i | is the underlying compression function which takes both an IV and a message block as input and outputs a hashblock.
First the orginal messaage is split up (and padded if need be) into n-bit blocks. Then f is applied with an IV and MB_1 as input resulting in HB_1. f is then applied iteratively each time tacking the next message block as input while using the previouse hash block as the IV input.
f(IV, MB_1) = HB_1
f(HB_1, MB_2) = HB_2
f(HB_s-1, MB_s) = HB_s = H(Message)
Merkle and Damgard proved that the over all construction is collision resistent given that the compression function f is collision resistant.
As the parent post pointed out though the last block had better include the over all message length. If this is not the case then by extendeing 2 different but colliding messages x,y with the same plain text q the input to the compression function becomes identical since H(x) = H(y) = IV input for f. If on the other hand the length of the orginal message is included in the last block then even though the IV input for f is the same for f(H(x),q_1) and for f(H(y),q_1)..., the final message block (q_s) will again be different resulting in a different final hash block.
If on the other hand len(x) = len(y) then the problem persists since both IV and message blocks will be the same when the final iteration of the algorithm is reached.
Infact this attack is even stronger since by the same reasoning one can see that to produce H(x+q) all one needs is to know is H(x) (and len(x) if that is included in the last message block). No other information is needed about the orginal message x! H(x) is simply inserted as the IV for f when hashing q and so the iteration is "jump started" just where x finishes. (If the length is included in the last block then all that need be used is len(x)+len(q).)
Disclaimer: Not to 100% sure about all this though so please feel free to correct me if i'm wrong...
It's dumb to ban MD5. It is still, and will continue to be a useful tool in situations where a cursory comparison is needed for reasons other than security. It would be silly, stupid even, to use a larger hash that requires more (slower) computation in these situations. Banning MD5 can mean one of two things: Either it's a stupid publicity stunt that will result in slower code overall, or that Microsoft doesn't trust it's developers to be smart enough to know when a particular hash is appropriate.
RPM only uses MD5 to check for corruption of the type you might find during download. RPM actually uses GPG or PGP to sign the generated RPMs for security, and GPG is (afaik) capable of using nearly any hashing algorythm including ones that are yet to be invented. So as far as security is concerned RPM doesn't use MD5 but rather uses whatever hashing algorythm the GPG key that signed the RPM was generated with.
Windows is a bonfire, Linux is the sun. Linux only looks smaller if you lack perspective.
Whirlpool is not "based on AES". It shares a few similiarities (and a designer) but it is a distinct algorithm in its own right. It has a larger block size, different S-boxes, a different linear component, a different key schedule and so on.
I would interpret "based on AES" as meaning it actually uses AES itself (perhaps in tandem Davis-Meyer mode or similar).
I like Whirlpool but it's not fast. I think Tiger is quite a bit faster.
Bernstein's cache timing attacks, and the ever-growing gap between processor speeds and memory access times, mean that table-based primitives (which both of these are) are going out of style.
Xenu loves you!
I just read the article, and i was thinking :If we can replace the extractor, then why bother creating an evil file with the same checksum ? The extractor can do the "evil patch" at extraction. The method in this article isn't very usefull for evil purposes.
I've been reading the posts in this thread and I've noticed that there are two types of posters here: the ones who got it 100% right, and the clueless ones (there appear to be little or no posters in the middle ground).
Now, the clueless ones are thinking of lots of "attacks" using this vulnerability, some of them really wrong. Since this has the potential of getting lots of people to do stupid things (like not trusting MD5 when they should), let's talk a little bit about the vulnerability and its effects.
First of all: this is not new. There was an article here explaining the same attack a few months ago (about x.509 certificate collisions and how to fake postscript orders, if you know what I am refering to, please post a link).
The attack goes like this:
You have a block B1 that is known to collide with another block B2.
You have some custom made code that looks like this:
-----BEGIN SNEAKY CODE---------------
If DATA[1] = DATA[2] then
do something good
else
do something bad
end
DATA[1]
DATA[2]
-----END SNEAKY CODE-----------------
The trick is that since there's a collision between B1 and B2 and MD5 makes the hash by reading sequentially, the hash for the whole program will be the same whether you fill DATA[1] and DATA[2] with B1 or B2 (in any combination). Since the code is DESIGNED to do different things depending on the collision area, by changing the contents of DATA[1] and DATA[2] you can have programs that do "good" or "bad" things, with the same hash. Please note it's been DESIGNED with that in mind.
From now on I'll talk on absolute terms, while in reality there is a very small probability of things being right for an attack without being planned that way, so keep in mind that before saying "but that's not the whole truth.....".
Now let's discuss what's possible to do and what's not:
1.Oh no! Now, someone will create a virus that has the same hash than my favorite app!
False: the app (or installer) would have to have been designed with that "feature" in advance.
2.MD5 is worthless and should not be used anymore.
False: MD5 is useless in the situation presented above. There are some very good uses of MD5 that are safe (like access control: this attack does nothing practical to you salted MD5 shadow file). MD5 should probably be watched for other undesirable properties, though. An alternate cryptographically secure function should be kept in reserve.
3.I'll use another hash function, I'll be invulnerable to this attack.
(somewhat) False: You'll be invulnerable until someone finds ONE collision in your new hash function (it might take a long time but....). Then you'll be vulnerable again. But now we all know what can be done with ONE collision. What you're thinking is probably good, but it's no silver bullet.
4.Microsoft will forbid the use of MD5 and DES, and use SHA-1 and AES. We should do the same.
(somewhat) True: Not for the reasons you're thinking though. If MS is really doing this, this attack is a lame excuse to do it. MD5 is still useable for some things, and SHA-1 is not much better than MD5 in the things related to this attack. IIRC these collisions were found using an attack derived from an attack on SHA-1. Right now, SHA-1 collisions can be found in 2^63 operations (and the clock is ticking). We should probably consider using a new hash function someday, but leave the decision to the cryptologists. About AES, it's about time. DES can be brute forced in reasonable time, and that's been like that for a few years. 3DES is slow. That's the reason for the AES contest, we should use since we have it.
5.Someone could distribute some sort of binary and the switch it so it does lots of damage to unsuspecting people.
True: That's exactly what the attack is about. Maybe you were wrong to trust [insert a name here].
6.Who should be doing what and when?
If you work in crypto, you probably k
GPG 0x1B479C78