Implications Of The Recent Hash Function Attacks
An anonymous reader writes "Cryptography Research has issued a Q&A that explains the security implications of the hash function
collision
attacks recently announced at CRYPTO 2004. Apparently the consequences can be catastrophic for certain kinds of code signing and digital signatures, but MD5 sums for checking binaries are (mostly) OK. While the
speculation that SHA-1 is about to fail seems to be overblown, updating the many legacy systems and protocols that rely on MD5 is going to be a massive undertaking."
Yes, too bad ECC is not a hashing algorithm and has absolutely nothing to do with this, or else we'd be set.
But what I dont get is this still doesn't allow somebody to arbitrarily pick whatever sum they want for their code right? I mean still the chances of somebody writing some trojan'd program and magically somehow getting the sum's to match is extreemly small and/or really computationally expensive.
Right. The breaks that were announced of the variety: "Here are two totally contrived documents that hash to the same value (which I can't control)." The attack does not allow someone to "hit" a desired hash value. So for the use you described, MD5 is still OK (so far).
And too bad that ECC is a) not provably secure and b) is rumored to have been broken already.
I am disrespectful to dirt! Can you see that I am serious?!
I *think* this is the difference the article talked about between a preimage attack and a collision attack. An attacker still can't find a message (program, whatever) that produces the same hashcode as your one - but they *can* produce two messages which produce the same hashcode as each other. One of those may be perfectly acceptable, the other not - but once a user/system/whatever has accepted (or signed) the first one, they've effectively signed the second one.
That's only my impression based on the article, and I wouldn't like to swear to it - but it does make a certain amount of sense.
I think the SHA-1 speculation originated because of the SHA-0 collision, not the MD5 collision.
SHA-0 obviously is related to SHA-1. So although no one has yet extended the SHA-0 collision to SHA-1, it is conceivable it might be possible.
FTA...
SSL3.0/TLS does use both, and is therefore immune to this attack.
Also FTA, SHA-1 is still believed to be secure, and Cryptography Researchers does not believe it is in danger to this same attack.
I am disrespectful to dirt! Can you see that I am serious?!
Because wildcards are not necessarily a bad thing. The concept is that you have a single SSL accelerator in front of a whole pool of servers, and it absorbs the "security context" of all the hosts behind it.
If you want universal SSL deployment, this is one of the ways you get it.
--Dan
ECC has been cracked...it just takes roughly the combined powers of 50 PCs to do it. This is a really old link but its valid for this thread.
http://cristal.inria.fr/~harley/ecdl7/readMe.html
Is here.
"Wouldn't just turning on the computer affect the checksum of the entire disc if just a single file was accessed, thereby changing its last accessed date, or if a single temp file was modified?"
Correct, usually what happens when a computer is confiscated is this:
1.) Power is removed. IE, plug pulled on desktop or battery removed on laptop. If you turn the power off, APM or ACPI will kick in and write to the disk.
2.) Disk is removed and a chain of custody form is written.
3.) Disk is checksummed and imaged, either using a standard computer or a forensics machine that is designed to image disks. The disk does not have to be mounted to do this, you can get a raw dd without mounting a disk and without accessing any files.
4.) Forensic analysis is performed on images, usually on copies of the images. When evidence is found, the checksum is checked again to make sure that this image is the same as what was on the disk.
Hash functions map a bigger space (unbounded strings, for example) to a smaller space (64 bits, perhaps), so collisions are inevitable. The linked article is not significant because it points out that MD-5 has collisions, since this is trivially true as I've attempted to make clear. It's significant because researchers have found a way to find multiple inputs which both hash to the same value. Since they have not found out how to create an input string that hashes to a value of their choice (preimage problem) MD-5 is not fundamentally broken.
t 21/lect21.html or by googling for "hash functions collisions probablility".
"When there are collisions in the algorithm, the checksum cannot prove, beyond a reasonable doubt, that the data has not been tampered with." This preceding sentence demonstrates a remarkable lack of understanding about hash functions. Collisions are inevitable, see above. How hash functions work is by making the hash values unpredictable and spread out evenly over the space of the hash values, given random input. Read up: http://www.cs.sunysb.edu/~skiena/214/lectures/lec
It's based on the birthday party paradox.
For two randomly chosen hashes, the chances of them being the same is 1/p where p is the maximum size of the hash.
However, if you pick n hashes at random, then the chance that any two of them match is approximately n^2/2p, since any one of the 'n' could match with any other of the 'n'.
So if p is 1/(2^160) and you generate 2^80 hashes of random (or partly random) data, then theres about a probability of 1 that two of them match each other. 2^80 is still a big number, but they've managed to reduce it further with some clever tricks, and modern processors can do billions of operations per second.
So, if you write two programs one evil, one good, and then add 2^80 different random fillers on the end of each, chances are, two of the good/bads will have the same hash.
But the chances of any of the bads matching a given hash that somebody hands you is only 2^80/2^160 which is 1/2^80- much too small.
-WolfWithoutAClause
"Gravity is only a theory, not a fact!"I think you're confusing symmetric and asymmetric cryptography. Current browsers use "128-bit SSL", which refers to 128-bit symmetric keys, which are still very secure (as long as the algorithm isn't weak and the implementation isn't flawed). 1024 or 2048-bit keys for asymmetric crypto are considered secure, I believe.
But yes, 40-bit SSL is too weak to use. I don't think 512-bit RSA is considered very secure either.
512 bits made from 2 hashes, one weak and one strong will be weaker than a single 512 bit hash from the stronger algorithm.
Yes, but remember that this is a collision attack, not a preimage attack. You can find two pieces of plaintext that have the same hash, but you don't get to choose what the hash is. Thus it is still computationally difficult to find a document (even garbage) that has the same hash as some preexisting document.
...interesting if true.
I write 2 programs, lets call one "Cool Whizzy Must Have Util" and the other "Soul Sucking Destruction" and I tweak and tweak one of the binaries so that they have the same checksum.
Then I release the first one, everybody is eventually using it.
So then on my servers, I replace the first one with the 2nd one.
Gotcha!
Is that the danger here?
Some forms of ECC have been 'broken', Len Adlemann (A of RSA) showed that ECC in dimensions higher than 2 was no more secure. He has been working on some further attacks and thinks that ECC as a whole might be vulnerable.
I don't like ECC for two reasons. The first is that ECC is a very new field of mathematics, new results come regularly. It is entirely possible that someone would find an efficient means of transforming ECC problems into discrete math problems and come up with a solution.
The other reason is that ECC is patented up the wazoo. The most efficient ways of using ECC are patented and if you can't use them there is no efficiency advantage over RSA in a discrete field so why bother?
The hash algorithm thing is massively overblown. MD5 was already toast. SHA1 was due to be withdrawn in 2010 in any case and has already been superceded by SHA-256 and SHA-512. New versions of DSA for the larger hash sizes are also due.
It remains to be seen whether the construction of SHA-256 needs to be adjusted in the light of the MD5 results. It may well be that it shares the same vulnerability as SHA-1 and we should forget about the new hash functions and move straight to something else. Alternatively all might be right with the world. We do not know yet.
A lot of people are suggesting a competition similar to the AES competition for a new digest algorithm. There is already something underway for stream ciphers. This seems like a good plan, not least since the cryptographers seemed to have fun with the last one.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
create random garbage with the same md5 hash
A collision attack doesn't let you create random garbage that generates a given md5 hash. It lets you generate two pieces of random garbage (or nongarbage) that generate the same hash. It can't be used to attack a third party's existing hash.
"What I fail to see is how finding 2 things that give the same hash is a big deal."
wow, didn't expect to see that here. Well, in very simple terms, assume that you have a an OS where the password is saved in a file, say passwd, as a hash. Someone come along and finds another word that makes the same hash. Now when that person logs in as you, he can use that other 'password' and not your password and still login.
The war with islam is a war on the beast
The war on terror is a war for peace
"SHA1 is a totally different algorithm, so it's still perfectly safe."
Yes and no. MD5 collisions are not SHA1 collisions, and the attack that generated the MD5 collisions doesn't seem to be applicable to SHA1, or its authors would have published collisions on SHA1. The published collisions on several other algorithms: HAVAL-128, MD4, and RIPEMD. They also say that their method will work against SHA0. All these hash functions share similar design principles. It seems highly probable that the MD5 attack will have at least some applicability to SHA1 even though it isn't directly an attack against SHA1. Also, other researchers have published results against SHA1. In particular, Biham and Chen con produce collisions on reduced versions of SHA1 with up to about 40 rounds (the full hash function has 80). That isn't a break of the full hash function, and there's no guaranteed it can be extended to more rounds, but it looks worrisome.
"This attack produces two messages with the same hash, no guarantee what that hash would be, instead of one message with a chosen desired hash, so it isn't a threat to real systems."
That's just stupid. "No practically-findable collisions" is one of the design requirements for a secure hash function. Protocols using secure hash functions are based on the assumption that the functions used are secure hash functions. If your hash function doesn't guarantee collision resistence, then your protocols must be assumed to be broken unless you can go back and prove, for every protocol, "This one is still secure even if we use something that is not a real secure hash function."
One way a hash collision could be useful, for instance, would be against some signature schemes where the secret key is revealed if you ever sign an identical message more than once. People who use those schemes are careful to avoid signing the same message twice... but if you had two different messages and they had the same hash, it's quite possible to imagine that you could be tricked into signing the same hash more than once (because people sign hashes, not actual messages) and making trouble for yourself. Similarly, if you use hash output for initialization vectors in cipher modes that use those, the result could be encrypting two messages with the same keystream, which means an attacker can probably recover both messages (and then use them as stepping-stones to breaking the rest of your system).
Also, a fast way of finding collisions may well be extensible to a somewhat-slower, but still faster-than-brute-force, way of finding the preimages that you think an attacker really wants.
"This attack depends on the messages having a special form; they don't look like real plaintext, so it isn't a threat to real systems."
One of the conditions for a secure cryptographic system is that you don't depend on the plaintext having (or NOT having) a specific form. If your system doesn't work regardless of the content of the data I put through it, then I will punt on your system, and recommend to my clients some other system that will actually work. It's also not clear that the attack on MD5 really does require a specific form... those strings look randomly-generated to me, even though the XOR difference of them clearly is not. Maybe with just a little more work they can produce collisions of two meaningful and interesting strings with opposite meanings.
"All hash functions have collisions, so it was bound to happen and isn't a threat."
The important question is whether people can actually find collisions. With a good hash function, collisions should be rare enough that nobody has any reasonable chance of finding them on purpose any time soon. Wang, Feng, Lai, and Yu can find collisions on MD5 deliberately, with practical amounts of computer power. They have done this more than once, and have at least outlined a plausible theoretical explanation of how they can do it. That means MD5 does not provide the guarantees that a secure hash function must
SSL does this. It is not a very good idea.
The problem is that MD5 and SHA-1 are both variations of MD4. Each one has an extra cycle. SHA-1 has in addition a mysterious expansion function that blocks many attacks and has five chaining variables rather than four. But at root there is no real difference. Both use the exact same functions for the transformation.
There would be slightly more point to using SHA-1 with a hash algorithm with an entirely different construction mechanism. But even then the keying mechanism is not very satisfactory.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
Slight correction: AFAIK RSA-512 was not broken, it was brute-forced. There is a huge difference between the two.
Breaking a combination lock is figuring out that you can hear the tumblers go *click* when you hit the right number. It will take you twenty seconds and five tries to get the right combination.
Brute-Forcing a combination lock is trying every combination from 00-00-00 through 99-99-99 until you get the right one. You will get the right combination, it will just take you long enough that someone will notice you.
Just to give you back a little bit of a warm-fuzzy feeling about RSA strength, realize that every bit added doubles the brute-force keyspace. So if you can brute-force 40-bit SSL in 10 seconds, you can do 41-bit SSL in 20 seconds, but it'll take 98 billion-billion years for the same computer to do 128-bit SSL.
For the combo lock analogy, it would be adding on another number to guess, a 4 number lock instead of 3, which would give you 100x as much work (original amount of work to get numbers A-B-C with D=00, then lather, rinse, repeat until D=99). If the combo lock were truly broken instead, it would take you about a minute and seven tries, instead of 100x as long.
Using two hashes in conjunction does not work as well as you would expect it to work. There are at least half a dozen posters here proposing this idea, so I will try to explain in some detail why it does not work.
In general an n-bit hash can be collided in 2^(n/2) time using the birthday paradox attack. When you concactenate two hashes of lengths n and m bits, you get a hash of length n+m bits. However, this (n+m)-bit hash can in fact be collided in m*2^(n/2) + 2^(m/2) time (assuming n is greater than or equal to m). This is only slightly more effort than it takes to collide both hashes separately. In the case of SHA-1 and MD5, n is 160 and m is 128, so colliding both hashes would take 128*2^80 + 2^64 = 2^87.00000017 effort versus 2^80 effort for SHA-1 alone.
It must be especially stressed that m*2^(n/2) + 2^(m/2) is considerably smaller than the attack time of 2^((n+m)/2) which you would normally expect from a well designed hash having n+m output bits.
So how does the attack on two hashes work, you ask? It exploits a curious property of the birthday attack which says that generating a multicollision (three or more messages all with the same hash) by brute force takes only marginally more effort than generating a single collision. Specifically, generating a 2^(m/2) way multicollision takes only m/2 times as much effort as generating a single collision. So what you do to collide two hash functions is: you generate a huge multicollision in the first hash function, and then from that set you look randomly for a pair that collides the second function. It seems very counterintuitive, but the point is you can break the hash functions one by one instead of having to break both of them at once. Strength in numbers doesn't apply here.
If one of the hash functions (say MD5) has a better than brute force attack, then that can be used to speed up the attack against both hash functions by the same factor. The only uncertainty is if both of the hash functions have better than brute force attacks; in this case it would depend on the particulars of the attacks as to whether one can make them interact in such a way as to break both. However, no matter what, the idea of concactenating two hash functions has such low security compared to designing a good hash function of the same length from scratch that it is unlikely that this concept will ever be useful from a pure cryptography standpoint.
For more information on multicollisions and attacking concactenated hash functions, see A. Joux "Multicollisions in Iterated Hash Functions", Proceedings of Crypto 2004, LNCS 3152.
More info on the implications at Educated Guesswork. (It isn't my work, so anonymously it is.)
NIST (http://csrc.nist.gov/cryptval/) ran the AES contest (http://csrc.nist.gov/CryptoToolkit/aes/index.html ). They would be the body to run future contests of the same sort.
BTW, NIST never approved MD5 for government use (as per FIPS), but they have been validating implementations of SHA-1 for several years. NIST also now validates SHA-224, SHA-256, SHA-384, and SHA-512, each essentially a longer version of SHA-1 ("160").
http://eprint.iacr.org/2004/199.pdf
In the United States, the National Institute of Standards and Technology determines what is and what is not to be considered secure enough for federal data processing using the definition below. I highlighted the part where MD5 would run into trouble because a method has been discovered to predict collisions in MD5. (NIST never classified MD5 as a "secure" hash.)
From the NIST site, FIPS 180-2 (http://www.nist.gov):
Federal Information Processing Standards Publication 180-2
3. Explanation: This Standard specifies four secure hash algorithms - SHA-1, SHA-256, SHA-384, and SHA-512 - for computing a condensed representation of electronic data (message). When a message of any length < 264 bits (for SHA-1 and SHA-256) or < 2128 bits (for SHA-384 and SHA-512) is input to an algorithm, the result is an output called a message digest. The message digests range in length from 160 to 512 bits, depending on the algorithm. Secure hash algorithms are typically used with other cryptographic algorithms, such as digital signature algorithms and keyed-hash message authentication codes, or in the generation of random numbers (bits).
The four hash algorithms specified in this standard are called secure because, for a given algorithm, it is computationally infeasible 1) to find a message that corresponds to a given message digest, or 2) to find two different messages that produce the same message digest. Any change to a message will, with a very high probability, result in a different message digest. This will result in a verification failure when the secure hash algorithm is used with a digital signature algorithm or a keyed-hash message authentication algorithm.
Sure there are bound to be collisions, but the number of outputs is so large (2^160 for SHA) that the odds of you ever accidentally "finding" one should be slim to none.
What is important about the recent results is that it is possible to generate hash collisions. This means that an adversary can make two things that have the same hash value.
Using the example from the article, suppose an adversary creates two messages with the same has value:
1. "I, Bob, agree to pay Charlie $ 5000.00 on 4/12/2005."
2. "I, Bob, agree to pay Charlie $18542841.54 on 9/27/2012."
If the adversary can get you to sign the original message (by selling you something for $5000), the adversary could then claim that you agreed to the second message because the messages have the same hash.
When the dcfldd version of dd is used, the investigator can acquire the MD5 hashes of chunks of data; even if the whole disk hash changes, you can still have valid pieces of the disk images.
Here's an actual example of two different binary files having the same md5sum.
With a collision attack, you can perform an attack that matters - here's an example:
Imagine that Microsoft won't sign any audio drivers for Windows XP that allow raw audio data to be output to disk. Also imagine that you are the driver release engineer at Creative (Sound Blaster division) and you want to release a driver that can do that.
What you do is build both drivers (one that Microsoft will sign, and one that you want to release with the "unacceptable" feature) with a large static data buffer that isn't used in the binary. You then try to modify both buffers in such a way as the two files will have the same hash (doesn't matter what hash, just that it's the same). This will take about 2^40 worth of work for MD5 instead of the 2^64 that it should take because of this security issue.
Once you've created your two binaries with the same hash, you send the acceptable binary to Microsoft and they sign it. Then, in the release section of your website you post the other binary with the signature you got from Microsoft... and the signature verifys just like they signed it.
There is also a break in the digital check situation, *if* the digital check protocal has random padding (many do) *and* the payee generates the check (also possible).
-- The act of censorship is always worse than whatever is being censored. Always.
DSA and GPG aren't hashes. They're signature schemes (well, GPG implements PGP, an encoding of either RSA or ElGamal signatures), and they use other hashes in their checksums, like SHA1 and MD5. Of course, you can make GPG do RIPEMD160, which IIRC has not been broken yet either.
I hereby place the above post in the public domain.
I wasn't there this year. A friend told me that the embarrassing thing was that the Chinese paper was REJECTED from the conference. They presented their results at the rump session.
Of course, it would have helped the paper's chances of being accepted if:
a) They had actually presented the methods they used
and/or
b) The results had been correct. Initially, they were not finding collisions for MD5, but what they *thought* was MD5 (due to a translation error).
So what the reviewers read was a claimed attack on MD5, with no details, and their examples did not work. If I were reviewing that paper, I would have rejected it too. They didn't correct the paper until (IIRC) the day before the rump session.
The Q/A does not say they have the same hash, but rather say "Suppose they both have the same hash..." The keyword is "Suppose"....
For example, suppose the attacker (Charlie) discovers that the message "I, Bob, agree to pay Charlie $ 5000.00 on 4/12/2005." has the same hash as "I, Bob, agree to pay Charlie $18542841.54 on 9/27/2012."
Or, you could get it to hash right by making other, less noticable changes. Extra spaces between words or at the end of the document, extra non-displaying bytes, etc., that won't give away the fact that the original has been tampered with.
Notice that the two plaintext messages that were found by Joux were very similar, large sections of each message were identical. If you can create a plaintext message that looks superficially similar to the original (except that $1995 is now $2995) in significantly less time than would be required using brute force, then that certainly is a big problem.
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
The paper used an incorrect (wrong endianness, I think) implementation of MD5. You can reverse every chunk of 4 bytes in the data in the paper, or just look around for someone else who did the same thing.
TANSTAAFI: There Ain't No Such Thing As A Free iPod.
Yes, some crypto people are already saying you should change the whitespace in any pre-generated document you are asked to digitally sign.
Changing spelling or punctuation would also protect against collision attacks.
Nothing to see here; Move along.
for a given algorithm, it is computationally infeasible to find two different messages that produce the same message digest.
This is the trick to figure out if MD5 is secure in how you use it. If I'm using md5 to sign a 4 character string, then md5 is secure for that reason because there are no collisions in the 4 billion input strings (its insecure because I can reverse it because I easily do 2^32 md5s)
There are a number of applications where md5 is still secure. One of them is hmac since the sending end controls the random seed and there isn't any place to inject and funny bits.
This isn't a killer for md5. Its a killer for md5 in some cases but not all. For now I'm going to continue to use md5 for a number of uses and one of them is integrity checking of what I downloaded from open source binaries. If they can hack the web site to put a virus in the binary, can't they also update the md5 displayed on the web site? In which case its a great tool to verify the download is what was on the web site. I have binary checkers that use md5 to make sure my binaries aren't corrupted on my systems. Right now I md5 every binary that gets backed up I may consider adding a seed to the front of the md5 I use. Someone may come up with a funny RH7 binary of ps that has the same md5 as the stock one, but they aren't going to come up with one that has the same hash that also has the same hash when I tack on "xyzzy" on the front.