SHA-0 Broken, MD5 Rumored Broken
An anonymous reader writes "Exciting advances in breaking hash functions this week at the CRYPTO conference. SHA-0 has
definitely been broken (collision found in the full function). Rumors are that at the informal rump session, a researcher will announce a collision in full MD5 and RIPEMD-128. And Ed Felten is speculating about collisions in SHA-1! Many systems, especially those that use cryptography for digital signatures are most at risk here."
No GNU has been Hurd during the making of this comment.
I looked up hash collision (e2) and hash function (e2) at Wikipedia and Everything2, which clarified the summary quite a bit.
doesn't mean that much.
Carefully crrafted binary data can be made to have the same checksum.
This is not a generalised attack where I can create binary data to have a CHOSEN checksum.
Therefore, if you verify your downloads by checksum, I can't generate a fake download with the same checksum.
First step is MATCHING some checksums (this has been done)
The next step is CHOOSING the chekcsum (aka DEADBEEF attack)
The next step is MANIPULATING, i.e. adding junk to a given binary file to allow you to choose the cheksum. that's the scary one!
- substitute trojaned binary
- append some binary junk so the checksum matches
- profit!!!
Nothing to worry about yet, sort of like the first proof-of-concept brute force crack of DES.
Yes, it can be done under some circumstances.
Yes, eventually processing power and methods may improve to make this a valid attack
Yes, you can sleep soundly tonight.
No, it is not possible to extract the original data from the digest. Hash algorithms have nothing to do with compression.
However, it might be possible to construct a file to a given hash. In that case, MD5 sums become worthless to check wether a downloaded file is what independent sources claim it is.
The security implications are quite severe. Also, P2P clients that use MD5 to identify unique files will be vulnerable to spoofing by malicious users.
All they found here were two values C, D for which H(C) = H(D). Anyone who thinks this was suprising needs to take a look at the idea of a hash:
The existence of collisions is obvious. The point is that it should be so phenomenally difficult to find a collision that you can't ever come up with one in a sane amount of time.
If they could find two data values with less than, say, 80 bits of data that both hash to the same, then that's something, 'cause that could be someone's password.
That would be unlikely to be a big deal. For passwords, I can just brute force the password, maybe precompute a dictionary of password hashes.
May we never see th
Indeed, here's another novel argument from Bruce Schneier's book....
in regards to the strength of 256-bit encryption:
now, the annual energy output of our sun is about 121 * 10^41 ergs. this is enough to power about 2.7 * 10^56 single bit changes on our ideal computer; enough state changes to put a 187-bit counter through all its values. if we build a dyson sphere around the sun and captured all of its energy output for 32 years, without any loss, we should power a computer to count up to 2 ^ 192. of course, it wouldn't have the energy left over to perform any useful calculations with this counter.
but that's just one star, and a measly one at that. a typical supernova releases something like 10^51 ergs. (about a hundred times as much energy would be released in the form of neutrinos, but i let them go for now.) if all this energy could be channeled into a single orgy of computation, a 219-bit counter could be cycled through all of its states.
these numbers have nothing to do with the technology of the devices; they are the maximum that thermodynamics will allow. and they strongly imply that brute-force attracks against 256-bit keys will be infeasable until computers are built from something other than matter and occupy something other than space.
bruce schneier, applied cryptography, p 158
Good question. :-)
The complexity of the attack was around 2^51.
SHA-0 is a 160-bit hash. Now, you might at first think that means it would take aorund 2^160 tries to get two strings that collided, but because of the birthday paradox (what's the probability that two people on a football field have the same birthday) it's actually supposed to be the square root of that, or 2^80.
Unfortunately, 2^51 is much, much less than 2^80. Which means that this crack was not just brute force (which is kinda redundant, since the definition of a crack to a cryptosystem is a violation like this without using brute force, but whatever). That means they've found a mathematical technique for making this 'easy'. Generally, once the idea is out there for an easy crack, people can gnaw on it and make it more general or easier, bringing this down to every-day-concern level.
I've had this sig for three days.
Of course it will be possible to find a collision given enough time. There are only 2^160 possible sha0 hashes.
You say 2^160 like it's some tiny number.
To demonstrate how large that truly is, let me do a little thing that my cryptography prof did when I took the course several years ago:
2^160 = 1.4615016373309029182036848327163e+48 possibilities. (We'll be nice to the people following along and say 1.46 x 10 ^ 48 possibilities.) Now, let's say you can successfully generate one unique hash per second, that's 1.46 x 10 ^ 46 seconds.
But just how long is that exactly?
Well, let's be nice (for the sake of making the math easy), and say that there are 100 seconds in a minute. That's 1.46 x 10^46 minutes. Now, let's do the same thing for minutes in an hour: that's ~ 1.46 x 10^44 hours.
Now we reach hours in a day. I'm feeling really generous, so we'll say 100 hours in a day. That's 1.46 x 10 ^ 42 days.
365 (or 366) days in a year is too close to 3 x 10 ^ 2, and dividing by 3s is just messy, so we'll say 1 000 days in a year. That gives us 1.46 x 10 ^ 39 years.
Chances are good that our sun will have burnt itself out long before you've managed to generate every single possible hash.
However, what makes this particular case more interesting is that they've found a way to get a collision without brute forcing their way through every possible hash. It's not particularly useful yet, as it's still a 2^51 problem, (approx 2.25 x 10 ^15), so it's hardly trivial enough for you to do on your home PC, but it's a step in the right direction.
"I won't mod you down - I feel the need to call you a twit explicitly, rather than by implication."
First, a little about SHA and SHA-1. SHA (0) was developed as a national standard hash function. Curiously, a last minute change was proposed by the NSA which resulted in SHA-1. The change was puzzling the cryptographers in academia. After some time, an attack was discovered on some classes of cryptographic hashes which SHA-1 turned out to be curiously immune to.
All that aside, one should view cryptographers a bit like plumbers. Cryptographers have a bag full of equipment (algorithms), which they combine in many ways to accomplish what they want. After much research and testing, certain ways of combining those pipes, valves, and spigots are 'proven' to work well, within given assumptions, such as expecting an L-joint not to leak. SHA-1 likewise is an integral component in many cryptographic systems.
SHA, MD5, RIPEM, and others are cryptographic hash functions. Cryptographers expect certain things out of a hash function. Its job is to take a variable amount of input bytes, and give you back a small, fixed number of bytes. Most importantly:
- The function is 'one-way'. You can't determine from the output what any of the input was.
- The probability that two inputs hash to the same output should be exceedingly low. In particular, an ideal hash function has a randomly distributed output for a large set of inputs.
- Given a hash output, it should be extremely difficult to construct another document which also hashes to that output.
In this case, it seems that it may be much more easy than the designers had hoped to break the second condition. This tends to mean that 3 is easier as well, which has ramifications for security protocols.So, in summary, this discovery is a bit like finding out that an L-joint which has been used reliably by plumbers the world over is much more likely to leak than previously thought.
But we haven't seen the results yet...
The key is not that they found a collision, but that they did so with substantially less than brute-force effort. Substantially less here means 30 - 130 orders of magnitude less (depending on whether you call SHA-0 a 2^160 space or a 2^80 space because of the birthday paradox mentioned in the parent article).
--Pat / zippy@cs.brandeis.edu
I see that I forgot the birthday paradox, which means that brute forcing a hash is much closer to a 2 ^ 80 problem.
So here's the calculations for that:
2^80 = 1.208925819614629174706176 x 10^ 24, so we use 1.2 x 10 ^ 24.
1.2 x 10 ^ 24 seconds / 100 seconds/minute = 1.2 x 10 ^ 22 minutes
1.2 x 10 ^ 22 minutes / 100 minutes/hour = 1.2 x 10 ^ 20 hours
1.2 x 10 ^ 20 hours / 100 hours/day = 1.2 x 10 ^ 18 days
1.2 x 10 ^ 18 days / 1000 days/year = 1.2 x 10 ^ 15 years
A much smaller number, but you're still not likely to get it before the sun goes out.
"I won't mod you down - I feel the need to call you a twit explicitly, rather than by implication."
http://www.google.com/search?q=%22attack+relies+on +the+idea+of+producing+duplicates%22&num=20&hl=en& lr=&ie=UTF-8&safe=off&filter=0
You're so wrong it's funny :)
:) there *is* no further analysis you can make. lets take the following:
Sorry, didn't mean to mock, it's just amusing whenever these one-time pad things come up and everyone starts jumping up and down yelling "unbreakable" and others start going "no, 'cos, like, we could brute-force it.."
You can't brute-force a one-time pad. That's the point. There are many weaknesses to OTP, related to key exchange, but you can't brute force it, because you have no way of knowing if you're right, or even if you're close. The possible set of plaintexts from a properly OTP encrypted message is the complete set of possible plaintexts of that size (or smaller, plausibly).
Let us take the following ciphertext:
aaa
I have encrypted this with a one time pad. Now, it's a pretty short message. We could brute force all the possible combinations on your regular computer pretty much instantly. Anyone care to guess what the message might be?
Of course not, because it could be *any* 3 letter combination, assuming that I'm sticking to letters. Any attempt to contextually analyse it is flawed because you will never be able to prove you got it right. Let's say that we know the message is english, and we can therefore reduce the number of possibilities down to all 3 letter english words.
Woohoo. It doesn't help, it doesn't get us any closer to knowing exactly what it is, because there is no next step, the only information that can aid us in the decryption of a one time pad is information from "outside" the decryption. In this case, two items of information are available to us, the length (3 letters) and the fact that it is english, but the actual ciphertext itself is of no value whatsoever. It doesn't matter if those a's were z's or q's or anything else, we can't do anything with them unless we have the OTP.
"Decrypt candidates with "bad" and "moo" in them would definitely merit further analysis"
This is always the point where things go wrong
abskjhsglkjlssdkglkjsfdlkgjfld
Now lets imagine that we knew it was coming from bank robbers. Sweet, so, what can we do? again, the *only* information we have is the length. It could say this:
I am going to rob a bank in WA
Or this:
I am going to rob a bank in CA
Or this:
I am going to rob a bank in NZ
And there is no way to prove what it actually says at all. It might say:
I am going to buy some flowers
Again, you'll never know unless you have the key, there's just no way to tell.
You can't win a fight.
They didn't get a real MD5 collision. The authors of that paper got the IV vector endianness wrong on the MD5 algorithm and therefore the algorithm used wasn't MD5. However, the IV vectors don't seem to have any significance on the MD5, so I wouldn't think it'd be hard for them to produce a real collision after fixing up the algorithm.
Okay, the story goes like this:
NIST (working with the NSA, but in a GOOD way) develops the SHS, of which SHA is part, during their effort to develop the DSS (of which the DSA standard is part). The standard is published, and everybody rejoices.
A little while later, NIST says, "Hey! There's a new version of SHA! We're calling it SHA-1. There are, uh, 'security concerns' with the old SHA. We can't tell you what it is, but don't use it. By the way, the main difference between SHA and SHA-1 is the introduction of a shift instruction. Buh-bye!"
The obvious inference here is that the NSA's crypto folks found a nasty problem with the original algorithm, and had it changed in the interest of keeping the Secure Hash Algorithm, well, secure.
Now, this catches the interest of cryptologists. They spend a bunch of time analyzing the algorithm, and a couple of folks found that there are a few attacks-- very theoretical, very impractical-- that could find a collision in slightly less time than a brute force search or a birthday attack.
With that, everybody stops trusting the original SHA. Most people decide to use SHA-1 or MD5 instead-- but SHA-1 has a longer hash length than MD5 (SHA-1 is designed for ~80 bits of security, MD5 designed for ~64 bits), so a lot of people decide to trust SHA-1 instead.
What's happened at CRYPTO '04 is a significant improvement on the attacks discovered previously. A birthday attack on the SHA-0 algorithm would normally be on the order of 2^80 memory and 2^80 time. This attack used 2^51 time and significantly less memory-- in other words, this attack is somewhere in the area of 500 million times faster than previously known possible.
As for the idea that "SHA-1 is not to be trusted", well-- maybe the more paranoid folks avoid it, but a lot of applications still use SHA-1:
For the record, SHA-1 is used in PGP (but not exclusively, and not necessarily as the default hash algorithm), in SSL (again, this can be configured), in a number of TripWire-like programs, by Gentoo's "emerge" system, the *BSDs' "ports" trees, and-- this is important, here-- as the only officially recognized hash for use in the digital signature standard (DSS).
So an attack on SHA-1 would be VERY significant.
Some quick numbers on this:
First let's start with something that might return some "sensible" (i.e. not ridiculously high) numbers. On the Apple II, Basic programmers had access to an incredibly low resolution mode with 40x40 pixels and 16 colors. Assuming we only use 2 colors (say, black and white), there are:
2^(40*40) = 4.44 x 10^481 possible screen images.
Whew! Already far beyond the 2^256 limit discussed. But out of curiosity, we can look at some other numbers. Using the full 16-color support of this low-res mode:
16^(40*40) = 3.90 x 10^1926
How many possible terminal screens are there, assuming only alphanumeric (and space) characters?
(26+26+10+1)^(80*24) = 5.41 x 10^3454
And some other modes of interest:
320 x 200, 2 colors: 8.31 x 10^19265
320 x 200, 256 colors: 2.27 x 10^154127
640 x 480, 256 colors: 2.07 x 10^739811
After this, direct computation was far too slow, but we can get rough estimates:
640 x 480, 16-bit color:
640*480*log10(65536) = 10^1,479,622
800 x 600, 16-bit color:
800*600*log10(65536) = 10^2311910
1024 x 768, 16-bit color: 10^3787833
And finally...
1024 x 768, 32-bit color: 10^7575677
Yep, a 1 with 7.5 million zeroes behind it. So we may all have to wait awhile before we see a computer sequentially generate a picture of alternate post-Caesar Earth. Still, an interesting thought.
This person's intro into this "news flash" is so out of wack I don't know where to begin!
_ __
Lets start with SHA-0 collisions, methods for determining collisions in the original
SHA algorithm have been known since 98'. In 95' the NSA issued a modification to the
SHA-0 (original algorithm) which became SHA-1, the modification was to counter an attack
unknown to open researchers known as parallel shifting. The two French guyz which found
collision in SHA-0 in 98' were the first open researchers to encounter this attack method.
A side point I would like to make is that the NSA made a similar change back in the early
70's to the IBM DES algorithm which until 89-90 was not known why such a change was needed.
The attack the modification was protecting against was differential cryptanalysis. early 90s
there was a 20 year difference in knowledge late 90s there was only 3 years difference in
knowledge GO FIGURE!
So in theory the SHA news is old news, as far as MD5 and RipeMD, well anyone that has the
slightest clue of the mathematics behind message digests will know that all MDs derive from
the same logic and same mathematics, and that a flaw found in one algorithm can be easily
transferred to another algorithm if that a particular algorithm hasn't already dealt with
that specific attack/flaw.
And on a final note:
"Many systems, especially those that use cryptography for digital signatures are most at risk here."
Tell me of system in the world that uses security and does not make use of hash functions?
Arash Partow
_______________________________________________
Be one who knows what they don't know,
Instead of being one who knows not what they don't know,
Thinking they know everything about all things.
http://www.partow.net
Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.