SHA-0 Broken, MD5 Rumored Broken
An anonymous reader writes "Exciting advances in breaking hash functions this week at the CRYPTO conference. SHA-0 has
definitely been broken (collision found in the full function). Rumors are that at the informal rump session, a researcher will announce a collision in full MD5 and RIPEMD-128. And Ed Felten is speculating about collisions in SHA-1! Many systems, especially those that use cryptography for digital signatures are most at risk here."
I picked the wrong week to quit sniffing MD5 hashes.
d008960fa6b395dca1c8362165bb31be
Can a crypto-geek sum up the consequences for all of us dummies? Thanks.
No GNU has been Hurd during the making of this comment.
I'm a neophyte, so excuse my ignorance, but how does the fact that a full-time researcher (working on the SHA-0 algorithm), using a computation requiring: (direct quote follows):
... in order to find two different input values which produce the same output value (I presume that's what they did.. I could be wrong) make _any_ sort of practical difference?
The computation was performed on TERA NOVA (a 256 Intel-Itanium2 system developped by BULL SA, installed in the CEA DAM open laboratory TERA TECH). It required approximatively 80 000 CPU hours. The complexity of the attack was about 2^51.
Your bank will buy enron stock with your accounts, your credit card will explode, and your mind will begin to melt. Nuclear missiles will spontaneously launch and direct themselves to your house. Bush will be exposed as a witless robot when he begins to utter swahili at a press conference. The Martians will arrive from their base on the dark side of the moon, and the War of the Worlds will begin. Super-Bowl half-time will be unceremoniously interrupted when terrorists will arrive to sear off Janet Jackson's nipple with a laser in the name of Allah.
In fact, advances in computer speed tend to favor people encrypting data, rather than those trying to decrypt it. For example, increasing CPU speed by a factor of four or five will generally allow you to use a key two or three times as large, and still get the same performance. However, it definitely won't let you crack a key twice as large.
Suppose your faster CPU inspires you to move from 128-bit keys to 256-bit. What happens to the guy trying to decrypt your message? He now has to work 68,056,473,384,187,692,692,674,921,486,354,000,000 times as long, even if he buys the 5x faster CPU. Ouch!
doesn't mean that much.
Carefully crrafted binary data can be made to have the same checksum.
This is not a generalised attack where I can create binary data to have a CHOSEN checksum.
Therefore, if you verify your downloads by checksum, I can't generate a fake download with the same checksum.
First step is MATCHING some checksums (this has been done)
The next step is CHOOSING the chekcsum (aka DEADBEEF attack)
The next step is MANIPULATING, i.e. adding junk to a given binary file to allow you to choose the cheksum. that's the scary one!
- substitute trojaned binary
- append some binary junk so the checksum matches
- profit!!!
Nothing to worry about yet, sort of like the first proof-of-concept brute force crack of DES.
Yes, it can be done under some circumstances.
Yes, eventually processing power and methods may improve to make this a valid attack
Yes, you can sleep soundly tonight.
All they found here were two values C, D for which H(C) = H(D). Anyone who thinks this was suprising needs to take a look at the idea of a hash:
The existence of collisions is obvious. The point is that it should be so phenomenally difficult to find a collision that you can't ever come up with one in a sane amount of time.
If they could find two data values with less than, say, 80 bits of data that both hash to the same, then that's something, 'cause that could be someone's password.
That would be unlikely to be a big deal. For passwords, I can just brute force the password, maybe precompute a dictionary of password hashes.
May we never see th
maybe I should have read this first: http://www.freedom-to-tinker.com/archives/000661.h tml
"And now the rumor is that somebody has extended Joux's method to find a collision in SHA-1."
"The finding of a single collision in SHA-1 would not, by itself, cause much trouble, since one arbitrary collision won't do an attacker much good in practice. But history tells us that such discoveries are usually followed by a series of bigger discoveries that widen the breach, to the point that the broken primitive becomes unusable. "
Obtaining the original data is hardly the point of breaking the hash. You can't recreate the Illiad from 2048 bits for God's sake.
An attacker's goal would be to substitute something else for the original data and make you trust it.
ZZ
Indeed, here's another novel argument from Bruce Schneier's book....
in regards to the strength of 256-bit encryption:
now, the annual energy output of our sun is about 121 * 10^41 ergs. this is enough to power about 2.7 * 10^56 single bit changes on our ideal computer; enough state changes to put a 187-bit counter through all its values. if we build a dyson sphere around the sun and captured all of its energy output for 32 years, without any loss, we should power a computer to count up to 2 ^ 192. of course, it wouldn't have the energy left over to perform any useful calculations with this counter.
but that's just one star, and a measly one at that. a typical supernova releases something like 10^51 ergs. (about a hundred times as much energy would be released in the form of neutrinos, but i let them go for now.) if all this energy could be channeled into a single orgy of computation, a 219-bit counter could be cycled through all of its states.
these numbers have nothing to do with the technology of the devices; they are the maximum that thermodynamics will allow. and they strongly imply that brute-force attracks against 256-bit keys will be infeasable until computers are built from something other than matter and occupy something other than space.
bruce schneier, applied cryptography, p 158
Yep -- that's right. I'm not a crypto expert. Hell -- I'm a layman compared to most /.'ers, and my user number proves it (all 7 embarrassing digits of it). But I do know this -- if Slashdot crypto geeks are concerned about it, then we've reached the point of...
CARRYING A MIDGET AROUND.
Yes, it's true. Every person with encrypted data on Earth will soon have to carry around a Level 10 Anthromorphic Hexidecimal Midget Encryption System. Or "Midget Key" for short. The midget will become part of every computer purchase where the user requires high encryption, secured communications, etc. Families without sufficient room to accommodate and feed the midget will have to run computers with the old and vulnerable encryption technologies.
Meanwhile, those of us with a Midget Key will need to have his/her encryption midget with us at all times. The midget will encrypt data locally by locking a portable hard drive to his/her wrist and preventing anyone OTHER THAN THE OWNER of said local data from accessing it again. To facilitate this local midget encryption, each encryption midget will be equipped with:
- body armor
- handgun
- lightweight sub-machine gun
- tactical nuclear or convential explosive self destruct device
Addtionally, each encryption midget will be required to communicate with all other encryption midgets around the world using special genetically encoded phones that cannot be replicated outside of the midget gene pool. The phone will be surgically embedded in the arm of each encryption midget and require a drop of said midget's body temperature saliva to activate the phone (a.k.a. spit on the arm to make the call).
Why encryption midgets? They're:
- portable
- eat less than an encryption giant and/or an encryption obese person
- tough as nails
Why tough as nails? If you've watched The Amazing Race at all this season on CBS, you have witnessed a midget drag her whiney, lazy cousin around the world. She has become the envy of other teams featuring health nuts, ex-Marines, and super-Christians. Who wouldn't entrust their data with a badass little person that can grab a live electrified cattle fence somewhere in South America, cuss about it, and STILL manage to continue the race?
Get me THAT encryption midget, and you'll never get a hold of MY data!
IronChefMorimoto
[Note -- if the midget from the show mentioned above has been eliminated from said show, then our data is doomed. I've missed the last several episodes, so all may be lost.]
Not really...nobody expects this trend in processing performance to continue forever. There are in fact provable limitations given things such as the number of atoms available in the universe that can be harnessed for computation... ignoring little details like quantum computation and such :-). These limitations may seem "out there" but in fact they aren't nearly as unrealistic as you might think. Exponentials grow FAST.
It's trivial to continue scaling the computing requirements required to break encryption schemes by simply adding more bits. That is, assuming the encryption/hash scheme itself doesn't have some fatal flaw which may allow for a sub-exponential cracking algorithm.
First, a little about SHA and SHA-1. SHA (0) was developed as a national standard hash function. Curiously, a last minute change was proposed by the NSA which resulted in SHA-1. The change was puzzling the cryptographers in academia. After some time, an attack was discovered on some classes of cryptographic hashes which SHA-1 turned out to be curiously immune to.
All that aside, one should view cryptographers a bit like plumbers. Cryptographers have a bag full of equipment (algorithms), which they combine in many ways to accomplish what they want. After much research and testing, certain ways of combining those pipes, valves, and spigots are 'proven' to work well, within given assumptions, such as expecting an L-joint not to leak. SHA-1 likewise is an integral component in many cryptographic systems.
SHA, MD5, RIPEM, and others are cryptographic hash functions. Cryptographers expect certain things out of a hash function. Its job is to take a variable amount of input bytes, and give you back a small, fixed number of bytes. Most importantly:
- The function is 'one-way'. You can't determine from the output what any of the input was.
- The probability that two inputs hash to the same output should be exceedingly low. In particular, an ideal hash function has a randomly distributed output for a large set of inputs.
- Given a hash output, it should be extremely difficult to construct another document which also hashes to that output.
In this case, it seems that it may be much more easy than the designers had hoped to break the second condition. This tends to mean that 3 is easier as well, which has ramifications for security protocols.So, in summary, this discovery is a bit like finding out that an L-joint which has been used reliably by plumbers the world over is much more likely to leak than previously thought.
But we haven't seen the results yet...
No, it's the OTP key that needs to be random. The main problems with OTP are that the key needs to be as long as the message, can only be used once, and a secure channel is needed to distribute the key.
If you do OTP right, it is unbreakable* because the only possible attack is to brute force it by trying every possible key, and trying every possible key on an n character cyphertext gives you every possible n character plaintext, with no way of telling which one is right. (That is, if you had the 16 character cyphertext "bhgisngukfgxd gyt" you would get all possible 16 character strings as possible plaintexts, including "attack US friday", "shoot Osama soon" and "I like chocolate", and you would have no way to tell which was the actual plaintext.)
*except for rubber-hosing, but that affects all crypto systems, and is a weakness of the people involved, not the crypto.
I see that I forgot the birthday paradox, which means that brute forcing a hash is much closer to a 2 ^ 80 problem.
So here's the calculations for that:
2^80 = 1.208925819614629174706176 x 10^ 24, so we use 1.2 x 10 ^ 24.
1.2 x 10 ^ 24 seconds / 100 seconds/minute = 1.2 x 10 ^ 22 minutes
1.2 x 10 ^ 22 minutes / 100 minutes/hour = 1.2 x 10 ^ 20 hours
1.2 x 10 ^ 20 hours / 100 hours/day = 1.2 x 10 ^ 18 days
1.2 x 10 ^ 18 days / 1000 days/year = 1.2 x 10 ^ 15 years
A much smaller number, but you're still not likely to get it before the sun goes out.
"I won't mod you down - I feel the need to call you a twit explicitly, rather than by implication."
http://www.google.com/search?q=%22attack+relies+on +the+idea+of+producing+duplicates%22&num=20&hl=en& lr=&ie=UTF-8&safe=off&filter=0
You're so wrong it's funny :)
:) there *is* no further analysis you can make. lets take the following:
Sorry, didn't mean to mock, it's just amusing whenever these one-time pad things come up and everyone starts jumping up and down yelling "unbreakable" and others start going "no, 'cos, like, we could brute-force it.."
You can't brute-force a one-time pad. That's the point. There are many weaknesses to OTP, related to key exchange, but you can't brute force it, because you have no way of knowing if you're right, or even if you're close. The possible set of plaintexts from a properly OTP encrypted message is the complete set of possible plaintexts of that size (or smaller, plausibly).
Let us take the following ciphertext:
aaa
I have encrypted this with a one time pad. Now, it's a pretty short message. We could brute force all the possible combinations on your regular computer pretty much instantly. Anyone care to guess what the message might be?
Of course not, because it could be *any* 3 letter combination, assuming that I'm sticking to letters. Any attempt to contextually analyse it is flawed because you will never be able to prove you got it right. Let's say that we know the message is english, and we can therefore reduce the number of possibilities down to all 3 letter english words.
Woohoo. It doesn't help, it doesn't get us any closer to knowing exactly what it is, because there is no next step, the only information that can aid us in the decryption of a one time pad is information from "outside" the decryption. In this case, two items of information are available to us, the length (3 letters) and the fact that it is english, but the actual ciphertext itself is of no value whatsoever. It doesn't matter if those a's were z's or q's or anything else, we can't do anything with them unless we have the OTP.
"Decrypt candidates with "bad" and "moo" in them would definitely merit further analysis"
This is always the point where things go wrong
abskjhsglkjlssdkglkjsfdlkgjfld
Now lets imagine that we knew it was coming from bank robbers. Sweet, so, what can we do? again, the *only* information we have is the length. It could say this:
I am going to rob a bank in WA
Or this:
I am going to rob a bank in CA
Or this:
I am going to rob a bank in NZ
And there is no way to prove what it actually says at all. It might say:
I am going to buy some flowers
Again, you'll never know unless you have the key, there's just no way to tell.
You can't win a fight.
Actually, you can do interesting and dangerous things with variants of your first step, not even progressing to step two. The MD5 collisions (well, almost collisions) are largely the same input data that has differences in only a few places. Now imagine that I have two messages that say something like this:
- "Joe will send Dr. Blue $10. Confirmation number 1234567."
- "Joe will send Dr. Blue $100000. Confirmation number 6451234."
Now lets say I can manipulate the confirmation numbers in those two messages so that they have the same hash value -- I don't care what the hash is, as long as it's the same in both cases. Then I send you the $10 message.If you agree, you sign it. But you realize that digital signatures don't actually sign the message, right? They sign the hash of the message, so I can later produce the $100000 message, with your signature, and it will verify that you signed that message!
Or say Microsoft signs the hash of Windows XP SP3 with their special key. I tweak the firewall to allow in a nasty backdoor, put data in the padding to give it the same hash as before, and put it out on a BitTorrent site for hundreds of unsuspecting geeks to download. The signature verifies fine, so it must be OK, right?
Karma: Segmentation fault (tried to dereference a null post)
For more fun see Ultimate physical limits to computation by Seth Lloyd
They didn't get a real MD5 collision. The authors of that paper got the IV vector endianness wrong on the MD5 algorithm and therefore the algorithm used wasn't MD5. However, the IV vectors don't seem to have any significance on the MD5, so I wouldn't think it'd be hard for them to produce a real collision after fixing up the algorithm.
hmmmm....
I have heard rumors of a cypher on the street called SHA-X. It's not mathematically strong, as you so eloquently put, but it's supposed to be really good, really stong stuff. And is really asymmetrical, meaning it takes less time to decypher the message after encryption. Unfortunately it uses a semi-random keysize, so you never know the strength until you try to decrypt. It also has a key that destroys itself 48 hours later so Alice or Bob can't even tell you were ever encrypted. Only problem is the algorithm tends to overuse one particular register resulting in spontaneous cpu burnout.
But hey, if you got extra cpus...
How otfen does this have to be said:
- odd is development
- even is release
use ROT13, tripple-ROT13, quintupple-ROT13 for DEVELOPMENT WORK ONLY!
For release work, use double, quadruple, hextuple-ROT13
Okay, the story goes like this:
NIST (working with the NSA, but in a GOOD way) develops the SHS, of which SHA is part, during their effort to develop the DSS (of which the DSA standard is part). The standard is published, and everybody rejoices.
A little while later, NIST says, "Hey! There's a new version of SHA! We're calling it SHA-1. There are, uh, 'security concerns' with the old SHA. We can't tell you what it is, but don't use it. By the way, the main difference between SHA and SHA-1 is the introduction of a shift instruction. Buh-bye!"
The obvious inference here is that the NSA's crypto folks found a nasty problem with the original algorithm, and had it changed in the interest of keeping the Secure Hash Algorithm, well, secure.
Now, this catches the interest of cryptologists. They spend a bunch of time analyzing the algorithm, and a couple of folks found that there are a few attacks-- very theoretical, very impractical-- that could find a collision in slightly less time than a brute force search or a birthday attack.
With that, everybody stops trusting the original SHA. Most people decide to use SHA-1 or MD5 instead-- but SHA-1 has a longer hash length than MD5 (SHA-1 is designed for ~80 bits of security, MD5 designed for ~64 bits), so a lot of people decide to trust SHA-1 instead.
What's happened at CRYPTO '04 is a significant improvement on the attacks discovered previously. A birthday attack on the SHA-0 algorithm would normally be on the order of 2^80 memory and 2^80 time. This attack used 2^51 time and significantly less memory-- in other words, this attack is somewhere in the area of 500 million times faster than previously known possible.
As for the idea that "SHA-1 is not to be trusted", well-- maybe the more paranoid folks avoid it, but a lot of applications still use SHA-1:
For the record, SHA-1 is used in PGP (but not exclusively, and not necessarily as the default hash algorithm), in SSL (again, this can be configured), in a number of TripWire-like programs, by Gentoo's "emerge" system, the *BSDs' "ports" trees, and-- this is important, here-- as the only officially recognized hash for use in the digital signature standard (DSS).
So an attack on SHA-1 would be VERY significant.
Some quick numbers on this:
First let's start with something that might return some "sensible" (i.e. not ridiculously high) numbers. On the Apple II, Basic programmers had access to an incredibly low resolution mode with 40x40 pixels and 16 colors. Assuming we only use 2 colors (say, black and white), there are:
2^(40*40) = 4.44 x 10^481 possible screen images.
Whew! Already far beyond the 2^256 limit discussed. But out of curiosity, we can look at some other numbers. Using the full 16-color support of this low-res mode:
16^(40*40) = 3.90 x 10^1926
How many possible terminal screens are there, assuming only alphanumeric (and space) characters?
(26+26+10+1)^(80*24) = 5.41 x 10^3454
And some other modes of interest:
320 x 200, 2 colors: 8.31 x 10^19265
320 x 200, 256 colors: 2.27 x 10^154127
640 x 480, 256 colors: 2.07 x 10^739811
After this, direct computation was far too slow, but we can get rough estimates:
640 x 480, 16-bit color:
640*480*log10(65536) = 10^1,479,622
800 x 600, 16-bit color:
800*600*log10(65536) = 10^2311910
1024 x 768, 16-bit color: 10^3787833
And finally...
1024 x 768, 32-bit color: 10^7575677
Yep, a 1 with 7.5 million zeroes behind it. So we may all have to wait awhile before we see a computer sequentially generate a picture of alternate post-Caesar Earth. Still, an interesting thought.
This person's intro into this "news flash" is so out of wack I don't know where to begin!
_ __
Lets start with SHA-0 collisions, methods for determining collisions in the original
SHA algorithm have been known since 98'. In 95' the NSA issued a modification to the
SHA-0 (original algorithm) which became SHA-1, the modification was to counter an attack
unknown to open researchers known as parallel shifting. The two French guyz which found
collision in SHA-0 in 98' were the first open researchers to encounter this attack method.
A side point I would like to make is that the NSA made a similar change back in the early
70's to the IBM DES algorithm which until 89-90 was not known why such a change was needed.
The attack the modification was protecting against was differential cryptanalysis. early 90s
there was a 20 year difference in knowledge late 90s there was only 3 years difference in
knowledge GO FIGURE!
So in theory the SHA news is old news, as far as MD5 and RipeMD, well anyone that has the
slightest clue of the mathematics behind message digests will know that all MDs derive from
the same logic and same mathematics, and that a flaw found in one algorithm can be easily
transferred to another algorithm if that a particular algorithm hasn't already dealt with
that specific attack/flaw.
And on a final note:
"Many systems, especially those that use cryptography for digital signatures are most at risk here."
Tell me of system in the world that uses security and does not make use of hash functions?
Arash Partow
_______________________________________________
Be one who knows what they don't know,
Instead of being one who knows not what they don't know,
Thinking they know everything about all things.
http://www.partow.net
Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.