SHA-0 Broken, MD5 Rumored Broken
An anonymous reader writes "Exciting advances in breaking hash functions this week at the CRYPTO conference. SHA-0 has
definitely been broken (collision found in the full function). Rumors are that at the informal rump session, a researcher will announce a collision in full MD5 and RIPEMD-128. And Ed Felten is speculating about collisions in SHA-1! Many systems, especially those that use cryptography for digital signatures are most at risk here."
So what does this really mean for the average web surfer and computer user?
... inevitable that eventually all crypto will be broken? Surely it's only a matter of time.
What are the possible rammifications if MD5 is confirmed to have a collision? What other hashing algo's are there that may take its place? And even if it does have collisions, is that serious enough to break many security systems already in place?
thisnukes4u.net
I'm a neophyte, so excuse my ignorance, but how does the fact that a full-time researcher (working on the SHA-0 algorithm), using a computation requiring: (direct quote follows):
... in order to find two different input values which produce the same output value (I presume that's what they did.. I could be wrong) make _any_ sort of practical difference?
The computation was performed on TERA NOVA (a 256 Intel-Itanium2 system developped by BULL SA, installed in the CEA DAM open laboratory TERA TECH). It required approximatively 80 000 CPU hours. The complexity of the attack was about 2^51.
I am curious about 2 things:
Firstly, perhaps someone here on slashdot.org might be able to shed some light on the particular encryption algorythms mentioned in the article...thier basic differences, weaknesses and strengths.
Secondly, I am assuming that a "collision" is not as seriuos as a crack...what is a collision, and what are the ramifications of this in my ssh sessions and the like...
I have read about and used encryption, but have never really delved into it. I would venture to say that a good many of us would benefit from some enlightenment.
Has it?
Reading the list, it looks like some people in China managed to produce a hash collision using unknown means...
Although the mention in the paper that the result was obtained in one hour (using a P690, whatever that is) sounds scary... There is very little information though, for example we don't know if the message was special in any way.
Basically, they have managed to find two particular messages which collide, but I'm not yet convinced that they know how to make a collision for any abritary message.
But then... I'm not a cryto person.
>I just don't think this is anything special.
From the article;
The finding of a single collision in SHA-1 would not, by itself, cause much trouble, since one arbitrary collision won't do an attacker much good in practice. But history tells us that such discoveries are usually followed by a series of bigger discoveries that widen the breach, to the point that the broken primitive becomes unusable. A collision in SHA-1 would cast doubt over the future viability of any system that relies on SHA-1; and as I've explained, that's a lot of systems. If SHA-1 is completely broken, the result would be significant confusion, reengineering of many systems, and incompatibility between new (patched) systems and old.
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
I would just like to mention that one of the biggest problems of it becoming feasible to find a collision in MD5 is that a lot of routers use MD5 to authenticate routing updates with one another. If a hash is sniffed and the password is cracked then it becomes a trivial matter to inject bad routing updates and crash large networks especially if the inter-ISP BGP links are cracked. Its not quite as simple as I put it here but it is possible.
The paper that claims to have broken these says that the computation took 1 hour (with something like 15 seconds for subsequent collisions). Being able to generate collisions that quickly is a huge deal, and it has many applications -- it is often possible to "pad" a message with arbitrary data, so the data don't even have to have any particular format. It is scary to see all of these algorithms fall in the same week...
What this means for passwords is simple. People don't decrypt your password and compare it to a stored copy. They hash it, and then store the hash. When you log in, they hash your attempt, and if the hashes match, then the assumption is that the passwords matched, and you are let in. Hashes are very difficult to reverse, which is why they are used. The chances of two passwords producing the same hash is 1/2^2048. However, either one can be substituted for the other. We just trust in the extreme unlikelyness that two passwords would have the same hash and go on our merry ways.
Now suppose that someone has the hash of your password. They may be extremely unlikely to find your password, but they can find something just as good, if a bit unwieldly, since there's no guarantee that the substitute password is just as short as yours. If you don't mind a million character password, then there are likely about 2^8,386,560 passwords that will spoof yours.
"Anyone who attempts to generate random numbers by deterministic means is living in a state of sin." -- John von Neumann
why not do both a md5sum and a sha-0 on the same data? Isn't it more difficult to spoof a binary and have matching collisions in both hashes at the same time? If either of the md5sum or the sha-0 don't match then you know that the data has been tampered with, right?
One of the properties necessary to a good hash function is that one finds it extremely difficult to find two documents that hash to the same value. These guys managed to do it. That in itself isn't such a big deal; we knew that throwing enough compute cycles at the problem would eventually allow someone to find a collision.
But that period of time should be measured in decades. The author of this paper did it in just a little over 13 days. I'm not quite sure what he did; something about neutral bits... which sounds an awful lot like a way to predict which bits can be switched together to produce the same hash.
Cryptographic strength is based on np complete problems. The problems that you have to solve to break a hash function are typically characterized as one that can only be solved in less than polynomial time by having an oracle function; that is, some way to know the result of a calculation without having to actually do that calculation.
Neutral bits sounds like an oracle function. 80 000 CPU hours on a 256-way system is just a little over 14 days. Now I'm just waiting to see if SHA-1 is vulnerable. (let's hope not!!!)
I am disrespectful to dirt! Can you see that I am serious?!
Basicly you're saying that if SHA-1 alone is not good enough then use some combination of cryptographic hash functions and that will be better.
Well, OK. But it does not address the reason that so many people hope that the attack on MD5 and SHA-1 are not solid. These algorithms have a number of attractive characteristics. They are not hugely expensive to compute (the algorithm above is). They can be used to calculate a (supposedly) unique signature for an input. And the signature does not consume a large number of bits (another problem with the algorithm proposed above).
On a related note: I wonder if the attack on these algorithms has something to do with the number of bits used as input. Intuitively it seems that the larger the number of bits in the input stream, the more unique (widely distributed) the hash should be. So if you used small inputs then collision might be more likely.
One time pads. If the pad's bits are generated by truly stochastic events (say, radioactive decay) and the pad is longer than the plaintext, there is provably no way to ever ever ever ever recover the plaintext without the key; the only method is to simply guess all possible plaintexts.
All's true that is mistrusted
If there is a fundamental flaw in the hash algorithm itself, simply using it more often will probably not solve the problem.
It absolutely is incredibly hard to make an encryption algorithm more secure. Just "doing some math with the hashes" is the type of bit-twiddling at which cryptologists both wince and sneer. "Then I'll multiply the second one by three, then add them together! Then modulo it 17! Then oohoohooh, square root the whole thing and drop the first digit! No one will _ever_ figure this out!" Crap like this does not add any new cryptographic strength, just a dependency on a secret algorithm. And any method which relies on a secret algorithm is hopelessly flawed.
There is still considerable debate in the cryptographic community about whether 3des is actually any stronger than des. Many people feel that if an attack is found to be effective against the des algorithm, the extra layers of stirring the bits around will not make the plaintext any more secure.
I'm afraid that Schneier covered this succinctly: "Anyone who creates his or her own cryptographic primitive is either a genius or a fool. Given the genius/fool ratio for our species, the odds aren't very good."
What has happened here (IIUC) is that somebody has shown that it's possible to create a new file with the same hash key as some other file. Which means that I could put up a file for download and provide a hash key for it, but someone could "mirror" the file and change it to something different which still generates the same hash key. Which means that the "guarantee" that the hash-key gives is broken - a receiver can no longer be sure that a matching hash key means a matching file.
Reality is defined by the maddest person in the room
What if you find a hole in the sanity checks, or if there aren't any sanity checks at all? It's highly likely there are many programs out there which "trust" data which passes authentication, and don't bother with much in the way of checking. Now this is obviously a short sighted thing to do, but given that SHA-1 and friends are apparently not broken, I wouldn't be surprised.
Personally, I think breaking authentication has far more effect than breaking encryption. Often all you need is a garbage message to be accepted, e.g:
Use an faked md5 to put out rootkitted.tgz? Odds are that any other message with the same hash isn't going to be a valid.tgz.
No, but it will cause the application to fail in an unexpected way: an authenticated .tar.gz which doesn't untar. How about a scarier example: run you an apparently "safe" signed client-side piece of executable code off a web page, except it's actually garbage and it causes your web browser to crash. Or your web server does an automatic update from the vendor only to have its installation corrupted. Or simply inserting garbage but authenticated packets into a network stream thereby causing it to disconnect (or even crash one or both sides).
An interesting thing I notice in the SHA-0 collision is that there aren't that many bits changed - the message looks largely the same. I wouldn't be surprised if you could apply the technique to signed executable code such that a collision is found which would crashes a program in just the right place and just the right way to execute arbitrary code, e.g fork a shell.
Unless the hashes have a lot on common, it seems like this would be a simple way to extend the useful life of both hashes.
This PDF explains the MD5 Collision: http://eprint.iacr.org/2004/199link