MD5 To Be Considered Harmful Someday
Effugas writes "I've completed an applied security analysis (pdf) of MD5 given Xiaoyun Wang et al's collision attack (covered here and here). From an applied perspective, the attack itself is pretty limited -- essentially, we can create 'doppelganger' blocks (my term) anywhere inside a file that may be swapped out, one for another, without altering the final MD5 hash. This lets us create any number of binary-inequal files with the same md5sum. But MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash. Wang released the two files needed (but not the collision finder itself). A tool, Stripwire, demonstrates the use of colliding datasets to create two executable packages with wildly different behavior but the same MD5 hash. The faults discovered are problematic but not yet fatal; developers (particularly of P2P software) who claim they'd like advance notice that their systems will fail should take note."
I guess I know what I'll be coding now - SHA-1 just got a lot more important in my priority list... :(
I can only hope I live that long.
Aha! So it was MD5 and not MP5...
The owls are not what they seem
So does this mean that it's possible to find a useful MD5-equivalent file for any file? Just because someone alters a file does not mean they have done anything destructive. Would one be able to take a binary, make a change of some sort, and then run a tool to determine the block of data to add to the binary to both allow the change to take effect and cancel out the MD5 change? How complex would it be to construct this tool?
You are not the customer.
mc5 was harmful to your ears...
Is there a translator from ultra-nerd to english?
Does this mean that I can take a given binary file, like kde.rpm, and change it so that its' contents are different, and harmful, yet the MD5 hash is the same? This has never been done before? Sounds like an interesting 4th-year engineering project to me. I'd think it'd be really easy to mess up a file and keep the same MD5 hash (like screwing up an avi file shared ona p2p network). But to actually change the file so that it behaves a certain way, wow.
By examining the MD5 hash using a sophisticated Fourier schema followed by deconvolution with a bit binary-inequal collision analysis, it is quite obvious I have no freaking clue what this stuff is about.
I am glad somebody does.
No trees were harmed in the composition of this; however, numerous electrons were inconvenienced.
Someone with the knowledge and will and time can come up with a way to circumvent almost any protective scheme they come up with. Likely the only way to really safeguard something like this is to do a very time consuming and cpu intensive comparison of the two files. It will come back to "do you trust the source of the file".
So the RIAA can corrupt Kazaa and Gnutella music downloads by sharing garbage files with matching checksums.
http://en.wikipedia.org/wiki/Md5
:)
here is a very good link about the algo...
I've seen this exact thing in the wild already. One of my customer is storing large amounts of mpeg movies. They were using md5sum to verify file contents for network transfers. This worked great until there was a power failure on of the remote systems. Everything passwd the md5sum perfect, except halfway through the clip you would lose video. I haven't relied on md5sum for over a year now as a result.
"The similarities of sysadmins and drug dealers: both measure stuff in K's, and both have users."
If your cursor finds a menu item followed by a dash,
And your double-clicking icon puts your window in the trash,
And your data is corrupted 'cause the index doesn't hash,
Then your situation's hopeless and your system's gonna crash!
He can create a file that MD5sum's to the same result as a legitimate file, but does not have full control over the content or size of the result (making this a mostly useless avenue of exploitation except for people who want to spread trash on P2P networks -- I.E. it shouldn't particularly bother anyone except people who already don't care about security).
Or he can create two files that MD5sum to the same result. But he has to have control over both files, which offers effectively no advantage to someone who is trying to spread malware or tamper with existing archives that have been MD5summed.
Consequently, while this is of academic interest I don't see what the big deal is; any time you reduce a large file to a fingerprint you will inevitably run into problems like this because it is impossible to represent one-to-one every individual possible combination of a large set of data in smaller sets ("fingerprints"). You can reduce the risk by increasing the set domain with a larger variadic function but it is impossible to escape this constraint without using fingerprints as large as the data itself.
Try not. Do or do not, there is no try.
-- Dr. Spock, stardate 2822-3.
Buy it instead.
I'm kidding. I download Linux distro's via bittorrent all the time and willfully ignore the MD5 sum, and now I come to find that it's compromisable. OH SNAP!
I can envision something being added on to bittorrent to prevent bad blocks being passed around via ... magic. Okay. I can't really envision it, but I'm sure it can be done.
On to the next topic I'm not qualified to comment on...
*is run over by rotten tomatoes*
Other weaknesses were reported in various other secure digests, including MD4, RIPEMD, HAVAL-128, SHA-0.
SHA-256 is a good replacement. SHA-1 should be fine but if you are going through the trouble of an upgrade, why not make it sufficiently future proof?
heres a very good link for the "very good link about the algo"
Two files with the same MD5 hash at once. Aaw yeah.
The exploit has used the properties of the MD5 algorithm to create two executable files with the same MD5 sum.
Anytime you use a hash especially for passwords you are only in fact giving the intruder additional password combinations that will work.
proof:
md5(x1) = some md5 string
md5(x2) = same md5 string
where x1 thru x100000... could be any number of possible passwords that will equal the same hash.
I say this under the assumtion that "someday" somebody will figure out a way to take a good file and make an MD5 equivalent file to poison a torrent with. The way I read the blurb, it sounds like both files were specifically crafted to create a collision.
[This is the author]
I've been doing some analysis on MD5 collision announced by Wang et al. Short version: Yes, Virginia, there is no such thing as a safe hash collision -- at least in a function that's specified to be cryptographically secure. The full details may be acquired at the following link:
http://www.doxpara.com/md5_someday.pdf
A tool, Stripwire, has been assembled to demonstrate some of the attacks described in the paper. It may be acquired at the following address:
http://www.doxpara.com/stripwire-1.1.tar.gz
Incidentally, the expectations management is by no means accidental -- the paper's titled "MD5 To Be Considered Harmful Someday" for a reason. Some people have said there's no applied implications to Joux and Wang's research. They're wrong; arbitrary payloads can be successfully integrated into a hash collision. But the attacks are not wildly practical, and in most cases exposure remains thankfully limited, for now. But the risks are real enough that responsible engineers should take note: This is not merely an academic threat, systems designed with MD5 now need to take far more care than they would if they were employing an unbroken hashing algorithm, and the problems are only going to get worse.
Some highlights from the paper:
* The attack itself is pretty limited -- essentially, we can create "doppelganger" blocks (my term) anywhere inside a file that may be swapped out, one for another, without altering the final MD5 hash. This lets us create any number of binary-inequal files with the same md5sum.
* MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash. This leads to...
* Attacks are possible using only the proof of concept test vectors released by Wang -- the actual attack is not necessary.
* Stripwire emits two binary packages. They both contain an arbitrary payload, but the payload is encrypted with AES. Only one of the packages ("Fire") is decryptable and thus dangerous; the other ("Ice") shields its data behind AES. Both files share the same MD5 hash.
* Digital Signature systems are vulnerable, as they almost always sign a hashed representation of data rather than the data itself.
* This is an excellent vector for malicious developers to get unsafe code past a group of auditors, perhaps to acquire a required third party signature. Alternatively, build tools themselves could be compromised to embed safe versions of dangerous payloads in each build. At some later point, the embedded payload could be safely "activated", without the MD5 changing. This has implications for Tripwire, DRM, and several package management architectures.
* HMAC's invulnerability has been slightly overstated. It's definitely possible, given the key, to create two datasets with the same HMAC. Attacker possession of the key violates MAC presumptions, so the impact of this is particularly questionable.
* Very interesting possibilities open up once the full attack is made available -- among other things, we can create self-decrypting executables (fire.exe and ice.exe) that exhibit differential behavior based on their internal colliding payloads. They'll still have the same MD5 hash.
* Several doppelgangers may (relatively quickly, as per Joux) be computed within a single multicollision-friendly block. As such, the particular selection of doppelganger sets within a file can itself be made to represent data. It's relatively straightforward to embed a 128 bit signature inside an arbitrary file, in such a way that no matter the value of the signature, a constant MD5 hash is maintained. This is curiously steganographic.
* Many popular P2P networks (and innumerable distributed content databases) use MD5 hashes as both a reliable search handle and a mechanism to ensure file integrity. This makes them blind to any sign
Was there a cash money prize for someone who could produce a pair of strings with the same MD5 sum? I remember reading or possibly hearing that No One had Ever discovered a hash collision with MD5. Is this the first ever hash collision found?
--grendel drago
Laws do not persuade just because they threaten. --Seneca
I found out the darker and moister it was, the more powerful it was. Of course any form of hash was much stronger then regular leaf pot and you could get far more reusable resin when you were done with the pipe.
For the most part, big deal, all of current security procedures will be harmful ie compromised in the future...
Using MD5 with SHA1, or even the older MD2 or MD4 will reduce the probability of creating a compatable binary with the same checksum to virtually zero.
If only one checksum is required then just XOR the resulting checksums from each algorithm.
Slicing and dicing the file into many different overlaping blocks and then computing a md5 sum for each block would be much much stronger.
....
a. MD5 sum entire file
b. MD5 sum each 10K block in the file (overlap blocks by ZZZ bytes)
c. break file into different blocks such as every 40th byte in the file
Can someone explain to me how this is something more complicated than the birthday paradox?
(heh, Wang)
Blaze a trail to the New World
Well, an md5sum has only 16^32 possible values, possibly less (I don't know the algorithm exactly), so it was only a matter of time.
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
Whenever you use a hash, you should always record the size along with the file. That way, any "doppleganger blocks" will stick out like a sore thumb.
I've made the switch to SHA1 for all my website work where I'm storing passwords and session IDs and the like. It's just a simple change from 32 to 40 characters, and a search/replace for:
md5_hex to sha1_hex
I was a bit suprised recently when I tried to
sort 15,000 jpg image files to remove duplicates.
Since the names varied, and it is not uncommon
to have many images with the same length, it
seemed like a good idea to use md5 hashes.
So I coded it up in python to do the md5 hash,
and then stuffed the file name into a table
keyed by the md5 hash. Big suprise when multiple
different files landed in the same hash.
Some property of jpgs probably reduces the
randomness of the files and increases the
probability of hash collisions.
It's all about the famous IMPS (Infinite Monkey Protocol Suite, RFC2795), if you give the CPU enough data, and enough time, it will eventually crack what ever encryption you are using. To be safe, deploy a stegography technique to hide your data, such as in the meta field of a picture, or within some MIME data on a harmless looking email, but be clever about it (theres a lot of junk in harmless looking SWAP file).
Why UNIX?
If you have a 4-byte hash value of a 1000-byte character string, you have 4+ billion possible hash values. But the number of possible 1000-byte strings? What's 256^1000? So, on average, there would be 256^4 / 256^1000 source values that would generate each individual hash value. The only way to avoid this is to have a hash value that is as long as the source value, which eliminates much of the benefits of a hash.
I like the idea someone else posted about using multiple algorithms because it would dramatically reduce the number of possible values that would generate the same hash value using 2 different methods.
Look at the Whirlpool hash. Its fast and amenable to hardware implementation.
Not that it comes down to that, of course, because we've got SHA-1 and other longer hashing algorithms to choose from, but in the scenario above I believe there is less risk with MD5.
Testing the equality of hash codes is not a replacement for testing equality. It should be used as an optimization to limit the set of items for which equals need be applied. e.g.
// IMPORTANT!
Thing search(Thing thingToFind) {
thingList = hashSearch(hash(thingToFind));
for each thing in thingList
if (thing == myThing)
return thing;
return null;
}
Which is the same as just one insecure algorithm ;]
You can't handle the truth.
"Try not. Do or do not, there is no try.
-- Dr. Spock, stardate 2822.3."
I think you mean Yoda.
Besides, Dr. Spock is the baby doctor.
When will it be out?
Any plans for MD64?
n/m
Think of the children! We cannot allow some MD5 to put our children in the harm's way! Initiate operation AntiMD5, deploy the bombers.
You can't handle the truth.
It's also safe to truncate 40 byte SHA-1 down to 32 byte if need be.
However, there is no real need to use weak algorithms (unless you're in the Government and need to use the FIPS standard). Unless you're doing something time-critical, just use SHA-1 and Whirlpool.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
If you have the legal right to do so, please some samples, or at least submit them to the cryptographic community for analysis.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
verify eq cmp or verify eq diff --speed-large-files -q
I personally think any clients that don't support magnet links should really start consider adopting those -- DC++, most Gnutella clients, and Shareaza already do.
A magnet link can look like something like this:
magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZ
Beware: In C++, your friends can see your privates!
Like, say, Windows?
Yeah, yeah, Obligatory Microsoft Slam. But I don't see any great stampede to isolate and neutralize the threat that this Typhoid Mary of OS's presents to communication stability. Why are we preparing for a threat that barely exists, when we refuse to deal with the present, much greater threat?
I've always called this dirt. Some folks call it padding, but it's essentially the same. However up until now, no one has known how to do it with md5 and come up with the same sum. This is nothing more than a hack to me.
The bottom line is that if you care enough about security this will only be a problem for a day or so when md5 is finally solved completely and the hashes don't take five days to brute force -- they could be solved in a matter of microseconds. As soon as that happens, the next hash system goes up and it would not likely be so easy to break, unless the whole hashing process is somehow cracked wide open (which would be very bad for a number of reasons).
What we'll likely see is that with the greater increases in CPU speeds available to end users, the hash breaking systems can be tested easier until a final reverse algorhythm is available for md5. There already is a site that will break your hash and give you *something* with the same hash, and it takes a couple days.
The dangers of knowledge trigger emotional distress in human beings.
If you can modify the shadow password suite to support SHA-1 and/or Whirlpool, that would be wonderful. The greatest potential use for the attack is the discovery of aliases for passwords and other login tokens, as then you don't care if what you produce is nonsense. It just has to hash to the same key.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
This is slightly modified quote from our beloved Office Space.
I haven't had time to think this out, so I'm throwing it out for you guys:
Is doing a 2nd pass helpful?
In other words, if
ABCfireDEF
and
ABCiceeDEF
hash the same, does
ABCfireDEFABCfireDEF
necessarily, or even frequently, hash to
ABCiceeABCiceeDEF?
Even if it were to provide protection, in practical terms,
1) other hashing althorithms are likely faster than "MD5 times two"
2) there may be some - hopefully a lot fewer - places in a long file where where
ABCfireDEFABCfireDEF
can be replaced with
ABCiceeDEFABCmeltDEF
I'm beginning to like the "pick two algorithms and call me in the morning" approach.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
It's well known that encrypting twice doesn't always improve your security, so it wouldn't be surprising if hashing twice didn't always improve your security either.
Any sufficiently advanced technology is indistinguishable from a rigged demo
--Andy Finkel (J. Klass?)
MD5 every single byte (or better yet every bit) in the file and put the result alongside the original file. Voila! No smartypants would ever attempt to temper with that, bwahahahahah!
really. and it's about time this stupid meme died. it's all too easy to toss it about for dramatic effect when attention whoring.
Dijkstra Considered Dead
The real threat for it is for the p2p network. not because you will download a different version of the file, but the file is broken in various pieces. Imagine somebody released, dunno, some linux distro, over bittorrent. you will download the real file, and the fake one at the same time, leaving the final file useless, and probably ruining the release.
I could see game and movie companies releasing bogus versions of their pirated apps to break the releases. after all, you cant de-compress corrupt zips, rars and etc. the musics would have cracks and noises because of the bogus versions. P2P as it is now would be ruined
Jesus, I hope I'm wrong... I would have to acctualy buy MS crap.
www.adultswim.com has information on the TV show of Sealab 2021 if anyone is interested.
Quite interesting to see someone go around and claim words as their own. In that case, I claim the word "the". Anyone who wishes to use the word "the" can submit enough money to fill a Library of Congress for a lincense.
Ah, you found me!
They only compare on P2P file names. In fact, they apparently only compare very loosely on that.
Time to move on. And, while we're at it, let's skip SHA1 etc. as well and go directly to SHA512. That should last for a while.
quidquid latine dictum sit altum videtur.
He can create a file that MD5sum's to the same result as a legitimate file, but does not have full control over the content or size of the result (making this a mostly useless avenue of exploitation except for people who want to spread trash on P2P networks -- I.E. it shouldn't particularly bother anyone except people who already don't care about security).
Suppose you're storing passwords as encrypted hashes, so that intercepting the hashes doesn't tell you what the password is. But if you can generate a password to match that MD5...
You know that GPG keys are identified and signed by their MD5 hashes? Suppose that I can generate a GPG key that would be identified as yours, and distributed it.
Or he can create two files that MD5sum to the same result. But he has to have control over both files, which offers effectively no advantage to someone who is trying to spread malware or tamper with existing archives that have been MD5summed.
There's a coin-flipping protocol that goes as follows. Suppose that Alice and Bob want to flip a coin (over the Internet), but they don't trust each other.
Now, suppose that Alice generated multiple files in step 1. When Bob makes his guess, she tries to pick a file that will make her win. If she generated only two files, completely randomly, this would let Alice win 75% of the time.
These are just the first ideas I thought of. If I were looking for other problems, I'd think about undeniable signatures, keysigning (which as GPG and X.509 SSL are heavily based on) and other specialized signature systems. In particular, I expect that the first type of crack could cause issues with SSH keys, both user keys (used for authentication) and host keys (to prevent man-in-the-middle attacks).
Digital signatures are used for much more than just testing for file tampering.
That was hilarious
Don't you hate meta-sigs?
The first case is that if you have any given file, you can take a subblock of that file and swap it with one of a number of other subblocks and keep the same checksum. Actually, that's not news, what's news is that you can COMPUTE what those sub blocks are.
The second case is that if you have two files that share the same checksum, you can add the same arbitrary code to both files and they will still share (a new) checksum.
The exploit in the first case is that if you can pick a subblock and then compute new "doppelganger" blocks that you can replace the original subblock with without changing the file's checksum, you can (if not now, someone will probably figure out how to do it in the future) also pick a sub-block and compute new sub blocks that result in a given checksum Y. So I take my original file, and replace some subblock with a subblock of my choosing. This results in a new checksum. Then all I have to do is pick a different subblock and compute a replacement block for that that restores the checksum back to the original value.
Basically, make one change to install my malicious code, then use my ability to compute subblock changes that result in particular checksums to swap a different subblock to restore the original checksum value.
The exploit for the second case is I write a block of malicious code, and then create another file with harmless code that has the same checksum. I then add a real program to the harmless code and distribute it. Some time later, I add the same program to the malicious code (resulting in the same checksum as harmless code + program) and I start distributing the file with the malicious code instead of the file with harmless code.
paintball
Eggs, .NET, Apple, whatever.. it's basically ripe to become another in-joke here at Slashdot. In Korea, only old people are considered harmful someday.
Someone with the knowledge and will and time can come up with a way to circumvent almost any protective scheme they come up with.
Well duh, that's what separates GOOD protective schemes from BAD ones, and what makes GOOD ones so valuable - they're the ones you can't circumvent.
Likely the only way to really safeguard something like this is to do a very time consuming and cpu intensive comparison of the two files.
Which two files? If you're comparing the file you downloaded from an untrusted source to the file you downloaded from the trusted source, why did you bother downloading the file from the untrusted source in the first place?
paintball
This tricky DOS virus that appeared in the early 90', appended certain number of 0's to the executable, so it would have exactly the same CRC.
:)
Expect a couple (+ ???) of years of cryptanalysis on MD5, and you'll just get that.
Still, what _I_ want to see is an ANTI-MD5 algorithm. Given an MD5 Hash, obtain a certain string that will generate it. Now _THAT_ would be something neat
"..to be considered harmful someday"? I mean, gimme a break, dude, if you think its harmful just write "MD5 is harmful", but "to be considered...someday"?
YA/.IJ - Yet Another Slashdot In Joke
The point is not that there is collision. The point is that given a particular file, you can compute other files that collide with that file in an amount of time many orders of magnitude less than it would take if you randomly tried files until you hit one that happened to collide.
paintball
Be honest. How many of you have checked the MD5 sums on a file with a TRUSTED source, as opposed to from the same source you got the file? How many of you do this regularly?
THAT is the biggest problem with MD5 for most users.
These posts express my own personal views, not those of my employer
Where time isn't critical (eg: creating and validating checksums for files), I'd say use both. The overhead isn't great, and you'd get much more security.
Where time is critical AND you don't have to be concerned with computers not under your control, use Whirlpool. Rijndael is fast, SHA-1 is slow. Whirlpool also offers a longer hash string than SHA-1.
In any other situation, use SHA-1. Whirlpool might turn out to be the greatest algorithm out there, but that doesn't help if you're trying to talk to a remote computer that doesn't support it.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
okay, so if MD5 has issues that can now be explioted, or at least tampered with, is it time for a new checksum? being clueless in this respect can we call it MD6?
why do you waste your time writing this stuff? graduate from high school and come back later.
But MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash.
Duh!
An analogy: the results of 4x8 and 16x2 are both 32. Now take the results of both and multiply by 3. Great Scott! We get 96 for both!
Even SHA is constructed this way. Since they're byte-wise and sequential with the input, you'll always get this problem. The only way to eliminate this behavior is to random-seek the file or input (which means you need to know up front how many bytes are going in -- not always possible), but if you did it this way, the hash would always come out differently, thus making the hash useless. If you wanted to keep the seek strategy the same each time to make it useful, then the "additive" problem would still occur -- instead of being at the end of the file, it would just be woven into the seek pattern.
There's nothing new here.
It has been said time and time again by the paranoid, and we all knew that this was possible. MD5 is supposed to be used to verify that the thing you have transfered isnt corrupted. THAT IS ALL :) none of this "omg lets use it for passwords because its Really Secure", because it isnt :P
MD5 isn't the right way to detect tampering.
Asymetric key encryption is. You have a trusted authority issue a key, and sign it. You can verify the key file hasn't been tampered with, and they can revoke the certificate. It's harder to fake, even if they can spoof a MD5, because you can connect out to the certificate authority and ask if the certificate is valid, and they can revoke them if there's problem.
If you're downloading a package, presumably you're wanting to know that it's the right package, and that it's from whomever it says its from. An MD5 hash does not get you that capability, asymetric encryption does.
Only someone who knows your distribution's private key can produce your distribution's packages. The public key is verified by a neutral 3rd party that can revoke accidents or hackers once they're discovered.
So you know both that the distribution wasn't tampered with (because it decrypts) and that the distribution is from who they say it is from (because it decrypts), and you have a neutral 3rd party vouching for the facts who has a vested interest in retaining business by not letting people fake other people's signatures.
If your code is acting bloated, and is running rather slow, it's likely and predicted that some loops you will unroll.
I think everone is missing the analysis done by the author of this attack (including the author).
The point is that the malicious code is in both versions of the downloaded package. The swappable payload just serves as an activation key. The actual mal-code can be present in both packages 'hidden inside an AES encrpted data section.'
Yes MD5's have collisions, so do any current or future hashes or checksums. The point is it's still very very unlikely that anyone can insert arbitrary and functional code into *someone else's* binary package and get the same checksum.
It is still safe to use MD5 for passwords (remember to salt though), it's still fine for checksumming files.
The issue is whether a signed/hashed piece of code has a hidden bad payload or not. A hash will not protect you from mal-ware if the creator of the package did this intentionally. The trigger could simply be a date check, or your ip address or some other external data. The only effect of making two versions of the same package is that you can deploy the package to everyone without detection after the first package passed review. But like I said, you can pass a tainted package past review if the payload isn't triggered. Finally, the security of a checksum assumes that the package is somehow reviewed by a credible source, and that they catch all possible obfuscated attacks.
From what I read, creating a hash-equivalent file for one hashing mechanism requires different and incompatible changes to those required for making a hash-equivalent file for a second hashing mechanism.
Would it therefore be safe to calculate two hashes (MD5 and SHA-1) for the file, distribute the file and the two (or three, or four) separate hashes, and use the incompatible nature of the hashes as an extra "guarantee" of data integrity?
But for someone putting money on it (me) it's safest to take into account the possibility that the OP would try to come up with a collision using a birthday attack.
But doesn't Warner own IP in the birthday attack? Snopes seems to think so.
that I know for sure that the MD11(formerly DC-10) is harmful today. It has been for over 30 years.
...somebody might press the RED BUTTON!
Wake me up when paranoia levels are back to normal
I'll take Secure Hasing Trivia for $500, Alex.
Q: What is d41d8cd98f00b204e9800998ecf8427e ?
Here is another paper that examines the implications of the Wang paper as well as some analysis of the collision.
Someone with the knowledge and will and time can come up with a way to circumvent almost any protective scheme they come up with. Likely the only way to really safeguard something like this is to do a very time consuming and cpu intensive comparison of the two files. It will come back to "do you trust the source of the file".
That's a silly conclusion to reach.
It's like saying that we should give up on cryptography because some has found a way to crack the Caesar cipher.
Just because ONE IMPLEMENTATION of a technology is flawed does not mean the entire technology is flawed.
Hashing is a VERY useful process and only ONE particular implementation is being challenged here. The fundamental concept is still sound and extremely useful. Abandoning the entire idea would just be foolish.
Life is too short to proofread.
That might have been true before 2000, but the effort to calculate both algorithm in parallel ( not SMP, just passing the input bytes to each function ) is negligible with modern CPUs, both PC and embedded. The limiting bottlenecks are the network, hard drive and memory, not the CPU power required
gokeln wrote: "You do not get exponentially improved security"
When anyone uses the phrase "exponentially improved security", it triggers a link to Avoiding bogus encryption products: Snake Oil FAQ.
I was going to use that example in my post, too. Ceasar cypher keys being a group and all :)
Any sufficiently advanced technology is indistinguishable from a rigged demo
--Andy Finkel (J. Klass?)
I just couldn't read your sig with out telling you how wrong you must be. I REFUSE to accept that our president can preform math of any sort with any great skill.
end-of-line
What do you say to the man that has nothing? Cast it away!!
How might OSPF be affected by this? I use md5 authentication in exchanging routing updates in my networks...
Say someone created two files with some simple but "useful" app installed. One version was fine. The other version had some sort of time release backdoor installed. 1 year later after the hype really picks up. *BOOM*
Example scenario: some OS has a list of MD5 hashes of all of it's core files (drivers etc). At boot, it checks to see if the files are untampered, and if they are warns the user or halts the boot.
Now imagine you could maliciously inject bad code (worm/trojan/virus/spyware) into this system, and you had a tool that knew exactly what extra random data to add to make the file have the same MD5 hash after your addition.
If that becomes possible, then Hell will freeze over. Unless, of course, people don't use MD5 and switch to SHA1.
The potential solution space that exists to create a binary that generates a differing binary that generates the same checksum for each algorithm is going to be a lot smaller than switching to less unproven single algorithm such as SHA-256.
That's OK..just invent MD6!
Presumably Verisign would check the request, and see it is of the form that can be spoofed. So then when they pad it, they would pad it with info that makes it not spoofable. Or else they'd refuse to sign it and make you use a slightly different form that isn't spoofable either.
You're making up problems that don't exist.
I don't see the big deal. What happened to CRC?
All that means is that we will move to another digest algorithm when this one will become too easy to tamper with. A lot of people already provide sha-1 or a signiture of the files they provide. And that will be our solution until it fails. Just like cryptography.
Strong -> stong -> weaker -> weak -> time to move on
So instead of just the MD5 value you would release a binary with an md5 and the md5 of the same file with an arbitrary string added to the end.
Then you run the md5 tool with the added string as a flag and it gives you 2 MD5 results.
Hell, make the md5 generate a little xml file with the original md5, an arbitrary random text string and the md5 with the random string added.
Then mod MD5 to look for filname.md5 next to the filename and say weather the file passes the test with the command 'md5 filename'.
Good times. Gives MD5 a few years, and perhaps we need to do the same for all current and future hash algoriths now as well, always use this vector approach to make manipulation of the values cost prohibitive.
So people are off trying to crack the RSA key for signing X-box software by Microsoft. That doesn't matter anymore, all they need to do is find an MD5 signature which is identical to one of the many pre-existing X-box games MD5 signatures, viola, you have a working Linux implementation without any mod chips.
I think this company allready try to profit from it(finnish text)?
Does this have implications for SHA-1, or is it still secure?
I watch Brit Hume on Fox News
I'm guessing using two different hashing methods and putting both checksums up would make it harder to use exploits like this.
Maybe, maybe not. The new technique would certainly be more difficult to analyse mathematically, but just stacking complicated but flawed methods does not necessarily result in a more secure method: typically, the security of the weakest link determines the security of the whole system.
What you say reminds me of Don Knuth's experience when he wrote his first innocent 'super' pseudo random number generator (reported in his Art of Computer Programming, Volume 2, page 4: "Algorithm K" ;-): he composed all sorts of complicated operations, but had to learn the resulting number sequence was far from more (pseudo-)random, in fact much worse than the the standard 1-line modulo function.
Another case of (false sense of) security through obscurity?
--
Try Nuggets , our SMS search engine which uses question answering technology; now available across the UK.
No. Hashes are typically much shorter than files (or even bits of files) and using hashes (or hash trees) is still much more efficient than sending all the data twice.
Also, it also allows network error detection which is essentially free and more robust than just sending data twice and comparing -- if the network for has a nasty habit of throwing bits away during long sequences of 0s, your method might not detect it, but a good hash would (with overwhelming likelihood).
HAND.
Correction: Gnutella is using SHA1, not MD5.
Moreover, Gnutella is using an alternate hash (Tiger) to construct a hash tree, and the root of that hash tree is appended to the SHA1 of the file to make a unique "bitprint", as understood by Bitzi (http://www.bitzi.com/).
Impressive! Thank you.
--grendel drago
Laws do not persuade just because they threaten. --Seneca
MPAA-hired thugs spamming the P2P networks with utter crap have been doing stuff like this for a long time
28bc8c78881b2f89bbeab4f9bb8fbeda is the md5 of "clue", I can bet my hash farm and a hash laced brownie that you'll never find anything other than "clue" that hashes to the same md5, at least not "someday" in your life time.
Oh, I thought he said MC5...
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Okay, is anyone truly surprised at this?
:P
MD5's are known to be collidable for appropriately complex constructs. It only goes to reason that the same constructs can have the same MD5.
I've seen programs that sort through these for the last few years, although - obviously, they are processor dependant in the extreme.
What is truly interesting imho, is the programs that have been created here and there to *spoof* MD5's by rearranging content to hit collisions so that you can play with the content...
bastard!
The Bitprint used by Bitzi.com and other places contains both SHA-1 and Tiger Tree, separated by a period. Unfortunately, they're being stupid and only publishing the SHA-1 in their magnet links, so those of us who use Tiger Tree are out of luck.
My life's goal is to get a score of +3!