Slashdot Mirror


MD5 To Be Considered Harmful Someday

Effugas writes "I've completed an applied security analysis (pdf) of MD5 given Xiaoyun Wang et al's collision attack (covered here and here). From an applied perspective, the attack itself is pretty limited -- essentially, we can create 'doppelganger' blocks (my term) anywhere inside a file that may be swapped out, one for another, without altering the final MD5 hash. This lets us create any number of binary-inequal files with the same md5sum. But MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash. Wang released the two files needed (but not the collision finder itself). A tool, Stripwire, demonstrates the use of colliding datasets to create two executable packages with wildly different behavior but the same MD5 hash. The faults discovered are problematic but not yet fatal; developers (particularly of P2P software) who claim they'd like advance notice that their systems will fail should take note."

401 comments

  1. damn by YetAnotherDave · · Score: 1, Interesting

    I guess I know what I'll be coding now - SHA-1 just got a lot more important in my priority list... :(

    1. Re:damn by networkBoy · · Score: 4, Insightful

      Another option is to hash against two very different algorithms, that even if both are partially insecure, the chances of being able to trick both are exponentially higher.
      -nB

      --
      whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
    2. Re:damn by LiquidCoooled · · Score: 2, Informative

      There will ALWAYS be collisions with any kind of hashing algorythm.

      Thats the nature of the game we play.

      The only hashing system without collisions is sending the original file itself.

      --
      liqbase :: faster than paper
    3. Re:damn by user9918277462 · · Score: 1

      Gentoo has already considered this problem as it affects the Portage system (two months ago, no less). Portage distfiles are only validated by md5 sums (no gpg signing yet, though it's being worked on) so this could be a serious problem at some point.

    4. Re:damn by WolfWithoutAClause · · Score: 5, Interesting
      There will ALWAYS be collisions with any kind of hashing algorythm.

      Yes, but a good hash makes it *extremely* difficult to find them. MD5 is looking pretty mediocre right now.

      --

      -WolfWithoutAClause

      "Gravity is only a theory, not a fact!"
    5. Re:damn by frission · · Score: 1

      this will make file checking just take that much longer. have you done md5 sums on iso files...for say, a linux distro? it takes awhile, even on a p4 3ghz with 1 gb of ram. but now you want to do it twice (2 diff algos)?

    6. Re:damn by bigbadbob0 · · Score: 1

      >>Another option is to hash against two very different algorithms, that even if both are partially insecure, the chances of being able to trick both are exponentially hig Is it? Can you prove it? You shouldn't be able to because it simply isn't true. Collisions will exist.

    7. Re:damn by networkBoy · · Score: 1, Informative

      "The only hashing system without collisions is sending the original file itself."

      That's not hashing:
      Producing hash values for accessing data or for security. A hash value (or simply hash), also called a message digest, is a number generated from a string of text. The hash is substantially smaller than the text itself, and is generated by a formula in such a way that it is extremely unlikely that some other text will produce the same hash value.

      --
      whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
    8. Re:damn by The+Milkman · · Score: 1, Insightful

      Which is the same as just one secure algorithm.

    9. Re:damn by MatanZ · · Score: 1

      The time is usually wasted in reading the file from disk, so calculating two hashes concurrently will take about the same time.

    10. Re:damn by networkBoy · · Score: 2, Informative

      "You shouldn't be able to because it simply isn't true."

      I have to disagree with you here.
      If I have algo A and algo B:
      I hash with algo A and get a value which I store.
      I hash with algo B and get a value which I store.
      While the security does not add up to A^B it does ammount to > A+B, which is still better than A or B only. (I really wish I had my Crypto reference books handy)

      Other posters mentioned it was more work and equiv to one secure algo, both those statements are true; as I pointed out this was an alternative to writing a new SHA-1 algo.
      -nB

      --
      whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
    11. Re:damn by TCM · · Score: 2, Insightful

      A P4 at 2.6GHz does >300MB/s md5 according to openssl speed md5. As noted, you probably wait for the data to be fetched from disk rather than the checksum to be computed.

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
    12. Re:damn by MindStalker · · Score: 1

      Problem is there is no secure hash, its just not possible.

    13. Re:damn by Meostro · · Score: 1

      It depends on the two hash algos and how much/little they inter-relate.

      For algorithms that are provably unrelated, it should approach A*B.

      However, if they're closely tied to each other (not sure, but probably MD5 and MD4?), it could be as little as A+B.

    14. Re:damn by Hanji · · Score: 2, Insightful

      That was his point.
      Any hash that maps a large (infinitely large, in most cases) space onto a finite ones will always have collisions, it's just a question of how easy they are to find. If you don't want to have have collisions, you have to send the whole file.

      --
      A Minesweeper clone that doesn't suck
    15. Re:damn by chialea · · Score: 1

      I'm having trouble tracking down the paper right now, but this is not necessarily true for even unrelated hash functions. In fact, you can even treat one of them like a random oracle (which is perfect), and fail.

      Lea

    16. Re:damn by SpaceLifeForm · · Score: 1

      The combo of MD5 and GPG signature would be a good example of dissimilar algorithms.

      --
      You are being MICROattacked, from various angles, in a SOFT manner.
    17. Re:damn by MighMoS · · Score: 1

      Personally, I think that saying "this could be a problem at some point" for Gentoo would be a little overboard. Small scripts (like lila's overlay) already use gpg signing. Gentoo devels tend to stay on top of things and develop new features increadably rapidly. If another year goes by and Gentoo is still MD5 hashing though, then I'd say we have a problem.

    18. Re:damn by networkBoy · · Score: 1

      If you find that paper I'd love to read it :-) networkboy at sign gmail dot com -nB

      --
      whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
    19. Re:damn by Anonymous Coward · · Score: 0

      Of course no hash is perfect, but that doesn't mean you can't set some reasonable goals. Making it unlikely to find two strings that have the same length and hash but only differ by a small substring (small being relative to the length of the whole string) is a reasonable goal if you ask me.

    20. Re:damn by chialea · · Score: 1

      Take a look at paper published at Crypto 2004. My glasses are broken, so I can't.

      Lea

    21. Re:damn by goatpunch · · Score: 1
      There will ALWAYS be collisions with any kind of hashing algorythm.
      How about using two different hash algorythms on the same data?

      This article says that the following can be true:

      (MD5(x) == MD5(y)) && (x != y)
      Intuitively it seems to me that it'd be difficult to find y for a given x where:
      (MD5(x) == MD5(y)) && (SHA1(x) == SHA1(y)) && (x != y)
      Not putting all the eggs in one basket, so to speak.
    22. Re:damn by canavan · · Score: 4, Insightful

      You're using a different definition of a secure hash than everybody else. It's rather obvious that for files larger than the length of the hash (128 bit for md5), there must be quite a lot sharing the same hash, for a given file length about 2^(filelength in bits - hashlength in bits). However for a hash to be considered secure, it's only required that finding two files with the same hash must be as hard as trying (in md5's case 2^127 different files), but in md5's case you can compute those collisions much cheaper under certain circumstances.

      Another condition is obviously that the message should not be reconstructable from the hash.

    23. Re:damn by Anonymous Coward · · Score: 0

      I can't remember where, or why, but that Bruce Scheiner guy said using multiple hashes doesn't lower the odds much. Sorry for the crappy post, it's just fud, I know, but if anyone does want to hunt it down I'd appreciate it.

    24. Re:damn by Meostro · · Score: 1

      Or MD5 and SHA-1, as pointed out in the paper. The two files generate the same MD5 hash, but disparate SHA-1 hashes.

    25. Re:damn by jyoull · · Score: 2, Insightful

      Intuitutively, yes, and i'm just pulling this out of the air here cuz i'm supposed to be working on something else, but solve a slightly different problem.

      If you are concerned that the result-space of a hash algorithm is going to lead to collisions (this is not an ordinary concern because the algorithms claim to have dealt with this for us already) then using two very different hash functions in concert will definitely expand the range of possible results and reduce the probability of collisions by Asize * Bsize.

      If however you are concerned about bad guys faking things up, then there is a slightly different problem...

      A == MD5
      B == SHA

      Hashing to both A and B yields a huge range of results.
      However, if A known or suspected to be broken, then you're down to the security provided by B. A is out of the picture.

      If A can be tossed then, then you're totally reliant on B for a safe hash. If that's true, then you didn't need A at all and you'd better be confident that B is gonna do what you need it to do, cuz A don't dance.

      Note to experts: Please do not grade harshly.

    26. Re:damn by somethinghollow · · Score: 0, Redundant

      I am / was redesigning my website, with a database overhaul. A friend of mine convinced me to store passwords in the database w/ some sort of encryption. I'd had experience with the MD5 hashes somehow matching, so I looked for an alternative method. SHA-1 was it.

      PHP has the built-in function but my server doesn't support it. Instead I used another implementation: http://www.tecknik.net/sha-1/

      I thought there was even a JavaScript implementation somewhere, though I can't seem to find it.

      SHA-1, if nothing else, is cooler because it's more fun to say. "Em dee five" versus "Sha, One" which is alot like "Sha, right!" from Wayne's World.

    27. Re:damn by Breakfast+Pants · · Score: 1

      You are basically doing a hash of twice the length you were previously doing. Since you know an entire half of your hashing system is insecure it would be better to find a more secure one and just double the bit length.

      --

      --

      WHO ATE MY BREAKFAST PANTS?
    28. Re:damn by Discoflamingo13 · · Score: 1

      Of course the collisions exist - the point is that the person doing it sleeps easier at night. Security is not an algorithm or a product, it's a way of life. Until somebody comes up with a more collision-proof algorithm, I think the solution is a pretty good stop-gap.

    29. Re:damn by TheLibero · · Score: 1

      I totally agree. That's why the ultimate secure way of doing encryption (which nobody does, because it has "0" usability) is to have your text, and get a key that will be used only for once which has the same length as your text. XORing both would do as long as your key won't be used anywhere else!

      --
      "Evil thrives when good men do nothing"
    30. Re:damn by uncqual · · Score: 2, Interesting

      Agreed that folding A with B after A has been compromised is a nonstarter.

      However, isn't there merit to the general idea of combining two 'dissimilar' (i.e., the theoretical basis for the 'security' of each share as few attributes as feasible) hash functions X and Y where both X and Y are currently though to be 'secure'?

      In those very common cases where the hash is protecting something with decreasing time value (such as software binaries which become obsolete in a few years) or where the original source is secure and it is only alterations of copies that are a concern (perhaps historical legal documents), this buys time if X or Y (but not both) is compromised (or seem more likely to be compromised - as MD5 is now)?

      For example, if X=MD5, the concerns about MD5 would result in an effort to discontinue its use and replace it with a new hash function Z and start using Y+Z. While the community switches to a new combination, everything is still pretty safe (as safe as Y in this case) during this transition period.

      It seems that in the private sector (spooks aside), it's quite unlikely that X and Y would both be effectively compromised simultaneously and with little warning unless the notion of dissimilarity is also compromised. Fortunately, the skills needed to compromise such functions seem to be concentrated in the guys with the white hats so they tend to publish their preliminary results.

      Disclaimer: I'm not a cryptomaniac so my observations are worth substantially less than you paid for them.

      --
      Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
    31. Re:damn by gokeln · · Score: 1

      This is actually untrue. Though it seems intuitive that the chances are exponentially higher, it turns out they are actually only linearly higher. Thus, the method you recommend is not advisable. Instead, choose a better hash algorithm, such as SHA-256.

      --

      There's no time to stop for gas, we're already late.
    32. Re:damn by onemorechip · · Score: 1

      You might not even need to use different algorithms. What if you simply permute the data you are checking?

      Call the original data D, and call the aliased data (that has the same md5sum) E.

      md5sum(D) = md5sum(E)

      Now permute D and E by some measure, such as reversing the bits in each byte, to get D' and
      E'. What are the chances that

      md5sum(D') = md5sum(E')?

      The goal is to make it harder to construct an alias data set that satisfies both conditions.

      Of course the technique is extensible by creating more permutations.

      Another possibility is to append a 32-bit CRC to the data before running md5. I suspect

      md5sum({D,CRC(D)}) = md5sum({E,CRC(E)})

      would be very hard to solve.

      --
      But, I wanted socialized health insurance!
    33. Re:damn by rpresser · · Score: 1

      If it's a hash, the message CANNOT be reconstructable from it. After all, as you just stated, there are collisions, even if they are hard to find. So any process to reconstruct a message from a hash would have to be able to reconstruct ALL the colliders.

      Is my thinking way off somehow?

    34. Re:damn by jyoull · · Score: 1

      The idea of "buying time" is kind of interesting. I guess it would only apply if you really think that two algorithms are both vulnerable, and you don't know which is weaker.

      I think there is also an underlying premise that the crypto algorithms in common use today aren't going to be "lost" suddenly due to some unexpected development.

    35. Re:damn by grumbel · · Score: 2, Interesting

      ### If it's a hash, the message CANNOT be reconstructable from it

      Depends on the hash function, if the input length is equal or smaller then output length reconstructing the message might be possible, ie:

      $ echo "hello" | md5sum
      b1946ac92492d2347c6235b4d2611184 -

      Should be easily reversible with a dictonary attack. However in the more common case a hash maps a large input domain, to a much smaller output domain, so yep, hashes are not reversible unless input is somehow smaller then the hash itself.

    36. Re:damn by Zeinfeld · · Score: 1
      The problem with the SSL approach of use SHA+MD5 is that they are both essentially the same algorithm with only minor differences. Both were extensions of MD4 with an extra round added and both share the same constant tables. SHA-1 has an extra feature added that makes it a lot harder to use the current round of cracking techniques. So I doubt that there is likely to be a case where SHA1 is broken and MD5 is not.

      A much better approach would be to use SHA-2, aka SHA-256, & SHA-512 which have much better all round properties.

      This is probably what the industry will end up adopting in time but at this point everyone has pretty much stopped using MD5 except in carefully controlled and understood applications. One of the issues that folk often miss is that one of the reasons CAs invest money in high physical security is so that the strength of their crypto is NOT their last line of defense. There are still a lot of applications that simply cannot use anything other than MD5, it would be good if folk upgraded but simply turning off the crypto will not have the desired effect, they will simply run with no security.

      At this point the compromises are a long long way from enabling a criminal attack. This is where there is a tension between complacency and common sense. Nobody need lay awake at night worrying about the strength of MD5, even MD4 which has been 'broken' for a decade has still not been broken to the point where someone can take a signed document and work out how to alter it. A forger CAN produce two documents with the same hash but that is not quite the same thing.

      On the other hand people should not think that they can use Md5 forever. Now is a good time to start the transition. Its a bit like Y2K, there is no necessity for a catastrophe if there is planning.

      --
      Looking for an Information Security student project suggestion?
      Try http://dotcrimeManifesto.com/
    37. Re:damn by goatpunch · · Score: 2, Insightful
      However, if A known or suspected to be broken, then you're down to the security provided by B. A is out of the picture.

      If A can be tossed then, then you're totally reliant on B for a safe hash. If that's true, then you didn't need A at all and you'd better be confident that B is gonna do what you need it to do, cuz A don't dance.

      I'm not so sure, the description of the MD5 attack didn't say it was completely compromised, only that it was possible to find padding bits that could allow malicious content to be inserted without changing the checksum of the whole.

      If the orginal data is: "abcdefg", I can insert some malicious bits 'M', along with some padding bits 'X', and ensure that the result of A would be the same: A("abcdefg") == A("abcdMXfg").

      Assuming that I found a similar attack on B, I could find padding bits 'Y' such that B("abcdefg") == B("abcdMYfg").

      Of course to fool both checksums, X must equal Y. Finding a value that keeps both checksums the same might turn out to be a non-trivial problem.

      PGP is easily broken by a brute force attack given enough time, it's just that the amount of time isn't feasible at the moment.

    38. Re:damn by apdt · · Score: 1

      How do you think a gpg signature works?

      It simply takes a hash of the file, and encrypts that with your private key, so that anyone with your public key can decrypt it and verify that the file is what was signed.

      I can't remember which hash algorithm gpg uses by default, but I believe earlier versions of pgp used MD5.

      --
      I lay awake last night wondering where the sun had gone, then it dawned on me.
    39. Re:damn by apdt · · Score: 1

      You're implying that it is possible to infer something about the size of the source from the hash itself, but if you were given just the hash in your example, you'd have no idea whether that was created from a short string, or a 2G file.

      Unless you know the source is small by some other means, the fact that it is small makes no difference to how easy it is to reverse the hash.

      --
      I lay awake last night wondering where the sun had gone, then it dawned on me.
    40. Re:damn by Anonymous Coward · · Score: 0

      So any process to reconstruct a message from a hash would have to be able to reconstruct ALL the colliders.

      The point is that it should be difficult for you to find ANY of those colliders. If you can find one easily, the hash is not secure.

    41. Re:damn by Chrispy1000000+the+2 · · Score: 1

      Not necessarily. Even small hashes can be made unpredictable.

      What if one got the hex value for each character in decimal, and when computing the hex, used the h value twice, the l value 4 and 2 times, the o value once and half of the e value, (basically multiplied by a function depending on where the initial value was located) and then, when computing the final hex, used these numbers more than once in the calculation.

      Doing this, and getting the results from modular division, the number in the hundredths placeholder of the original value over the total value...

      There are lots of methods to create a seemingly random but predictable string from an input, and even with low-level hashes, its completely unpredictable. Even if one tried to use a small enough input value for the computation, a good hash sum generator would have compensations for this, ie, the total string value is changed to 10^(e+1) - x.yz...*10^e then the result is multiplied by the an original hash value.

      If you don't think I am making any sense, look at this example. I am making a hash value of two different inputs, one is '100' (a) and the other is '101' (b)
      Method: [10^(e+1) - x.yz...*10^e]*[(ones placeholder +3)-(hundreds placeholder*45) modular divided by 34)] Then the result is modular divided by 104.5. If the result is negative, it is converted to a positive value. Then, the value is doubled string-wise, ie xy --> xyxy and is put to the power of (x+y+sum of original char's*.1). Then it is rounded to the nearest one.

      a) [1000-100] * (3-135)/34]/104.5
      hash = (900 * -30)/104.5
      hash = 39(39)^(12.1)
      hash = 3.1927900973916559543780941642248*10^43 (calc won't go any further)

      b) [1000-101] * (4-134)/34]/104.5
      hash = (899 * -28)/104.5
      hash = 92(92)^(11.2)
      hash = 2.7721903833553211943079134140892*10^44 (ditto)

      And that was just a simple hash, that I created off the top of my head. Defiantly longer than the input value, and one could probably just modular divide to 32 decimals for convenience. Now imagine what a person who actually makes hash algorithms could do. (This is only my second ever pseudo-hash.)

      --
      Sig
    42. Re:damn by whereiswaldo · · Score: 1

      Plus you can mix algorithms with differing computational demands. ie. adler-32 and MD4, like rsync.

      http://rsync.samba.org/tech_report/

    43. Re:damn by Anonymous Coward · · Score: 1, Interesting

      $ dd if=/dev/urandom bs=1M count=300 > yoink
      300+0 records in
      300+0 records out
      314572800 bytes transferred in 63.270583 seconds (4971865 bytes/sec)
      $ time md5sum yoink
      a4a846995ccf65be3bfcd9e69325e69a yoink

      real 0m2.482s
      user 0m2.342s
      sys 0m0.131s

      This is about 121MB/s.
      Northwood P4 at 2.8 800FSB, DDR400
      linux 2.6.8

      Same test, SHA1: 77.3MB/s
      cksum: 246MB/s

    44. Re:damn by wirelessbuzzers · · Score: 1

      However for a hash to be considered secure, it's only required that finding two files with the same hash must be as hard as trying (in md5's case 2^127 different files), but in md5's case you can compute those collisions much cheaper under certain circumstances

      Actually, you can do it in 2^64-ish time with a "birthday attack": you hash about 2^64 different files, and compare the hashes; you should get about one pair that turns out the same. This is like the phenomenon that once there are at least 23 people in a room, odds are >1/2 that two of them have the same birthday.

      It should still take about 2^127 files to find something which hashes to a particular value, a much more difficult problem.

      --
      I hereby place the above post in the public domain.
    45. Re:damn by grumbel · · Score: 1

      ### Unless you know the source is small by some other means, the fact that it is small makes no difference to how easy it is to reverse the hash.

      If I run a dictonary attack on that 'hello'-hash I will quite quickly get the result that 'hello' is a input that matches to the hash. You are right that there are unlimited other values that match to the same hash, so I don't have actually reversed it, but only found a value that matches to the same one. However since its almost impossible to find a hash-collision in md5 by pure luck, I would say that the chance that I found the original input to the hash function is rather high. And for many common cases having a input that matches to the same hash is as good as having the original input, ie. for passwords it doesn't matter if I know the real password, its already enough when I have a password that matches to the hash.

    46. Re:damn by Loonacy · · Score: 1

      If someone is distributing a file and they provide hashes so you can check the validity of mirrors, if you proved 2 different hashes that's MUCH MUCH better than only producing 1.
      Suppose for the sake of argument that someone breaks SHA-1 as much as MD5. If I have an MD5 has and a SHA hash, and you try to install a trojan and modify the original to have a bunch of junk to keep the same MD5, you're likely not going to have the same SHA. But if you change it so it matches the same SHA hash, then that will change the MD5 hash. It's HIGHLY unlikely that you'd find a way to fool both of them at the same time.

    47. Re:damn by merlin_jim · · Score: 1

      As long as we're enumerating properties of hash algorithms:

      not only should the message not be constructable from the hash, but the hash should give no information about the distribution of semantic units (i.e. characters) in the original bitstream.

      The corollary of which is this: changing any random bit should result in the hash changing in completely random and unpredictable ways. How do you define unpredictable? The quickest way to predict the results of changing that bit is to actually recompute the hash.

      I believe Wang's collision finder is based on an algorithm that found a way to predict changes to the hash without having to actually rehash the whole message.

      --
      I am disrespectful to dirt! Can you see that I am serious?!
    48. Re:damn by hyc · · Score: 1

      exponentially lower, you mean.

      By default Tripwire comes with a whole slew of hash algorithms, and back in the day when I was deploying it on our packaged servers, we alway enabled all of them. The cost in computation time is really trivial.

      Of course, the other obvious conclusion from this is that we might as well design one new hash algorithm that generates 512, 1024, or 2048 bit hashes, rather than running a handful of separate hash algorithms that each produce 128 bit hashes.

      --
      -- *My* journal is more interesting than *yours*...
    49. Re:damn by stoborrobots · · Score: 1

      I thought there was even a JavaScript implementation somewhere, though I can't seem to find it.

      FWIW...

      http://www.alltheweb.com/search?q=javascript+SHA-1
      http://www.pajhome.org.uk/crypt/md5/sha1src.html

    50. Re:damn by Vo0k · · Score: 1

      The original, compressed?

      --
      Anagram("United States of America") == "Dine out, taste a Mac, fries"
  2. Two files with the same md5 hash? by Anonymous Coward · · Score: 5, Funny

    I can only hope I live that long.

    1. Re:Two files with the same md5 hash? by Phleg · · Score: 4, Informative
      --
      No comment.
    2. Re:Two files with the same md5 hash? by UTPinky · · Score: 2, Funny

      drice@pinky:/tmp$ echo "Hello World" > file1
      drice@pinky:/tmp$ echo "Hello World" > file2
      drice@pinky:/tmp$ md5sum file1
      e59ff97941044f85df5297e1c302d260 file1
      drice@pinky:/tmp$ md5sum file2
      e59ff97941044f85df5297e1c302d260 file2

      Cheap, I know...

      --
      I'm only paranoid because everyone is against me...
  3. MP5 harmful? No way! by October_30th · · Score: 5, Funny

    Aha! So it was MD5 and not MP5...

    --
    The owls are not what they seem
    1. Re:MP5 harmful? No way! by jusdisgi · · Score: 1

      Yeah, the type of collisions caused by an MP5 have been known to be harmful for some time...

      --
      Given a choice between free speech and free beer, most people will take the beer.
  4. Exploit? by Limburgher · · Score: 4, Interesting

    So does this mean that it's possible to find a useful MD5-equivalent file for any file? Just because someone alters a file does not mean they have done anything destructive. Would one be able to take a binary, make a change of some sort, and then run a tool to determine the block of data to add to the binary to both allow the change to take effect and cancel out the MD5 change? How complex would it be to construct this tool?

    --

    You are not the customer.

    1. Re:Exploit? by Anonymous Coward · · Score: 0

      Read the fscking blurb, it's not even possible to find a non-useful MD5-equivalent file for any file. You have to have control over both files for a collision to happen. ...for the moment.

    2. Re:Exploit? by Ayaress · · Score: 3, Insightful

      It doesn't have to be harmful to break a ptp system. There's a pretty common exploit on Kazaa where people have a file just containing random junk that registers as a match to a popular file. If you download taht file, and get any portion of it from the fake, the file is corrupted and useless. Somebody could use these fake files to "poison" popular torrents, making it very unlikely that anybody on them will get uncorrupted files.

    3. Re:Exploit? by Pxtl · · Score: 1

      The blurb is unclear, but it sounds like you have to have created the original file with the original intent that a portion would be "swapped". So, a content creator could create a legitimate piece of content and get that reviewed, and then create a malicious piece of content with the same md5. Not quite as nasty as hacking an existing md5'd file, but still nasty.

      Of course, I could be completely misunderstanding the information at hand.

    4. Re:Exploit? by Pxtl · · Score: 2, Insightful

      Wait - no - I'm completely fscking wrong.

      I think the point was this.

      file1: xxxxxxxxxx
      file2: xxxxxyyyyy

      %file1 = %file2

      now, user downloads the file from p2p, and one user gives him file1 and one user gives him file2.

      you get:
      file3: xxxxxyyxxx
      which has the same has file1 and file2. That's the problem - not only can two files have the same hash, the combination of those files (as would occur with a corrupted P2P file) also has the same hash.

      Or I still could be wrong.

    5. Re:Exploit? by anthony_dipierro · · Score: 2, Informative

      As the length of the file is sent in addition to the MD5, in the vast majority of cases it's going to be impossible to find a file which gives you the same length and MD5. I guess as the size of media files increases this gets more and more likely, but if it ever starts affecting more than 0.0000000001% of files you can just increase the length of the hash.

    6. Re:Exploit? by owlstead · · Score: 3, Informative

      Yep, you are still wrong. I've tried some code and the (precalculated) different blocks MUST be at the start of the code. So it's more like

      file1: xxxxccccccc....
      file2: yyyyccccccc....

      %file1 = %file2

      Which is the example given in the article.

      However, Wang said she could get to a collision from any intermediate hash code within the hour (according to the article). That would mean:

      file1: ccccxxxxccccccccc......
      file2: ccccyyyyccccccccc......

      %file1 = %file2

      Where xxxx and yyyy are (pre?)calculated and cccc.... is the payload.

      If _I_ am not mistaken.

    7. Re:Exploit? by esukafurone · · Score: 1

      The exploit on kazaa exists because files are not fully hashed for some reason... (More info: http://en.wikipedia.org/wiki/FastTrack )

    8. Re:Exploit? by Poltras · · Score: 1
      Well, I don't get it, so AT LEAST I'm sure I'm not mistaken.

      I'll have to re-read my md5 hash math... :(

    9. Re:Exploit? by TheLibero · · Score: 1
      How complex would it be to construct this tool?

      I think it's fairly easy to implement. The problem is with the computing and memory resourced needed. Since we are dealing with binary data, you'd expect something to genertae 2^n varaitions of data, where (n) is the length of the original file in bits.
      So, we will be trying to brute force the first file with all possible combinations and try to match a hash function for both.

      But now I think (n) doesn't have to be identical!

      It might be MUCH more problematic to start from 1-n, and potentially we might try beyond (n), till you get the match!

      --
      "Evil thrives when good men do nothing"
    10. Re:Exploit? by owlstead · · Score: 1

      Maybe there is a misunderstanding. If you refer to iiiii and jjjjj as 64 byte blocks you can calculate xxxxx and yyyyy to attach to iiiii and jjjjj so you get the same MD5 hash for certain weak iiiii and jjjjj. In the article you must place iiiiixxxxx and jjjjjyyyyy at the start of the code to get the same intermediate hash result. So both iiiiixxxxxcccccc and jjjjjyyyyyccccc result in the same hash. ccccc can be ANY code at all. Problem is that iiiiixxxxx and jjjjjyyyyy have no meaning. Note that normal (symetric) keys or random values have no meaning either.

      It does not work if you put ccccc in front of the calculated blocks because that would distort the algorithm. Wang might be able to create bbbbbiiiiixxxxxccccc and bbbbbjjjjjyyyyyccccc within the hour, where b is any code.

      I'm going to bed, I'm starting to get a headache. Tomorrow I am going to re-read the original Wang article (Ouch). Note that if you came this far without reading the original article, kudo's to you :)

    11. Re:Exploit? by Anonymous Coward · · Score: 0

      Sounds like a problem with Kazaa, rather than a problem with hashing per se. If a client is sending corrupt data, detect it and blacklist them...

    12. Re:Exploit? by Anonymous Coward · · Score: 0

      This paper is just an attempt to explain the implications of the MD5 collision discovery. Basically, I can take a file A and modify it in two different ways to produce files B and C with MD5(B) = MD5(C) but B != C. If I am clever than I can do this in such a way that B appears benign, but C is malicious. I can do this today using the published collisions in a somewhat clumsy manner. If I knew the (as yet unpublished) algorithm to generate the collisions, then I could do it in a more flexible manner. None of this is particularly challenging.
      If you don't understand how this impacts P2P networks, digital signatures, HMAC, and software distribution schemes that rely on MD5 as a secure hash, then read the paper.

    13. Re:Exploit? by Surt · · Score: 1

      Actually, they explicitly mention that they can replace sub-blocks with identically sized sub-blocks generating the same md5. So you get the same file length and md5.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    14. Re: Exploit? by Omniscient+Ferret · · Score: 3, Informative

      Somebody could use these fake files to "poison" popular torrents, making it very unlikely that anybody on them will get uncorrupted files.

      This would worry me, except that BT uses SHA1, not MD5, so this is irrelevant. MD5 has seemed suspect for years, & Bram's the sort to pay attention to that sort of thing.

      I checked; Edonkey is based on MD4. Gnutella variants might use MD5.

    15. Re:Exploit? by anthony_dipierro · · Score: 1

      Yeah, I realized after I wrote that that there is always going to be a big enough chance of a random collision, as the hash is always going to be smaller than the actual file.

    16. Re:Exploit? by wdr1 · · Score: 2, Informative
      I thought the same at first, but they have a pretty clear example showing that's not the case. From here:
      file1.dat:

      00000000 d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c
      00000010 2f ca b5 87 12 46 7e ab 40 04 58 3e b8 fb 7f 89
      00000020 55 ad 34 06 09 f4 b3 02 83 e4 88 83 25 71 41 5a
      00000030 08 51 25 e8 f7 cd c9 9f d9 1d bd f2 80 37 3c 5b
      00000040 96 0b 1d d1 dc 41 7b 9c e4 d8 97 f4 5a 65 55 d5
      00000050 35 73 9a c7 f0 eb fd 0c 30 29 f1 66 d1 09 b1 8f
      00000060 75 27 7f 79 30 d5 5c eb 22 e8 ad ba 79 cc 15 5c
      00000070 ed 74 cb dd 5f c5 d3 6d b1 9b 0a d8 35 cc a7 e3

      MD5(file1.dat) = a4c0d35c95a63a805915367dcfe6b751
      file2.dat:

      00000000 d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c
      00000010 2f ca b5 07 12 46 7e ab 40 04 58 3e b8 fb 7f 89
      00000020 55 ad 34 06 09 f4 b3 02 83 e4 88 83 25 f1 41 5a
      00000030 08 51 25 e8 f7 cd c9 9f d9 1d bd 72 80 37 3c 5b
      00000040 96 0b 1d d1 dc 41 7b 9c e4 d8 97 f4 5a 65 55 d5
      00000050 35 73 9a 47 f0 eb fd 0c 30 29 f1 66 d1 09 b1 8f
      00000060 75 27 7f 79 30 d5 5c eb 22 e8 ad ba 79 4c 15 5c
      00000070 ed 74 cb dd 5f c5 d3 6d b1 9b 0a 58 35 cc a7 e3

      MD5(file2.dat) = a4c0d35c95a63a805915367dcfe6b751
      --
      SlashSig Karma: Excellent (mostly affected by moderatio
    17. Re:Exploit? by CryoPenguin · · Score: 2, Informative

      That's only a problem because Kazaa uses one checksum for the whole file. BitTorrent checksums each chunk (typically 256KB - 1MB), so it can detect corrupt data, throw out only little collateral real data, and blacklist whoever sent you the corrupt bits. Then it redownloads the chunk from someone else, and the user never needs to know.

    18. Re: Exploit? by Effugas · · Score: 1

      Kazaa uses MD5, in a moderately broken mode no less. See:

      https://lists.berlios.de/pipermail/gift-fasttrac k- devel/2004-February/000023.html

      --Dan

    19. Re: Exploit? by Lehk228 · · Score: 1

      Moderately Broken!??? they only hash parts of the file to save timefreaking stupid should at least do a complete hash on all downloaded files since it is handling the whole file and complete hash all the files using spare CPU time, do a quick hash to start but only "trust" a full hash.

      --
      Snowden and Manning are heroes.
    20. Re:Exploit? by Lehk228 · · Score: 1

      I think it takes a few corrupt bits to blacklist someone, not sure though (and can vary per client)

      --
      Snowden and Manning are heroes.
    21. Re: Exploit? by lachlan76 · · Score: 1

      Gnutella is SHA-1. So is XBOX, unfortunately.

    22. Re: Exploit? by Omniscient+Ferret · · Score: 1

      they only hash parts of the file to save timefreaking stupid ...

      The older Kazaa only did the first part of the file. Looking at the docs in the parent post, the newer one does the whole file. It looks a little like what I remember of the tiger tree hash idea.

    23. Re: Exploit? by Lehk228 · · Score: 1

      it hashes parts of the whole thing, skipping out rapidly as the file gets bigger, good for the appearance of speed when adding files to your shared directory, bad for not sucking hard with file corruption.

      --
      Snowden and Manning are heroes.
    24. Re: Exploit? by Omniscient+Ferret · · Score: 1

      I might be misreading it, or misremembering, but it resembles tiger tree hash, only with MD5. With the right block size, this might catch the doppelganger blocks.

      Hm. The doppelganger blocks were 64 bytes long. That would require even tinier blocks to catch. Storing or passing along the complete tree could take more room than the original.

  5. MC5 by Anonymous Coward · · Score: 0

    mc5 was harmful to your ears...

  6. In english by ValuJet · · Score: 4, Funny

    Is there a translator from ultra-nerd to english?

    1. Re:In english by LiquidCoooled · · Score: 1

      Yes, when you run the md5 hashing algorythm over certain files, its possible that you can change the file and still get the same hash.

      Its a small blindspot that isn't catered for in the mixing process.

      --
      liqbase :: faster than paper
    2. Re:In english by Anonymous Coward · · Score: 1, Funny

      Is there a translator from ultra-nerd to english?

      A computer thingy has an owie.

    3. Re:In english by Anonymous Coward · · Score: 0

      I'm sorry - I don't understand the question.

    4. Re:In english by Anonymous Coward · · Score: 0

      Seriously, I tried it against babelfish and couldn't come up with squat.

    5. Re:In english by Anonymous Coward · · Score: 0

      You're sure you're visiting the correct site ? :)

    6. Re:In english by Anonymous+Brave+Guy · · Score: 4, Informative

      Short version: A common technology for verifying that a file you've downloaded is legitimate and untampered-with, known as MD5, isn't as secure as people thought.

      Slightly longer version: MD5 is a way of generating a checksum -- a single, comparable value -- from a file. Ideally it is supposed to give you different numbers for different files, so if a web site advertises the checksum a file should have, you can compare that with one generated from the file you actually got to see whether the file you've downloaded has been modified, potentially maliciously.

      The research shows that it is possible for someone to construct a drop-in replacement for the file you thought you had that generates the same MD5 checksum as the original, so anyone attempting to validate the file this way would think they had the real thing. If it turns out that you can construct a damaging replacement for a common file -- perhaps an installer for a popular application like Firefox or OpenOffice that's usually downloaded from a public server -- then this could open a loophole for viruses, worms, etc. that would slip through the security net often used by cautious people when downloading such programs.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    7. Re:In english by supersmike · · Score: 1
      ...so if a web site advertises the checksum a file should have, you can compare that with one generated from the file you actually got to see....

      If someone can push a hacked file up to some site for download, couldn't they just hack the advertised MD5 too? What sort of precautions are in-place to defeat this?

    8. Re:In english by Meostro · · Score: 1
      If someone can push a hacked file up to some site for download, couldn't they just hack the advertised MD5 too? What sort of precautions are in-place to defeat this?
      This is why people are supposed to get their distros and digests from two or more different sources. It's nice to check my MD5 sums to see if what I have matches what the site had, but it's an entirely separate problem to verify that either one of those items is the correct item.
    9. Re:In english by MindStalker · · Score: 2, Insightful

      Important part is you have to construct both files. So if you were the distributor of said common file and people trusted you, you could produce two versions of said file, with same hash. But producing a identical hash to a file you can't change has currently not been successful.

    10. Re:In english by Taladar · · Score: 1

      Someone could hack a (or just open a new) mirror for the file but if you get the md5 from the original site you can check wether you have the same file as the one on the original server without costing them the bandwidth of all downloads of the file itself.

    11. Re:In english by MighMoS · · Score: 1

      When major distributions user mirrors, that's where the MD5s come in handy. Like siteX gives me the correct MD5 sum for its file. But doesn't host it. So I have to go to a mirror to get the file. If the mirror was hacked, then the MD5 would be different. And from my experience, I've seen that its the mirrors that usually get hacked. If someone really wanted to trick the user they'd have to hack siteX's websites, and most the mirrors.

    12. Re:In english by owlstead · · Score: 1

      Yes, and they have currently only shown that you can calculate collisions (the damaging replacement) for certain weak data, and you can only replace that by a certain calculated value. This value is meaningless most of the time. But Wang (one of the cryptoanalists that found the problem) promisses more interesting holes.

      I've halready come up with two bash scripts that prints out "MD5 is good, SHA1 is bad" and one that is _supposed_ to print "MD5 is bad, SHA1 is good". The scripts have the same MD5 hash value. The scripts are easy to program: make a switch if you find any changed bits and ignore the errors at the start.

      The problem was a number that translated into the '(' character, which created an unrecoverable error.

    13. Re:In english by mattdm · · Score: 1

      If someone can push a hacked file up to some site for download, couldn't they just hack the advertised MD5 too? What sort of precautions are in-place to defeat this?

      Well, for example, the Fedora Linux iso MD5SUM files are GPG signed.

    14. Re:In english by kirkjobsluder · · Score: 1

      Short short version: use software that incorporates sha1 over md5. configure software to use sha1 over md5 when you have a choice.

      On the other hand, it seems that sha1 has become a default for most applications anyway.

  7. What does this mean by Jonny_eh · · Score: 1

    Does this mean that I can take a given binary file, like kde.rpm, and change it so that its' contents are different, and harmful, yet the MD5 hash is the same? This has never been done before? Sounds like an interesting 4th-year engineering project to me. I'd think it'd be really easy to mess up a file and keep the same MD5 hash (like screwing up an avi file shared ona p2p network). But to actually change the file so that it behaves a certain way, wow.

    1. Re:What does this mean by PatrickThomson · · Score: 1

      No.
      It means that you can create two files with the same hash but different contents. An example would be, you release your own (specially tailored) rpm for a completely harmless thing, then a few weeks later silently swap it for the malicious one with the same md5sum.

      --
      I am one of many. My idea is not unique, nor do I expect my voice alone to sway you. I speak in a chorus of opinion.
    2. Re:What does this mean by Kunnis · · Score: 1

      The first thing I thought about is something like Redhat is distributed via Bittorrent. What happens if someone replaces - say gcc and login to do the old "my special login has a backdoor, and my gcc detects when I'm compiling login, and puts the exploit it, and my gcc detects when gcc is compiling, and puts the exploit into gcc" trick, and then hops onto the torrent network, (perhaps with some special mods so they will distribute the "hacked" sections more often) and then start distributing the exploit. Thoughts? You would even know the IP address of the guy that downloaded the chunk with the exploit, and then he would start distributing the hacked version. Now I'll admit it's a bit far fetched, but I also never underestimate the power of REALLY determined people. I also realize it would be tricky keeping it small enough, and getting it to recompress and... But still, someone might figure it out. And with how big the Linux community is, if it's possible, someone will do it. Kunnis

    3. Re:What does this mean by Anonymous Coward · · Score: 1, Informative

      Yes, that's eaxctly what that means.

      Just ask yourself : how many values can a MD5-hash (checksum) have, and how many (different) files are there on this world of ours.

      If the number of files exeedes (is more than) the number of checksum-values, logic dictates that there must be several files that generate the same MD5-hash (checksum) ...

      But Having multiple files with the same MD5-hash is not the problem. The problem is that someone could choose which MD5-hash his (program-)file should generate. And that would mean he could replace any file he wants with another one ...

      And that's just what the MD5-hash should make impossible :-(

    4. Re:What does this mean by Discoflamingo13 · · Score: 1
      the old "my special login has a backdoor, and my gcc detects when I'm compiling login, and puts the exploit it, and my gcc detects when gcc is compiling, and puts the exploit into gcc" trick

      Ah, yes - the classic "Reflections on Trusting Trust" trick. Scary how often that never gets mentioned when it should be - so kudos to you for bringing it up.

    5. Re:What does this mean by Anonymous Coward · · Score: 0

      It was cited in the paper.

    6. Re:What does this mean by Anonymous Coward · · Score: 0

      It's disturbing how much "kde.rpm" looks like "kiddie porn"

  8. Good analysis by overbyj · · Score: 5, Funny

    By examining the MD5 hash using a sophisticated Fourier schema followed by deconvolution with a bit binary-inequal collision analysis, it is quite obvious I have no freaking clue what this stuff is about.

    I am glad somebody does.

    --
    No trees were harmed in the composition of this; however, numerous electrons were inconvenienced.
    1. Re:Good analysis by eric2hill · · Score: 1

      You didn't decouple your Heisenberg compensator. It's much easier after that.

      --
      LOAD "SIG",8,1
      LOADING...
      READY.
      RUN
    2. Re:Good analysis by Quixote · · Score: 1
      Here's a summary for ya:
      Fire burns. Ice remains frozen. Fire and Ice have the same MD5 hash.

      Happy?
      (From page 3 of the paper)

    3. Re:Good analysis by ArsonSmith · · Score: 1

      hmm, could you dumb it down a bit?

      --
      Paying taxes to buy civilization is like paying a hooker to buy love.
    4. Re:Good analysis by ichimunki · · Score: 1

      Also may need to change the endianness of your I/O stream.

      --
      I do not have a signature
    5. Re:Good analysis by Wanker · · Score: 3, Funny

      Hibbert: Homer, I'm afraid you'll have to undergo a coronary bypass operation.
      Homer: Say it in English, Doc.
      Hibbert: You're going to need open heart surgery.
      Homer: Spare me your medical mumbo jumbo.
      Hibbert: We're going to cut you open and tinker with your ticker.
      Homer: Could you dumb it down a shade?

      http://www.tvtome.com/tvtome/servlet/GuidePageServ let/showid-146/epid-1355/

    6. Re:Good analysis by Anonymous Coward · · Score: 0

      Oh the ironing!

  9. Let's face it by mordors9 · · Score: 2, Insightful

    Someone with the knowledge and will and time can come up with a way to circumvent almost any protective scheme they come up with. Likely the only way to really safeguard something like this is to do a very time consuming and cpu intensive comparison of the two files. It will come back to "do you trust the source of the file".

    1. Re:Let's face it by Anonymous Coward · · Score: 0

      And to help facilitate this, to help ascertain the source, you want asymmetric (public key) encryption to sign files.

    2. Re:Let's face it by jd · · Score: 2, Interesting
      Let's say you break a file into blocks, encrypt those blocks with Rijndael or Serpent using a chaining method that authenticates the prior block, digitally sign the result using (seperately generated) RSA, DSA and ECC signatures in turn, and generate SHA-1 and Whirlpool checksums of both the encrypted and unencrypted file.


      True, you'd spend longer validating and decrypting the unholy mess generated than you'd spend downloading it, but I think you'd be fairly safe in assuming that the file was what it claimed to be.


      In other words, it isn't impossible to be so close to 100% secure as to make no odds. It is merely a question of whether it's worth it. Is the expense of protecting something greater than the value of what you're protecting? If so, nobody is going to bother, no matter how simple the task of protecting is.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    3. Re:Let's face it by Meostro · · Score: 0
      Someone with the knowledge and will and time can come up with a way to circumvent almost any protective scheme they come up with.
      The entire point of cryptography is to make it hard enough that you won't bother attacking the algorithm, you'll just rough up the guy with the password or do something else to get your protected content.

      Things like AES and PGP have been checked through by smart people to make sure there are no obvious cheats, but every algorithm has a glaring weakness: brute force.

      If you try every single possible password / key / hash / whatever, you are guaranteed to get the answer. The only point of cryptography is to make it so that:

      Brute-force* is the fastest way to get what you want - there is no magic key that will get you there faster

      Brute-force will take too long for it to be useful to anyone.

      I can collide any SHA-1 hash you give me, guaranteed, with only 2^160 tries. The only problem with trying all 2^160 combinations is that it'll take more than a billion billion billion years.

      * = Most clever tricks included - if you have some clever trick that cuts your search space down by a billion times, that's great, but it'll still take a billion billion years. This is the case with things like Quadratic Sieves versus test division for prime factorization: both will get the answer, and QS is thousands / millions / billions of times faster than division, but it's still infeasible to factor 1000-bit composites.

    4. Re:Let's face it by archeopterix · · Score: 1
      The only point of cryptography is to make it so that:
      • Brute-force* is the fastest way to get what you want - there is no magic key that will get you there faster
      • Brute-force will take too long for it to be useful to anyone.
      This is of course true, but what makes cryptography interesting is that with current knowledge such kind safety is impossible to prove for any hash function. As long as we don't know whether NP is actually harder that P, no hash function is provably safe. Strong functions are different from weak functions only in that they don't fall to the known attacks.
  10. RIAA by Anonymous Coward · · Score: 0

    So the RIAA can corrupt Kazaa and Gnutella music downloads by sharing garbage files with matching checksums.

  11. md5 vs sha1 vs ? by alexandre · · Score: 4, Informative

    http://en.wikipedia.org/wiki/Md5

    here is a very good link about the algo... :)

    1. Re:md5 vs sha1 vs ? by WolfWithoutAClause · · Score: 1
      here is a very good link about the algo... :)

      You got a MD-5 crytographic hash to prove that? :-)

      --

      -WolfWithoutAClause

      "Gravity is only a theory, not a fact!"
  12. Already Happening by mxmasster · · Score: 0

    I've seen this exact thing in the wild already. One of my customer is storing large amounts of mpeg movies. They were using md5sum to verify file contents for network transfers. This worked great until there was a power failure on of the remote systems. Everything passwd the md5sum perfect, except halfway through the clip you would lose video. I haven't relied on md5sum for over a year now as a result.

    --
    "The similarities of sysadmins and drug dealers: both measure stuff in K's, and both have users."
    1. Re:Already Happening by Anonymous Coward · · Score: 0

      lieer

    2. Re:Already Happening by menscher · · Score: 3, Informative
      Bullshit.

      I'll give you $50 if you can back that claim up. I want to see two video files. They must start out the same, but have a difference about half-way through. And they have to have the same md5 sum. Just post where I can download the two files, and your paypal address.

      The way I see it, you've got a 1/2^64 chance of being right. So I'm risking $50/184467440737095516, which isn't a whole lot.

    3. Re:Already Happening by Anonymous Coward · · Score: 0

      What probably happened was that the file md5-verified correctly but wasn't completely written to your disk when the power failed. I wouldn't blame MD5 for that...

    4. Re:Already Happening by Johnboi+Waltune · · Score: 1

      I bet a lot of their subscribers were upset when they didn't get to see the money shot. ;)

      --
      "The advanced societies of the future will be driven by competing systems of psychopathology." -JG Ballard
    5. Re:Already Happening by kiltedtaco · · Score: 1

      Parent is well-founded. Good hashes, including MD5, are designed to create an "avalanche effect." When one bit changes in the file, it it supposed to change multiple bits in the hash output. Nobody has discovered an attack that avoids this avalanche effect.

      The chance that random data changes defeated md5 is astronomical.

    6. Re:Already Happening by cursion · · Score: 1

      um, no, youre risking $50. ... and the odds you'll have to pay out are 50 times the number you suggested ... but the average amount you and all the other yous in parallel universes would pay is the amount you listed.

      --
      remember when it was {of|for|by} the people?
    7. Re:Already Happening by owlstead · · Score: 1

      Maybe you used a (slightly) different decoder. I've seen this happening after switching between (different versions) of decoders. I'll bet another 50 if you can back up your claims, no problem. Unless Wang is working for your competitor.

    8. Re:Already Happening by goatpunch · · Score: 1

      Sounds like a bug in your system. Nothing works until it's been tested.

    9. Re:Already Happening by Anonymous Coward · · Score: 0

      no, youre risking $50.

      How do you know? Maybe he has 184467440737095516 people he's gonna split it with! :o)

    10. Re:Already Happening by Anonymous Coward · · Score: 0

      Bullshit indeed. Well, actually since MD5 is a 128-bit hash, the number of possible combinations is 2^128=340282366920938463463374607431768211456, and not 2^64=184467440737095516.

      With a few simple calculations, you can prove that it's practically impossible for a random corruption to cause a collision:

      Let's assume there are 6 000 000 000 computers in the world. (One for each living human)
      Let's assume each computer has 1 000 000 files on it.
      Let's assume every computer causes a corrupion in *every* file once a *day*.
      So, it's:
      p = (2^128)/(6000000000*1000000*365) = 155380076219606604321.1756198

      So, even with these pessimistic figures, it would take more than 10^20 years for the computers to produce one single collision. Divide it by a factor of x to account for some randomness. The universe has existed for about 13.7 × 10^9 years. Our universe will probably cease to exist before a collision occurs this way. :)

    11. Re:Already Happening by menscher · · Score: 1

      Your calculations are correct for an accidental collision. But for someone putting money on it (me) it's safest to take into account the possibility that the OP would try to come up with a collision using a birthday attack. And that takes 2^(N/2) operations, which is why I used 2^64 in my calculation.

    12. Re:Already Happening by Anonymous Coward · · Score: 0

      Bullshit.
      First go read the article about removing parts of files and replacing them with basically anything, (payload/emptyness) while acheiving the exact same MD5 result.
      I'll give you $50 if you can back that claim up/
      Done reading the article?? Now either give the man his 50$ or make a donation to the charity of your choice.

    13. Re:Already Happening by menscher · · Score: 1
      First, note that these files generated by the attack must be constructed to have a collision. You can't just take a given file and generate another one that collides with it. In particular, the attack is simply encrypting stuff using a key -- getting this to work on a video file is a different problem.

      I'm also not particularly concerned that the OP is going to construct a collision. I think it goes without saying that using either of the two known collisions (presented in the Wang paper) to construct an answer to my challenge is an obvious cheat. Of course, it would still be interesting to see it be done -- it just wouldn't get the $50. ;)

    14. Re:Already Happening by theLOUDroom · · Score: 1

      Bullshit.
      I'll give you $50 if you can back that claim up. I want to see two video files.


      Always nice to see someone get called for talking out their ass.

      This guy is absolutely correct that the chances of these two video files colliding is amazingly small. Although the article is about collisions in MD5 sums, a little common sense goes a long ways.

      If grandparent had these files when he said he did it would have been a very big deal. I imagine they could have been sold to researchers for a nice sum of money.

      --
      Life is too short to proofread.
    15. Re:Already Happening by Anonymous Coward · · Score: 0

      One of my customer is storing large amounts of mpeg movies

      pr0n?? Are you a customer of your customer?

    16. Re:Already Happening by peterhoeg · · Score: 1

      I know exactly what you mean! One of my "friends" also has a large collection of mpeg movies and halfway through he will pretty much always loose his.... I mean... Stop watching at least. Although in most cases that actually happens way before the half-movie mark.

  13. This is almost appropriate... by freeze128 · · Score: 3, Funny

    If your cursor finds a menu item followed by a dash,
    And your double-clicking icon puts your window in the trash,
    And your data is corrupted 'cause the index doesn't hash,
    Then your situation's hopeless and your system's gonna crash!

    1. Re:This is almost appropriate... by Anonymous Coward · · Score: 0

      And your screen is all distorted by the side effects of gauss,
      So your icons in the window are as wavy as a souse,
      Then you may as well reboot and go out with a bang,
      'Cause sure as I'm a poet, the sucker's gonna hang!

  14. Correct me if I'm wrong, but... by Sheetrock · · Score: 5, Insightful
    If I'm translating this properly, a malicious person can do two things with this knowledge:

    He can create a file that MD5sum's to the same result as a legitimate file, but does not have full control over the content or size of the result (making this a mostly useless avenue of exploitation except for people who want to spread trash on P2P networks -- I.E. it shouldn't particularly bother anyone except people who already don't care about security).

    Or he can create two files that MD5sum to the same result. But he has to have control over both files, which offers effectively no advantage to someone who is trying to spread malware or tamper with existing archives that have been MD5summed.

    Consequently, while this is of academic interest I don't see what the big deal is; any time you reduce a large file to a fingerprint you will inevitably run into problems like this because it is impossible to represent one-to-one every individual possible combination of a large set of data in smaller sets ("fingerprints"). You can reduce the risk by increasing the set domain with a larger variadic function but it is impossible to escape this constraint without using fingerprints as large as the data itself.

    --

    Try not. Do or do not, there is no try.
    -- Dr. Spock, stardate 2822-3.




    1. Re:Correct me if I'm wrong, but... by jd · · Score: 1
      Many passwords are MD5 hashes. You may be able to use this to attack password files, by finding some string that hashes to the same password entry, in a reasonable length of time.


      It's not much of a threat to signature schemes, but there are enough places where MD5 is used as a one-way encryption of an access token that this attack poses a potential security hazard.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    2. Re:Correct me if I'm wrong, but... by CatsupBoy · · Score: 1

      Lets say I create two contracts with the same md5 sum, we agree on one of the contracts, and I send you the md5 sum for future proof of its authenticity. Afterwards, I maintain the contract and allow you to check it at any time with your md5 sum for verification.

      Later down the line, lets say I replace the original contract with the second one. You can still validate its authenticity via the same means, however in it would be new terms we did not agree to originally.

      Unless you can produce a copy of the doppleganger, your out of luck in court because you cant prove that i've created an alternative.

      Now, this sounds mathematically improbable that I could come up with two contracts that make sense and have the same md5sum, but none the less possible. And when dealing with exact mathematics, we prefer exact results.

    3. Re:Correct me if I'm wrong, but... by nautical9 · · Score: 1

      Couldn't a (newer) P2P app just use two different algorithms to do the checksum? Assuming both use very different schemas, even if one is compromised (like MD5), the other would not be? Not sure how computationally expensive the checksums are, but if this kind of sabotage becomes commonplace, it would make for an easy fix.

    4. Re:Correct me if I'm wrong, but... by chialea · · Score: 4, Insightful

      When you're dealing with cryptography, it should be very, very, very hard to find collisions. If you find enough of them, you can proabably find something bad with the same hash value. For example, if you sign a digital document that says you're going to pay me $1 for my pencil, and I find a suitable hash collision, I could make it look like you signed a promise to pay me $3,000 for some used tissue. I wouldn't rule out that someone could find a harmful collision for a program distributed online, and substitute a trojan. If the prize gives enough reward, people will throw a lot of computational power at it, and will likely hit pay dirt.

      Secondly, this is quite a signifigant break. Once a hash function has had an attack like this discovered, it often becomes completely useless not long down the road. I work in cryptography, and the people I know have written off MD5. Heck, the people I know are also quite worried about SHA-1, and the current best attack against that one isn't nearly as strong.

      The upshot of this is that this hash function should NOT be considered secure any more. For now, if you are not protecting anything of high value, you're probably fine. Tomorrow? Possbily. But soon, you're not going to be protected at all, and so you should start worrying about that now, instead of when you're already in trouble.

      Lea

    5. Re:Correct me if I'm wrong, but... by Chris+Burke · · Score: 1

      any time you reduce a large file to a fingerprint you will inevitably run into problems like this because it is impossible to represent one-to-one every individual possible combination of a large set of data in smaller sets ("fingerprints").

      True, it's impossible to have a hash without collisions. However it should be difficult (in a crypto computational complexity sense) to find such collisions. Apparently MD5 has a vulnerability where it is possible to compute in reasonable time sets of bits that can be swapped for each other in a file without changing the hash. That is definitely a weakness in the MD5 algorithm that is not inherent to hashing in general.

      The benefit to people trying to spread malware would be that they could create software that wasn't malware, then swap it with real malware while keeping the same MD5, using established trust against the victims. Not terribly likely, though, which is why not many people are going nuts over this.

      --

      The enemies of Democracy are
    6. Re:Correct me if I'm wrong, but... by HeghmoH · · Score: 2, Interesting

      Let's say I have a system that downloads files over the internet. To prevent man-in-the-middle attacks, I digitally sign the files. This prevents me from having to vet all of the code that deals with local files for buffer overflows. I implement the digital signatures by simply encrypting a hash of the file with an RSA private key when I create the file, and decrypting and verifying the hash on the receiving side.

      If I had decided to use MD5 for the hash in the digital signature, my scheme is now vulnerable. It's not too far-fetched to imagine that somebody could come up with a way to embed an exploit in one of the files while staying within the limitations imposed by this collision technique. Then if he can accomplish a man-in-the-middle attack, he's successfully used my automatic downloader to compromise somebody's machine. Not fun.

      This may not be completely feasible currently, but the technique shows that it may be possible in the future. If you're currently designing a system that you plan to have function for several years, you should not assume that MD5 is cryptographically secure.

      --
      Mod down posts with a "Free Mac Mini/iPod" sig, they're spam!
    7. Re:Correct me if I'm wrong, but... by RedWizzard · · Score: 1

      No one is going to let you keep the sole copy of the contract. So what will happen is that you'll make the switch and then when you point out the new terms the other party will whip out their copy and realise that the MD5 sums are the same. They'll know you've tried to defraud them and they'll have a good chance of convincing a court since there will be likely be evidence that you were the one to generate the contracts. So it won't work.

    8. Re:Correct me if I'm wrong, but... by bogado · · Score: 1

      I have not read the article, but what I got is that you can create two messages with the same MD5 by appending (or pre-pending) two different blocks into it. I guess it is is useless for passwords, but if it is easy maybe you should change the password to SHA1.

      --
      []'s Victor Bogado da Silva Lins

      ^[:wq

    9. Re:Correct me if I'm wrong, but... by anthony_dipierro · · Score: 1

      Unless you can produce a copy of the doppleganger, your out of luck in court because you cant prove that i've created an alternative.

      In civil court you don't have to prove something in order to win a case.

      Now, this sounds mathematically improbable that I could come up with two contracts that make sense and have the same md5sum, but none the less possible.

      It may or may not be possible. The set of contracts which make sense is finite, as a 500 gig contract doesn't make sense.

    10. Re:Correct me if I'm wrong, but... by Facekhan · · Score: 2, Insightful

      A lot of routing protocols use MD5 as the password hash for authentication. If someone found a collision for some of the important internet router BGP peers then large chunks of the internet could get routed the wrong way or just small parts like the subnet your bank controls could get routed to a some black hat's basement in Romania where he could setup a mock intranet and webserver to steal customer info through a spoofing attack.

    11. Re:Correct me if I'm wrong, but... by ReelOddeeo · · Score: 2, Interesting

      I was thinking the same thing.

      See the long list of cryptographic hash functions on Wikipedia. (Bottom of page.)

      If you used three or four different hash functions, to produce a long combined message digest (concatenated from the several message digests), it would seem to be much more difficult to produce files that collide. An attack resulting in a collision on one algorithm, is not likely to work for another algorithm. Finding two files that simultaneously fool two or more algorithms seems a much more difficult task than just fooling a single algorithm like MD5.

      This is of utmost importance. We can't have the RIAA infiltrating garbage files ("what the fiddlesticks do you think you're doing?") into our p2p networks.

      --

      Those who would give up liberty in exchange for security and DRM should switch to Microsoft Palladium!
    12. Re:Correct me if I'm wrong, but... by kiltedtaco · · Score: 3, Informative

      The Wang et al attack does not apply to passwords. Their attack applied to situations where the md5 input plaintext was known. Collisions are nowhere near common enough when using less than 16 character inputs to md5 to provide a feasible means of cracking passwords. Nobody has ever found a collision with under 128 bits of input, and the attacks in the article take considerably more than that.

    13. Re:Correct me if I'm wrong, but... by jd · · Score: 1
      A common way for users to generate a "secure" password is to take a common word and append something to it. (More often, though, they just use common words.)


      If you can append payloads to the end of the alias and the original, and get the same result, then you may be able to use dictionary attacks on "extended" regular words.


      Actually, the article does talk about finding regular aliases to an unmodified original value. To me, that is by far the more worrisome aspect of the MD5 weakness. And, yes, I'll be switching to SHA-1 passwords at some point soon.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    14. Re:Correct me if I'm wrong, but... by Dracolytch · · Score: 1

      I agree...

      For those who don't understand "Variadic"

      First, we've always known in theory this is possible. The question is: How useful is it? Can me, replacing block 4356 of a file with a different block useful? Possibly, but not really. The only thing you could do is gunk up files, but people already do that by publishing fake legit files.

      Nowadays a lot of P2P applications are also moving to hash trees... Meaning that each X bytes of a file is hashed, and the combination of those hashes are hashed. Generally this is used for finding corrupt file parts (If the main hash doesn't go right, then you can find the parts of the tree that don't match right). It isn't a lot more computationally expensive in a real-world situation to just compare each leaf's hash to look for problems anyway.

      So, you're downloading a 10 gigabyte file (Movie). Say you hash every megabyte (10,000 hashes), that's an additional what... 160K? BFD. Then the MPAA^H^H^H^H offending party would have to write a payload that fits inside a megabyte and that section hashes right. Increase hash resolution as needed. A fingerprint that validates 100K would result in 1.6 Megabytes of overhead data. 10K would be 16 megabytes.

      Writing a payload that would hash correctly, and do anything in 10K of space would be exceptionally difficult in real-world scenarios. The overhead would only be about 1.6% of the filesize. Not a bad tradeoff.

      For the really paranoid, you can increase your fingerprint resolution to 1K (Probably about the same size as this post), and it would add 16% to your file. You'd start to notice the overhead at that point, but it'd be nearly impossible to do anything with a file that well fingerprinted.

      Of course, you can always try to attack the hash as it's being transferred as well, but if they invalidate your checksum, they could give you any old file they wanted anyway.

      ~D

      --
      This sig has been enciphered with a one-time pad. It could say almost anything.
    15. Re:Correct me if I'm wrong, but... by jd · · Score: 2, Informative

      Very true, or it could be used to poison the BGP routing tables. The problem with such an attack is that, once one router believes in the poisoned routes, the poison will spread very rapidly.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    16. Re:Correct me if I'm wrong, but... by Spy+Hunter · · Score: 1

      You miss the point. Of course hash functions will have collisions, it's inevitable. The point is that they should be computationally infeasible to find, and MD5 has just been broken in that respect. Read the article. Using the attack it is possible to get a benign executable digitally signed (using an MD5 hash of course, think ActiveX control), then later replace the executable with a malicious version that has the same MD5 hash. The attack still probably isn't practical for spreading malware, since as the author points out, the security model for ActiveX controls has never been a technical one, it is a legal one (if it attacks us we know who to sue). However, there may be other scenarios not yet thought of where this attack is feasible. Being able to swap out a constructed benign file with a constructed malicious one that has the same hash might break other crypto-systems. Also, it is quite likely that new attacks on MD5 will follow soon, meaning that its days as a useful hash are numbered. That's what the "big deal" is.

      --
      main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
    17. Re:Correct me if I'm wrong, but... by Sheetrock · · Score: 1
      I work in cryptography, and the people I know have written off MD5. Heck, the people I know are also quite worried about SHA-1, and the current best attack against that one isn't nearly as strong.

      Is there a hash function they don't have serious qualms about? I've seen an intriguing way to work around the limitations present in MD5 involving the generation/validation of hashes for multiple (perhaps overlapping) blocks in a set of data, but this is moving away from the fixed-width representation of validity MD5 was designed for.

      --

      Try not. Do or do not, there is no try.
      -- Dr. Spock, stardate 2822-3.




    18. Re:Correct me if I'm wrong, but... by bogado · · Score: 1

      This is by far the worst atack I ever saw in any cripto-protocol. This is really-really-bad. I think passwords are the least of my preocupation right now.

      Almost all the distributions use MD5 to check the validity of files downloaded. (I'm not sure if pgp uses MD5). Changing this will be a really big head ache, and it will have to be done. :-/

      --
      []'s Victor Bogado da Silva Lins

      ^[:wq

    19. Re:Correct me if I'm wrong, but... by thing12 · · Score: 2, Informative
      Not really... the article is talking about relatively huge files which when you swap out blocks in order to result in the same MD5. Passwords on the other hand are much shorter than the MD5 in the first place and no two strings shorter than the MD5 length (128 bits) will give you the same MD5. The MD5 space is many orders of magnitude larger than the typical password space - even a 16 character password that uses mixed case, digits and punctuation (basically all of base64), that contains about 96 bits of information -- that's 4 billion times smaller than the space of MD5.

      What I'm getting at is that you'll probably always have to either know the password + the salt (if there is one), use brute force, or use a database of MD5's for all possible passwords in order to decrypt an MD5'd password. But since there are already MD5 databases, we're kind of past this part anyway.

      As for access tokens, MD5's are chosen simply because they're large and seemingly random which makes them "unguessable". Since they're just temporary anyway, guessability is all you're trying to prevent - you'll get a new one next time and the attacker will have to start over.

    20. Re:Correct me if I'm wrong, but... by chialea · · Score: 1

      You want to be really, really careful when trying to combine hash functions to gain more security. You never want to show up as one of those examples illustrating the principle of Don't Roll Your Own Crypto. I believe there was an interesting paper in crypto 2004 that dealt with this.

      SHA-1 is currently considered the best option that I'm aware of. The best attack is not practical for the full scheme, only for subsets. There have been several proposals that seem to fix the vulnerability, but I'd personally use SHA-1 for now and wait until at least crypto and eurocrypt next year to see what starts shaking out. I'm not active in this area, so I can't speculate as to what that will be.

      Lea

    21. Re:Correct me if I'm wrong, but... by Kanasta · · Score: 1

      Consequently, while this is of academic interest I don't see what the big deal is; any time you reduce a large file to a fingerprint you will inevitably run into problems like this because it is impossible to represent one-to-one every individual possible combination of a large set of data in smaller sets ("fingerprints"). You can reduce the risk by increasing the set domain with a larger variadic function but it is impossible to escape this constraint without using fingerprints as large as the data itself.

      The point is you should NOT be able to create a file that MD5sum's to the same result as another file. When you check the MD5, you are supposed to be able to say that it is either the same file, or by cosmic coincidence it happened to have the same MD5. When you DL a program, it is unlikely that a random file that runs the same way has the same MD5. But now, part of the data could have been tampered with without altering the final MD5 hash. What if your bank statements are checked with MD5?

    22. Re:Correct me if I'm wrong, but... by Anonymous Coward · · Score: 0

      He can create a file that MD5sum's to the same result as a legitimate file, but does not have full control over the content or size of the result.

      As I understand it, this is not the case. One must have control of both files to use the particular attack described in the paper linked.

      Or he can create two files that MD5sum to the same result. But he has to have control over both files, which offers effectively no advantage to someone who is trying to spread malware or tamper with existing archives that have been MD5summed.

      This is correct. However, in addition to what you say, the two files in question must begin with essentially random data (the part with the same MD5), to which identical data must be appended.

      The "attack" described in the paper is quite simple: by taking two random files with identical MD5s (and by "random files" I mean that their contents are junk), using them as keys to encrypt another file, and appending the other file to both sets of random junk, you have created two encrypted versions of the same data, with identical MD5 sums, one of which carries its own key with it and the other of which does not.

      BFD. The number of cases in which this particular "attack" represents a security risk can be counted on the fingers of one cupboard.

      There is NO, repeat NO RISK that the attack described in the paper linked could be used to crack passwords, insert malware into a program (even your own program), or interfere with P2P networks. It can't be done, period.

      In other words, there is NO REASON WHATSOEVER to start moving away from MD5. It has not been shown to carry any greater risk than other hashing systems. SHA-1 is not necessarily any more secure. For all we know at present, MD5 may still be secure in ten years' time, while a critical weakness might be found in SHA-1 next week. Wouldn't you feel silly switching to SHA-1 tomorrow in that case?

    23. Re:Correct me if I'm wrong, but... by ars · · Score: 1

      I'm afraid you don't know much about MD5. Combining hashes IS how MD5 works!

      That's exactly what makes it work - you run the data through two different hashes, then combine the hashes - it's very hard to make data that happens to get the same result for both hashing functions, and that's how MD5, SHA, and all other secure hasing functions works.

      --
      -Ariel
    24. Re:Correct me if I'm wrong, but... by XO · · Score: 1

      Apparently you've never met my ex's lawyer! *received a "brief" big enough to completely fill my slightly larger than normal mailbox once*
      (and they call them BRIEFs.. boggle)

      --
      "Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
    25. Re:Correct me if I'm wrong, but... by WolfWithoutAClause · · Score: 1
      If I'm translating this properly...

      Dunno. I'm too busy barfing over your signature:

      "Try not. Do or do not, there is no try. -- Dr. Spock, stardate 2822.3."

      It was friggin' Yoda- it even sounds like him. You're a moron. :-)

      --

      -WolfWithoutAClause

      "Gravity is only a theory, not a fact!"
    26. Re:Correct me if I'm wrong, but... by SlashdotMeNow · · Score: 1

      A malicious person can create 2 files with the same checksum. So I can create a useful app and let it loose over P2P. Once everyone trusts the app I can exchange it with my malicious one. No-one will know that it's changed and will get 0wned! Yay!

    27. Re:Correct me if I'm wrong, but... by thogard · · Score: 1

      Of course MD5 uses a table of sines as a modification factor and there is no reason that table needs to start at sine(0) so a p2p program could say "give me the hash of block 34 using seed 12" which uses an MD5 with the table starting at sine(12) and that will break this sort of thing very quickly.

    28. Re:Correct me if I'm wrong, but... by Facekhan · · Score: 1

      Now that organized crime has begun to see the value in spoofing attacks and identity theft in general it will not be long before organized and well funded groups exploit the reality that it does not take that much money to build a computer capable of finding a collision for a md5 hash value in a reasonable amount of time. Perhaps 100k-500k. Well within the budget of both large organized crime syndicates, terrorists, and intelligence agencies.

    29. Re:Correct me if I'm wrong, but... by m50d · · Score: 1
      Or he can create two files that MD5sum to the same result. But he has to have control over both files, which offers effectively no advantage to someone who is trying to spread malware or tamper with existing archives that have been MD5summed.

      No, this does give an advantage to someone trying to spread malware. I create a cool little game and release it with an MD5 sum. Various important people say that it's a good game, works, and won't mess up your computer. Then I switch it with the one including some trojan. MD5 still matches. Profit. But I thought that attack existed anyway.

      --
      I am trolling
    30. Re:Correct me if I'm wrong, but... by GigsVT · · Score: 1

      Of course, if you are going to go as high as 16% overhead, you could just compute XORs for 20% slices, which gives you 20% overhead (like RAID5), but you only need to download 5 of the 6 slices. If something is corrupt (using a normal overall MD5), download the 6th parity slice and repair.

      So you not only get verification, you also get error recovery.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    31. Re:Correct me if I'm wrong, but... by ars · · Score: 1

      Switching to SHA-1 may not help very much. Although there is no such attack on SHA-1 right now, MD5 and SHA are all variants of the same algorithm.

      Actally ALL secure hashes are based on the same algorithm with only some modifications each time the hash has been broken. (Except MD4, which was new - but broke quickly.)

      --
      -Ariel
    32. Re:Correct me if I'm wrong, but... by ars · · Score: 1

      Everybody keeps asking this - but you really need to learn more about MD5. Combining hashes is EXACTLY how MD5 work!

      MD5 is based on the fact that it's hard to find two different backs that happen to have the same hash - this is what makes it secure.

      And be very careful when combining hashes, most hashes for all that the have different names (MD5, SHA, etc) are all based on the same algorithm. If you combine them poorly you may end up with something weaker then you started. (Since they can cancel each other out - since they are basically the same.)

      --
      -Ariel
    33. Re:Correct me if I'm wrong, but... by ReelOddeeo · · Score: 1

      You make a good point about multiple hashes being all very similar. If that is true, and I do not know this, then what you end up with is just a "wider" MD5.

      I do not understand how combining any number of hashes could end up weaker than the strongest hash.

      If I combine MD5 + weakHash1 + weakHash2, then the resulting hash is as strong as MD5. At a minimum I would have to expend the same effort to construct two files that are aliases under MD5, even if the cost were ZERO to get them to alias under weakHash1 and weakHash2 together. If weakHash1 and weakHash2 are any good at all, then the result is better than MD5 alone.

      For instance, if weakHash1 were just CRC32, then you must find two files that alias under MD5, but in addition they must have the same CRC32.

      Applied Cryptography, which I don't have with me at the moment, describes how to build a strong hash function out of a strong two-way cipher (such as AES or DES or the like). For instance, I could take strongTwoWayCrypto and make a strongHash function out of it. This hash function is not necessarily similar to MD5. Combining two hash functions cannot be worse than the strongest hash function, and is probably much better if the weakest hash function is any good at all.

      --

      Those who would give up liberty in exchange for security and DRM should switch to Microsoft Palladium!
    34. Re:Correct me if I'm wrong, but... by ars · · Score: 1

      You didn't specify what "combine" means.

      If you just meant concatenate, i.e. abc and def = abcdef then you are correct.

      But usually you don't want to do that since it makes the hash longer, usually you use some function to combine them together, so that abc and def = tyu or something like that.

      Lets say you use xor as the combining function (which is the first impulse) and you are combining two similar hashes: 01001 and 00110 you end up with 10000 which is much weaker then what you started with.

      That's why combining similar hashes is dangerous - even if you don't use xor you need to be very sure you don't end up with one hash canceling out the other.

      Or if your combining function is no good it may give one hash more weight then the other - for example lets say you add two numbers as the combining function: 445 and 003 = 448. If one hash has a tendency to give larger hashes then then the other then it carries more weight then the other (2 out of 3 digits are from the first hash) - so if it's a weaker hash you end up making the final hash weaker then what you started with.

      --
      -Ariel
    35. Re:Correct me if I'm wrong, but... by ReelOddeeo · · Score: 1

      I did mean "concatenate" when I say "combine".

      The longer digest is insignificant in this day of quarter Terabyte drives, Gigabyte memory, multi Gigahertz processors, etc. Suppose my concatenated digest were 8K bits wide! That is only 1K bytes. Trivial really. Something you could add as an extended attribute into the directory entry of every file on your system.

      When exchanging P2P info about files, a very secure concatenated hash that fit into 1K bytes (and that is a long digest!) would still easily fit within a packet with a filename. Even better if the hash were say 1024 bits, and therefore only 128 Bytes.

      In addition to the benefits that a more secure hash would give P2P applications against the RIAA; other applications of secondary importance would also benefit. Banking, commerce, digital signatures, digital certificates, password hashes in /etc/shadow files, etc.

      --

      Those who would give up liberty in exchange for security and DRM should switch to Microsoft Palladium!
  15. I have a novel solution by quamaretto · · Score: 1

    Buy it instead.

    I'm kidding. I download Linux distro's via bittorrent all the time and willfully ignore the MD5 sum, and now I come to find that it's compromisable. OH SNAP!

    I can envision something being added on to bittorrent to prevent bad blocks being passed around via ... magic. Okay. I can't really envision it, but I'm sure it can be done.

    On to the next topic I'm not qualified to comment on...

    --
    *is run over by rotten tomatoes*
    1. Re:I have a novel solution by Dan+Farina · · Score: 2, Informative

      I'm not sure if you are being sarcastic or something, but as said before, Bittorrent uses SHA-1, not MD5....

      So you are safe downloading linux for now via bittorrent. Besides, the chances of MD5 collisions happening from sheer luck/unluck are very slim. (after all, we've been using it for ages with no reports)

      The most dangerous factor to continued use of MD5 are malicious individuals.

  16. Not just MD5 by PureFiction · · Score: 2, Interesting

    Other weaknesses were reported in various other secure digests, including MD4, RIPEMD, HAVAL-128, SHA-0.

    SHA-256 is a good replacement. SHA-1 should be fine but if you are going through the trouble of an upgrade, why not make it sufficiently future proof?

    1. Re:Not just MD5 by jd · · Score: 1
      Problem with SHA-256 (and SHA-512) is that they're not considered sufficiently verified. The modification used to add the extra bits may or may not be sound. It's probably OK, but there's no assurance of that.


      Whirlpool (another 256-bit hash) is another excellent hash, but again isn't considered adequately tested to be considered "safe".


      Of course, you could opt for an unconventional hash. There are some algorithms based on cellular automata. Although some have known weaknesses, it would seem that this would be computationally more difficult to break.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    2. Re:Not just MD5 by chialea · · Score: 2, Informative

      SHA-1 has an attack that's somewhat troubling. I'd look to next year's crypto and eurocrypt conferences to see starts shaking out as the new standard.

      Still... I would switch out MD5 if you have a target that's worth pretty much anything at all. After a break like this, I'd expect MD5 to become basically useless pretty fast. Of course, I don't work in hash collisions, I work mostly in protocols...

      Lea

  17. link by saderax · · Score: 1

    heres a very good link for the "very good link about the algo"

    1. Re:link by Proteus · · Score: 1

      And here is a very good link that will keep you from getting your pants in a bunch when people forget to link URLs.

      As an aside, it's not that hard, people, just enclose your URL like this: <URL:http://your.url.here.com/>, and the magic of /. transforms it into http://your.url.here.com/

      --
      We may not imagine how our lives could be more frustrating and complex—but Congress can. – Cullen Hightower
    2. Re:link by Erasmus+Darwin · · Score: 1
      "And here is a very good link that will keep you from getting your pants in a bunch when people forget to link URLs."

      Your very good link won't protect me from people who can't link because my web browser is lynx.

  18. If I Had A Million Terabytes... by Tackhead · · Score: 5, Funny
    If I had a million terabytes of storage, y'know what I'd do?

    Two files with the same MD5 hash at once. Aaw yeah.

    1. Re:If I Had A Million Terabytes... by Anonymous Coward · · Score: 0

      y'know what I'd do?

      Not a girl that's for sure =P

    2. Re:If I Had A Million Terabytes... by Anonymous Coward · · Score: 0

      Since md5 is only 128 bits, you only need a maximum of 2x129 bits to store two files with an identical md5sum. In real filesystems, that would at least be 136 bits or 17 bytes each, or, more likely 512 bytes.

    3. Re:If I Had A Million Terabytes... by PitaBred · · Score: 1

      Not all matching MD5 summed files dig guys with lots of storage space, though

    4. Re:If I Had A Million Terabytes... by wren337 · · Score: 2, Funny

      Well, the type of files that would double up on a dude like me do.

  19. FYI by vigyanik · · Score: 1
    The MD5 algorithm is like a hash function and the MD5 sum is the hash key of the data in the file. If the hash key is smaller in size than the data item then collision (i.e. two data items having the same key) is always possible.

    The exploit has used the properties of the MD5 algorithm to create two executable files with the same MD5 sum.

  20. MD5 is obviously less secure by Anonymous Coward · · Score: 0

    Anytime you use a hash especially for passwords you are only in fact giving the intruder additional password combinations that will work.

    proof:

    md5(x1) = some md5 string
    md5(x2) = same md5 string

    where x1 thru x100000... could be any number of possible passwords that will equal the same hash.

    1. Re:MD5 is obviously less secure by pclminion · · Score: 2, Informative
      While this is technically true (there are an infinite number of passwords that hash to the same value), it's not of much practical use. There are only 256^8 possible 8-character passwords (less, if you don't allow un-typeable characters). There are 2^128 possible MD5 hashes. 256^8 is much, much, MUCH less than 2^128, thus the probability of discovering another valid passwords of 8 characters or less is negligible.

      Sure, there are an infinite number of possible passwords, but they're all impossibly huge. I can come up with a password which is one trillion characters long and which hashes to the right result, but that's not practical.

    2. Re:MD5 is obviously less secure by cryptor3 · · Score: 1

      What is this, some kind of techno-troll? What you've said is true, but like the other replies have implied, it's basically inconsequential.

  21. clarification: by Ayaress · · Score: 1

    I say this under the assumtion that "someday" somebody will figure out a way to take a good file and make an MD5 equivalent file to poison a torrent with. The way I read the blurb, it sounds like both files were specifically crafted to create a collision.

    1. Re:clarification: by XO · · Score: 1

      Well, obviously, the algorithm for MD5 is known, therefore, it would be relatively easily to write a program for a computer that would determine what would be necessary to build a different file with the same MD5-sum. Poisoning a torrent or P2P program would be an easy thing to do. Making a program that erases everything on your disk have the same MD5-sum as say, an IRC client, would take a little more work.

      The only way to avoid it is to straight byte-compare the files. An EASY way to make it less of a pain in the ass, if someone were to go about writing a program that would make spoofing MD5-sum files easy, would be to compare a random sampling of say, 1/10th of the bytes in the file. That's something that the bittorrent and P2P software would be able to handle easily, communicating back and forth between them, and letting some person on one end or the other know that their files are fuct.

      It seems that MD5-sums are more of a "human readable" way to get a quick comparison of wether two files are supposed-to-be-identical or not. However, though it is slightly more difficult to defeat than older methods, it's not all -that- difficult. So, since the P2P programs have no need for "human readable" ways to compare the files, as they can communicate between each other and determine the differences, we really don't have much of a need for MD5sum anymore.

      Make sense?

      --
      "Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
    2. Re:clarification: by Eivind · · Score: 1
      No. Makes no sense. Or more precisely makes "sense" only for a person with extremely limited knowledge of crypto and in this spesific case, cryptographically secure hashes.

      First, the fact that md5 is a published algorith does *not* imply that finding different files with identical hash is easy. Yes, you can just try different files until two matches, this will happen at the latest when you've tested 2^hashsize+1 files, and on the average when you've tested sqrt(2^hashsize) files. That would take forever.

      Your suggestion to randomly check say 10% of the file is dumb. This would mean if I changed some small thing in a file so that the file became garbled (or malicious) then in 90% of the cases you wouldn't discover it. It would also mean that you add a 10% overhead to all your downloads. (no show-stopper, I agree, but also not desireable)

      The solution is to use a secure hash. md5 has been suspect for quite some time now, sha1 is the current favourite. A secure hash must have the following properties:

      • Knowledge of the hash must convey no knowledge of the file beyond "could it be a hash from :this: file?"
      • It must be computationally hard to create two files with the same hash.
      • It must be computationally easy to compute the hash for any given file.
      • All of the above must be true even with complete knowledge of the hashing-algorithm.

      If we didn't have hashes with these properties we could toss away PGP, infact pretty much forget about digital signatures alltogether (since signing a complete message is very computationally expensive, what is typically done is signing the *hash* of a message). Same with SSL, SSH, and infact something like 95% of all cryptographic products in use today.

      Luckily there are secure hashes. Or atleast hashes we have no idea how to break. Ofcourse in principle sometime in the future a break for some of those hashes can be found, in which case we'll need to design better ones.

      But there's no indication that this is a eternal arms-race. There are problems that where thougth to be "hard" 30 years ago which are still pretty much equally hard. For example, factoring large primes is still a hard problem now, and there's no indication that will ever change. Some algorithmic progress is being made, and ofcourse Moores law means we'll need a few extra bits of hash every decade, but nothing I'd worry about.

    3. Re:clarification: by gedhrel · · Score: 1

      "Well, obviously..." or not. True, if you're a particular flavour of mathematician: "we just construct all possible files in sequence and compute the MD5 of each until we find a match".

      However, there isn't enough energy in the solar system to implement this approach.

      Finding a plaintext that has a preselected MD5 hash without using brute-force is still an unsolved problem; you can bet, however, that if a solution is found it won't be introduced with "well, obviously".

      You can only verify a file's integrity with absolute certainty, as you say, by making a comparison that uses the same information content as the original. However, in the real world, you must of course factor in the liklihood that spurious real-world events will disrupt your computation; ignoring for the moment hostile attacks against your verification protocol.

      Additionally, two files can have the same MD5 hash, be arbitrarily long, an ddiffer in only 6 bits. Your 10% comparison is doomed.

    4. Re:clarification: by cburley · · Score: 1
      factoring large primes is still a hard problem

      Is that you, Bill?!

      --
      Practice random senselessness and act kind of beautiful.
    5. Re:clarification: by Panaflex · · Score: 1

      Moores law may not apply to molecular or quantum computing. Same as the tube, transistor/chip revolutions.

      Pan

      --
      I said no... but I missed and it came out yes.
    6. Re:clarification: by XO · · Score: 1

      Well, since the hash computes over a certain size of blocks of the file, and you have to wholesale replace the -entire- block with a block that has the same MD5, to hash to the same overall MD5, then verifying a relatively randomized set of bytes, across the file, would give you a much better representation of "are user A and user B going to send me the same file, or files that are intentionally made to look alike but really aren't".

      --
      "Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
  22. The "Detailed Summary" by Effugas · · Score: 5, Informative

    [This is the author]

    I've been doing some analysis on MD5 collision announced by Wang et al. Short version: Yes, Virginia, there is no such thing as a safe hash collision -- at least in a function that's specified to be cryptographically secure. The full details may be acquired at the following link:

    http://www.doxpara.com/md5_someday.pdf

    A tool, Stripwire, has been assembled to demonstrate some of the attacks described in the paper. It may be acquired at the following address:

    http://www.doxpara.com/stripwire-1.1.tar.gz

    Incidentally, the expectations management is by no means accidental -- the paper's titled "MD5 To Be Considered Harmful Someday" for a reason. Some people have said there's no applied implications to Joux and Wang's research. They're wrong; arbitrary payloads can be successfully integrated into a hash collision. But the attacks are not wildly practical, and in most cases exposure remains thankfully limited, for now. But the risks are real enough that responsible engineers should take note: This is not merely an academic threat, systems designed with MD5 now need to take far more care than they would if they were employing an unbroken hashing algorithm, and the problems are only going to get worse.

    Some highlights from the paper:

    * The attack itself is pretty limited -- essentially, we can create "doppelganger" blocks (my term) anywhere inside a file that may be swapped out, one for another, without altering the final MD5 hash. This lets us create any number of binary-inequal files with the same md5sum.

    * MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash. This leads to...

    * Attacks are possible using only the proof of concept test vectors released by Wang -- the actual attack is not necessary.

    * Stripwire emits two binary packages. They both contain an arbitrary payload, but the payload is encrypted with AES. Only one of the packages ("Fire") is decryptable and thus dangerous; the other ("Ice") shields its data behind AES. Both files share the same MD5 hash.

    * Digital Signature systems are vulnerable, as they almost always sign a hashed representation of data rather than the data itself.

    * This is an excellent vector for malicious developers to get unsafe code past a group of auditors, perhaps to acquire a required third party signature. Alternatively, build tools themselves could be compromised to embed safe versions of dangerous payloads in each build. At some later point, the embedded payload could be safely "activated", without the MD5 changing. This has implications for Tripwire, DRM, and several package management architectures.

    * HMAC's invulnerability has been slightly overstated. It's definitely possible, given the key, to create two datasets with the same HMAC. Attacker possession of the key violates MAC presumptions, so the impact of this is particularly questionable.

    * Very interesting possibilities open up once the full attack is made available -- among other things, we can create self-decrypting executables (fire.exe and ice.exe) that exhibit differential behavior based on their internal colliding payloads. They'll still have the same MD5 hash.

    * Several doppelgangers may (relatively quickly, as per Joux) be computed within a single multicollision-friendly block. As such, the particular selection of doppelganger sets within a file can itself be made to represent data. It's relatively straightforward to embed a 128 bit signature inside an arbitrary file, in such a way that no matter the value of the signature, a constant MD5 hash is maintained. This is curiously steganographic.

    * Many popular P2P networks (and innumerable distributed content databases) use MD5 hashes as both a reliable search handle and a mechanism to ensure file integrity. This makes them blind to any sign

    1. Re:The "Detailed Summary" by BranMan · · Score: 2, Insightful

      It seems to me that at least the malicious nature of this vulnerability can be limited by using the size of the file as an additional check besides the MD5 hash. In other words, if you know how large the file is supposed to be in bytes, then there is less likelihood that something malicious can be passed off as the original - even if it has the same MD5.
      Is that right? I'm no expert by any means, but could this reduce the potential for a real attack to pretty much nill?

    2. Re:The "Detailed Summary" by javax · · Score: 1

      The FreeBSD ports and the DarwinPorts collection both use md5 verification for the source files they download (and probably many other source distrubution systems also do). If someone hacks one of the mirrors the source is distributed via, he could break into all clients...

    3. Re:The "Detailed Summary" by Effugas · · Score: 2, Insightful

      File size remains constant between Fire and Ice. Good thinking, though.

      The solution is to not use MD5.

    4. Re:The "Detailed Summary" by Anonymous Coward · · Score: 0

      Checking the file size would not help, since the attack does not allow changing the file size. You have to replace specially prepared data with other specially prepared data of exactly the same size. That's why it's considered "limited" at present time (until someone finds a way to extend it).

    5. Re:The "Detailed Summary" by Dracolytch · · Score: 2, Insightful

      I know a fair amount about this stuff, but obviously not as much as you...

      What do you think are the most viable alternatives? It seems to me that SHA-1 would suffer similar vulnerabilities. Does SHA-1 suffer from the appendable cascade issue?

      Do you think there is any way to avoid this kind of problem with hashes? I'm not really aware of any alternate techniques that wouldn't suffer from this same kind of attack eventually. Sure, you could develop related algorithms that increase the hash size, but then it looks like it'd just be an arms race between hashers and colliders.

      ~D

      --
      This sig has been enciphered with a one-time pad. It could say almost anything.
    6. Re:The "Detailed Summary" by Effugas · · Score: 2, Informative

      SHA-1 also uses a Merkle-Damgard construction, so yes, if we find a SHA-1 collision, it'll be appendable.

      SHA-1 has a much stronger design. It's starting to show cracks, though, so I don't recommend anything. Something based on AES will come, though -- maybe AES-OMAC, maybe Whirlpool. At the core of almost every hashing algorithm is just a block cipher anyway...

      --Dan

    7. Re:The "Detailed Summary" by __aaahtg7394 · · Score: 1

      The intuitive short-term solution, at least to me, is to use N different MD5sums for each file, taken at N different offsets within the file (each unique mod block size). Of course, this becomes moot when collision-finding becomes cheap, but it provides a much greater degree of safety than single MD5 for the near term, with the benefit of leveraging existing codebases and minimal additional overhead.

      That said, it's likely that moving away from MD5 is still a better idea.

    8. Re: The "Detailed Summary" by Omniscient+Ferret · · Score: 1

      The intuitive short-term solution, at least to me, is to use N different MD5sums for each file, taken at N different offsets within the file (each unique mod block size).

      It won't do the offset thing, but tiger tree hashing uses blocks, and combines them into another checksum. If the blocks were small enough, they'd catch this. I'm not sure about the block size involved in either, though; I suppose with a bit more CPU, you could get arbitrarily small blocks.

    9. Re:The "Detailed Summary" by BranMan · · Score: 2, Insightful

      The file size does indeed remain constant, but the attacker is constrained as to what he can substitute out of the original file while maintaining both the file size and the MD5 hash. Without the file size remaining constant too, the attacker can theoretically make the original file into a trojan. With the file size remaining constant, most likely the only thing an attacker can do is break the file - he doesn't have enough leeway to put something malicious in there. I think.

    10. Re:The "Detailed Summary" by Ninja+Programmer · · Score: 1

      Wait a second, wait a second. Consider the following code:

      char md5_collision_play_area[2048] = {0};

      if ((CRC16(md5_collision_play_area, 2048) & 1)) {
      return goodApplication();
      } else {
      return Virus();
      }

      Ok, so now you apply the Wang et al. mechanism to start filling the md5_collision_play_area with stuff to find a collision pair. Then you have a 50% chance of finding a pair such that one version of the application runs the goodApplication() while the other runs Virus(). This means that on average, you have to run the collision construction mechanism twice before you get a pair that execute that condition with opposite results.

      So then you publicize the file which runs the goodApplication() and publish the MD5 and advertise its functionality as the goodApplication() (which maybe you stole from somewhere.) Then one day you just switch it for its doppleganger (with the same MD5) which runs the virus version.

      Is this not a way to exploit the MD5 weakness *today*?

    11. Re:The "Detailed Summary" by Anonymous Coward · · Score: 0

      nope, the whole point of fire and ice is that you can put in something malicious but hidden;

      At some point early in the normal code path, you make a conditional jump based on a value in the middle of the replaced sequence. At the point where it ends up with the malware case, you decode a chunk of apparently random data into malware using the replaced sequence as part of the encryrption key (so that disassembling the binary will never show anything more suspicious than random data).

      In the non-malicious path you don't decrypt, but you do use the random data as a seed for some random number generators or something. This justifies its existence in the binary.

      Now you have two binaries with the same MD5. One is an innocent game which happens to have a large amount of random seed material. The other is a piece of malware with the evil AES encrypted payload.

  23. Cash Money? by Grendel+Drago · · Score: 1

    Was there a cash money prize for someone who could produce a pair of strings with the same MD5 sum? I remember reading or possibly hearing that No One had Ever discovered a hash collision with MD5. Is this the first ever hash collision found?

    --grendel drago

    --
    Laws do not persuade just because they threaten. --Seneca
    1. Re:Cash Money? by Meostro · · Score: 2, Informative

      No, but I believe this one is:

      XiaoyunWang, Dengguo Feng, Xuejia Lai, and Hongbo Yu
      "Collisions for hash functions md4, md5,haval-128 and ripemd"

  24. I studied different hashes.. by Anonymous Coward · · Score: 1, Funny

    I found out the darker and moister it was, the more powerful it was. Of course any form of hash was much stronger then regular leaf pot and you could get far more reusable resin when you were done with the pipe.

    1. Re:I studied different hashes.. by gentleolas · · Score: 1

      Actually the strongest (Moonshine, CCup winner 2002/3 in Amst.) is quite dark, but seems dry as a stone....

  25. ______ with be harmful/obsolete in the future.. by l4m3z0r · · Score: 2, Insightful
    Really no big surprise, all currently implemented algorithms for security and digital signatures will be rendered useless or easily hackable in the future. The fact that a compromise seems to be soon is the part thats interesting.. however the parent fails to address how serious it is.

    For the most part, big deal, all of current security procedures will be harmful ie compromised in the future...

    1. Re:______ with be harmful/obsolete in the future.. by Drakonian · · Score: 1

      Yeah, big deal. There's no hope. Let's just use insecure ones and save ourselves the trouble. Hey, wanna buy something on my website?

      --
      Random is the New Order.
  26. Solution: Use more than one hash algorithm by NZheretic · · Score: 2, Informative
    This kind of attack can be mitigated into non-existance by just using two dissimilar hash algorithms.

    Using MD5 with SHA1, or even the older MD2 or MD4 will reduce the probability of creating a compatable binary with the same checksum to virtually zero.

    If only one checksum is required then just XOR the resulting checksums from each algorithm.

    1. Re:Solution: Use more than one hash algorithm by gokeln · · Score: 1

      This seems intuitive, but it is wrong. The resulting improvement in security is not worth the additional cost of computing both. You do not get exponentially improved security, but only an arithmetic improvement. What you should actully recommend is a better algorithm, such as SHA-256.

      --

      There's no time to stop for gas, we're already late.
    2. Re:Solution: Use more than one hash algorithm by m50d · · Score: 1
      No, that's nonsense. This attack lets you create not just two different binaries with the same md5, but arbitrarily many. Then you just use the attacks against md4 or md2 on this set of binaries. Actually I think for md4 you don't even need to do that, as you can make a binary for an arbitrary md4 fairly easily.

      SHA-1 plus MD5 is still secure, yes, but only because SHA-1 is still secure, you might as well just use SHA-1. If you want to improve security, use a double-length version of SHA-1, not SHA-1 plus MD5.

      --
      I am trolling
    3. Re:Solution: Use more than one hash algorithm by gedhrel · · Score: 1

      I think your reasoning is flawed. Let's say you have a set, {a, b, c, d} of strings all with the same MD5 hash.

      How, exactly, do you go from there to a set of strings with the same MD5 _and_ the same (some other hash)? Bear in mind that

      otherHash(a) != otherHash(b)

      with very high probability; there's nothing essentially "magic" about these strings, except with respect to a single hash algorithm and a single initialisation vector.

    4. Re:Solution: Use more than one hash algorithm by huntse · · Score: 1

      Sadly you are completely wrong in almost every detail. This type of superencryption doesn't necessarily improve things at all and may make things worse. In effect you're saying "The chocolate lock I have on my front door isn't a problem because I have a deaf dumb and blind guard dog. Each on his own wouldn't do the job but together they reduce the chances of being burgaled to virtually zero".

      Trivially, consider a "hash" algorithm (hash1) where you just add all of the ascii values of the letters together to get a hash value. This will obviously be insecure and easy to manufacture collisions.

      Now consider another "hash" algorithm (hash2) where you just subtract all of the ascii values of the letters together to get a hash value. This will obviously also be insecure and easy to manufacture collisions.

      Now, they're both bad so to make them more secure I add the results of hash1 and hash2 together. I then have a "super" hash function where any text will collide with any other text (because the result of adding the numbers together will always be zero). In other words, my well-intentioned change has made the resultant algorithm much worse.

      The point of this straw man example is that changes to crypto algorithms which seem intuitively to help often don't. By adding algo a and algo b together I don't get a*b I get c, which could be better or worse than a or b on their own.

    5. Re:Solution: Use more than one hash algorithm by DimGeo · · Score: 1

      It might be better to just concatenate several different checksums (using an appropriate separator).

    6. Re:Solution: Use more than one hash algorithm by m50d · · Score: 1

      The thing is that with this attack you don't have a set of four strings with the same MD5 hash, you have a set of as many strings as you want with the same MD5 hash. At that point finding two of these strings with the same (other hash) is no harder than finding any two strings with the same (other hash). And in fact it's easier than that because most hashing algorithms used today do have things in common - the probability that two strings with the same MD5 also have the same MD4 is much greater than the probability that two random strings have the same MD4.

      --
      I am trolling
    7. Re:Solution: Use more than one hash algorithm by Anonymous Coward · · Score: 0

      Does the first word of "dissimilar hash algorithms" mean anything to you?

    8. Re:Solution: Use more than one hash algorithm by gedhrel · · Score: 1

      That last claim's an interesting one. D'you have a pointer to the relevant papers?

      Although this: "At that point finding two of these strings with the same (other hash) is no harder than finding any two strings with the same (other hash)" isn't much of an improvement; in fact, I'd claim it's actually wrong. Because there may well be (intelligent) attacks to find two strings that MD4 the same; whereas the approach you outline relies on effectively discovering such a hit using a brute force attack.

    9. Re:Solution: Use more than one hash algorithm by m50d · · Score: 1
      I don't have pointers I'm afraid. But if you look at the design MD5 and to a lesser extent SHA1 are based on MD4, and they're related. I'm not sure anyone's found an exploitable pattern, but it is there. It may well not be the case if you use RIPEMD and MD5 or SHA, as I don't think there's a common heritage there, and I don't really know much about RIPEMD.

      As far as the second point is concerned, you're partly right, but for this particular attack there's enough variety in the set of strings you can get that any attack against "normal" (other hash) is almost certainly going to be effective. Given that most "intelligent" attacks are based around the idea of modifying small sections of a binary, and you can get a set with a small section which is arbitrary with the same MD5, I think these would still work against (other hash).

      The one place where combining hashes does make sense is if both are thought to be equally secure, to hedge your bets if one is broken. But adding a weak hash to a strong one does not make it any stronger, you're better off lengthening the strong hash. Switching to SHA-256 is a better way to improve security than combining MD5 and SHA1, and will probably be easier to implement as well.

      --
      I am trolling
  27. break it down by Anonymous Coward · · Score: 0, Interesting

    Slicing and dicing the file into many different overlaping blocks and then computing a md5 sum for each block would be much much stronger.

    a. MD5 sum entire file
    b. MD5 sum each 10K block in the file (overlap blocks by ZZZ bytes)
    c. break file into different blocks such as every 40th byte in the file ....

  28. birthday attack by kippy · · Score: 1

    Can someone explain to me how this is something more complicated than the birthday paradox?

    (heh, Wang)

    1. Re:birthday attack by Anonymous Coward · · Score: 0

      1) anything that uses MD5 assumes two people cannot have the same birthday.

      2) Up till now, they haven't found two people with the same birthday.

      3) DRM is built on the assumption that you can trust all people with the same birthday.

      4) replace "people" with "files", and "birthday" with "MD5 hash"

    2. Re:birthday attack by Asterixian · · Score: 1

      It's actually quite simple. Hash collisions are everywhere, no matter what hash function you use. They are all vulnerable to the birthday attack because if you use too many documents, you will have a non-trivial probability that two documents will have the same hash.

      However, this isn't the real problem. Hashes can be secure so long as the collisions are random and sparse. For a secure 128-bit hash function, this is the case. You can't just keep trying documents until you find a random collision. That would take longer than the age of the universe given today's computers.

      Actually, even if someone does find a random collision, it's not a real problem because software can be retrofit to detect constructions that use (or abuse) the known collision. Or the algorithm can be trivially changed so that the attacker has to start searching all over again.

      Hash functions really become insecure when someone finds a formula to generate two documents with the same hash quickly. Or, even worse, when someone finds a formula that, given two documents, specifies the modifications for one of the documents to collide with the other!

    3. Re:birthday attack by thebatlab · · Score: 1

      That's some real fine writing but he asked how it's MORE complex than simple collisions. As in "I don't see what the big deal is, it's obvious this would happen"

  29. Only 340282366920938463463374607431768211456 ? by wsanders · · Score: 1

    Well, an md5sum has only 16^32 possible values, possibly less (I don't know the algorithm exactly), so it was only a matter of time.

    --
    Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
    1. Re:Only 340282366920938463463374607431768211456 ? by Anonymous Coward · · Score: 0

      Hmm, do you know that every extra digit of a number makes it 10x bigger?

    2. Re:Only 340282366920938463463374607431768211456 ? by lphuberdeau · · Score: 1

      Well, if they had found a way to compress any file to something as small as a MD5 hash uniquely, there would be no need for broadband. A 100 mb files has a whole lot more possibilities than a short string, no maths required. Collisions are obvious. SHA1 has even more possibilities, but it's still not enough.

      So, why is this news? I guess someone wanted massive traffic on his website to get more people to see ads.

      --
      Qui ne va pas à la chasse n'a pas de gibier
      PHP Queb
    3. Re:Only 340282366920938463463374607431768211456 ? by Anonymous Coward · · Score: 0

      Collisions may be inevitable but the problem lies in the ability to purposefully generate them.

    4. Re:Only 340282366920938463463374607431768211456 ? by Chrispy1000000+the+2 · · Score: 1, Troll

      Here, let me sell you my pattented 'smallest hash file generator in the world!!!!' With any input, it produces a one digit binary hash. Completely flawless. Have +110% security!!! Just send some money to my bank acound in nigeria and I will send you money!!!!

      --
      Sig
  30. Very simple solution - record the size by Anonymous Coward · · Score: 0

    Whenever you use a hash, you should always record the size along with the file. That way, any "doppleganger blocks" will stick out like a sore thumb.

  31. Switching to SHA1 by AGTiny · · Score: 1

    I've made the switch to SHA1 for all my website work where I'm storing passwords and session IDs and the like. It's just a simple change from 32 to 40 characters, and a search/replace for:
    md5_hex to sha1_hex

    1. Re:Switching to SHA1 by AGTiny · · Score: 1

      I was thinking about this and I wonder how insecure MD5 actually is when you are feeding it pseudo-random data (to make a session ID) or to store a user's password. I don't believe those kinds of things are vulnerable to any known form of attack other than brute force.

      Better safe than sorry though I guess... SHA-1 is just fine for me.

  32. It actually easy to see this by Anonymous Coward · · Score: 2, Interesting

    I was a bit suprised recently when I tried to
    sort 15,000 jpg image files to remove duplicates.
    Since the names varied, and it is not uncommon
    to have many images with the same length, it
    seemed like a good idea to use md5 hashes.
    So I coded it up in python to do the md5 hash,
    and then stuffed the file name into a table
    keyed by the md5 hash. Big suprise when multiple
    different files landed in the same hash.
    Some property of jpgs probably reduces the
    randomness of the files and increases the
    probability of hash collisions.

    1. Re:It actually easy to see this by Anonymous Coward · · Score: 0

      You should post the images with hash collisions if you want anyone to believe your story.

    2. Re:It actually easy to see this by pclminion · · Score: 0, Flamebait

      You're full of shit. Put up or shut up.

    3. Re:It actually easy to see this by XO · · Score: 1

      hmm.. he's got a pretty good idea for me to do a quick PHP script to do this though , so I can at least weed out a few of my duplicate MP3 files... but then, there will be many that are actually same songs but different files, due to not being the same dataset... oh well.

      --
      "Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
    4. Re:It actually easy to see this by Anonymous Coward · · Score: 0

      I'd assume a simple checksum would have similar defects then, but it does not. I have literally millions of images that I am constantly doing duplicate file checks on, and do not have this problem.

      Even if you are telling the truth, MD5 is overkill for this sort of thing anyway. I can't imagine just how slow a python-MD5 combination JPEG comparator would be, compared to an ASM optimized simple checksum algorithm. Something on the order of a few thousand times, for no discernable benefit.

    5. Re:It actually easy to see this by Anonymous Coward · · Score: 0

      Exactly. He is full of it.

      I computed the MD5 hash codes on our collection of almost a billion web pages and images, and there were no MD5 collisions. You can talk about esoteric and theoritical attacks all you want, but I've seen it work well for huge data sets daily for years.

    6. Re:It actually easy to see this by Anonymous Coward · · Score: 0

      The collisions were probably just different shots of the same naked woman. No biggie.

  33. Time is of the essence by eneville · · Score: 1

    It's all about the famous IMPS (Infinite Monkey Protocol Suite, RFC2795), if you give the CPU enough data, and enough time, it will eventually crack what ever encryption you are using. To be safe, deploy a stegography technique to hide your data, such as in the meta field of a picture, or within some MIME data on a harmless looking email, but be clever about it (theres a lot of junk in harmless looking SWAP file).

    1. Re:Time is of the essence by PitaBred · · Score: 1

      Because any cracker worth his salt wouldn't be running the program through a debugger, and noticing it accessing the swap file when it's supposed to be loading data. With scheduling systems as they are, the program won't know it's running through a debugger. Hasn't Microsoft already proven time and again that security through obscurity doesn't work?

    2. Re:Time is of the essence by eneville · · Score: 1

      I'm thinking more in terms of hiding data, not regular access. If it's for regular access, just be sensible about how you store it, physically and logically, ie. dont give the national crime database server a public IP address.

      Networks which are obscure are often safer, if you run a network of the same OS then it's going to be easier to hack, if you run a machine where each system runs a different OS then it harder to hack away at.

      Security through obscurity is not infinatly safe, but it does slow things down. Perhaps long enough for the next encryption standard to be released.

  34. Inherent to any hashing mechanism by FearUncertaintyDoubt · · Score: 0
    Of course. I've run into this type of thing with all sorts of data hash/checksum type of algorithms. If there are a numerically smaller amount of hash values than source data values, it is a mathematical certainty that multiple source data from generating the same hash value.

    If you have a 4-byte hash value of a 1000-byte character string, you have 4+ billion possible hash values. But the number of possible 1000-byte strings? What's 256^1000? So, on average, there would be 256^4 / 256^1000 source values that would generate each individual hash value. The only way to avoid this is to have a hash value that is as long as the source value, which eliminates much of the benefits of a hash.

    I like the idea someone else posted about using multiple algorithms because it would dramatically reduce the number of possible values that would generate the same hash value using 2 different methods.

    1. Re:Inherent to any hashing mechanism by FearUncertaintyDoubt · · Score: 2, Insightful

      Before you flame, change 256^4 / 256^1000 to 256^1000 / 256^4.

    2. Re:Inherent to any hashing mechanism by Anonymous Coward · · Score: 0

      Well I think a better approach would be to use two different hashes, with a counter (similar to hmac), and multiple times...

      so do something like

      H = H1(m + 0) + H2(m + 0)

      for counter = 1 to 999
      H = H1(m + counter + H) + H2(m + counter + H)

      This should extend the life of some of these hashes.

    3. Re:Inherent to any hashing mechanism by HeghmoH · · Score: 1

      MD5 is not a hash-table hash, it is (was?) a cryptographic hash. It outputs 16 bytes of apparently-random data that depends on the input. If it obeyed good cryptographic properties, the odds of a collision would be incredibly low (1 in 10^38), and finding a collision would be impractical. It appears that MD5 does not, in fact, completely obey good cryptographic properties, and there are ways of generating a collision that do not depend on brute-force searches. It is not "of course" just because you see it all the time in your 4-byte checksum algorithms, any more than factoring a 4096-bit RSA key is "of course" because you can easily decide that 15 = 5 x 3.

      --
      Mod down posts with a "Free Mac Mini/iPod" sig, they're spam!
    4. Re:Inherent to any hashing mechanism by gokeln · · Score: 1

      NO! A better approach is to use a well-designed algorithm, such as SHA-256. Combining two insecure hash algorithms arithmetically only gives you an arithmetically improved security. I know this seems unintuitive, but it is true!

      --

      There's no time to stop for gas, we're already late.
    5. Re:Inherent to any hashing mechanism by gokeln · · Score: 1

      ... using multiple algorithms because it would dramatically reduce the number of possible values that would generate the same hash value using 2 different methods.

      Except that this does not give you dramatically better security. A better approach is to use a well-designed algorithm, such as SHA-256. Combining two insecure hash algorithms arithmetically only gives you an arithmetically improved security (not exponential). I know this seems unintuitive, but it is true!

      --

      There's no time to stop for gas, we're already late.
    6. Re:Inherent to any hashing mechanism by onemorechip · · Score: 1
      Except that this does not give you dramatically better security.

      Actually it can. You are comparing uncompromised hashes only. Consider that as long as the best known approach is brute force, then the difficulty of finding pairs of data sets that have matching hashes using hash algorithm 1 is exponential w.r.t. the length of the hash. Now find a method that compromises the security of the hash (as these researchers did) and the difficulty becomes merely polynomial, perhaps even linear.

      But if you add another constraint (such as requiring both hash function 1 and hash function 2 to produce the same output for the two files), then you have the following situation: Your algorithm can create 2 files that match with hash 1. The probability that those 2 files also match with hash 2 -- provided hash 1 and hash 2 don't have some common periodicity or similar characteristic between them -- is 1 in 2^N, where N is the length in bits of hash 2. So you reseed your algorithm and try again. You must do this an average of 2^(N-1) times before you get a hit. Even having an algorithm for hacking hash 2 doesn't help, because both algorithms must be met with the same pair of files. So it does return you to exponential difficulty (a dramatic improvement over the compromised hash), at least until someone finds an algorithm that compromises the combination of hash 1 and hash 2.

      Note: The above argument might not hold in the case of preimage attacks (which the md5sum attack is not), but I can't see why it would not hold for collision attacks.

      --
      But, I wanted socialized health insurance!
  35. Look at Whirlpool by Anonymous Coward · · Score: 0

    Look at the Whirlpool hash. Its fast and amenable to hardware implementation.

  36. You have a point, except... by Anonymous Coward · · Score: 0
    With passwords, there is usually a length constraint and a character constraint (Ctrl-C, Enter, Backspace). If it came down to the choice of having only one correct possibility for a password and keeping the passwords in plaintext or having in some cases two and keeping the hashes in MD5 (mind you, on a system that limits users to five guesses per fifteen minutes and a ten-guess lockout), which would you prefer?

    Not that it comes down to that, of course, because we've got SHA-1 and other longer hashing algorithms to choose from, but in the scenario above I believe there is less risk with MD5.

  37. hash, equals by freality · · Score: 1

    Testing the equality of hash codes is not a replacement for testing equality. It should be used as an optimization to limit the set of items for which equals need be applied. e.g.

    Thing search(Thing thingToFind) {
    thingList = hashSearch(hash(thingToFind));
    for each thing in thingList
    if (thing == myThing) // IMPORTANT!
    return thing;
    return null;
    }

  38. nah by roman_mir · · Score: 1

    Which is the same as just one insecure algorithm ;]

  39. OT yet informative: Spock and Try by Anonymous Coward · · Score: 1, Informative

    "Try not. Do or do not, there is no try.
    -- Dr. Spock, stardate 2822.3."

    I think you mean Yoda.
    Besides, Dr. Spock is the baby doctor.

    1. Re:OT yet informative: Spock and Try by Anonymous Coward · · Score: 0

      I think you mean Yoda.

      And I think the joke is over your head.
      ...there is no try = Yoda
      Dr Spock = Baby doctor
      stardate = Star Trek

      It's not supposed to make any sense at all. Get it now? Let us know if you need more help with anything.

    2. Re:OT yet informative: Spock and Try by Anonymous Coward · · Score: 0

      Dr. Spock is the baby doctor.

      Wow, did his mommy drive him to medical school every day?

  40. I am holding out for MD6 by Anonymous Coward · · Score: 0

    When will it be out?

    Any plans for MD64?

  41. s/myThing/thingToFind/ by freality · · Score: 1

    n/m

  42. MD5 To Be Considered Harmful Someday by roman_mir · · Score: 1

    Think of the children! We cannot allow some MD5 to put our children in the harm's way! Initiate operation AntiMD5, deploy the bombers.

    1. Re:MD5 To Be Considered Harmful Someday by Anonymous Coward · · Score: 0

      Think of the children!

      that's soooo yesterday...

  43. Truncated SHA-1 Hash Also Secure by Effugas · · Score: 1

    It's also safe to truncate 40 byte SHA-1 down to 32 byte if need be.

  44. Very true by jd · · Score: 1
    The only exception being where the two algorithms have a coincidentally overlapping weakness. At which point, you're no better off.


    However, there is no real need to use weak algorithms (unless you're in the Government and need to use the FIPS standard). Unless you're doing something time-critical, just use SHA-1 and Whirlpool.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:Very true by Anonymous Coward · · Score: 0

      What's Whirlpool? I've found a site about it but who endorses it over sha1, or should I use both, or what?

    2. Re:Very true by jd · · Score: 2, Informative
      NESSIE is an attempt to do for the European crypto standards what NIST does for the American crypto standards. Whirlpool was approved by NESSIE and is part of the ISO/IEC 10118-3:2003(E) standard. As such, it should be OK for use in Europe for secure systems, including military and Government applications.


      The ISO approval also carries some weight in industry, although after some rather disasterous specifications (such as ISO 9000), they have lost some of their image. However, there are plenty of organizations that would consider an ISO standard an absolute must.


      I don't know of anyone using Whirlpool for highly secure systems. It certainly wouldn't be ok in the US, as it's not a FIPS standard. France or Germany would be better bets.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    3. Re:Very true by Froobly · · Score: 1

      I'm just curious, but what was so disasterous about ISO 9000? I'm not trying to argue the contrary, just interested in hearing some examples of things gone horribly wrong. I've never really come into contact with it, so I don't know.

    4. Re:Very true by jd · · Score: 1
      US Government instiutions, such as NASA, decided to get themselves certified ISO 9000 compliant. So far, so good. In order to do so within the timescale given and the budget available, they usually ended up re-interpreting the requirements to fit what they were already doing, rather than improving what they were doing to meet the requirements.


      What you ended up with was a largely-worthless certification that implied standards in process control that never existed.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  45. Would you consider posting samples? by davidwr · · Score: 1

    If you have the legal right to do so, please some samples, or at least submit them to the cryptographic community for analysis.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  46. md5sum ne verify by Anonymous Coward · · Score: 0

    verify eq cmp or verify eq diff --speed-large-files -q

  47. Maybe start consider adopting magnet links by Jugalator · · Score: 1
    Magnet links is an open standard (specification draft) and an alternative to primarly eDonkey2000, eMule, and Overnet hashes that are based on a precursor to the MD5 algorithm, and I doubt very safe anymore?

    I personally think any clients that don't support magnet links should really start consider adopting those -- DC++, most Gnutella clients, and Shareaza already do.

    A magnet link can look like something like this:

    magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO 5C

    ... see "sha1" above -- the standard doesn't define which hash algorithm to use.

    --
    Beware: In C++, your friends can see your privates!
  48. what about stuff that's harmful NOW? by ChipMonk · · Score: 1

    Like, say, Windows?

    Yeah, yeah, Obligatory Microsoft Slam. But I don't see any great stampede to isolate and neutralize the threat that this Typhoid Mary of OS's presents to communication stability. Why are we preparing for a threat that barely exists, when we refuse to deal with the present, much greater threat?

  49. Dirt by mfh · · Score: 2, Insightful

    I've always called this dirt. Some folks call it padding, but it's essentially the same. However up until now, no one has known how to do it with md5 and come up with the same sum. This is nothing more than a hack to me.

    The bottom line is that if you care enough about security this will only be a problem for a day or so when md5 is finally solved completely and the hashes don't take five days to brute force -- they could be solved in a matter of microseconds. As soon as that happens, the next hash system goes up and it would not likely be so easy to break, unless the whole hashing process is somehow cracked wide open (which would be very bad for a number of reasons).

    What we'll likely see is that with the greater increases in CPU speeds available to end users, the hash breaking systems can be tested easier until a final reverse algorhythm is available for md5. There already is a site that will break your hash and give you *something* with the same hash, and it takes a couple days.

    --
    The dangers of knowledge trigger emotional distress in human beings.
    1. Re:Dirt by MarkByers · · Score: 1

      >> There already is a site that will break your hash and give you *something* with the same hash, and it takes a couple days.

      Can you provide a link please?

      --
      I'll probably be modded down for this...
    2. Re:Dirt by Anonymous Coward · · Score: 0
  50. Shadow Password Suite by jd · · Score: 1

    If you can modify the shadow password suite to support SHA-1 and/or Whirlpool, that would be wonderful. The greatest potential use for the attack is the discovery of aliases for passwords and other login tokens, as then you don't care if what you produce is nonsense. It just has to hash to the same key.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:Shadow Password Suite by vipw · · Score: 1

      I think poisoning torrents(or other swarm downloads) has more potential. They are much easier to find than password files.

  51. Reference explained by gkwok · · Score: 1

    This is slightly modified quote from our beloved Office Space.

  52. Is a two-pass just as vulnerable? by davidwr · · Score: 3, Interesting

    I haven't had time to think this out, so I'm throwing it out for you guys:

    Is doing a 2nd pass helpful?

    In other words, if
    ABCfireDEF
    and
    ABCiceeDEF
    hash the same, does
    ABCfireDEFABCfireDEF
    necessarily, or even frequently, hash to
    ABCiceeABCiceeDEF?

    Even if it were to provide protection, in practical terms,
    1) other hashing althorithms are likely faster than "MD5 times two"
    2) there may be some - hopefully a lot fewer - places in a long file where where
    ABCfireDEFABCfireDEF
    can be replaced with
    ABCiceeDEFABCmeltDEF

    I'm beginning to like the "pick two algorithms and call me in the morning" approach.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
    1. Re:Is a two-pass just as vulnerable? by Anonymous Coward · · Score: 0

      No! Your idea is reasonable.

      The problem is, that if you find x and y, such that MD5(x)=MD5(y) (ignoring things like blocksize....)

      then MD5(x _ q) = MD5(y _ q) for any q.
      _ means concatenation. This is indeed a serious flaw in MD5 for many of it's applications (as the author points out).

      But then you will have MD5(x _ q _ x _ q) != MD5 (y _ q _ y _ q) for most x,y with MD5(x)=MD5(y) and most q.
      But there might be other problems introduced by "Double MD5".

      Is SHA1 also only "one pass" ?
      I.E. SHA1(x) = SHA1(y) => SHA(x _ q) = SHA(y _ q) for all q?

    2. Re:Is a two-pass just as vulnerable? by Anonymous Coward · · Score: 0

      According to wikipedia SHA-1 also processes the file in chunks, so once you find a collision '''stripwire''' will also work with SHA-1.

    3. Re:Is a two-pass just as vulnerable? by man_ls · · Score: 1

      Mathematically/combinatorically, MD5(x_q_x_q) would reduce to MD5(x_q) anyway, as the q_x_q is taking the place of the appended term.

      Thus you'd be nowhere better.

    4. Re:Is a two-pass just as vulnerable? by Anonymous Coward · · Score: 0

      I dont see your point.

      Let's define 2MD5(s):=MD5(s_s).
      Let x,y be a MD5-Collision: MD5(x)=MD5(y)

      So why should 2MD5(x _ q) = MD5(x _ q _ x _ q) ?=? MD5(y _ q _ y _ q) = 2MD5(y _ q),
      since q_x_q != q_y_q ???

      A 2MD5 collison wouldn't be bad either.

      2MD5 seems to be stronger against the "appendable cascade construction"-flaw.

    5. Re:Is a two-pass just as vulnerable? by gokeln · · Score: 1

      A second pass is only arithmetically better (not exponentially so). I know this is unintuitive, but it is true. The better approach is to use a stronger hash algorithm such as SHA-256.

      --

      There's no time to stop for gas, we're already late.
    6. Re:Is a two-pass just as vulnerable? by Anonymous Coward · · Score: 0

      But once you find a collision, SHA-256 again has the
      "appendable cascade construction"-flaw.

      (ok, it might be much harder to find a collision, but it only has to be done "once").

    7. Re:Is a two-pass just as vulnerable? by Anonymous Coward · · Score: 0

      but if md5(x_q) = md5(y_q), (i.e., postpending the same string to x and y preserves the collision), then wouldn't md5(x_q_x) = md5(y_q_x) (i.e., postpending the same string to x_q and y_q preserves the collision)? Now here's the key question: Does prepending have the same effect as postpending? I.e., would md5(y_q_x) = md5(y_q_y)? (This I'm not sure of. I know it is true for a CRC calculation but it might not be universally true for md5.) If the second assertion is true, then it follows from combining the two that md5(x_q_x) = md5(y_q_y), and from there it is but a tiny step to md5(x_q_x_q) = md5(y_q_y_q).

    8. Re:Is a two-pass just as vulnerable? by gedhrel · · Score: 1

      You are incorrect. Let's say you have two strings, X and Y, such that

      MD5(X,q) = MD5(Y,q).

      If you'd read the initial paper you'd know that X and Y are dependent on the MD5 initial vector (IV): that is, the starting internal state of the MD5 routine.

      The internal state will _not_ be the same (that is, it's highly unlikely) at the point where you've hashed (X,q).

      So X and Y are unlikely to produce a hash collision given the new (and arbitrary) state vector of the MD5 machine.

  53. Not always... by Goonie · · Score: 2, Insightful
    How do you know that the algorithms aren't going to be weak for the same text pairs?

    It's well known that encrypting twice doesn't always improve your security, so it wouldn't be surprising if hashing twice didn't always improve your security either.

    --

    Any sufficiently advanced technology is indistinguishable from a rigged demo
    --Andy Finkel (J. Klass?)
    1. Re:Not always... by Anonymous Coward · · Score: 0

      You mean all this time I've been encrypted my most sensitive messages twice with rot13 messages and it wasn't improving my security? Aha! I bet if I start encrypting four times with rot13, nobody will ever be able to find my secret volcano lair!

  54. My humble proposal for a solution. by xv4n · · Score: 1

    MD5 every single byte (or better yet every bit) in the file and put the result alongside the original file. Voila! No smartypants would ever attempt to temper with that, bwahahahahah!

  55. Dijkstra Considered Dead by reflective+recursion · · Score: 1

    really. and it's about time this stupid meme died. it's all too easy to toss it about for dramatic effect when attention whoring.

    --
    Dijkstra Considered Dead
  56. You are missing the point. by Carlbunn · · Score: 1

    The real threat for it is for the p2p network. not because you will download a different version of the file, but the file is broken in various pieces. Imagine somebody released, dunno, some linux distro, over bittorrent. you will download the real file, and the fake one at the same time, leaving the final file useless, and probably ruining the release.
    I could see game and movie companies releasing bogus versions of their pirated apps to break the releases. after all, you cant de-compress corrupt zips, rars and etc. the musics would have cracks and noises because of the bogus versions. P2P as it is now would be ruined

    Jesus, I hope I'm wrong... I would have to acctualy buy MS crap.

    1. Re:You are missing the point. by pclminion · · Score: 4, Insightful
      Jesus, I hope I'm wrong... I would have to acctualy buy MS crap.

      Your statement is ironic in the extreme. The big risk here is NOT P2P apps. Here's the real risk.

      Using one of these collision generators, I can create two x.509 certificate requests which have the same MD5 hash. One request says, "I am John Smith, kshdfkhs8i76y238888888" and the other request says, "I am Microsoft Corp., oiushir87dsfhgkjshdfg"

      Now, I get Verisign to issue me a certificate for the first request. Since the hash is the same, I can rewrite the certificate to say that I am Microsoft Corp, and nobody will ever be able to tell the difference. Now, I am able to sign code as if I were Microsoft, and Dominate The Earth.

    2. Re:You are missing the point. by jjon · · Score: 3, Funny
      [... complex plan snipped ...] Now, I am able to sign code as if I were Microsoft,

      You can just ask Verisign for a certificate in the name of Microsoft, and they'll give you one. Much simpler.

      It's happened in the past.

    3. Re:You are missing the point. by Alsee · · Score: 1

      Now, I am able to sign code as if I were Microsoft, and Dominate The Earth.

      What are we going to do tomorrow night?
      NARF!

      -

      --
      - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
  57. "Doppelganger" "my term" by Maxite · · Score: 0, Flamebait
    ... essentially, we can create 'doppelganger' blocks (my term)... Doppelganger is a term that I have seen used on the TV show Sealab 2021 quite a few times.

    www.adultswim.com has information on the TV show of Sealab 2021 if anyone is interested.

    Quite interesting to see someone go around and claim words as their own. In that case, I claim the word "the". Anyone who wishes to use the word "the" can submit enough money to fill a Library of Congress for a lincense.

    --
    Ah, you found me!
  58. The RIAA is safe by Anonymous Coward · · Score: 0

    They only compare on P2P file names. In fact, they apparently only compare very loosely on that.

  59. Time to move on by slavemowgli · · Score: 1

    Time to move on. And, while we're at it, let's skip SHA1 etc. as well and go directly to SHA512. That should last for a while.

    --
    quidquid latine dictum sit altum videtur.
    1. Re:Time to move on by Anonymous Coward · · Score: 0

      no way, i would wait for SHA1.NET 3.0

  60. You're wrong. by Piquan · · Score: 4, Informative

    He can create a file that MD5sum's to the same result as a legitimate file, but does not have full control over the content or size of the result (making this a mostly useless avenue of exploitation except for people who want to spread trash on P2P networks -- I.E. it shouldn't particularly bother anyone except people who already don't care about security).

    Suppose you're storing passwords as encrypted hashes, so that intercepting the hashes doesn't tell you what the password is. But if you can generate a password to match that MD5...

    You know that GPG keys are identified and signed by their MD5 hashes? Suppose that I can generate a GPG key that would be identified as yours, and distributed it.

    Or he can create two files that MD5sum to the same result. But he has to have control over both files, which offers effectively no advantage to someone who is trying to spread malware or tamper with existing archives that have been MD5summed.

    There's a coin-flipping protocol that goes as follows. Suppose that Alice and Bob want to flip a coin (over the Internet), but they don't trust each other.

    1. Alice generates a file with random data.
    2. Alice sends Bob the MD5 hash of the file.
    3. Bob picks a bit in the file, and whether he thinks it's a 0 or a 1.
    4. Bob wins if and only if he picked right.
    5. To verify, Alice sends Bob the file she generated at the beginning.

    Now, suppose that Alice generated multiple files in step 1. When Bob makes his guess, she tries to pick a file that will make her win. If she generated only two files, completely randomly, this would let Alice win 75% of the time.

    These are just the first ideas I thought of. If I were looking for other problems, I'd think about undeniable signatures, keysigning (which as GPG and X.509 SSL are heavily based on) and other specialized signature systems. In particular, I expect that the first type of crack could cause issues with SSH keys, both user keys (used for authentication) and host keys (to prevent man-in-the-middle attacks).

    Digital signatures are used for much more than just testing for file tampering.

    1. Re:You're wrong. by cduffy · · Score: 1

      You know that GPG keys are identified and signed by their MD5 hashes? Suppose that I can generate a GPG key that would be identified as yours, and distributed it.

      Finding a collision is one thing. Finding a collision that happens to be a member of a valid GPG keypair is quite another. (IOW: Let's say you came up with a valid public key that has the same md5sum -- that doesn't net you a private key that matches it, or a way to make one).

      It's bad, but not that bad.

    2. Re:You're wrong. by Anonymous Coward · · Score: 1, Interesting

      You are correct that digital signatures w/ certificates is where the real risk lies for a flaw in MD5. Specifically, certificate chains are at risk, since you cannot truly validate a chain. This will have no affect on:

      "In particular, I expect that the first type of crack could cause issues with SSH keys, both user keys (used for authentication) and host keys (to prevent man-in-the-middle attacks)."

      The server has a store of acceptable public keys, third paty observable random data from both the client and server are combined and hashed using SHA-1, that data is then signed by the client's private key. Being able to come up with substitute md5 or sha-1 data will not affect the outcome of the signed data algorithm.

      For server authentication a certificate chain could be at risk. Although most SSH client implementations simply cache the public key, so they are vulnerable to first time active attacks.

      An algorithm like the following would be helpful for certificate chains. See original zero score post here (http://developers.slashdot.org/comments.pl?sid=13 1994&cid=11024443)

      H = H1(m + 0) + H2(m + 0)

      for counter = 1 to 999
      H = H1(m + counter + H) + H2(m + counter + H)

      return H;

      H1 and H2 are crypto hash algorithms. This approach is very strong, since both hashing algorithms are interleaved together. If one hashing algorithm is found to be weak, and other is strong, attacks will fail. Additionally, even if both algorithms are found to be weak interlacing and the number of rounds make it much more computationally difficult to attack. In comparison to DSA or RSA signature verification algorithms this additional work for generating the hash is very CPU light weight.

      Just finishing up the patent application. Done.

      Hey Moderator, this has to be worth more than a zero. Sheesh.

    3. Re:You're wrong. by Anonymous Coward · · Score: 0

      Suppose you're storing passwords as encrypted hashes, so that intercepting the hashes doesn't tell you what the password is. But if you can generate a password to match that MD5...

      It doesn't work this way at all. If all you have is the MD5, this 'attack' is useless. You must have a complete copy of the original, non-hashed input to generate a file that would have the same MD5.

    4. Re:You're wrong. by XO · · Score: 1

      so you replace their password data with passwords that MD5 to the same thing, but they are not the same thing. Boom, suddenly no one can login to the box. Sounds like that could be pretty devastating, too.

      --
      "Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
    5. Re: You're wrong. by Omniscient+Ferret · · Score: 1

      You know that GPG keys are identified and signed by their MD5 hashes?

      I vaguely remember that the short key ID used a flimsy hash, but to avoid that you only had to verify the key length.
      Looking around, MD5 looked dubious in 1997, and alternatives - like SHA1 and RIPEMD-160 - got incorporated to PGP.
      So. I don't buy this. PGP.com knows about it.

    6. Re:You're wrong. by Piquan · · Score: 1

      Finding a collision is one thing. Finding a collision that happens to be a member of a valid GPG keypair is quite another.

      This is true. I added the GPG bit at the last minute, and didn't give it due thought. I should have put this in the "speculations" paragraph, since I'm rather out-of-date on GPG.

      that doesn't net you a private key that matches it, or a way to make one

      If I were going to investigate this further, I'd consider whether a randomly-generated public key could easily be made into a private key. Specifically, the strength of a public key is in the difficulty of factoring (at least under RSA; I haven't kept up with what's being used). While factoring the product of two large primes is difficult, the more general case of factoring random numbers is much easier. Perhaps this could be exploited to replace a public key with one that can be easily factored. Again, I haven't kept up with the technology, so this is still just speculation, but I think it's a valid line of inquiry.

      More generally, you don't have to substitute the key with one that you create. Just substitute it with a weaker key.

      Being able to substitute public keys changes the threat model significantly, and in a way that the designers didn't account for. I still think that it seems entirely possible that this would be something that could generate a GPG attack.

    7. Re:You're wrong. by Piquan · · Score: 1

      It doesn't work this way at all. If all you have is the MD5, this 'attack' is useless. You must have a complete copy of the original, non-hashed input to generate a file that would have the same MD5.

      Traditionally, yes, and that's the point of using MD5 in password files. But I was specifically working under Sheetrock's scenario:

      He can create a file that MD5sum's to the same result as a legitimate file,

      Sheetrock felt that this sort of attack on MD5 would not have any significant impact, and I was disagreeing.

      Now, I'm not saying that the attack in the article allows this to happen. I haven't studied the attack in question closely, but it looks like it only allows "weak" collisions, which we all thought were on the way for MD5 anyhow. The password attack we're discussing here requires "strong" collisions, which it doesn't look like Wang's attack allows.

    8. Re: You're wrong. by Piquan · · Score: 1

      I vaguely remember that the short key ID used a flimsy hash, but to avoid that you only had to verify the key length.

      If you and I are remembering the same thing, that was under a different attack. IIRC, that wasn't even an attack on the hash, but just on the way that fingerprints were generated. But it's been a long time, so I may be misremembering.

      Wang's attack, the one in the article-- which doesn't, AFAICT, permit this sort of attack (when I mentioned GPG I was talking about a different type of attack that Sheetrock hypothesized)-- does use same-length data.

      Looking around, MD5 looked dubious in 1997, and alternatives - like SHA1 and RIPEMD-160 - got incorporated to PGP.

      Yes, although many legacy keys are still around-- mostly because people kept using PGP 2.6.2 for so long. My keyring is 10% legacy keys. Since PGP encourages you to get lots of signatures on your keys, then you have a decent chance of any given key having signatures that you can (under this sort of attack) subvert. And if it doesn't, then try to subvert up the keychain some; you have to swap out more keys, but there's more keys to choose from.

    9. Re:You're wrong. by Gemini · · Score: 2, Informative

      You know that GPG keys are identified and signed by their MD5 hashes?

      They're not. The old PGP 2.6 keys are, but GnuPG generates OpenPGP keys that use SHA-1.

      GnuPG will use an already generated PGP 2.6 key, but will not make more of them.

    10. Re: You're wrong. by Gemini · · Score: 1
      I vaguely remember that the short key ID used a flimsy hash, but to avoid that you only had to verify the key length.


      If you and I are remembering the same thing, that was under a different attack. IIRC, that wasn't even an attack on the hash, but just on the way that fingerprints were generated. But it's been a long time, so I may be misremembering.

      Here's the deal: old PGP 2.6 keys used MD5 as the hash algorithm to make a fingerprint. Due to a flaw in the design, the key bits, but not the length of the key was fed into the hash. Thus, it would be possible to manufacture a key with a fingerprint that matched someone elses. The trick was to 'slide' bits from one value to the next.

      i.e. is it:

      "1111111122222222" where the two values are "11111111" and "22222222"

      or is it:

      "1111111122222222" where the two values are "111111" and "1122222222"

      If you give the key length, it foils the attack since doing this trick ends up with a different sized key than the original.

      As you say, it is not an attack on the hash. It was a flaw in the old PGP 2.6 protocol. It hasn't really worked that way since 1997 or so, though some people still use their 2.6 keys.
    11. Re:You're wrong. by Anonymous Coward · · Score: 0

      Actually that is not what would happen. You would happen to have two passwords that both work... A cleartext password is converted using MD-5, and then compared (memcmp) with a stored MD5 password.

    12. Re:You're wrong. by Anonymous Coward · · Score: 0

      Ah, but you forgot about Chuck. Whenever Alice and Bob around around, Chuck's always lurking behind the next IP packet to screw everything up. Damn that Chuck! ...

      Someone needs to come up with better names than Alice, Bob, and Chuck. I smell a new paper: "Use of standardized names in security papers not a good idea; Alice, Bob, and Chuck considered harmful."

    13. Re:You're wrong. by Piquan · · Score: 1

      Ah, but you forgot about Chuck. Whenever Alice and Bob around around, Chuck's always lurking behind the next IP packet to screw everything up. Damn that Chuck! ...

      Usually, if someone's trying to screw things up, it's Mallory.

      There's actually a good reason to use the canonical names. It makes it easy to remember who's trying to do what. If I see "Joe and Jill are exchanging messages, and Jack is trying to read them..." then I have to remember their roles instead of concentrating on the protocol. If I see, "Jack sends Jill the hash he computed," then I may think that this is part of the normal flow rather than data injection, because I forgot that Jack is the evildoer.

      But after reading 50 papers about Alice, Bob, Eve, Mallory, and the rest of the gang, you have everybody's motives down and can concentrate on the data flow instead, which is the point.

    14. Re:You're wrong. by gedhrel · · Score: 1

      The attack in the initial paper does not permit you to discover a plaintext that has a desired hash. It permits you to discover two plaintexts that have the same hash; you have no control over what that hash will be.

      That is, for an abitrary (fixed) prefix string P, you can determine X=x(P) and Y=y(P) such that

      X != Y
      MD5(P,X) = MD5(P,Y)

      however
      MD5(P,X) != MD5(P) with very high probability.

    15. Re: You're wrong. by Omniscient+Ferret · · Score: 1

      If you and I are remembering the same thing, that was under a different attack. IIRC, that wasn't even an attack on the hash, but just on the way that fingerprints were generated.

      I think you're right. I was trying to work out where you were coming from.

      Yes, although many legacy keys are still around-- mostly because people kept using PGP 2.6.2 for so long. My keyring is 10% legacy keys.

      Oh. That's not good. I should have known that people would still be using old versions, and/or overriding the suggested expiration dates...

      So. The key signing mechanism for RSA keys uses the MD5 hash. And RSA got included with GnuPG 1.0.3.

      Looking around, there's someone who still recommends PGP 2.6 because the source code is smaller, so easier to audit.

      This is kind of depressing, isn't it?

  61. Mod parent up! by spuzzzzzzz · · Score: 1

    That was hilarious

    --

    Don't you hate meta-sigs?
  62. Well, in plainer english... by raehl · · Score: 1

    The first case is that if you have any given file, you can take a subblock of that file and swap it with one of a number of other subblocks and keep the same checksum. Actually, that's not news, what's news is that you can COMPUTE what those sub blocks are.

    The second case is that if you have two files that share the same checksum, you can add the same arbitrary code to both files and they will still share (a new) checksum.

    The exploit in the first case is that if you can pick a subblock and then compute new "doppelganger" blocks that you can replace the original subblock with without changing the file's checksum, you can (if not now, someone will probably figure out how to do it in the future) also pick a sub-block and compute new sub blocks that result in a given checksum Y. So I take my original file, and replace some subblock with a subblock of my choosing. This results in a new checksum. Then all I have to do is pick a different subblock and compute a replacement block for that that restores the checksum back to the original value.

    Basically, make one change to install my malicious code, then use my ability to compute subblock changes that result in particular checksums to swap a different subblock to restore the original checksum value.

    The exploit for the second case is I write a block of malicious code, and then create another file with harmless code that has the same checksum. I then add a real program to the harmless code and distribute it. Some time later, I add the same program to the malicious code (resulting in the same checksum as harmless code + program) and I start distributing the file with the malicious code instead of the file with harmless code.

  63. Everything is considered harmful eventually by Peter+Cooper · · Score: 1

    Eggs, .NET, Apple, whatever.. it's basically ripe to become another in-joke here at Slashdot. In Korea, only old people are considered harmful someday.

  64. Hello Master of Obvious. by raehl · · Score: 1

    Someone with the knowledge and will and time can come up with a way to circumvent almost any protective scheme they come up with.

    Well duh, that's what separates GOOD protective schemes from BAD ones, and what makes GOOD ones so valuable - they're the ones you can't circumvent.

    Likely the only way to really safeguard something like this is to do a very time consuming and cpu intensive comparison of the two files.

    Which two files? If you're comparing the file you downloaded from an untrusted source to the file you downloaded from the trusted source, why did you bother downloading the file from the untrusted source in the first place?

  65. Remember NATAS? by Spy+der+Mann · · Score: 1

    This tricky DOS virus that appeared in the early 90', appended certain number of 0's to the executable, so it would have exactly the same CRC.

    Expect a couple (+ ???) of years of cryptanalysis on MD5, and you'll just get that.

    Still, what _I_ want to see is an ANTI-MD5 algorithm. Given an MD5 Hash, obtain a certain string that will generate it. Now _THAT_ would be something neat :)

    1. Re:Remember NATAS? by Anonymous Coward · · Score: 0

      bigint i
      while(1) {
      if (md5sum(bitstringof(i)==WantedHash) break;
      i++;
      }

      Oh, you wanted it to be efficient, too?

  66. what kind of language is this? by khrtt · · Score: 1

    "..to be considered harmful someday"? I mean, gimme a break, dude, if you think its harmful just write "MD5 is harmful", but "to be considered...someday"?

    1. Re:what kind of language is this? by alienmole · · Score: 1

      First, the title is a reference to a famous computer science paper by Edsger Dijkstra, Goto Statement Considered Harmful.

      Second, this MD5 weakness is not yet actually harmful, because no-one has yet devised a successful attack which exploits it. However, it could be considered harmful someday.

  67. All your hash are belong to us by Anonymous Coward · · Score: 0

    YA/.IJ - Yet Another Slashdot In Joke

  68. You missed the point. by raehl · · Score: 2

    The point is not that there is collision. The point is that given a particular file, you can compute other files that collide with that file in an amount of time many orders of magnitude less than it would take if you randomly tried files until you hit one that happened to collide.

    1. Re:You missed the point. by onemorechip · · Score: 1
      The point is that given a particular file, you can compute other files that collide with that file

      No, that would be a preimage attack (read the FAQ). The md5sum attack is a collision attack: you can create two files that have the same md5sum, but you can't specify either of the two files in advance.

      --
      But, I wanted socialized health insurance!
  69. The biggest problem is still human falibility by syousef · · Score: 2, Insightful

    Be honest. How many of you have checked the MD5 sums on a file with a TRUSTED source, as opposed to from the same source you got the file? How many of you do this regularly?

    THAT is the biggest problem with MD5 for most users.

    --
    These posts express my own personal views, not those of my employer
    1. Re:The biggest problem is still human falibility by ilithiiri · · Score: 1

      Every time I update my system, it automatically checks the MD5 sum of the .tar.gz or .tbz2 of the source or binary I am currently downloading in order to update my system. That's the way Gentoo updates itself. It would be nicer, having seen those MD5-junk files, using better algorythms like SHA-256, though. IMHO it's ok for now: .tar.gz won't decompress cleanly if some bytes are corrupted.. It's the same as a linux distro: you download it and some bytes fail; you then burn the iso to cd, and it says that a file is corrupted. You should've checked the MD5 before *and* after burning :")

      --
      If anyone can hear me, slap some sense into me But you turn your head, and I end up talking to myself
    2. Re:The biggest problem is still human falibility by syousef · · Score: 1

      What you're talking about is a mechanism for checking for corruption, which I grant you is one benefit of using MD5.

      However if you're talking about security, verifying the MD5 matches an ISO is meaningless unless the MD5 sum is from a trusted source. In the case of your Gentoo upload you're trusting whatever server/mirror you're using to update hasn't been corrupted.

      --
      These posts express my own personal views, not those of my employer
  70. Almost forgot by jd · · Score: 3, Interesting
    Whirlpool is a 256-bit hashing algorithm, derived from the Rijndael encryption algorithm. Rijndael is known to be strong, and has been approved by NIST, but the conversion to a hash function has not been sufficiently tested.


    Where time isn't critical (eg: creating and validating checksums for files), I'd say use both. The overhead isn't great, and you'd get much more security.


    Where time is critical AND you don't have to be concerned with computers not under your control, use Whirlpool. Rijndael is fast, SHA-1 is slow. Whirlpool also offers a longer hash string than SHA-1.


    In any other situation, use SHA-1. Whirlpool might turn out to be the greatest algorithm out there, but that doesn't help if you're trying to talk to a remote computer that doesn't support it.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:Almost forgot by wirelessbuzzers · · Score: 1

      Whirlpool is a 256-bit hashing algorithm, derived from the Rijndael encryption algorithm. Rijndael is known to be strong, and has been approved by NIST, but the conversion to a hash function has not been sufficiently tested.

      Um. No cryptographic algorithm is ever known to be strong, except for theoretical things like the one-time pad. Rijndael is believed to be strong, but who knows whether it will be broken in 20 years?

      Where time isn't critical (eg: creating and validating checksums for files), I'd say use both. The overhead isn't great, and you'd get much more security.

      Where time is critical AND you don't have to be concerned with computers not under your control, use Whirlpool. Rijndael is fast, SHA-1 is slow. Whirlpool also offers a longer hash string than SHA-1.


      Either way, make it an option which hash to use. That's fast, and if one is broken, you can fall back on the other.

      --
      I hereby place the above post in the public domain.
    2. Re:Almost forgot by Tassach · · Score: 1
      No cryptographic algorithm is ever known to be strong
      Half right. First off, "strong" isn't a binary state, it's a relative term. Any cypher except a properly implemented one-time pad can be broken, given enough time. A "strong" algorithm is one which will protect it's payload for enough time so that when it is eventually broken the information protected no longer has any value to the attacker. What constitutes "enough time" varies by application and the sensitivity of the data. DES might not be secure enough to protect your secret plans for world domination against the NSA, but it's more than adequate to keep your wife from reading your love letters to your mistress.

      Secondly, you *can* prove that a crypto algorithm is secure against known attacks, which is good enough for most purposes. Any algorithm which resists all known attacks with no loss of effective key length is, for all practical purposes, "strong".

      --
      Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
  71. what's next.... by SatanMat · · Score: 1

    okay, so if MD5 has issues that can now be explioted, or at least tampered with, is it time for a new checksum? being clueless in this respect can we call it MD6?

    1. Re:what's next.... by tepples · · Score: 1

      is it time for a new checksum?

      Yes.

      being clueless in this respect can we call it MD6?

      No, we can call it SHA-256.

    2. Re:what's next.... by m50d · · Score: 1

      People who want secure hashes have already been using SHA-1 for some time. (Secure Hashing Algorithm, if you want to know)

      --
      I am trolling
  72. -5 load of crap by Anonymous Coward · · Score: 0

    why do you waste your time writing this stuff? graduate from high school and come back later.

    1. Re:-5 load of crap by mordors9 · · Score: 1

      Kind of puzzled by this. Actually my youngest child will be graduating this year.

  73. Duh by Anonymous Coward · · Score: 0

    But MD5 uses an appendable cascade construction -- in other words, if you happen to find yourself with two files that MD5 to the same hash, an arbitrary payload can be applied to both files and they'll still have the same hash.

    Duh!

    An analogy: the results of 4x8 and 16x2 are both 32. Now take the results of both and multiply by 3. Great Scott! We get 96 for both!

    Even SHA is constructed this way. Since they're byte-wise and sequential with the input, you'll always get this problem. The only way to eliminate this behavior is to random-seek the file or input (which means you need to know up front how many bytes are going in -- not always possible), but if you did it this way, the hash would always come out differently, thus making the hash useless. If you wanted to keep the seek strategy the same each time to make it useful, then the "additive" problem would still occur -- instead of being at the end of the file, it would just be woven into the seek pattern.

    There's nothing new here.

  74. MD5 dosnt belong in security by Anonymous Coward · · Score: 0

    It has been said time and time again by the paranoid, and we all knew that this was possible. MD5 is supposed to be used to verify that the thing you have transfered isnt corrupted. THAT IS ALL :) none of this "omg lets use it for passwords because its Really Secure", because it isnt :P

  75. Asymetric Key Encryption by dbacher · · Score: 1

    MD5 isn't the right way to detect tampering.

    Asymetric key encryption is. You have a trusted authority issue a key, and sign it. You can verify the key file hasn't been tampered with, and they can revoke the certificate. It's harder to fake, even if they can spoof a MD5, because you can connect out to the certificate authority and ask if the certificate is valid, and they can revoke them if there's problem.

    If you're downloading a package, presumably you're wanting to know that it's the right package, and that it's from whomever it says its from. An MD5 hash does not get you that capability, asymetric encryption does.

    Only someone who knows your distribution's private key can produce your distribution's packages. The public key is verified by a neutral 3rd party that can revoke accidents or hackers once they're discovered.

    So you know both that the distribution wasn't tampered with (because it decrypts) and that the distribution is from who they say it is from (because it decrypts), and you have a neutral 3rd party vouching for the facts who has a vested interest in retaining business by not letting people fake other people's signatures.

    --
    If your code is acting bloated, and is running rather slow, it's likely and predicted that some loops you will unroll.
    1. Re:Asymetric Key Encryption by Anonymous Coward · · Score: 0

      Problem is that signing the whole file just isn't practical.

      A signature is made by first hashing the data in question (using for instance MD5 or SHA-1), and then signing it with your private key.

      The hash on it's own is fine for detecting tampering, the point of signatures is to prove who created the content.

    2. Re:Asymetric Key Encryption by m50d · · Score: 1

      The problem with that is that the "signature" will be larger than the file itself. Not good for 700mb isos over dialup.

      --
      I am trolling
    3. Re:Asymetric Key Encryption by m50d · · Score: 1

      Sorry to reply twice, but I've just realised why this is completely impractical. PGP uses session keys because encrypting a typical email takes too much computing power. The power or time needed grows exponentially with the size, a typical 700mb iso would take ~2^3000000 times as long as a typical 700b email to encrypt. That's why we always sign hashes rather than files themselves, which means attacks like this are important.

      --
      I am trolling
  76. ALL hashes will have this weakness by Euler · · Score: 1

    I think everone is missing the analysis done by the author of this attack (including the author).

    The point is that the malicious code is in both versions of the downloaded package. The swappable payload just serves as an activation key. The actual mal-code can be present in both packages 'hidden inside an AES encrpted data section.'

    Yes MD5's have collisions, so do any current or future hashes or checksums. The point is it's still very very unlikely that anyone can insert arbitrary and functional code into *someone else's* binary package and get the same checksum.

    It is still safe to use MD5 for passwords (remember to salt though), it's still fine for checksumming files.

    The issue is whether a signed/hashed piece of code has a hidden bad payload or not. A hash will not protect you from mal-ware if the creator of the package did this intentionally. The trigger could simply be a date check, or your ip address or some other external data. The only effect of making two versions of the same package is that you can deploy the package to everyone without detection after the first package passed review. But like I said, you can pass a tainted package past review if the payload isn't triggered. Finally, the security of a checksum assumes that the package is somehow reviewed by a credible source, and that they catch all possible obfuscated attacks.

  77. Double Hashes? by Grail · · Score: 1

    From what I read, creating a hash-equivalent file for one hashing mechanism requires different and incompatible changes to those required for making a hash-equivalent file for a second hashing mechanism.

    Would it therefore be safe to calculate two hashes (MD5 and SHA-1) for the file, distribute the file and the two (or three, or four) separate hashes, and use the incompatible nature of the hashes as an extra "guarantee" of data integrity?

    1. Re:Double Hashes? by gokeln · · Score: 1

      Only marginally so. If you're going to go to the trouble of using another hash, why not go all the way and select a really good one: SHA-256? Then, since you have a really good one, why not just dump MD5?

      --

      There's no time to stop for gas, we're already late.
    2. Re:Double Hashes? by Grail · · Score: 1

      I just feel more comfortable using two orthogonal hash systems. Otherwise you're putting all your eggs in one basket (albeit a newer, denser weave basket, but one basket all the same).

  78. Happy birthday? by tepples · · Score: 1

    But for someone putting money on it (me) it's safest to take into account the possibility that the OP would try to come up with a collision using a birthday attack.

    But doesn't Warner own IP in the birthday attack? Snopes seems to think so.

  79. I can tell you by Anonymous Coward · · Score: 0

    that I know for sure that the MD11(formerly DC-10) is harmful today. It has been for over 30 years.

  80. Yeah, and imagine... by karnat10 · · Score: 1

    ...somebody might press the RED BUTTON!

    Wake me up when paranoia levels are back to normal

  81. A: MD5 of an empty string by dstone · · Score: 1

    I'll take Secure Hasing Trivia for $500, Alex.

    Q: What is d41d8cd98f00b204e9800998ecf8427e ?

    1. Re:A: MD5 of an empty string by Refrozen · · Score: 0

      Does it have anything to do with goats who cannot see? Or tubs with girls?

    2. Re:A: MD5 of an empty string by MarkMcLeod · · Score: 0

      It's the hash of a zero-length string.

  82. More detail by Anonymous Coward · · Score: 0

    Here is another paper that examines the implications of the Wang paper as well as some analysis of the collision.

  83. Silly by theLOUDroom · · Score: 1

    Someone with the knowledge and will and time can come up with a way to circumvent almost any protective scheme they come up with. Likely the only way to really safeguard something like this is to do a very time consuming and cpu intensive comparison of the two files. It will come back to "do you trust the source of the file".

    That's a silly conclusion to reach.
    It's like saying that we should give up on cryptography because some has found a way to crack the Caesar cipher.

    Just because ONE IMPLEMENTATION of a technology is flawed does not mean the entire technology is flawed.


    Hashing is a VERY useful process and only ONE particular implementation is being challenged here. The fundamental concept is still sound and extremely useful. Abandoning the entire idea would just be foolish.

    --
    Life is too short to proofread.
  84. Computing both is negligible with current CPUs by NZheretic · · Score: 1
    gokeln wrote: "The resulting improvement in security is not worth the additional cost of computing both.".

    That might have been true before 2000, but the effort to calculate both algorithm in parallel ( not SMP, just passing the input bytes to each function ) is negligible with modern CPUs, both PC and embedded. The limiting bottlenecks are the network, hard drive and memory, not the CPU power required

    gokeln wrote: "You do not get exponentially improved security"

    When anyone uses the phrase "exponentially improved security", it triggers a link to Avoiding bogus encryption products: Snake Oil FAQ.

    1. Re:Computing both is negligible with current CPUs by gokeln · · Score: 1

      Sorry, I was unclear. What I meant to say was not the cost of the computation itself, but rather the cost of the effort to implement another equally-weak algorithm. Better to choose a single strong algorithm rather than go to the effort of implementing another weak one and combining MD5 with it arithmetically.

      WRT "exponentially improved security", there are instances when you do get it. For example, doubling the key size in some algorithms such as Rijndael. I agree it smells of snake-oil.

      --

      There's no time to stop for gas, we're already late.
  85. ROT-13 is a group... by Goonie · · Score: 1

    I was going to use that example in my post, too. Ceasar cypher keys being a group and all :)

    --

    Any sufficiently advanced technology is indistinguishable from a rigged demo
    --Andy Finkel (J. Klass?)
  86. Off topic: Re:You are missing the point. by mrbuttboy · · Score: 1

    I just couldn't read your sig with out telling you how wrong you must be. I REFUSE to accept that our president can preform math of any sort with any great skill.

    end-of-line

    --
    What do you say to the man that has nothing? Cast it away!!
    1. Re:Off topic: Re:You are missing the point. by Alsee · · Score: 1

      Quit yer Bush bashing! Bush has extordinary mathematical skills! Who else on earth could count the internets?

      -

      --
      - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
  87. OSPF? by Anonymous Coward · · Score: 0

    How might OSPF be affected by this? I use md5 authentication in exchanging routing updates in my networks...

  88. Attack over time? by Anonymous Coward · · Score: 0

    Say someone created two files with some simple but "useful" app installed. One version was fine. The other version had some sort of time release backdoor installed. 1 year later after the hype really picks up. *BOOM*

  89. Wow cowboy's... by pVoid · · Score: 1
    I know everyone is thinking "damn pirates, they can get their corrupted files", but another good example of something we don't want is OS signed files.

    Example scenario: some OS has a list of MD5 hashes of all of it's core files (drivers etc). At boot, it checks to see if the files are untampered, and if they are warns the user or halts the boot.

    Now imagine you could maliciously inject bad code (worm/trojan/virus/spyware) into this system, and you had a tool that knew exactly what extra random data to add to make the file have the same MD5 hash after your addition.

    If that becomes possible, then Hell will freeze over. Unless, of course, people don't use MD5 and switch to SHA1.

  90. Hashing and encryption by NZheretic · · Score: 1
    I don't quite agree. There is a major difference between encryption and hashing. Combining two or more weak encryption algorithms in serial does not really provide better encryption. However combining two or more relatively weak, but dissimilar hashing algorithms in parallel does deliver a magnitude ( note not exponential ) level of improvement.

    The potential solution space that exists to create a binary that generates a differing binary that generates the same checksum for each algorithm is going to be a lot smaller than switching to less unproven single algorithm such as SHA-256.

  91. MD6 by Danathar · · Score: 2, Funny

    That's OK..just invent MD6!

  92. they pad signatures... by Anonymous Coward · · Score: 0

    Presumably Verisign would check the request, and see it is of the form that can be spoofed. So then when they pad it, they would pad it with info that makes it not spoofable. Or else they'd refuse to sign it and make you use a slightly different form that isn't spoofable either.

    You're making up problems that don't exist.

    1. Re:they pad signatures... by Anonymous Coward · · Score: 0

      Sigh. YOU FAIL IT.

    2. Re:they pad signatures... by Anonymous Coward · · Score: 0

      You clearly have no idea what you're talking about. Where does this information that makes it magically non-spoofable come from? rofl

  93. What's the big deal? by demon_2k · · Score: 1

    I don't see the big deal. What happened to CRC?

    All that means is that we will move to another digest algorithm when this one will become too easy to tamper with. A lot of people already provide sha-1 or a signiture of the files they provide. And that will be our solution until it fails. Just like cryptography.

    Strong -> stong -> weaker -> weak -> time to move on

  94. I like that, an MD5 with a vector by Anonymous Coward · · Score: 0

    So instead of just the MD5 value you would release a binary with an md5 and the md5 of the same file with an arbitrary string added to the end.

    Then you run the md5 tool with the added string as a flag and it gives you 2 MD5 results.

    Hell, make the md5 generate a little xml file with the original md5, an arbitrary random text string and the md5 with the random string added.

    Then mod MD5 to look for filname.md5 next to the filename and say weather the file passes the test with the command 'md5 filename'.

    Good times. Gives MD5 a few years, and perhaps we need to do the same for all current and future hash algoriths now as well, always use this vector approach to make manipulation of the values cost prohibitive.

  95. Why is this good for Linux? by Anonymous Coward · · Score: 0

    So people are off trying to crack the RSA key for signing X-box software by Microsoft. That doesn't matter anymore, all they need to do is find an MD5 signature which is identical to one of the many pre-existing X-box games MD5 signatures, viola, you have a working Linux implementation without any mod chips.

  96. There is money to be made.. by Anonymous Coward · · Score: 0

    I think this company allready try to profit from it(finnish text)?

  97. What about SHA-1? by RussP · · Score: 1

    Does this have implications for SHA-1, or is it still secure?

    --
    I watch Brit Hume on Fox News
  98. Two hashes by LentoMan · · Score: 1

    I'm guessing using two different hashing methods and putting both checksums up would make it harder to use exploits like this.

  99. Cryptographic stacking potentially harmful by j.leidner · · Score: 3, Insightful
    Let's say you break a file into blocks, encrypt those blocks with Rijndael or Serpent using a chaining method that authenticates the prior block, digitally sign the result using (seperately generated) RSA, DSA and ECC signatures in turn, and generate SHA-1 and Whirlpool checksums of both the encrypted and unencrypted file. True, you'd spend longer validating and decrypting the unholy mess generated than you'd spend downloading it, but I think you'd be fairly safe in assuming that the file was what it claimed to be.

    Maybe, maybe not. The new technique would certainly be more difficult to analyse mathematically, but just stacking complicated but flawed methods does not necessarily result in a more secure method: typically, the security of the weakest link determines the security of the whole system.

    What you say reminds me of Don Knuth's experience when he wrote his first innocent 'super' pseudo random number generator (reported in his Art of Computer Programming, Volume 2, page 4: "Algorithm K" ;-): he composed all sorts of complicated operations, but had to learn the resulting number sequence was far from more (pseudo-)random, in fact much worse than the the standard 1-line modulo function.

    Another case of (false sense of) security through obscurity?

    --
    Try Nuggets , our SMS search engine which uses question answering technology; now available across the UK.

    1. Re:Cryptographic stacking potentially harmful by jd · · Score: 1
      It is certainly possible that my method would be flawed through a weak link. I attempted to avoid that by having verification and authentication at each level, and between each block of the file, so that tampering or falsifying files would be extremely difficult.


      It would also make transmission over unreliable lines easier, as you'd be able to spot and re-transmit corrupt blocks.


      On the other hand, there are now more algorithms which have the potential for being broken. There is also much more meta-data which may provide an attacker enough insight to avoid having to do a completely exhaustive key search.


      As I see it, the needs for assurance on reliability vs. the needs for assurance of unbreakability vs. the needs for sufficient efficiency to be usable are all conflicting requirements. What's needed is some kind of mapping function that can take your requirements and derive the best crypto solution for it.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  100. No. by warrax_666 · · Score: 1
    Make sense?


    No. Hashes are typically much shorter than files (or even bits of files) and using hashes (or hash trees) is still much more efficient than sending all the data twice.

    Also, it also allows network error detection which is essentially free and more robust than just sending data twice and comparing -- if the network for has a nasty habit of throwing bits away during long sequences of 0s, your method might not detect it, but a good hash would (with overwhelming likelihood).
    --
    HAND.
    1. Re:No. by XO · · Score: 1

      I was only referring to the use for P2P applications to determine if two different people were serving the same files, in the case of someone poisoning a P2P share.

      --
      "Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
  101. Gnutella is NOT using MD5 by ram4 · · Score: 1

    Correction: Gnutella is using SHA1, not MD5.

    Moreover, Gnutella is using an alternate hash (Tiger) to construct a hash tree, and the root of that hash tree is appended to the SHA1 of the file to make a unique "bitprint", as understood by Bitzi (http://www.bitzi.com/).

    1. Re: Gnutella is NOT using MD5 by Omniscient+Ferret · · Score: 1

      Thanks. I thought that tiger tree hashing could catch this with the right block size. Do you know the hash tree block size for Gnutella & Bitzi?

    2. Re: Gnutella is NOT using MD5 by ram4 · · Score: 1

      Yes, the basic blocks are 1K blocks. That means that for large files, the whole tree is not kept, only the topmost levels (the topmost 10, I'd say, but that depends on clients).

  102. Thanks! by Grendel+Drago · · Score: 1

    Impressive! Thank you.

    --grendel drago

    --
    Laws do not persuade just because they threaten. --Seneca
  103. old news by CobwoyNeal · · Score: 1

    MPAA-hired thugs spamming the P2P networks with utter crap have been doing stuff like this for a long time

  104. Someday? maybe not in your life time. by vicaya · · Score: 1
    If you read the original paper, you'll find that it's simply not true that you could easily find a collision to a given hash. Wang et al's contribution is a more efficient algorithm than the canonical birthday attack to find random collisions in a family of hash functions (SHA-* and MD5 are the same family), while this guy's contribution seems to be only confusion. For the attack to be useful, you'll need to find a easy way to create hash collision to a given file.

    28bc8c78881b2f89bbeab4f9bb8fbeda is the md5 of "clue", I can bet my hash farm and a hash laced brownie that you'll never find anything other than "clue" that hashes to the same md5, at least not "someday" in your life time.

  105. Kick Out the Jams, Motherfucker! by Sloppy · · Score: 1

    Oh, I thought he said MC5...

    --
    As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
  106. Oh wow! by Anonymous Coward · · Score: 0

    Okay, is anyone truly surprised at this?

    MD5's are known to be collidable for appropriately complex constructs. It only goes to reason that the same constructs can have the same MD5.

    I've seen programs that sort through these for the last few years, although - obviously, they are processor dependant in the extreme. :P

    What is truly interesting imho, is the programs that have been created here and there to *spoof* MD5's by rearranging content to hit collisions so that you can play with the content...

  107. Stop-gap solution: hash offset files by AsciiNaut · · Score: 2, Interesting
    Here's a potential stop-gap solution: provide two md5sums per file, one of the whole file as normal, and one of the file offset by one byte. Let's look at the two hash-equivalent files cited by parent:
    $ cmp file1.dat file2.dat
    file1.dat file2.dat differ: char 20, line 1

    $ md5sum file1.dat file2.dat
    a4c0d35c95a63a805915367dcfe6b751 file1.dat
    a4c0d35c95a63a805915367dcfe6b751 file2.dat
    Whoops! Now examine hashes of the same files, omitting the first (identical) bytes:
    $ xxd -s 1 -ps file1.dat | xxd -r -p | md5sum
    84e6e0a21e2c4c9ef53f3762fdc90bc8 -
    $ xxd -s 1 -ps file2.dat | xxd -r -p | md5sum
    a63151008e5f8fc116ba947fd8af8c5a -
    Clearly this would not work with all collisions (nothing useful would), but it might hugely limit their frequency. And it would be relatively easy to tag on tables of 1-byte offset md5sums to existing md5sum tables out there.
  108. OMG, YOU KILLED AC! by Anonymous Coward · · Score: 0

    bastard!

  109. Bitprint by Fissure_FS2 · · Score: 1

    The Bitprint used by Bitzi.com and other places contains both SHA-1 and Tiger Tree, separated by a period. Unfortunately, they're being stupid and only publishing the SHA-1 in their magnet links, so those of us who use Tiger Tree are out of luck.

    --
    My life's goal is to get a score of +3!