Slashdot Mirror


Meaningful MD5 Collisions

mrogers writes "Researchers at Ruhr-Universität Bochum have found a way to produce MD5 collisions between human-meaningful documents. This could be used to obtain a digital signature on one document and then transfer it to another. The same technique is theoretically applicable to other hash functions based on the Merkle-Damgård structure, such as SHA-1." From the article: "Recently, the world of cryptographic hash functions has turned into a mess. A lot of researchers announced algorithms ("attacks") to find collisions for common hash functions such as MD5 and SHA-1 (see [B+, WFLY, WY, WYY-a, WYY-b]). For cryptographers, these results are exciting - but many so-called 'practitioners' turned them down as 'practically irrelevant'."

6 of 312 comments (clear)

  1. Okay, I'm impressed. by ave19 · · Score: 4, Interesting

    At first I thought: Postscript! Well, obviously. To find a collision, you've probably got to hide a clump of randomness in the document, and then rotate that clump until the hashes collide. If you tried to hide random data in a text file, it would be obvious to the person signing it. You need some format to hide the random bits from the viwer.

    I bet the random parts are REALLY BIG! I mean, you'd probably need a lot of random data before you could find a collision...

    Then I downloaded the files...

    There's almost nothing to them! I can't read PS, so I'm not sure how many of that handful of bytes at the beginning might be tweakable... but it's a lot less than I expected.

    Collisions must be very easy to find! I am now offically very worried about this.

    --
    ...or maybe not.
  2. So you're saying I shouldn't implement MD5 ... by malcomvetter · · Score: 3, Interesting



    in my next big project?

    In all seriousness, I believe Schneier's right. We need a competition for a new hash function.

    Nah, let's just wait for 24 to drop the words "MD5" before we know it's really bad.

  3. Relative Resources: The Attackers' Advantage by G4from128k · · Score: 4, Interesting
    The core problem is that the resources of the defender are inevitably overwhelmed by the resources of the attacker.
    1. Relative expected values of gian vs. loss: The attacker thinks "I know I can gain a #BIG_NUM million dollars" and devotes their full effort to the attack. The defender thinks "I'm safe, there's a low probability, and I'm sure I'll catch the problem before it becomes real money, " and does not not devote effort to security becuase a Gartner report told him it was over-hyped. Thus, the attacker's perceived expected value is much higher than the defenders perceived expected loss and each invests accordingly.
    2. Rising Complexity: As IT systems become more complex, they become less secure. Each new device, new networking protocol, new physical layer, new OS feature, new networked application provides new opportunities for the attacker and a dilution of security resources for defenders.
    3. Time: The attacker has the advantage of time. New algorithms, new mathematical theories, new exploits, and faster processors all favor the attacker. What once was supposed to take the age of the universe to crack can be decrypted with a quickly declining number of networked (even zombied) PCs.
    4. Curse of Compatibility: Because so much crypto and security is networking related, it is subject to implementation delays caused by the need to be compatible. Defenders continue to use old, vulnerable systems to maintain compatibility with key partners. Patches don't solve the problem because the patch itself can introduce incompatibilities that make defenders leary of applying the patch with a very real chance of causing problems to avoid a hypothetical security issue.
    The bottom line is that the defender must protect all vulnerabilities while going about the day-to-day business of using the computer. In contrast, the attacker can devote full time to any weakness of their choice.
    --
    Two wrongs don't make a right, but three lefts do.
  4. From a prior discussion... by erroneus · · Score: 3, Interesting

    I forget where or when exactly, so please feel free to run a search if you care to... it was here on Slashdot though.

    There was talk about someone being able to foil P2P networks by seeding bad stuff through random data formulated to fit the MD5/SHA1 code from legitimate files shared on those networks. The consensus was that it was BS and that even if it weren't BS there could be updates to make such attacks more difficult or impossible to perform.

    Am I missing something or are these two stories relevant to each other?

  5. Works for certificates, too by cpeikert · · Score: 4, Interesting

    Lenstra and others came up with a way to generate syntactically-correct X509 certificates that collide under MD5.

    Here's a link to the paper: Lenstra et al.

  6. Re:Common sense by iabervon · · Score: 3, Interesting

    Actually, the two documents are actually almost identical. The difference is only one block in the whole file, which essentially acts as a selector for which of the two sets of content is displayed. MD5 (like most hash functions) works on fixed-size blocks smaller than the average file. To hash a complete file, you hash the first block, feed that into the hash with the second block, feed that into the hash with the third block, and so forth. So they have two files, and the first blocks are the same, the second blocks are different but hash the same, and the rest of the files are the same. Of course, the second blocks are junk, but the postscript is expecting a block-sized arbitrary value at that point anyway, so it doesn't matter that there's junk there.

    So they are actually using a format that can contain an exact quantity of extraneous information that doesn't get rendered but entirely changes what does get rendered.

    The same thing could be done with PDF or doc, and executables, but not anything compressed (it won't decompress at all if a block is changed) and not HTML without javascript (there's no way to test which block of junk is included and show different results based on that).