Slashdot Mirror


Practical Exploits of Broken MD5 Algorithm

jose parinas writes "A practical sample of an MD5 exploit can be found, with source code included,in codeproject, a site for .Net programmers. The intent of the demos is to demonstrate a very specific type of attack that exploits the inherent trust of an MD5 hash. It's sort of a semi-social engineering attack. At Microsoft, the MD5 hash functions are banned. The main problem is that the attack is directed to the distribution of software process, as you can understand reading the paper, Considered Harmful Someday. Some open source programs, like RPM, use MD5, and in many open source distributions MD5 is used as check sum."

14 of 253 comments (clear)

  1. hashtrust by gcnaddict · · Score: 4, Interesting

    Now we know why people distribute modified game ISOs on the net and check it with md5 :P In all technicality, couldnt this mean that someone could land a virus on someone else's machine because the person trusted the hash?

    --
    Viable Slashdot alternatives: https://pipedot.org/ and http://soylentnews.org/
    1. Re:hashtrust by Anonymous Coward · · Score: 1, Interesting

      Yes, possibly. However the original file must be created with the switch in mind. If you trusted person X's signature on the good file, and he swapped it with a colliding bad file, well, you trusted them anyways so this is just so much handwaving.

    2. Re:hashtrust by Anonymous Coward · · Score: 2, Interesting

      No, it's extremely hard. Probability of collision AND valid code is too damn low. Although this article proves that, with little social engineering, you can "emulate" (or cheat) a valid-code collision.

      No, it looks quite feasible.

      You need two ISOs, to each of which you have added a different block of random bytes that happen to hash to the same MD5, and you have also modified the game's setup program so that, depending on which block of bytes is there, it either installs the game normally, or it installs the game and a rootkit. You then start sharing the "good" ISO on a P2P network. It gets a bunch of downloads, a whole load of people add comments that say "works fine! no viruses!", and then you start sharing the evil version instead. Nobody can see any difference - but now some of the people downloading it are going to get owned.

      With little social engineering indeed. In fact, it barely takes any at all.

  2. But... by Auraiken · · Score: 2, Interesting

    Isn't this the problem with all algorithms? Only way to know if something is something... is to see something instead of its checksum?
    As for md5... with only 32bits, it should've came up with repetitive hashes in end anyways?

    I guess since the article explains some issues against md5 security, the only answer would be to trust the source that is supplying the hash in the first place? Coming down to the fact that a system is only as secure as its user?

  3. compressed content safe (?) by Anonymous Coward · · Score: 1, Interesting

    Heya,

    I thought that the known attacks would not work agains compressed content, because you had to add some "random" data to the content and while decompressing the decompressor would give an error. I know it isn't exact, but I thought that md5's of compressed content was safe. I'm not sure if rpm's, ... are compressed. I use archlinux and I know the contents of the packages is compressed.

    greetings,

    Michel

  4. Not a problem for software distribution.. by hhghghghh · · Score: 4, Interesting

    This isn't a problem for software distribution, really, since the good.bin file needs to start with a vector designed to enable a collision. A good-faith programmer wouldn't include that vector.

    It is a problem for stuff like contracts; you draw up two versions of a contract, a good one and an evil one, let someone sign the good one, and later keep them to the clauses in the evil one.

    So while there IS a very big problem, the example is a bit contrived.

  5. Re:H(x) == H(y) - H(x + q) == H(y + q) ? by Ckwop · · Score: 5, Interesting

    Is this true for other popular hash functions?


    No it is not. The newer hashes, such as Whirlpool, do not have this problem. You're correct in saying this is a "well known result" and every cryptographer worth his salt says that this fact constitutes a break of the algorithm. We've known since the middle of the nineties that breaking MD5 was within reach. The fact there has been so much inertia in getting people to change is quite incredible really.


    At Toorcon this year, Dan Kaminsky showed a way to create two different webpages that render properly in a browser but have the same MD5 hash. Anybody who thinks this attack is theortical and ignorable is grossly mistaken.


    Simon


  6. Yadda Yadda by Effugas · · Score: 5, Interesting

    Two pages, same hashes, etc. (This is the guy who wrote the MD5 someday paper.)

    http://www.doxpara.com/t1.html
    http://www.doxpara.com/t2.html

  7. Re:Checksums are always going to be vulnerable by gowen · · Score: 4, Interesting
    It typically takes less than five minutes to break MD5 so it is horribly broken.
    But all that enables you to do is replace an MD5'd file with garbage that happens to have the same MD5 sum. It's hard to deliver a payload when you're limited to tricking a target into downloading what would be (essentially) a random string of ones and zeroes.
    --
    Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
  8. Re:Checksums are always going to be vulnerable by MaGogue · · Score: 2, Interesting

    You almost got it right. The reverse problem of a checksum is to produce the original file that will give the checksum.

    Actually, the functions are called hash functions, and they can be written as:
    H(file) = h
    H() is the hash function, h is the hash value (checksum) the reverse would be RH()
    RH(h) = file
    Now we can of course prove that reverse is not a unique function, since any fixed-length h has a limited number of possible values, whereas file can be of any length.
    Therefore, reverse has many possible solutions, and in a simple case like the real checksum of bits (having two values 1 and 0) has a simple reverse algorithm, too. Given the value of h, produce the file == easy.
    Complex hash functions like MD5 have the interesting property of difficulty of reversion : it is difficult to even produce any file that will compute to any given MD5 value, hence it is difficult to fake it.
    It has been known all along that reverse is possible, and has many solutions, but it was thought they are too difficult to find. Now it has turned out that some can be, and were, found.

  9. Re:H(x) == H(y) - H(x + q) == H(y + q) ? by Anonymous Coward · · Score: 1, Interesting

    > Why? Adding the payload changes the MD5 value. So H(x) != H(x+q), and an added payload may be detected. This becomes bad only if H(x) = H(y), where x is good and y evil. But since no-one has yet found such a pair (and the fact that finding such a pair is hard is why MD5 is adopted), this result doesn't devalue anything...

    Uh ?

    Say you have x and y, such as H(x) = H(y), with x and y beeing neither good or evil. Say that len(x) = len(y) = n. Say also that x[0] != y[0] (first byte differ)

    Say the binary format of you platform access garbage at the start of executables (this is, I think, the case for an ISO file).

    Say you write the payload q, such as it works in the following way:

    Look at the first byte of the file. If it is x[0] do something good. If it is y[0] do something evil.

    Now you have constructed a pair( x+q, y+q ) where (x+q) is good and (y+q) is evil.

    Easy.

    You can obfuscate the test, and with clever xor'ing, you can even make the bad code hidden (without the knownledge of y, you can't reconstruct the bad code). For instance, compute x[0]^{ opcode of 'ret' }. ('^' is XOR). Call this 'a'.

    y[0]^a will probably NOT be a ret or jump instruction, but will probably move data between registers. Now, devise some bad code (you have n-1 bytes for that). Then, define:

    q0 = a + {bad code}^y[1..n]

    Define q as code that fecthes the n first bytes, xor them to the n next bytes, and calls the result, then do something non evil.

    Good Binary:
    x + q0 + q
    n bytes, n bytes, len(q) bytes

    At execution, it will execute x[0]^q0[0], which by definition is the 'ret' opcode.

    Evil Binary
    y + q0 + q
    n bytes, n bytes, len(q) bytes

    At execution, it will execute one byte of garbage, followed by the reconstructed bad code.

    The good and evil binaries will have the same MD5. The good binary will even NOT contain ANY hidden bad code.

    But, when the first n first bytes of that binary are replaced by y, (y^q0)[1..n] will be that evil code.

    Cheers,

    --fred

  10. Re:File Integrity Checkers ? by Shano · · Score: 2, Interesting

    Not necessarily - it depends on how much the author is trusted to begin with. Certain types of software are very closely checked by the open source community, and any trojan will be discovered if it exists in the package.

    Say I write a bit of security software (for which most people take the time to compare checksums). As a relative nobody, lots of people are going to scrutinise the source code before using it. Any new release will also be checked. Only after it's been scrutinised and built up a reasonable userbase is it worth switching it for the evil version - otherwise, the evil version would be discovered early and nobody would use it.

    Someone like Schneier could probably release a trojan directly and people would install it without thinking (I trust he's too responsible for that, though). For the rest of us, this gives a feasible way to sneak in evil code without anyone checking it.

  11. Re:M$ Antihash by Hal_Porter · · Score: 1, Interesting

    That's not any less secure than a real CRC32 function in terms of how hard it is to fake.

    In both cases, if you modfy the data you just need to change a couple of bytes to fix up the
    checksum. Admittedly you only need to change 1 byte in this algorithm versus 4 for a CRC32,
    but either way it is a couple of minutes work.
    It is worse at spotting errors - notably that you could swap a couple of bytes and it
    wouldn't notice, unlike CRC32.

    Mind you, in a underpowered embedded system (clock speed speed to slow to use the non table CRC
    in real time, not enough ram for the CRC tables), a checksum like this is probably better than nothing
    and cheaper than a CRC computationally. Imagine a microcontroller USB bootloader for example. Compared
    to no checksum, this would catch errors like broken RAM or a bad uart, both of which tend to completely
    destroy data, but it can still receive as fast as the host can send, whereas a real CRC would be
    a bottleneck limiting speed somewhat.

    Good page on CRC's - you can usually use either the non table version - fast CPU, no much RAM,
    or the table one - slow CPU clock, at least 1K of Ram or Rom available.

    http://www.cl.cam.ac.uk/Research/SRG/bluebook/21/c rc/node6.html#SECTION00060000000000000000

    --
    echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
  12. Re:Checksums are always going to be vulnerable by RAMMS+EIN · · Score: 2, Interesting

    ``If you contract your file from x bytes down to a fixed size, no matter what algorithm you use, you will always have collisions.''

    Yes.

    ``Unless you start to give your hash keys as the same size as the original file, there is not anything that can be done about it, ever.''

    No. If I run deflate or some other compression algorithm on a file, chances are it will come out smaller. Still, the compressed file maps one to one with the original.

    --
    Please correct me if I got my facts wrong.