Practical Exploits of Broken MD5 Algorithm
jose parinas writes "A practical sample of an MD5 exploit can be found, with source code included,in codeproject, a site for .Net programmers.
The intent of the demos is to demonstrate a very specific type of attack that exploits the inherent trust of an MD5 hash. It's sort of a semi-social engineering attack.
At Microsoft, the MD5 hash functions are banned.
The main problem is that the attack is directed to the distribution of software process, as you can understand reading the paper, Considered Harmful Someday. Some open source programs, like RPM, use MD5, and in many open source distributions MD5 is used as check sum."
Now we know why people distribute modified game ISOs on the net and check it with md5 :P In all technicality, couldnt this mean that someone could land a virus on someone else's machine because the person trusted the hash?
Viable Slashdot alternatives: https://pipedot.org/ and http://soylentnews.org/
Isn't this the problem with all algorithms? Only way to know if something is something... is to see something instead of its checksum?
As for md5... with only 32bits, it should've came up with repetitive hashes in end anyways?
I guess since the article explains some issues against md5 security, the only answer would be to trust the source that is supplying the hash in the first place? Coming down to the fact that a system is only as secure as its user?
Heya,
... are compressed. I use archlinux and I know the contents of the packages is compressed.
I thought that the known attacks would not work agains compressed content, because you had to add some "random" data to the content and while decompressing the decompressor would give an error. I know it isn't exact, but I thought that md5's of compressed content was safe. I'm not sure if rpm's,
greetings,
Michel
This isn't a problem for software distribution, really, since the good.bin file needs to start with a vector designed to enable a collision. A good-faith programmer wouldn't include that vector.
It is a problem for stuff like contracts; you draw up two versions of a contract, a good one and an evil one, let someone sign the good one, and later keep them to the clauses in the evil one.
So while there IS a very big problem, the example is a bit contrived.
Is this true for other popular hash functions?
No it is not. The newer hashes, such as Whirlpool, do not have this problem. You're correct in saying this is a "well known result" and every cryptographer worth his salt says that this fact constitutes a break of the algorithm. We've known since the middle of the nineties that breaking MD5 was within reach. The fact there has been so much inertia in getting people to change is quite incredible really.
At Toorcon this year, Dan Kaminsky showed a way to create two different webpages that render properly in a browser but have the same MD5 hash. Anybody who thinks this attack is theortical and ignorable is grossly mistaken.
Simon
Two pages, same hashes, etc. (This is the guy who wrote the MD5 someday paper.)
http://www.doxpara.com/t1.html
http://www.doxpara.com/t2.html
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
You almost got it right. The reverse problem of a checksum is to produce the original file that will give the checksum.
Actually, the functions are called hash functions, and they can be written as:
H(file) = h
H() is the hash function, h is the hash value (checksum) the reverse would be RH()
RH(h) = file
Now we can of course prove that reverse is not a unique function, since any fixed-length h has a limited number of possible values, whereas file can be of any length.
Therefore, reverse has many possible solutions, and in a simple case like the real checksum of bits (having two values 1 and 0) has a simple reverse algorithm, too. Given the value of h, produce the file == easy.
Complex hash functions like MD5 have the interesting property of difficulty of reversion : it is difficult to even produce any file that will compute to any given MD5 value, hence it is difficult to fake it.
It has been known all along that reverse is possible, and has many solutions, but it was thought they are too difficult to find. Now it has turned out that some can be, and were, found.
> Why? Adding the payload changes the MD5 value. So H(x) != H(x+q), and an added payload may be detected. This becomes bad only if H(x) = H(y), where x is good and y evil. But since no-one has yet found such a pair (and the fact that finding such a pair is hard is why MD5 is adopted), this result doesn't devalue anything...
Uh ?
Say you have x and y, such as H(x) = H(y), with x and y beeing neither good or evil. Say that len(x) = len(y) = n. Say also that x[0] != y[0] (first byte differ)
Say the binary format of you platform access garbage at the start of executables (this is, I think, the case for an ISO file).
Say you write the payload q, such as it works in the following way:
Look at the first byte of the file. If it is x[0] do something good. If it is y[0] do something evil.
Now you have constructed a pair( x+q, y+q ) where (x+q) is good and (y+q) is evil.
Easy.
You can obfuscate the test, and with clever xor'ing, you can even make the bad code hidden (without the knownledge of y, you can't reconstruct the bad code). For instance, compute x[0]^{ opcode of 'ret' }. ('^' is XOR). Call this 'a'.
y[0]^a will probably NOT be a ret or jump instruction, but will probably move data between registers. Now, devise some bad code (you have n-1 bytes for that). Then, define:
q0 = a + {bad code}^y[1..n]
Define q as code that fecthes the n first bytes, xor them to the n next bytes, and calls the result, then do something non evil.
Good Binary:
x + q0 + q
n bytes, n bytes, len(q) bytes
At execution, it will execute x[0]^q0[0], which by definition is the 'ret' opcode.
Evil Binary
y + q0 + q
n bytes, n bytes, len(q) bytes
At execution, it will execute one byte of garbage, followed by the reconstructed bad code.
The good and evil binaries will have the same MD5. The good binary will even NOT contain ANY hidden bad code.
But, when the first n first bytes of that binary are replaced by y, (y^q0)[1..n] will be that evil code.
Cheers,
--fred
Not necessarily - it depends on how much the author is trusted to begin with. Certain types of software are very closely checked by the open source community, and any trojan will be discovered if it exists in the package.
Say I write a bit of security software (for which most people take the time to compare checksums). As a relative nobody, lots of people are going to scrutinise the source code before using it. Any new release will also be checked. Only after it's been scrutinised and built up a reasonable userbase is it worth switching it for the evil version - otherwise, the evil version would be discovered early and nobody would use it.
Someone like Schneier could probably release a trojan directly and people would install it without thinking (I trust he's too responsible for that, though). For the rest of us, this gives a feasible way to sneak in evil code without anyone checking it.
That's not any less secure than a real CRC32 function in terms of how hard it is to fake.
c rc/node6.html#SECTION00060000000000000000
In both cases, if you modfy the data you just need to change a couple of bytes to fix up the
checksum. Admittedly you only need to change 1 byte in this algorithm versus 4 for a CRC32,
but either way it is a couple of minutes work.
It is worse at spotting errors - notably that you could swap a couple of bytes and it
wouldn't notice, unlike CRC32.
Mind you, in a underpowered embedded system (clock speed speed to slow to use the non table CRC
in real time, not enough ram for the CRC tables), a checksum like this is probably better than nothing
and cheaper than a CRC computationally. Imagine a microcontroller USB bootloader for example. Compared
to no checksum, this would catch errors like broken RAM or a bad uart, both of which tend to completely
destroy data, but it can still receive as fast as the host can send, whereas a real CRC would be
a bottleneck limiting speed somewhat.
Good page on CRC's - you can usually use either the non table version - fast CPU, no much RAM,
or the table one - slow CPU clock, at least 1K of Ram or Rom available.
http://www.cl.cam.ac.uk/Research/SRG/bluebook/21/
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
``If you contract your file from x bytes down to a fixed size, no matter what algorithm you use, you will always have collisions.''
Yes.
``Unless you start to give your hash keys as the same size as the original file, there is not anything that can be done about it, ever.''
No. If I run deflate or some other compression algorithm on a file, chances are it will come out smaller. Still, the compressed file maps one to one with the original.
Please correct me if I got my facts wrong.