Slashdot Mirror


Practical Exploits of Broken MD5 Algorithm

jose parinas writes "A practical sample of an MD5 exploit can be found, with source code included,in codeproject, a site for .Net programmers. The intent of the demos is to demonstrate a very specific type of attack that exploits the inherent trust of an MD5 hash. It's sort of a semi-social engineering attack. At Microsoft, the MD5 hash functions are banned. The main problem is that the attack is directed to the distribution of software process, as you can understand reading the paper, Considered Harmful Someday. Some open source programs, like RPM, use MD5, and in many open source distributions MD5 is used as check sum."

253 comments

  1. Checksums are always going to be vulnerable by LiquidCoooled · · Score: 3, Insightful

    If you contract your file from x bytes down to a fixed size, no matter what algorithm you use, you will always have collisions.

    Unless you start to give your hash keys as the same size as the original file, there is not anything that can be done about it, ever.

    --
    liqbase :: faster than paper
    1. Re:Checksums are always going to be vulnerable by Asztal_ · · Score: 1

      Then would you also take a checksum of that checksum? o_O

    2. Re:Checksums are always going to be vulnerable by MaGogue · · Score: 0, Troll

      Of course, but the trick is to find algorithms that are hard to reverse, that is to find a plaintext for any given checksum.

    3. Re:Checksums are always going to be vulnerable by Ckwop · · Score: 5, Insightful

      If you contract your file from x bytes down to a fixed size, no matter what algorithm you use, you will always have collisions. Unless you start to give your hash keys as the same size as the original file, there is not anything that can be done about it, ever.

      While this technical correct it is slightly misleading. The aim of a hash function is to make it has hard as possible to find a collision. For an n-bit has it takes roughly 2^(n/2) operations to find a collision. Any attack faster than this is considered a break of the hash function. It typically takes less than five minutes to break MD5 so it is horribly broken.

      While removing the possibility of collision all together is provably impossible you can design a hash function for which finding a collision is computationally infeasible. The standard size to achieve this today is 256-bits and the better designed functions like Whirlpool use this hash length.

      Simon

    4. Re:Checksums are always going to be vulnerable by Anonymous Coward · · Score: 5, Informative

      This completely misses the point of cryptographic hashes.

      The point is that it is supposed to be difficult to find another data set which hashes to the same value without doing a brute-force search. Of course you will get collisions, but the changes are (supposed to be) 1 in 2^80 with MD5 or 1 in 2^128 with SHA-1.

      The exploits mentioned above are that the algorithms (MD5 and to some extent SHA-1) have been broken to allow you to construct a piece of data which hashes to the same value as the original. This is VERY different from the fact that you get collisions.

    5. Re:Checksums are always going to be vulnerable by LostCluster · · Score: 1

      Yes, but ideally the collisions would be random collections of bits. They shouldn't make up anything meaningful, nevermind being a piece of malware.

    6. Re:Checksums are always going to be vulnerable by LentoMan · · Score: 2, Insightful

      Use of two or more different checksums algorithms would make it substantially harder to create a file with a fake hash.

    7. Re:Checksums are always going to be vulnerable by SimilarityEngine · · Score: 1

      Or, double the bit count?

      --
      Those who can make you believe absurdities can make you commit atrocities. - Voltaire
    8. Re:Checksums are always going to be vulnerable by ceeam · · Score: 4, Informative

      (sigh) Insightful, my ass... Checksums are NOT reversible. The main trick here is to replace one file with another and leave the hash/checksum the same by patching the fake file. (For practically every file format there exists a spare space where this patching could be done.)

    9. Re:Checksums are always going to be vulnerable by gowen · · Score: 4, Interesting
      It typically takes less than five minutes to break MD5 so it is horribly broken.
      But all that enables you to do is replace an MD5'd file with garbage that happens to have the same MD5 sum. It's hard to deliver a payload when you're limited to tricking a target into downloading what would be (essentially) a random string of ones and zeroes.
      --
      Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
    10. Re:Checksums are always going to be vulnerable by ceeam · · Score: 1

      Sure! But the interesting question, IMHO, is _how_much_ more difficult it would be? If both hashes are "patchable" then what would be the difficulty to find a method to make _both_ hashes match simultaneously? (I'm not a cypherpunk, anyone?)

    11. Re:Checksums are always going to be vulnerable by Ckwop · · Score: 5, Informative

      But all that enables you to do is replace an MD5'd file with garbage that happens to have the same MD5 sum. It's hard to deliver a payload when you're limited to tricking a target into downloading what would be (essentially) a random string of ones and zeroes.


      At Toorcon this year, Dan Kaminsky showed how to generate two valid, nicely rendered, html files with the same hash . Basically, he injects javascript into the page to remove the rubbish at the begining of the file. But how often to do you view the source of a page you're visiting. It'd be hard for a layperson to notice this. Make no mistake about it, the collision attack is very dangerous.


      Simon

    12. Re:Checksums are always going to be vulnerable by m50d · · Score: 2, Insightful
      No, it wouldn't, *sigh*. Stop posting this, I'm tired of correcting it.

      As another reply said, however, doubling the bit count would improve security. But, a simple double-MD5 would have exactly the same problems. Therefore, the solution is to use a longer and more secure hash, like, for example, SHA-256.

      --
      I am trolling
    13. Re:Checksums are always going to be vulnerable by MaGogue · · Score: 2, Interesting

      You almost got it right. The reverse problem of a checksum is to produce the original file that will give the checksum.

      Actually, the functions are called hash functions, and they can be written as:
      H(file) = h
      H() is the hash function, h is the hash value (checksum) the reverse would be RH()
      RH(h) = file
      Now we can of course prove that reverse is not a unique function, since any fixed-length h has a limited number of possible values, whereas file can be of any length.
      Therefore, reverse has many possible solutions, and in a simple case like the real checksum of bits (having two values 1 and 0) has a simple reverse algorithm, too. Given the value of h, produce the file == easy.
      Complex hash functions like MD5 have the interesting property of difficulty of reversion : it is difficult to even produce any file that will compute to any given MD5 value, hence it is difficult to fake it.
      It has been known all along that reverse is possible, and has many solutions, but it was thought they are too difficult to find. Now it has turned out that some can be, and were, found.

    14. Re:Checksums are always going to be vulnerable by nagora · · Score: 1
      The exploits mentioned above are that the algorithms (MD5 and to some extent SHA-1) have been broken to allow you to construct a piece of data which hashes to the same value as the original.

      Well, sort of. It allows you to construct two pieces of data that hash to the same value. It doesn't seem to make it any easier to take an arbitrary original and just magic up another file which has the same hash.

      TWW

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    15. Re:Checksums are always going to be vulnerable by Pieroxy · · Score: 2, Informative

      It doesn't seem to make it any easier to take an arbitrary original and just magic up another file which has the same hash.

      Yes it does.

    16. Re:Checksums are always going to be vulnerable by ocelotbob · · Score: 2, Informative

      I think you're missing what the OP is saying here. The OP is suggesting to use something like MD5+SHA1, algorithms with different techniques for generating hashes so that you decrease the probability of creating a collision that works for both, not using something like double MD5

      --

      Marxism is the opiate of dumbasses

    17. Re:Checksums are always going to be vulnerable by Anonymous Coward · · Score: 3, Informative

      No, it doesn't.

      You linked an example which takes a document A, and produces two documents A1 and A2 where

      A1 looks like A, but A2 does not look like A
      and MD5(A1) = MD5(A2)

      BUT, critically, MD5(A1) is not MD5(A)

      So neither of the documents created is an imposter. Arbitrary payloads are still protected by MD5. If you don't agree, simply reply with a link to a file that has the MD5 hash 7a0a25a5c71fa2639a41ee743aa5e2b7

      No-one can do this yet, and they may never be able to do it. But MD5 has failed one of its original design requirements and that's why people should stop using it and divert resources to ensuring that its replacements are safer.

    18. Re:Checksums are always going to be vulnerable by Goaway · · Score: 1

      Still not true.

      Read any Slashdot discussion about hash algorithms to find this idea refuted again and again.

    19. Re:Checksums are always going to be vulnerable by m50d · · Score: 1

      Yes, that's exactly what the OP is saying and it's a stupid idea. By double MD5 I meant a double-length MD5 (twice as many bits in the hash), which although still a stupid idea is nowhere near as stupid as using MD5 and SHA1 together and expecting it to increase security.

      --
      I am trolling
    20. Re:Checksums are always going to be vulnerable by Tony+Hoyle · · Score: 2, Informative

      Using multiple hashes is a known way to increase security (some protocols use it) - it *does* work because the number of potential collisions is reduced - if you can create a collission with the MD5 (far from simple, in fact) it's extremely unlikely to *also* collide with the SHA1 hash - the number of plaintexts where they both collide is significantly lower.

    21. Re:Checksums are always going to be vulnerable by Anonymous Coward · · Score: 0

      Mod comment : the real troll of this post is in the mods, not the post.

    22. Re:Checksums are always going to be vulnerable by Karhgath · · Score: 1

      Well, this is the usual naive way of seeing it. It also seems logical at first, since the hash is small, it obviously will get collision.

      However, when you start looking at it in more details, some things stands out. If you use a 256bit hash, you get about 10^77 possible combinaisons as a good algorithm should be able to fill all the hash space.

      Now, we are currently estimating at ~10^79 the number of atoms in the whole universe. 10^77 vs 10^79. Even without going into maths and analysis of hash algorithm, space distribution and all, it already gives you a pretty good idea. Now, since data isn't always physical(thoughts?), it's interesting to imagine if we can store more information than the total number of atoms in the universe. That's an interesting (but hardly on topic) question.

      So, a 256bit hash is pretty big. Increase it to 512bit and you get 10^154 combinaisons. Unless the algorithm is flawed or the distribution isn't uniform, I'd say that collisions are going to be pretty hard to get. Then again, that's without any distribution and statistical analysis, I'll leave that up to others.

      The conclusion is that our first thought about collision might not be as evident after thinking about the above.

    23. Re:Checksums are always going to be vulnerable by Anonymous Coward · · Score: 0

      It is YOU who don't have a clue .. of course what is meant by 'reverse' is finding a file to match a given hash. Since the mapping x->h(x) is not injective, there is no unique value of h(-1)(h(x)) , but many, and h(-1) is also a mapping.
      if you have say hash1 = h(file1) , finding hash1 = h(file2) is actually finding one more value of h(-1) at point hash1.

    24. Re:Checksums are always going to be vulnerable by baadger · · Score: 3, Informative

      No it doesn't.

      The Wang vector pair floating about at the moment, when prepended to 2 useable files will produce a MD5 collision of the said files. BUT - as a result of doing this you are also going to corrupt these files and make them unuseable (executeables, MP3's etc, obviously not text documents).

      All the proof-of-concept article shows is the two attack vectors by Wang in use with 3 simple programs. You will notice the "md5extractor", which needs to be in place to remove the arbitrary vector data before the evil good.exe becomes dangerous. This exact procedure doesn't apply to most software distribution actually, how are you going to get the extractor on the victims computer in the first place?

      This could be a problem is somebody can produce an attack vector pair that does produce a valid executeable/PE header or and MP3 header. But these have structure and leave much less room for the vector, may place restrictions on the payload, and might not even be possible.

      The webpage thing described in the comment you link to is pretty harmless. Who the hell usines a MD5 hash on a HTML documents? Misleading documentation? Browser exploits? Unlikely.

      The fact remains if you were to try and use this method you would really be doing, and what you will have to do, is nothing more than trick the user by normal means (human failure).

      Coincedentally, for use in authentication you would be a fool NOT to be running sanity checks on input anyway. For use in authentication, salted and sanity checked input to MD5 should is still very very safe.

      I can't see a reason why P2P applications implemented for networks using MD5 file verification can't start popping off bytes at the beginning of downloads (the first block) and try it with another payload to detect and reject people using this type of multicollision attack. In addition these applications could check for valid MP3, AVI, JPEG, headers etc.

      The author of the "MD5 - Someday to be considered harmful" paper is correct. MD5 is risky for some purposes, P2P networks still using MD5 without any smarts may be ruined, but the hash far from dead if used carefully backed up by other checks. What makes people think moving to SHA-1 or Whirlpool is going to solve these problems (OK with SHA-1 different types of attacks) in the longer term?

      Relying on the hash mechanism alone is just a bad habit to get into. People are switching because it's just best to play it safe when people (myself included) don't understand the full significance of the attacks produced this year.

    25. Re:Checksums are always going to be vulnerable by brainnolo · · Score: 1

      But how often to do you view the source of a page you're visiting.

      Much more often than i check for their MD5, especially cause nobody posts the MD5 sum of their pages so you can't check them.

    26. Re:Checksums are always going to be vulnerable by JCCyC · · Score: 1

      Well, then that brings us to the 1,000,000,000 Dollar Question:

      Is there a BETTER hash function out there than MD5 and SHA1?

    27. Re:Checksums are always going to be vulnerable by Dwonis · · Score: 1

      Ever heard of SSL? PGP? S/MIME? They all use(d) MD5 at some point.

    28. Re:Checksums are always going to be vulnerable by ajs · · Score: 2, Insightful

      You are looking at this incorrectly. The problem is that, given file A, which has hash X, if it is possible to create file B which has hash X in a non-exaustive way, then the hash is weak. Most such techniques work well regardless of file B being a from-scratch random pile of data or a combination of well-formed data plus random data. So, for example, you might have an RPM with 3 files in it. Add a fourth file, and start permuting to find your similar hash.

      Now, as you point out, we're not yet to the point that this is trivial with MD5 (it has always been possible, just not practical). So, the only valid thing to do is to move to something more secure as soon as possible so that when we finally do discover a trivial exploit for MD5, we no longer care as much about it. This also helps us by forcing software vendors and projects to re-consider how they use hashing, and to make it easier to modularly replace the hashing functions they use. You will note that all of the oldest such software has already had to do this, and thus the task will be easier for them.

    29. Re:Checksums are always going to be vulnerable by The+Snowman · · Score: 1

      Back when MD5 started coming under heavy attack earlier this year, I figured an easy way to make cracking hash codes more difficult: use multiple hashes.

      For example, rather than publishing an MD5 hash of a file, publish a hash that is a concatenation of MD5, SHA1, and whatever else is considered sufficient by today's standards. Finding a weakness in one algorithm is difficult. Finding a weakness in two is more difficult. Exploiting both simultaneously would be monstrously difficult, to the point that it would not be feasible without some major breakthroughs in hacking these algorithms.

      Of course, hashing has some inherent issues. The only real secure method of encryption is a one-time pad. However, this is not feasible. Hashes are designed to make it very difficult to compromise a source string and not be noticed. Nothing is 100% foolproof.

      --
      24 beers in a case, 24 hours in a day. Coincidence? I think not!
    30. Re:Checksums are always going to be vulnerable by RAMMS+EIN · · Score: 2, Interesting

      ``If you contract your file from x bytes down to a fixed size, no matter what algorithm you use, you will always have collisions.''

      Yes.

      ``Unless you start to give your hash keys as the same size as the original file, there is not anything that can be done about it, ever.''

      No. If I run deflate or some other compression algorithm on a file, chances are it will come out smaller. Still, the compressed file maps one to one with the original.

      --
      Please correct me if I got my facts wrong.
    31. Re:Checksums are always going to be vulnerable by LightningBolt! · · Score: 1

      >>Unless you start to give your hash keys as the same size as the original file, there is not anything that can be done about it, ever.''

      >No. If I run deflate or some other compression algorithm on a file, chances are it will come out smaller. Still, the compressed file maps one to one with the original.

      The parent is correct. Given a number of purely random files and a good compressor of patterned information, most, if not all, of are the "compressed" files will be larger than the original. It is mathematically impossible to create a compressor that can reduce the size of any arbitrary file. See the counting argument in section 9.2 of the compression FAQ for an explanation.

      --
      Old people fall. Young people spring. Rich people summer and winter.
    32. Re:Checksums are always going to be vulnerable by Anonymous Coward · · Score: 0

      Incorrect. Checksums ARE reverseable as long as the hash is a sparse mapping.

      If you continually map 8 letters into a 64-bit hash, then you are looking at a plausible reverse mapping. (8 letters into a big hash? Gee, that doesn't sound like password files.)

    33. Re:Checksums are always going to be vulnerable by Anonymous Coward · · Score: 1, Informative
    34. Re:Checksums are always going to be vulnerable by Anonymous Coward · · Score: 0

      You're are looking at it wrong. Say you have 2^512 possible source documents. Then you have a 256 bit output hash, leading to 2^256 possibly output values. To see how many source documents map to a single output hash value, you divide: 2^512 / 2^256 = = 2^(512-256) = 2^256. Which means that you have as many source documents per output value of your hash as you have output values. So you can obviously state that you will have collisions and they will happen and they will happen with a reasonable frequency. The real trick for a hash algorithm is to ensure that the resulting output retains "distinctness", ie. that source documents that are "distant" from each other have hash values that are also "distant" even if not in the magnitude or direction.

    35. Re:Checksums are always going to be vulnerable by teknopagan · · Score: 1

      Hehehe... the *Wang* vector pair... attack vectors by *Wang*

      That's just great. >:)

      --
      The Russian Mafia will mod you down just to see if the Moderate button works.
    36. Re:Checksums are always going to be vulnerable by poot_rootbeer · · Score: 1

      the algorithms (MD5 and to some extent SHA-1) have been broken to allow you to construct a piece of data which hashes to the same value as the original.

      However, in the majority of applied uses it's still extremely unlikely that the newly-constructed piece of data will have any of the OTHER properties of the original, apart from the MD5 sum.

      Sure, you can create an MP3 data stream that has the same MD5 checksum as a Beatles tune, but if you try to listen to it, odds are 0.999... that it's just going to sound like static.

    37. Re:Checksums are always going to be vulnerable by Anonymous Coward · · Score: 0
      you failed to read the article.

      they demonstrated specifically an application to mp3 files, by having multiple different mp3 files with the same hash and they were intact enough to both be playable and yet carry around at least a 128 bit payload.

    38. Re:Checksums are always going to be vulnerable by Anonymous Coward · · Score: 0

      All /. discussions on the subject I can find talk about first hashing with one algorithm and then hashing the hash you got with another. They do not explain why hashing the original file multiple times with different algorithms and validating against all of the hashes you get is weak.

    39. Re:Checksums are always going to be vulnerable by labratuk · · Score: 1

      It's hard to deliver a payload when you're limited to tricking a target into downloading what would be (essentially) a random string of ones and zeroes.

      Think again.

      --
      Malike Bamiyi wanted my assistance.
    40. Re:Checksums are always going to be vulnerable by evilviper · · Score: 2, Informative
      But all that enables you to do is replace an MD5'd file with garbage that happens to have the same MD5 sum.

      It's not nearly as scary as swaping one document for another, or one binary for another, but it's still quite useful.

      Think of P2P networks. Gnutella uses SHA1 hashes to verify files, file mirrors, and the final downloaded file. If some RIAA employees could find the SHA1 hash of a very popular song, and generate junk with the same hash, they could have people downloading their junk (pardon the pun) and the P2P app wouldn't know it was corrupt, and wouldn't have any way to avoid the junk file.
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    41. Re:Checksums are always going to be vulnerable by brainnolo · · Score: 1

      But they do not calculate the checksum of the webpage, so this does not change my point. As reported by the article to create collision you must have both the original and "evil" data, and at this point there is really little to do.

    42. Re:Checksums are always going to be vulnerable by Guignol · · Score: 1

      It might be good as 'beter than nothing',
      but this is not good as 'the correct answer to the problem'
      We have two algorithms not statisfactory enough one with n bits, the other with p. we check 'both'. well that could just be called a new algorithms of length n+p.
      As an algorithm, its strength over any of the other 2 is that it generates longer hashes, but in the domain of n+p bits hashes, it starts with a severe handicap since it is easily 'divisible'.
      (in two separate algorithms to work on to better work an attack, for instance, if you want to search a hash similar to this one, you can already start by just looking the md5 part and as long as this one fails, you can discard, therefore attacking far more effetively than you could do with a properly engineered n+p bits algorithm).
      Plus the proper weaknesses of both of its parts certainly must make for clever ways to attack it once both parts are better 'cracked' (might not be true, but I don't think there's an obvious reason for it to be false either).
      In any case, I think it's fairly obvious that it would just be a poor n+p algorithm (but yes, of course, it would be better than just 'this n' or just 'this p', but if the weaknesses found leads to better exploits, this n+p could very well be worse than some entirely different (just) n or p algorithms)

    43. Re:Checksums are always going to be vulnerable by that+_evil+_gleek · · Score: 1


      >Unless you start to give your hash keys as the same size as the original file, there is not anything that can be done
      Which would be pointless, you might as well send the same data twice ( or cmp the files from /cd/tape/ etc ).

      But, then again, it is a hash , it is what is, and as hash, it maps 1 set of data into another set , the bins. Collisions are really by design. Even if the sizes were same, they could still be collisions, depends on the hash alg. One could look at the first byte of a file of N bytes long, the replicate it N times, spit that out as a hash and have a hash key the same size of as the file. Or step thru each byte of a file, and feed to isalpha, and spit out either a 1 or 0, and again have the same size. It all depends, Cryptocraphic hash, seem to me to be somewhat similiar to pseudo random algs, really.
      I wonder how a hash based on stdlib's rand() function would fair? I imagine theorists would deny it, but then the limitations of pseudo random algs are well known, while each new hash is presented as the 'virtually' unbreakable, etc.

      Also, f you get the SUM, MD5, etc files from the same place, what difference does it really make at all? They might as well jus use the simpler the algs, those geared to finding transmission errors, as the file with the hash could be compromised as well.

    44. Re:Checksums are always going to be vulnerable by pbhj · · Score: 1

      But if you skip a step and instead of working on A2 looking like A1, you have A2 looking like A then you've cracked it.

      This guys patching garbage at the head of a file to create an A2 (= garbage.A) that looks like A and has the same md5 hash value. If he can but the garbage inside A or at the end, then he has an attack vector as far as I can see. You'd think A2 were A, execute it and maybe later find it wasn't.

      Perhaps I shoudl RTFA now??

    45. Re:Checksums are always going to be vulnerable by rthille · · Score: 1

      Hell, I've got something better than that. I was using the 'sum' program (yeah, I should have used md5, but it's slower) to check if pictures on the website I'd inherited were the same with different names or not. There were two wildly different pictures (an owl and a landscape) which were the same size with the same sum. I was totally surprised.

      --
      Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
    46. Re:Checksums are always going to be vulnerable by Anonymous Coward · · Score: 0

      The MD5 is fatally broken and should not be used for anything, it gives wrong impression at best and horribly broken results at worst.

      Check e.g. http://www.itconsult.co.uk/stamper.htm for an example of horribly broken one.

    47. Re:Checksums are always going to be vulnerable by JCCyC · · Score: 1

      Yes, for example Whirlpool

      Non-pay-for link: http://paginas.terra.com.br/informatica/paulobarre to/WhirlpoolPage.html

      Hey, it was made by a Brazilian! Cool!

  2. So if you need a freely available hash algorithm by MadMoses · · Score: 4, Informative

    ...better use Tiger or Whirlpool (based on AES). AFAIK there are no known vulnerabilities or attacks for these two yet.

    --

    Do not be alarmed. This is only a test.
  3. hashtrust by gcnaddict · · Score: 4, Interesting

    Now we know why people distribute modified game ISOs on the net and check it with md5 :P In all technicality, couldnt this mean that someone could land a virus on someone else's machine because the person trusted the hash?

    --
    Viable Slashdot alternatives: https://pipedot.org/ and http://soylentnews.org/
    1. Re:hashtrust by Anonymous Coward · · Score: 1, Interesting

      Yes, possibly. However the original file must be created with the switch in mind. If you trusted person X's signature on the good file, and he swapped it with a colliding bad file, well, you trusted them anyways so this is just so much handwaving.

    2. Re:hashtrust by MadMoses · · Score: 1

      Of course it's possible, that's why cryptographic hash algorithm designers try to make it as hard as possible to find an exactly equal hash value for a different file. That doesn't mean it can't be done. It just has to be hard enough that the result (screw some stranger's computer in your example) doesn't warrant the effort you have to put into the attack.

      --

      Do not be alarmed. This is only a test.
    3. Re:hashtrust by ocelotbob · · Score: 1

      This is absolutely what it means. Especially with something the size of a game .iso, the ability to put in a destructive payload is easy.

      --

      Marxism is the opiate of dumbasses

    4. Re:hashtrust by Metteyya · · Score: 1

      No, it's extremely hard. Probability of collision AND valid code is too damn low. Although this article proves that, with little social engineering, you can "emulate" (or cheat) a valid-code collision.

    5. Re:hashtrust by Anonymous Coward · · Score: 0

      In all technicality, couldnt this mean that someone could land a virus on someone else's machine because the person trusted the hash?
      Yes, you can have any MD5 you want, and relatively easily too. Whenever I encode video for my friends the files always have DEADBEEF as the hash so they know it's from me. =)

    6. Re:hashtrust by SillyNickName4me · · Score: 2, Insightful

      Any practical use of hash algorithms is extremely easy to fool with social engineering. All you have to do is get someone to trust your hash..

    7. Re:hashtrust by GauteL · · Score: 1

      Well, it only works on a specially prepared original, so you couldn't just change someone elses ISO into something harmful and still keep the same MD5 hash.

      However, you COULD possibly release a "good" ISO first with this preparation and get people to trust that source by word of mouth, and then you could swap this ISO for a bad ISO with the same MD5 sum.

      Still quite sinister, but not as easily exploitable as changing someone elses ISO.

    8. Re:hashtrust by Anonymous Coward · · Score: 2, Interesting

      No, it's extremely hard. Probability of collision AND valid code is too damn low. Although this article proves that, with little social engineering, you can "emulate" (or cheat) a valid-code collision.

      No, it looks quite feasible.

      You need two ISOs, to each of which you have added a different block of random bytes that happen to hash to the same MD5, and you have also modified the game's setup program so that, depending on which block of bytes is there, it either installs the game normally, or it installs the game and a rootkit. You then start sharing the "good" ISO on a P2P network. It gets a bunch of downloads, a whole load of people add comments that say "works fine! no viruses!", and then you start sharing the evil version instead. Nobody can see any difference - but now some of the people downloading it are going to get owned.

      With little social engineering indeed. In fact, it barely takes any at all.

    9. Re:hashtrust by Anonymous Coward · · Score: 0

      Probability of collision AND valid code is too damn low.

      That's a common misconception because you only see the hash and the malicious code. But in this case, the talk was about a whole ISO image that gets hashed. It's very easy (now) with MD5 to put in malicious code and then modify some other irrelevant part of the ISO file. Together, these two changes could yield the same hash.

      Even if you hashed every executable separately, nowaday's file formats for executable always have room for data that is essentially irrelevant when you execute that file.

      Essentially, bloat in file formats (comments in a HTML file, padding in a file, etc. etc.), i.e. data that remains invisible when you use that file format naturally, makes constructing malicious data and constructing a hash collision so much easier.

  4. But... by Auraiken · · Score: 2, Interesting

    Isn't this the problem with all algorithms? Only way to know if something is something... is to see something instead of its checksum?
    As for md5... with only 32bits, it should've came up with repetitive hashes in end anyways?

    I guess since the article explains some issues against md5 security, the only answer would be to trust the source that is supplying the hash in the first place? Coming down to the fact that a system is only as secure as its user?

    1. Re:But... by NetCow · · Score: 1

      As for md5... with only 32bits, it should've came up with repetitive hashes in end anyways?
      Where'd you come up with the "32 bits" thing? It's 128.
      I guess since the article explains some issues against md5 security, the only answer would be to trust the source that is supplying the hash in the first place?
      That's always been the case.
      Coming down to the fact that a system is only as secure as its user?
      Oversimplfied. A system is only as secure as its weakest component (and yes, that's oversimplified too).

    2. Re:But... by m50d · · Score: 1
      As for md5... with only 32bits, it should've came up with repetitive hashes in end anyways?

      Yes, but the point of a hash is that it's computationally infeasible to find them. If you're distributing a binary with a certain MD5 there's sure to be a file somewhere with the same sum, maybe it's a car manufacturer's specification PDF, but I can't write a trojan and get it to MD5 to the same thing *and* do what I want. Now you can.

      --
      I am trolling
    3. Re:But... by Auraiken · · Score: 1

      Where'd you come up with the "32 bits" thing? It's 128.

      Yeh sorry about that. Got confused with 32 character output.

    4. Re:But... by Anonymous Coward · · Score: 1, Informative

      No, you can't. You misunderstood the nature of the break.

      The break allows you to create two special "twin" documents A1 and A2 with different content where MD5(A1) and MD5(A2) are the same

      BUT critically it doesn't allow you to take arbitrary document B (written by me) and create document B1 where MD5(B) = MD5(B1)

      So you cannot create files that are impostors for my existing documents, and thus people who trust me can continue to trust my MD5 sums until a further break occurs. That break may come in weeks, or years, or never.

      You've been posting your incorrect summary throughout this thread, please stop that.

    5. Re:But... by Anonymous Coward · · Score: 0

      But MD5 is a 128 bit hash? Now, I know nothing specific about how to compute an MD5 hash... However, assuming it has a near-random distribution, and if we conservatively estimate that there have been less than 16 billion computers in existence, and that each produced less than 16 billion, then I guess you should have less than a 1 in 2^60 chance that someone has coincidentally produced a file with a duplicate MD5 hash. That seems pretty far from a sure thing.

    6. Re:But... by VolciMaster · · Score: 1
      As for md5... with only 32bits, it should've came up with repetitive hashes in end anyways?
      MD5 is a 160-bit algorithm, not 32.

      only answer would be to trust the source that is supplying the hash
      I have read a lot of Bruce Schneier's writings, and he points out over and over again that security relies on people, and people can fail. It is perfectly possible to create unbreakable algorithms and protocols. Impossible to guess passwords and keys.
      That's when the person wishing to gain access or dupe the user tries a different approach, like proverbial dumpster diving, social engineering, and other, non-technical attacks.

    7. Re:But... by owlstead · · Score: 1

      "MD5 is a 160-bit algorithm, not 32....I have read a lot of Bruce Schneier's writings"

      Obviously you haven't read enough :) SHA-1 is a 160 bit hash, MD5 is a 128 bits hash.

    8. Re:But... by VolciMaster · · Score: 1

      Sorry, I've seen a couple 160-bit implementations of MD5 to compare it directly with SHA-1. You're right, though, that the stock algorithm is 128-bit.

  5. M$ Antihash by CDMA_Demo · · Score: 4, Funny

    At Microsoft, the MD5 hash functions are banned.

    they use crc instead!

    1. Re:M$ Antihash by ceeam · · Score: 1

      Oh, here's their CRC function:

      BYTE CRC(BYTE *p, size_t sz)
      {
        BYTE res = 0;
        while (sz--) {
          res ^= *p++;
        }
        return res;
      }

      // I wish I were kidding.. Not in MS's code though, but I've seen exactly this.

    2. Re:M$ Antihash by Anonymous Coward · · Score: 0

      yea its pretty standard

    3. Re:M$ Antihash by Anonymous Coward · · Score: 0

      No they don't, they use something much better.

      Open source on the other hand does use MD5 (RTFA),

      "Some open source programs, like RPM, use MD5, and in many open source distributions MD5 is used as check sum"

      Who the fuck is insecure now zealot?

    4. Re:M$ Antihash by Anonymous Coward · · Score: 0

      Who the fuck is insecure now zealot?

      Judging by your post, you should speak to your shrink...

    5. Re:M$ Antihash by anglozaxxon · · Score: 0

      This just in: Microsoft has developed a new method of hashing files. It's been called "ParityBit.NET" and is scheduled for integration into currently existing and newly released Microsoft products.

    6. Re:M$ Antihash by DrSkwid · · Score: 2, Funny

      and your point is ?

      you'll be telling me Huffman encoding is dangerous next !

      --
      There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
    7. Re:M$ Antihash by Hal_Porter · · Score: 1, Interesting

      That's not any less secure than a real CRC32 function in terms of how hard it is to fake.

      In both cases, if you modfy the data you just need to change a couple of bytes to fix up the
      checksum. Admittedly you only need to change 1 byte in this algorithm versus 4 for a CRC32,
      but either way it is a couple of minutes work.
      It is worse at spotting errors - notably that you could swap a couple of bytes and it
      wouldn't notice, unlike CRC32.

      Mind you, in a underpowered embedded system (clock speed speed to slow to use the non table CRC
      in real time, not enough ram for the CRC tables), a checksum like this is probably better than nothing
      and cheaper than a CRC computationally. Imagine a microcontroller USB bootloader for example. Compared
      to no checksum, this would catch errors like broken RAM or a bad uart, both of which tend to completely
      destroy data, but it can still receive as fast as the host can send, whereas a real CRC would be
      a bottleneck limiting speed somewhat.

      Good page on CRC's - you can usually use either the non table version - fast CPU, no much RAM,
      or the table one - slow CPU clock, at least 1K of Ram or Rom available.

      http://www.cl.cam.ac.uk/Research/SRG/bluebook/21/c rc/node6.html#SECTION00060000000000000000

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    8. Re:M$ Antihash by Anonymous Coward · · Score: 0

      CRC is more commonly associated with ms-dos, and md5 is most commonly seen in open source distros...

  6. There's a simple solution by Anonymous Coward · · Score: 0, Funny

    MD5 --> MD6

  7. A quick note by Darren+Winsper · · Score: 4, Informative

    This seems to work on the assumption that you want to do some harm with a program you created yourself, you can't actually take a random RPM and turn it into an evil RPM with the same MD5. So, yes, it's bad, but it's not as bad as you might think.

    1. Re:A quick note by commodoresloat · · Score: 2, Funny
      you can't actually take a random RPM and turn it into an evil RPM

      Sure you can; all you need to do is set the evil bit.

    2. Re:A quick note by Antique+Geekmeister · · Score: 1

      RPM's are a bit more interesting. The content of the RPM is MD5 signed, but almost all RPM packages are also GPG signed as a package. This means that downloading things from redhat.com or mandrake.com, they have a GPG signature that is checked by most installers as a default and that the package is signed with, swearing that it was compiled by the people you love and trust.

      If you're not checking the GPG signatures, you have no idea what is in the package anyway: it could have been built by the 3l33t cracker kiddy of the week, and it's why the FTP or HTTP repositories for public RPM's need to be secure and managed. and why publicly distributed packages from random sources need to be built from source.

      Now, getting the MD5 signatures to match on a fake source tarball: that's clearly a bit of a risk to us downloaders, although that's been a risk for a while. It's nowhere near the risk of a badly secured site having the tarballs *AND* MD5 sums replaced. That's why really cautious sites publish MD5 sums and GPG or PGP sign those.

    3. Re:A quick note by provolt · · Score: 3, Informative

      The signature doesn't do anything to solve the problem. If I create an evil tarball that has the same hash as the good tarball, the signature will verify properly for both files. When you download the file, you won't know if it's the good or evil one.

      GPG signs the hash. With a strong hash function, it's as good as signing the tarball. With a weak hash functions, one signature would match for many files.

    4. Re:A quick note by Anonymous Coward · · Score: 0

      But that would change the MD5 sum!

    5. Re:A quick note by Anonymous Coward · · Score: 0

      But where are you getting the "known good" hash value from in the first place? If someone has access to evil-ize my RPM why wouldn't they also be able to re-generate the hash? It's different when the hash is GPG signed -- that at least leads back to a private key -- but a hash by itself is completely anonymous. I just don't see how file integrity checking relates to authenticating the source of the file.

    6. Re:A quick note by Antique+Geekmeister · · Score: 1

      GPG is also used to sign the tarball, at least with the source tarball sites I've been working with lately. Those sites involve security tools, so the authors are a bit pickier than most.

  8. compressed content safe (?) by Anonymous Coward · · Score: 1, Interesting

    Heya,

    I thought that the known attacks would not work agains compressed content, because you had to add some "random" data to the content and while decompressing the decompressor would give an error. I know it isn't exact, but I thought that md5's of compressed content was safe. I'm not sure if rpm's, ... are compressed. I use archlinux and I know the contents of the packages is compressed.

    greetings,

    Michel

    1. Re:compressed content safe (?) by ssj_195 · · Score: 1

      There is at least some truth to this, I think: creating a file with not only the same md5 as a given file but also the same file-type (e.g. plain-text if the original is plain-text; a valid .zip file if the original is a .zip file; a valid .iso if the original is a .iso, although I'm not sure on this last example as I don't know how strict the "structure" of a .iso is; etc) is far, far harder. In fact, it could potentially even be the case that there does not even exist a file with both the same hash and the same file-type as the original, or at least, not one that is not so vast as to be either computationally unfeasible, or just plain too darn big to download :)

    2. Re:compressed content safe (?) by m50d · · Score: 1
      I thought that the known attacks would not work agains compressed content, because you had to add some "random" data to the content and while decompressing the decompressor would give an error. I know it isn't exact, but I thought that md5's of compressed content was safe.

      I see "gzip: trailing garbage ignored" messages all the time when unpacking tarballs, should I be worrying about them? I know most users won't.

      Anyway, if it's zip compression then by design arbitrary content can be added to the start without resulting in an error. (This is done to make self-extracting programs possible). Make no mistake, this is a very serious break.

      --
      I am trolling
    3. Re:compressed content safe (?) by ssj_195 · · Score: 1

      Actually, having now read some of the comments, I retract this statement - one mentioned example about two renderable webpages with the same hash being created blew my mind. Amazing.

    4. Re:compressed content safe (?) by anthony_dipierro · · Score: 1

      Depends on the form of compression, I suppose, but you could probably create a compressed file where the garbage content uncompressed to other garbage content which was then removed (for instance, a compressed tarball where the garbage content was was in a single file which was subsequently ignored). If the decompressor gives an error while decompressing, then it isn't a very good compression algorithm, because that bit stream could have been dedicated to something valid instead of something invalid).

      As a really really simple example, imagine your compression algorithm is huffman coding. If the substitute data in each copy comes out to an integer number of uncompressed characters, then the rest of the file will be unaltered. If you can generate an unlimited number of substitutes, it's not very difficult to generate one which uncompresses without screwing up the rest of the file. Other compression algorithms work differently, of course, for example a gzipped file would probably have to be generated by hand to ensure that there were no backreferences to the modified data, but I believe gzip is a block algorithm, so in theory that makes things even easier.

    5. Re:compressed content safe (?) by Shano · · Score: 2, Informative

      Lots of file types allow for arbitrary junk at certain places.

      For example, a very basic form of steganography: cat a .zip file to the end of a .gif file. The result is a valid file that can be displayed as an image (which ignores trailing junk), and decompressed with zip (which ignores leading junk).

      Most file formats I've written don't care about junk at the end of the file. It'll be stripped off if you load and then save, but the program won't notice or complain. One program even preserves records it doesn't recognise (which could be secret messages, or just random crap).

    6. Re:compressed content safe (?) by Anonymous Coward · · Score: 0

      > I see "gzip: trailing garbage ignored" messages all the time when unpacking tarballs, should I be worrying about them? I know most users won't.

        I would! I've been using Linux for about five years now, and the only times I've seen that is when a tar.gz archive hasn't downloaded completely or is corrupted. In either case, I delete and redownload, I don't just shrug my shoulders and say "Aahhh, it's probably fine. ./configure away!"
        I'm wondering where you're getting all the bad .gz files that have taught you to think this way. What distro do you use, Corruptix? Hosednix? Linborked? Enquiring minds want to know.

  9. H(x) == H(y) - H(x + q) == H(y + q) ? by strcmp · · Score: 3, Informative
    There is a known result about MD5 hash function, is this: If MD5(x) == MD5(y) then MD5(x+q) == MD5(y+q) So, if you have a pair of messages, x and y, with the same MD5 value, you can append a payload q, and the MD5 value keeps the same, the size of q is arbitrary.

    Considering this is such a "well known" result, you would think that MD5 should have been abandoned long ago. Is this true for other popular hash functions?

    --
    "Yields falsehood when preceded by its own quotation" yields falsehood when preceded by its own quotation.
    1. Re:H(x) == H(y) - H(x + q) == H(y + q) ? by igb · · Score: 4, Insightful
      It's almost universal to use the byte count as part of the checking of equivalence, either by storing it as a distinct item or by using it as inital or final salt to the calculation of the hash.

      ian

    2. Re:H(x) == H(y) - H(x + q) == H(y + q) ? by Ckwop · · Score: 5, Interesting

      Is this true for other popular hash functions?


      No it is not. The newer hashes, such as Whirlpool, do not have this problem. You're correct in saying this is a "well known result" and every cryptographer worth his salt says that this fact constitutes a break of the algorithm. We've known since the middle of the nineties that breaking MD5 was within reach. The fact there has been so much inertia in getting people to change is quite incredible really.


      At Toorcon this year, Dan Kaminsky showed a way to create two different webpages that render properly in a browser but have the same MD5 hash. Anybody who thinks this attack is theortical and ignorable is grossly mistaken.


      Simon


    3. Re:H(x) == H(y) - H(x + q) == H(y + q) ? by gowen · · Score: 1

      Why? Adding the payload changes the MD5 value. So H(x) != H(x+q), and an added payload may be detected. This becomes bad only if H(x) = H(y), where x is good and y evil. But since no-one has yet found such a pair (and the fact that finding such a pair is hard is why MD5 is adopted), this result doesn't devalue anything...

      --
      Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
    4. Re:H(x) == H(y) - H(x + q) == H(y + q) ? by Soul-Burn666 · · Score: 1

      Mod parent up!

      Most hash functions work by dividing the data to blocks, then hashing a block and combining it with the next block as input for the "small" hash function. So most functions have that property.

      However, like the parent stated, the last block of the data is padded with zeros and then the byte count of the original message. In case there's no place, a complete new block is added, padded with zeros and the byte count in the end.

      This doesn't work when x and y are of the same length and still have the same hash, though. This ofcourse requires finding the x/y collision in the first place.

      --
      ^_^
    5. Re:H(x) == H(y) - H(x + q) == H(y + q) ? by m50d · · Score: 1
      Considering this is such a "well known" result, you would think that MD5 should have been abandoned long ago. Is this true for other popular hash functions?

      It's a useful property because it means you can MD5 data piece by piece. I don't want to think about how long it would take to MD5 a 700MB .iso if you had to use the file as a whole, it takes long enough as it is. It shouldn't allow you to break a hash function without other flaws, and I'd expect it to be the case for many hashes.

      --
      I am trolling
    6. Re:H(x) == H(y) - H(x + q) == H(y + q) ? by Anonymous Coward · · Score: 1, Interesting

      > Why? Adding the payload changes the MD5 value. So H(x) != H(x+q), and an added payload may be detected. This becomes bad only if H(x) = H(y), where x is good and y evil. But since no-one has yet found such a pair (and the fact that finding such a pair is hard is why MD5 is adopted), this result doesn't devalue anything...

      Uh ?

      Say you have x and y, such as H(x) = H(y), with x and y beeing neither good or evil. Say that len(x) = len(y) = n. Say also that x[0] != y[0] (first byte differ)

      Say the binary format of you platform access garbage at the start of executables (this is, I think, the case for an ISO file).

      Say you write the payload q, such as it works in the following way:

      Look at the first byte of the file. If it is x[0] do something good. If it is y[0] do something evil.

      Now you have constructed a pair( x+q, y+q ) where (x+q) is good and (y+q) is evil.

      Easy.

      You can obfuscate the test, and with clever xor'ing, you can even make the bad code hidden (without the knownledge of y, you can't reconstruct the bad code). For instance, compute x[0]^{ opcode of 'ret' }. ('^' is XOR). Call this 'a'.

      y[0]^a will probably NOT be a ret or jump instruction, but will probably move data between registers. Now, devise some bad code (you have n-1 bytes for that). Then, define:

      q0 = a + {bad code}^y[1..n]

      Define q as code that fecthes the n first bytes, xor them to the n next bytes, and calls the result, then do something non evil.

      Good Binary:
      x + q0 + q
      n bytes, n bytes, len(q) bytes

      At execution, it will execute x[0]^q0[0], which by definition is the 'ret' opcode.

      Evil Binary
      y + q0 + q
      n bytes, n bytes, len(q) bytes

      At execution, it will execute one byte of garbage, followed by the reconstructed bad code.

      The good and evil binaries will have the same MD5. The good binary will even NOT contain ANY hidden bad code.

      But, when the first n first bytes of that binary are replaced by y, (y^q0)[1..n] will be that evil code.

      Cheers,

      --fred

    7. Re:H(x) == H(y) - H(x + q) == H(y + q) ? by cpeikert · · Score: 1

      The newer hashes, such as Whirlpool, do not have this problem.

      I don't think you are right about this. Although it does not exactly use the Merkle-Damgard structure, Whirlpool is still iterative. If you find a collision between two short messages of the same length, then Whirlpool's internal state collides. From there you can extend both messages with the same string, maintaining the same internal state, padding and byte counts be damned.

      Any iterative hash is going to have this issue. It really seems like there is nothing you can do to mitigate it, short of a massive design change (which would affect efficiency).

    8. Re:H(x) == H(y) - H(x + q) == H(y + q) ? by kasperd · · Score: 1

      Is this true for other popular hash functions?

      First of all the way it is stated is a bit too general. It is not enough for x and y to cause a collision in the final hash, they must cause a collision in the internal state of the hashing algorithm. But all the hash collisions which have been found so far worked by finding a collision in the internal state, in which case the statement is true for any hash algorithm.

      You may find algorithms claimed not to be vulnurable to this kind of attack. The reality just is, that their internal state is larger than the final hash. So you can generate a hash collision without generating a collision of the internal state. Is that a strength? That really depends on how you define the strength of a hash function.

      Assume we have two hash functions that work the same except that one produce a hash value which is shorter than the internal state where the other use the entire internal state as hash. Which of the two is the strongest? You'd expect the one with the longest hash output to be the safest, and the way I described it there is no way you could find a collision for the long hash without also finding a collision for the short hash. However the long hash is not as much safer as you would expect from the length.

      Typically the internal state of a hash algorithm is of constant size. And it is clearly not practical to let the internal state grow as large as the input. For that reason there exist collisions in the internal state of any hash algorithm, and when one is found, all bets are off. So it is really a tradeoff between size of internal state, size of hash value, performance and security.

      --

      Do you care about the security of your wireless mouse?
    9. Re:H(x) == H(y) - H(x + q) == H(y + q) ? by Paul+Crowley · · Score: 1

      As stated, it's false. However, for certain messages, if |x| == |y| then H(x) == H(y) means H(x + q + r) == H(y + q + r) for a particular q that depends on |x|.

      This just means that if you can find one collision, you can find another; if finding collisions is implausibly hard then this property isn't a problem. However, there's a related property ("length extension") which is a problem, and it's one all Merkle-Damgard hash functions (ie practically all of them) have in common.

      To solve this problem, and resist various other attacks such as multicollision attacks, generic long message attacks and so forth, we need hash functions that don't use the Merkle-Damgard paradigm. This should be an interesting area of research for the next wee while. I wish I knew what to recommend in the meantime...

    10. Re:H(x) == H(y) - H(x + q) == H(y + q) ? by Paul+Crowley · · Score: 1

      You are right, Whirlpool does have this problem. The correct fix is to have an internal state markedly larger than the hash output size. Whirlpool has a 512-bit internal state, so a lot of attacks on the Merkle-Damgard structure are totally impractical; you can always truncate the output to be what you need. However, Whirlpool is not fast.

    11. Re:H(x) == H(y) - H(x + q) == H(y + q) ? by cpeikert · · Score: 1

      Your suggestion is just pushing the dirt under the carpet, though.

      If you're going to have a large internal state, you might as well use the entire state as the output. To do otherwise is to open yourself to attacks on the "state -> output" compression function, which may be the weakest link (and probably is, given the short output). And if you're going to pay the computational cost to maintain so much state, it's a waste to throw it all away by choosing a weaker (shorter) output length.

    12. Re:H(x) == H(y) - H(x + q) == H(y + q) ? by trappist123 · · Score: 1

      At Toorcon this year, Dan Kaminsky showed a way to create two different webpages that render properly in a browser but have the same MD5 hash. Anybody who thinks this attack is theortical and ignorable is grossly mistaken.

      Same thing has been done with postscript files, but neither is a result of breaking MD5. HTML and PS include the ability to embed logic in the page. Two identical web pages with javascript that says if(location.href==goodpage) { render one thing } else { render the other thing } will appear to defeat MD5.

    13. Re:H(x) == H(y) - H(x + q) == H(y + q) ? by Paul+Crowley · · Score: 1

      I disagree. First, a large part of the point of a hash function is to compress. The identity function is perfectly collision-free, but has the disadvantage that the output is as big as the input. For a lot of applications, a smaller hash output means smaller messages and more efficient use of bandwidth. Second, if the hash function is truly as good as we want them to be then it should be impossible to devise a halfway reasonable [1] compression function which results in a weak hash function when composed with your origina hash function. However, it could easily make the hash function stronger against attacks which are aimed at discovering something useful about the message given the hash.

      [1] by which I mean "if every input is equiprobable then every output is equiprobable". Truncation for example has this property.

      An extreme example is Panama hash. If you didn't have an output stage but output the entire state, the output would be *far* too big (about 1k IIRC) and vulnerable to all sorts of attacks to do with discovering useful things about the message. As it happens, Panama hash is weak, but (again IIRC) the attack causes a collision in the internal state, so outputting it all wouldn't help. I think the future of hash functions lies less in things like Whirlpool and more in things like Panama. Note that AES designers had a hand in both.

      I heard in 2002 that the AES designers were working on a hash/stream primitive that combined ideas from AES and from Panama, but no sign so far. And again, Bernstein's cache timing attacks make table-based ciphers that bit less attractive.

  10. Not a problem for software distribution.. by hhghghghh · · Score: 4, Interesting

    This isn't a problem for software distribution, really, since the good.bin file needs to start with a vector designed to enable a collision. A good-faith programmer wouldn't include that vector.

    It is a problem for stuff like contracts; you draw up two versions of a contract, a good one and an evil one, let someone sign the good one, and later keep them to the clauses in the evil one.

    So while there IS a very big problem, the example is a bit contrived.

    1. Re:Not a problem for software distribution.. by provolt · · Score: 2, Insightful

      This is certainly a problem with software distribution. Here's an example of why. The good-faith programmer creates good.bin. A Bad-faith programmer creates bad.bin with the same hash. All the bad-faith programmer has to do is find a way to swap bad.bin in for good.bin (hacking, misrepresentation, etc).

      Now everyone who gets bad.bin will hash the data, verify the signature and see that the signature verifies. When the signature verifies, everyone believes bad.bin is really good.bin.

    2. Re:Not a problem for software distribution.. by Anonymous Coward · · Score: 1, Informative

      No, even with this exploit, it is still very difficult for the bad-faith programmer to find a bad.bin with the same hash as the good.bin generated by the good-faith programmer.

      However, what the bad-faith programmer can do with this exploit is take good.bin and produce two new files, good2.bin and bad.bin. good2.bin executes identically to good.bin, but has the same hash as bad.bin. Note that good.bin and good2.bin _DO_NOT_ have the same hash.

      Now what the bad-faith programmer must do is convince the good-faith programmer to distribute good2.bin as good.bin. This seems quite difficult. However, if the bad-faith programmer achieves this, he can then distribute bad.bin, demonstrating that it has the correct hash.

      Does this make sense? Do you see where your argument is wrong?

    3. Re:Not a problem for software distribution.. by provolt · · Score: 1

      Yup that makes perfect sense. I took a shower right after I posted the comment and I realized that I was wrong. I came back to correct it, but you beat me to it.

      It doesn't give you a warm fuzzy about software distribution, but the attack is not as simple as I described.

    4. Re:Not a problem for software distribution.. by provolt · · Score: 1

      Someone needs to mod up the AC. He has a very good point.

    5. Re:Not a problem for software distribution.. by evilviper · · Score: 1

      This post is completely and very obviously wrong. The poster even came back and replied to say that he was wrong. Can someone please mod it down? and stop these moderators in M2, since they don't bother to even give a cursory check of the posts they are moderating? This is totally ridiculous. /. moderation has been very obviously broken over the past few months. Taco! Do something!

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  11. Re:So if you need a freely available hash algorith by MaGogue · · Score: 0, Redundant

    Unfortunately there is no way of guaranteeing they wont be found next month.

  12. filesystems... by Anonymous Coward · · Score: 0

    e.g. reiserfs does hash filenames too. does this mean that i would be able to overwrite or modify a file owned by root if i'm able to guess a filename which produces the same hash as this specific file?

    1. Re:filesystems... by ArsenneLupin · · Score: 5, Informative
      Reiserfs (and lots of other filesystems ... and lots of other system components in general) might use hash functions to speed up lookups. These hash functions however do not need to be cryptographically secure. The hash (which is usually very short, so much that brute force would be feasible) is only used as an index into an array of "buckets". Each bucket may contain multiple files, and the system still uses a bit-by-bit comparison on the full names to find the correct entry.

      The point is to reduce the set among which to do an exhaustive search (one small hash bucket versus all known files on the system), and not to verify some kind of signature.

      Any successful attack on the hash would only be useful to make the system slow and unefficient (by making an excessive number of files end up in one bucket), but cannot corrupt it.

  13. Pretty interesting.... by vidarlo · · Score: 1

    This is kinda interesting. Well, the user need to use my installer, but it might infect them, and they have no way to tell. But the interesting thing is... A couple of script/programs use md5sum, like rpm and such. How much work is it to change into sha1? AFAIK SHA1 is 160bits, whilst md5 is 128bit, so a bit more space in the rpm is needed. Apart from that?

    The solution to all this is gpg signing. I've heard little fuss about that... Yes, it is simply a longer signature, making it more difficult to break, but still..

    1. Re:Pretty interesting.... by Alioth · · Score: 1

      CentOS (and I suspect RHEL) uses GPG signing for getting updates and installing new packages from their repositories, so at least for CentOS - they are already using GPG signing. The first CentOS install I did was with 4, so I don't know how long they've been using GPG signatures.

    2. Re:Pretty interesting.... by Anonymous Coward · · Score: 0

      I still GPG sign my mails with sha1.

      hash = concat(sha1(data), md5(data))

      Makes finding a collision highly improbable and this is how I use md5 for the real world. md5 can also be a good choice for data integrity checks on an apps internal datastructures. Microsoft banning it outright doesn't demonstrate security awareness so much as it reflects poorly on their programmers.

    3. Re:Pretty interesting.... by Anonymous Coward · · Score: 0

      Are you just posting to hear yourself talk or what?

    4. Re:Pretty interesting.... by provolt · · Score: 1

      The solution is not GPG signing. Because public key signature algorithms (like RSA) are slow they can't really be used on large files. Signing and verifying a 700 MB iso with RSA would take hours even on the fastest desktop computers.

      GPG uses a signature algorithm to sign the hash of the data. If the hash function is strong, then it's infeasable to find another file that has the same hash. (Two different messages with the same hash is called a collision.) Researchers have been able to find collisions in MD5.

      If I sign a file and someone else is able to find a hash collision with the file, you cannot determine which file I intended to sign.

    5. Re:Pretty interesting.... by Anonymous Coward · · Score: 0

      FYI: SHA1 has been broken not so recently.

  14. Key sentence... by gowen · · Score: 1
    Now, suppose we have changed the extractor program, with our own version.
    In other words "if we can convince you to run our insecure extractor code", we can, erm, get you to run insecure code.

    Well, duh.
    --
    Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
  15. Considered Harmful Considered Harmful by Anonymous Coward · · Score: 0, Offtopic

    My next paper is "Considered Harmful" Papers Considered Harmful

    1. Re:Considered Harmful Considered Harmful by m50d · · Score: 1

      Already been done, as has "Djikstra considered harmful", shortly after the original one.

      --
      I am trolling
  16. What About Cleaner File Formats by Anonymous Coward · · Score: 0

    What about using document formats that are plain text but transformed to a paper/screen viewable format? The original demonstration of this (two postscript documents) was done by adding specific garbage to the files, making it slightly clearer what was going on. With Word or Excel, you stand no chance of determining if a file has had garbage added to it or not.

    Banning MD5 is a decent PR move at Microsoft, as long as they teach their target audience of morons why they're doing it first, but don't mistake this for actual security, just because it happens to have real world security benefits.

    P.S. A lot of file formats will fall to this, including some used by open source projects. I'm not bashing Microsoft for being Microsoft; I'm bashing them for a) making a big deal out of something that everybody else took as a matter of course months ago, and b) holding this up as an example of them shifting focus towards being more secure.

  17. File Integrity Checkers ? by amodm · · Score: 1

    I wonder how does this affect the file integrity checkers. A lot of these softwares store hashes and use them to verify if a file has changed.

    So the next time someone installs a root kit, he just needs to do it in a way TFA points out.

    1. Re:File Integrity Checkers ? by ArsenneLupin · · Score: 3, Insightful
      So the next time someone installs a root kit, he just needs to do it in a way TFA points out.

      Wouldn't work. TFA points out a way to generate to packages, a good one, and an evil twin. Both are constructed by the algorithm.

      It does not however show how to make an evil twin for an existing package. This considerably lessens its danger.

      Think about it: the only guy who could pull this off would be the original author. But then, the original author could achieve that same goal much more easily than by "breaking" MD5.

    2. Re:File Integrity Checkers ? by Shano · · Score: 2, Interesting

      Not necessarily - it depends on how much the author is trusted to begin with. Certain types of software are very closely checked by the open source community, and any trojan will be discovered if it exists in the package.

      Say I write a bit of security software (for which most people take the time to compare checksums). As a relative nobody, lots of people are going to scrutinise the source code before using it. Any new release will also be checked. Only after it's been scrutinised and built up a reasonable userbase is it worth switching it for the evil version - otherwise, the evil version would be discovered early and nobody would use it.

      Someone like Schneier could probably release a trojan directly and people would install it without thinking (I trust he's too responsible for that, though). For the rest of us, this gives a feasible way to sneak in evil code without anyone checking it.

    3. Re:File Integrity Checkers ? by ArsenneLupin · · Score: 1
      Not necessarily - it depends on how much the author is trusted to begin with. Certain types of software are very closely checked by the open source community, and any trojan will be discovered if it exists in the package.

      And you somehow think that an inactive, but still present trojan would not be discovered? Unlikely.

      And yes, this is exactly what this so called exploit does: it constructs a package which has both codes present at once (the payload), and some small bit of code that examines the "header" (the bit that is different between both "twins") to chose which fork to activate. Somebody analyzing the source (or even reverse engineering the binary) will see what's up, even if he is examining the "good" code.

      Say I write a bit of security software (for which most people take the time to compare checksums). As a relative nobody, lots of people are going to scrutinise the source code before using it.

      Ok, so the shenanigan will be discovered on analysis...

      Only after it's been scrutinised and built up a reasonable userbase is it worth switching it for the evil version - otherwise, the evil version would be discovered early and nobody would use it.

      Any examination of the source of the "good" version will show the evil version as well. The only examination that won't turn up the shenanigan would be black box tests in a sandbox ("if code doesn't misbehave in sandbox, it won't misbehave in production or later"). However, if your goal is to protect against black box testing, there are easier ways than that. Such as time-dependant code (after a certain date, code will turn evil).

      Someone like Schneier could probably release a trojan directly and people would install it without thinking (I trust he's too responsible for that, though).

      Exactly.

      For the rest of us, this gives a feasible way to sneak in evil code without anyone checking it.

      If nobody's checking it anyways, there's no point in exploiting an MD5 collision. Just release the sometimes-but-not-always evil code, publish its MD5 sum, and be done with it.

  18. Re:So if you need a freely available hash algorith by amodm · · Score: 0, Redundant

    pardon me if i might sound redundant or ignorant, but why shouldn't md5 be considered a free algorithm ?

  19. sha1 creaky at the edges by samjam · · Score: 0, Redundant

    sha1 creaky at the edges, which, AFAIK is used by GIT.

    Perhaps SCO will get their source code into the kernel by financing SHA1 collisions?

    Sam

  20. MOD PARENT DOWN!!! by Anonymous Coward · · Score: 0

    Guys,

    I don't mean to bitch here, but the parent is just trying to karma whore, by stating why HE thinks is obvious, but if you read the insightful replies to his BS, you will see he is full of it...

    Actualy, just look at his posting history... Karma bitch...!

  21. Re:So if you need a freely available hash algorith by thomasdn · · Score: 2, Insightful

    AFAIK there are no known vulnerabilities or attacks for these two yet

    I am no cryptography expert so I can not read and understand those algorithms. But the fact that there are no known vilnerabilities for an algorithm doesn't make it secure.
    Maybe they are just not used as much as other well known algorithms. And therefore nobody has found vulnerabilities for them yet?

  22. Yadda Yadda by Effugas · · Score: 5, Interesting

    Two pages, same hashes, etc. (This is the guy who wrote the MD5 someday paper.)

    http://www.doxpara.com/t1.html
    http://www.doxpara.com/t2.html

    1. Re:Yadda Yadda by SimilarityEngine · · Score: 1

      Nice examples. I had a look at the page source for each and it's exactly the same technique: the content of both web pages is present in both HTML documents, with a script at the end which selects which data to display depending on which vector is at the start of the document. So, really, this thing with good.bin and evil.bin is no new breakthrough at all. It's just a more sensationalist way of making the same point.

      --
      Those who can make you believe absurdities can make you commit atrocities. - Voltaire
    2. Re:Yadda Yadda by pomakis · · Score: 1

      Thank you! It's nice to see an actual real-life example of how MD5 hashes aren't as safe as once thought.

    3. Re:Yadda Yadda by NightHwk1 · · Score: 1, Insightful

      Both pages are exactly the same, but the javascript output changes depending on the URL used. This doesn't have anything to do with MD5, and it's hard to see how this would count as an exploit.

    4. Re:Yadda Yadda by Jerf · · Score: 2, Informative

      Mods, parent is not insightful, parent is wrong. And it's not hard to understand why the parent has a hard time seeing how MD5 is involved when they have an incorrect idea of how it works.

      As a bit of a hint to NightHwk1, extremely carefully check the first bytes of those two pages. They are indeed not identical and do indeed hash to the same MD5 value.

      The rest of the "trick" of course hinges on the fact that a single bit change to a Turing Machine can completely alter the resulting output. "Trick" is put in quotes because that isn't really a "trick", it's a fundamental truth about computing and is the reason why twiddling even a few bits in an MD5-hashed block (which is all this break seems to be able to do) is such a big deal. This is how they demonstrate it constructively, since most people aren't programmers and won't understand that.

    5. Re:Yadda Yadda by Shanep · · Score: 1

      Two pages, same hashes, etc. (This is the guy who wrote the MD5 someday paper.)

      Notice the garbage characters at the beginning of each .html file.

      This is what I would expect in at least one file, or to make it easier, both files. Making changes within the Least Significant Bits of images (where steganography is often used) or areas of files where padded data can be ignored, etc, is where I imagine people could hide this type of garbage that allows this exploitation.

      --
      War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
  23. Re:So if you need a freely available hash algorith by El_Muerte_TDS · · Score: 1

    No attack? what about brute force? It's an attack, not a good one, but it is an attack.

  24. Actually RPM uses MD5 and SHA1 by seifried · · Score: 5, Informative

    RPM uses both MD5 and SHA1, the chances of finding a collision that satisfies both hashes is small, even if both MD5 and SHA1 are compromised since the hash the data differently.

    rpm -Kvv xorg-x11-libs-6.8.2-37.FC4.49.2.i386.rpm
    D: Expected size: 2655615 = lead(96)+sigs(344)+pad(0)+data(2655175)
    D: Actual size: 2655615
    D: opening db index /var/lib/rpm/Packages rdonly mode=0x0
    D: locked db index /var/lib/rpm/Packages
    D: opening db index /var/lib/rpm/Pubkeys rdonly mode=0x0
    D: read h# 278 Header sanity check: OK
    D: ========== DSA pubkey id b44269d0 4f2a6fd2 (h#278)
    ./updates-released/packages/xorg-x11-libs-6.8.2-37 .FC4.49.2.i386.rpm:
    Header V3 DSA signature: OK, key ID 4f2a6fd2
    Header SHA1 digest: OK (f37bf5cb97db696f14133b90e23f2455b9f94587)
    MD5 digest: OK (8eda29837b6992876bd867df03b3b8af)
    V3 DSA signature: OK, key ID 4f2a6fd2
    D: closed db index /var/lib/rpm/Pubkeys
    D: closed db index /var/lib/rpm/Packages
    D: May free Score board((nil))

    1. Re:Actually RPM uses MD5 and SHA1 by m50d · · Score: 1
      RPM uses both MD5 and SHA1, the chances of finding a collision that satisfies both hashes is small, even if both MD5 and SHA1 are compromised since the hash the data differently.

      Not true, firstly the two come from the same "family" and are related, and secondly most of this type of attacks allow you to generate arbitrarily many files matching the hash almost as easily as two. So you use your MD5 attack to generate a large set (2^60 or however many you need) of RPMs matching the MD5, then you use your SHA1 attack to find one among them which matches the SHA1 too, or vice versa.

      Fortunately there is no practical attack against SHA1 yet, but when there is RPM will be completely insecure.

      --
      I am trolling
    2. Re:Actually RPM uses MD5 and SHA1 by verbatim_verbose · · Score: 1

      Fedora does one better, and GPG signs packages (through yum). I'm not sure if other distributions do this, but I imagine this will be standard policy for most in the future. This provides a reliable way to tell that the package came from who you think it did (ie. the Redhat/Fedora dev crew), whereas the MD5/SHA1 above mainly just to verify file integrity so that a partially broken package (bad download, etc) doesn't hose your system.

    3. Re:Actually RPM uses MD5 and SHA1 by hey! · · Score: 1

      So you use your MD5 attack to generate a large set (2^60 or however many you need) of RPMs matching the MD5, then you use your SHA1 attack to find one among them which matches the SHA1 too, or vice versa.

      Sure, conceptually simple, but not necessarily computationally easy.

      I wouldn't say that using both MD5 and SHA1 is useless when compared to using SHA1 alone. For one thing, we don't know what the future SHA1 exploit will look like. If the number of files you need to generate a dual collision is quite large, it may be sufficiently impractical enough to buy a few years. Or files generated this way may have detectable characteristics.

      It's not a long term solution of course, but in security everything always boils down to matter of time.

      --
      Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    4. Re:Actually RPM uses MD5 and SHA1 by m50d · · Score: 1
      I wouldn't say that using both MD5 and SHA1 is useless when compared to using SHA1 alone. For one thing, we don't know what the future SHA1 exploit will look like. If the number of files you need to generate a dual collision is quite large, it may be sufficiently impractical enough to buy a few years.

      No it won't. If there's enough files required to make it impractical with MD5 as well, then there's enough files required to make it impractical anyway.

      Or files generated this way may have detectable characteristics.

      Again, adding MD5 makes no difference as to whether or not that is the case.

      It's not a long term solution of course, but in security everything always boils down to matter of time.

      Using both doesn't gain anyone any time.

      --
      I am trolling
    5. Re:Actually RPM uses MD5 and SHA1 by Physics+Dude · · Score: 1
      So you use your MD5 attack to generate a large set (2^60 or however many you need) of RPMs matching the MD5, then you use your SHA1 attack to find one among them which matches the SHA1 too, or vice versa.

      Generate 2^60 RPMs?

      Then TRY to find one that matches the SHA1?

      You're kidding, right?

      Obviously you and I have far different ideas of what "computationally feasable" means.

    6. Re:Actually RPM uses MD5 and SHA1 by Turmio · · Score: 1

      RPM uses both MD5 and SHA1, the chances of finding a collision that satisfies both hashes is small, even if both MD5 and SHA1 are compromised since the hash the data differently.
      Also SSL 3.0 and TLS 1.0 uses this kind of combination of different hash functions. You're still relatively secure even if one of the two functions would be completely compromised. Simple and efficient, I like it.

    7. Re:Actually RPM uses MD5 and SHA1 by m50d · · Score: 1

      The 2^60 number is for the current best (AFAIK) attack on SHA1, yes it's not feasible at the moment, but my point is that it's just as feasible to attack an MD5-SHA1 combo as SHA1 on its own.

      --
      I am trolling
    8. Re:Actually RPM uses MD5 and SHA1 by Anonymous Coward · · Score: 0

      It's kind of funny to see people being cought in the sensationalism and FUD spread by the media. MD5 hash is perfect for what it's being used. None of these phony "attacks" is saying anything new or unexpected. To produce two files with the same hash one needs to be able to write the hash and BOTH files. If one can change the ORIGINAL file AND the hash, thinkering with another file/hash is pointless and a waste of time. These "attacks" show that if you get something from an untrusted source it could be bad - hash cannot prevent that. It was not designed to prevent it either. The whole story has some minimal entertainment value but nothing beyond that.

    9. Re:Actually RPM uses MD5 and SHA1 by m50d · · Score: 1

      These attacks are still enough to make MD5 invalid for some applications, e.g. digital signing of contracts, and attacks only get better. You should move away from MD5, whatever you're doing.

      --
      I am trolling
    10. Re:Actually RPM uses MD5 and SHA1 by Physics+Dude · · Score: 1

      I think you're missing the point of all the other posters here: That the attacks would almost certainly be IMCOMPATIBLE. For example: Suppose an attack is found for SHA1 that is just like this one in nature. You seem to think that these two attacks can be used in conjunction with each other, but that just isn't the case. The modifications to extend the file while preserving SHA1 almost certainly WON'T preserve the MD5 stamp match previously created by the other attack. The same would be true in the reverse order. Bottome line: Even with attacks like this against both algorithms, if you're trying to use this on a file that gives both MD5 and SHA1, then you can't combine the attacks and create new files that match BOTH MD5 and SHA1 of the original. If you still don't understand this, please, please don't try to make a living in cryptography.

    11. Re:Actually RPM uses MD5 and SHA1 by m50d · · Score: 1
      I think you're missing the point of all the other posters here: That the attacks would almost certainly be IMCOMPATIBLE. For example: Suppose an attack is found for SHA1 that is just like this one in nature. You seem to think that these two attacks can be used in conjunction with each other, but that just isn't the case. The modifications to extend the file while preserving SHA1 almost certainly WON'T preserve the MD5 stamp match previously created by the other attack. The same would be true in the reverse order.

      If you look at the attack in more detail you'll find that's simply not true. Although they use only two vector values in the example, it's almost trivial to extend the method used to produce arbitrarily many vector values. Furthermore, you can find two vector values among a given set, provided it is large enough, just as easily as finding two "random" vector values. Therefore, if the SHA1 attack worked the same way you could combine both attacks in two ways - you could generate a whole load of vectors for MD5, then from them find two with matching SHA1, or you could do it the other way around. Even if the SHA1 attack takes a different form, you can still combine the two provided either one of two conditions hold: the SHA1 attack can be used to generate a large number of binaries with matching SHA1 which differ only in a single n-bit section, or the SHA1 attack can be used to find two binaries with the same SHA1 among a large number of binaries which differ only in a single n-bit section. All the attacks I know of satisfy both of these.

      If you still don't understand this, please, please don't try to make a living in cryptography.

      Any serious cryptographer will tell you combining MD5 and SHA1 is a stupid idea. Bruce Schneier (I've prolly misspelled his name) covered it very well in a fairly recent Crypto-gram.

      --
      I am trolling
    12. Re:Actually RPM uses MD5 and SHA1 by Physics+Dude · · Score: 1

      Sorry... my brain must have been off this morning. ;) I'll have to claim sleep deprivation and a cranky attitude from 5 days of a bad flu. ...and obviously a bit of cluelessness.

  25. Re:So if you need a freely available hash algorith by Anonymous Coward · · Score: 0

    No, so let's just abandon the idea with hashes and encyption algorithms altogether. :p

  26. Trusted source by flajann · · Score: 1
    This is an interesting exploit, but I consider it moot from the standpoint that if you have an untrustworthy source for software, there are other ways to hide viruses and malware with timed-release activation, say, on a certain date.

    Having said that, I find it highly amusing that Microsoft would ban MD5. Are they really trying to say we can't trust them as a source of software?

    <evil grin>

  27. Re:So if you need a freely available hash algorith by MadMoses · · Score: 2, Informative

    Misunderstandment. I never made a statement about MD5 not being "free". But MD5 is vulnerable, that's why I pointed out some alternatives.

    --

    Do not be alarmed. This is only a test.
  28. Re:So if you need a freely available hash algorith by FidelCatsro · · Score: 1

    There is also no way of knowing if your going to get hit by a bus next month .
    So all we can do is take sensible precautions to make sure it doesn't happen .

    --
    The only things certain in war are Propaganda and Death. You can never be sure which is which though
  29. SHA1 will fall soon by m50d · · Score: 1

    Attacks only get better, a theoretical vulnerability is one to worry about as they are almost always followed by a practical exploit like this. Move away from SHA1 before the same thing happens.

    --
    I am trolling
    1. Re:SHA1 will fall soon by Anonymous Coward · · Score: 0

      do you really think this is a practical exploit? jeez.

  30. WTF mate? by MukiMuki · · Score: 0, Offtopic

    Not a SINGLE post about Bittorrent (which uses MD5) and the **AA's? I'm rather dissapointed in you people.

    1. Re:WTF mate? by RPoet · · Score: 1

      Not a SINGLE post about Bittorrent (which uses MD5) and the **AA's?

      May be related to the fact that BitTorrent uses SHA-1, not MD5.

      --
      "Oppression and harassment is a small price to pay to live in the land of the free." -- Montgomery Burns.
    2. Re:WTF mate? by hoka · · Score: 1

      When I last looked into it the protocol allows for MD5 and SHA1, but defaults to SHA1.

  31. Re:So if you need a freely available hash algorith by gedhrel · · Score: 1

    There isn't sufficient energy in the solar system to complete a brute force attack.

  32. Re:from the tin-foil-hat-dept by Anonymous Coward · · Score: 1, Funny

    By the way:

    1) I am risking my life right now for writting this and I must leave the internet cafe in 4 minutes.

    2) See who claims that MD5 is insecure, follow the links and understand who knows the backdoor to the "alternative, more secure" hashing algorithm.

    GTG

  33. Re:So if you need a freely available hash algorith by Anonymous Coward · · Score: 0

    HASTA LA VISTA!

  34. Speaking of hashing algorithms... by scovetta · · Score: 2, Informative

    The NIST is having a two-day workshop in Gaitherburg, Maryland (USA) on October 31-Nov 1. Xiaoyun Wang will be giving a keynote speech, and there'll be plenty of technical material to go around. The workshop website is: www.csrc.nist.gov/pki/HashWorkshop/program.htm. I don't work for NIST or anything, but I thought this was interesting and they haven't really done a good job getting the word out about this conference.

    --
    Wer mit Ungeheuern kämpft, mag zusehn, dass er nicht dabei zum Ungeheuer wird. --Nietzsche
  35. -1, Clueless by Jussi+K.+Kojootti · · Score: 1

    ... or maybe you (and the mod weho gave you Insightful) just have a strange sense of humour?

  36. Ah, but ..... by ajs318 · · Score: 1

    If the MD5sums for two files don't match, then you can still say the files definitely are not the same. So it's not entirely useless. You can still use it to check for non-deliberate file corruption {such as you might see if you have a faulty drive or motherboard}, since the example was so contrived as it could almost never happen by accident.

    Also, I don't see how you could apply the scheme through the usual layers of archiving and compression. If I have a file "tldpsk.tar.gz" which contains "photoindex", "camprobe", "photograb", "install", "copying", "readme" and "manifest" -- all of which are human-readable text files, and the manifest, which is also reproduced on the download page, contains the original MD5sums of all the other files -- and I take its MD5sum, sure I might be able to produce another file with the same MD5sum as "tldpsk.tar.gz". But the new "tldpsk_altered.tar.gz" probably won't uncompress cleanly {first alarm bell} because the extra data you added probably won't be a valid file within the tar archive. And even if it is a valid file, then the manifest will be wrong {second alarm bell}. If you added extra data to one or more of the inner files -- let's say you put something nasty into "camprobe" and something else into "install" -- then the chances are that these files now won't be perfectly human-readable {third alarm bell}, even if their MD5sums match the ones in the manifest and on the download page.

    PGP signatures can help, of course; but all a PGP signature really proves at the end of the day is that the file was signed by someone who knew the purported author's secret key. In an ideal world, of course, that means nobody but the author; but if the author of the package was unlucky enough to trust an MD5sum on a compromised file, that might not necessarily be the case .....

    --
    Je fume. Tu fumes. Nous fûmes!
    1. Re:Ah, but ..... by chefren · · Score: 1

      If the MD5sums for two files don't match, then you can still say the files definitely are not the same. So it's not entirely useless.

      But if they DO match? What then? Then you can't say anything. A hashing algorithm really is fairly useless for security once its broken.

    2. Re:Ah, but ..... by Prior+Puss · · Score: 1

      If you want to know for sure whether two files are identical, you're probably better off comparing them, byte-for-byte. After all, to compute a hash, you have to read in every byte anyway (albeit in blocks) AND compute complicated bitwise rotations and xors etc, so it's probably easier and faster just to compare byte-for-byte.

      I must admit, I've used a SHA-1 hash to conduct a file compare in my code, but that's just 'cos I'm lazy and I'd already written the code for a SHA-1 compare :)

    3. Re:Ah, but ..... by Anonymous Coward · · Score: 0

      Well there are cases where a hash is "better":
      -files on two systems without much bandwith between them or where one does not wish to send the file data over the line.
      -comparing a file to itself over time, one only needs to store the md5 hash of the file in the past and not the whole file. This seems like checking for corruption.

  37. Don't panic yet by Anonymous Coward · · Score: 4, Insightful

    While this is a very serious attack, it doesn't yet mean that every use of MD5 is totally unsafe.

    It is feasible now to generate 2 different pieces of data with the same MD5 hash. As many file formats allow one to embed invisible 'junk', it is possible to create a 'good' version and an 'evil' version with the same MD5 hash.

    BUT it is NOT (yet) feasible to create a piece of data with a given MD5 hash. This means that if you can not modify the 'good' version, you can't create a matching 'evil' version.

    An example of where the usage of MD5 isn't broken are *nix passwords. Your password is hashed with MD5 (a salt is added to your password too, but that's not important here), and that hash value is stored. Anyone who can supply a password (doesn't have to be the same one!) which has the same MD5 hash, is allowed access.

    So if it would be feasible to generate data with a given MD5 hash, one could easily generate a matching password, when given the MD5 hash (which you can often easily acquire, especially in NIS/yp environments). But luckily, this is not possible ... yet :-)

    So you'd better start protecting those hashes (that's what a shadow password file does), en better yet, move to a better algorithm. Like the Blowfish algorithm that OpenBSD has been using for years now.

    1. Re:Don't panic yet by Haeleth · · Score: 3, Insightful

      An example of where the usage of MD5 isn't broken are *nix passwords. Your password is hashed with MD5 (a salt is added to your password too, but that's not important here)

      Actually, the salt is extremely important. A large proportion of unsalted MD5 passwords can be cracked trivially by looking them up in pregenerated databases. MD5 alone is broken for passwords - it's only the salt that makes MD5 passwords still "good enough" for use on low-security machines... for the time being.

    2. Re:Don't panic yet by Traegorn · · Score: 0

      Well, the reason it's also not vulnerable for Passwords is that you'd still have to Brute force it - and at that point you could just Brute force ANY password.

  38. Why it IS a problem by Moraelin · · Score: 3, Insightful

    The problem is that a broken algorithm just makes that piece of social engineering a lot easier.

    If I just told you you can download the latest auto-installer of the latest WoW patch from www.i-pwn-ur-puter.ru instead through the slow Blizzard installer, you might think "uh, wtf, I think I'll play it safe anyway and get it directly from Blizzard. I trust them more than I trust a warez and script-kiddie site."

    Now picture that I tell you "and here's a link to the MD5 sum on Blizzard's site. You can check for yourself that the the file on our site is the original file and it hasn't been tampered with." In fact, I would even _urge_ you to make a habit to check all your downloads against the original MD5 sums, for your own good.

    It already looks a lot safer and more legitimate. Well, maybe not to _you_, but to a lot of people it does.

    That's the whole problem. That false sense of security makes the "if we can convince you to run our insecure extractor code" part a helluva lot easier.

    --
    A polar bear is a cartesian bear after a coordinate transform.
    1. Re:Why it IS a problem by archeopterix · · Score: 3, Insightful
      Now picture that I tell you "and here's a link to the MD5 sum on Blizzard's site. You can check for yourself that the the file on our site is the original file and it hasn't been tampered with." In fact, I would even _urge_ you to make a habit to check all your downloads against the original MD5 sums, for your own good.

      It already looks a lot safer and more legitimate. Well, maybe not to _you_, but to a lot of people it does.

      That's the whole problem. That false sense of security makes the "if we can convince you to run our insecure extractor code" part a helluva lot easier.

      The scheme described in the article enables you to: take a good file and generate:

      A. Another "good" file (one that generates the same exec while extracting).
      B. A "bad" file.
      Such that hash(A)=hash(B)

      For this scheme to work, you would first have to convince Blizzard to use your "good" file A for distribution (more exactly: computing the published hash). Hey Blizzard, I have a file that extracts the same files as your distribution, only has a different hash value, why don't you replace your file with my file? "Helluva lot" easier than just convincing them to distribute your "bad" file? I don't think so.

    2. Re:Why it IS a problem by aaronl · · Score: 1

      No, no you would not have to convince the company to do anything. All you have to do is convince the user that your "bad" file is the same as the company's "good" file because they both compute to the same hash. *That* is why this attack is so annoying and dangerous; you can make the file seem legit by generating a "bad" file with the same checksum.

      Once your file appears to be legit, since it matches the checksum on the real site, you can get yourself ranked in search engines. Most people find things by going to a search engine and running a query. Then even the mildly paranoid people that actually check MD5s, or installers that check them, can be fooled.

    3. Re:Why it IS a problem by Anonymous Coward · · Score: 0

      "No, no you would not have to convince the company to do anything. All you have to do is convince the user that your "bad" file is the same as the company's "good" file because they both compute to the same hash."

      But the files don't hash to the same value!

      What the explot allows you to do, as the grandparent pointed out, is take a file (good.bin) and create two new files (good2.bin and bad.bin) with the properties that good2.bin behaves identically to good.bin but hashes to bad.bin. The hash of good2.bin and bad.bin is _NOT_ the same as good.bin.

      Therefore, the grandparent is correct that you would still have to convince the software company to use your good2.bin file in preference to their own good.bin, which would not be easy.

    4. Re:Why it IS a problem by Physics+Dude · · Score: 1
      you can make the file seem legit by generating a "bad" file with the same checksum.s

      No you can't. Not with the same MD5 as someone else's exe. Please RTFA before posting. The attack does NOT allow you to produce a file that matches another arbitrary file's MD5.

  39. Re:So if you need a freely available hash algorith by commodoresloat · · Score: 2, Funny
    ...better use Tiger

    Once again, OSX proves to be more secure!

    *ducks*

  40. Re:So if you need a freely available hash algorith by MaGogue · · Score: 1

    Unfortunately for now there is no other way but "break and fix" cycling. However, sometimes I feel people are too easily persuaded to buy into a new thing since 'The old one has been proven vulnerable, and this brand new shiny thing looks promising'.

  41. Re:So if you need a freely available hash algorith by MadMoses · · Score: 1

    Brute force is always a possibility. But the possibiblity of someone having and willing to spend the resources, technology, money and time to screw you over by brute forcing a 256 bit Whirlpool hash is marginal.

    --

    Do not be alarmed. This is only a test.
  42. On Slashdot.. by Anonymous Coward · · Score: 5, Funny

    surprisingly many stories hashes to the same value..

    1. Re:On Slashdot.. by Jeff+DeMaagd · · Score: 1

      surprisingly many stories hashes to the same value..

      Is it zero?

  43. Re:So if you need a freely available hash algorith by poopdeville · · Score: 5, Informative
    I am no cryptography expert so I can not read and understand those algorithms. But the fact that there are no known vilnerabilities for an algorithm doesn't make it secure. Maybe they are just not used as much as other well known algorithms. And therefore nobody has found vulnerabilities for them yet?

    This is a complicated issue. Generally, the security offered by an encryption algorithm isn't measured by its popular usage, but by the amount of time qualified professional cryptographyers/mathematicians/hackers have studied it without finding a critical vulnerability. My claim is probably too broad: there is no magical formula that determines how secure an algorithm is. But in depth work by professionals does endear confidence in an algorithm.

    As a general rule of thumb, it is wise to use an algorithm that has been seriously studied for 10-20 years. At this point, it is modern enough to withstand modern brute force attacks, and (hopefully) understood well enough to ensure that there are no structural vulnerabilities. If it is much older than that and still studied, it is likely because a flaw has been found and people are trying to push it as far as it goes.

    --
    After all, I am strangely colored.
  44. Re:So if you need a freely available hash algorith by MadMoses · · Score: 1

    Haha! Good one!

    --

    Do not be alarmed. This is only a test.
  45. Multiple Checks by RAMMS+EIN · · Score: 1

    I wonder why we're not using multiple checks on files. In the ports systems of OpenBSD and NetBSD, checksums using several algorithms are stored for each file. Right now, I think only SHA1 and file size are checked, but I think the feasibility of an attack would be severely reduced if another alrorithm (say RMD160) were used instead. The same could be done with other systems (apt, BitTorrent, ...)

    --
    Please correct me if I got my facts wrong.
  46. TCP by RAMMS+EIN · · Score: 2

    And to think that they're still working on getting MD5 digests in TCP packets...the algorithm could be as useless as the checksums they currently use by the time this change would have become widely accepted (now it's probably never going to happen).

    --
    Please correct me if I got my facts wrong.
    1. Re:TCP by tomstdenis · · Score: 2

      It's ok as a checksum [e.g. non-malicious modification].

      If you need say authentication you need a MAC anyways. As far as I know HMAC-MD5 is not immediately attackable by the known attacks [though I wouldn't use it anyways].

      Tom

      --
      Someday, I'll have a real sig.
    2. Re:TCP by milosoftware · · Score: 1

      For TCP using MD5 is not feasible. The hardware just isn't up to it, you can't calculate MD5 as fast as your network card can transfer it. By the time you have hardware that can keep up, the network has also evolved and you're still not fast enough.

      Another thing is, with CRC64, any modification of 64 bits or less will be detected (as will any odd number). If you look at the example, only 5 bytes (40 bits at most) were different, but yielded the same MD5 checksum. CRC64 would have detected this modification.

      The point is: MD5 is (well, used to be) good for data integrety protection. It is hard to willingly modify the data in such a way that the hash remains unchanged. It is not so good at error detection - it cannot guarantee a level of protection.

      CRC is good at detecting (random, transmission) errors. You can calculate exactly what the odds are that a certain number of bits has changed. And you can guarantee that no errors up to 32 bits will pass unnoticed.

      --
      Musicians don't die. They just decompose.
    3. Re:TCP by Anonymous Coward · · Score: 0

      It clearly _IS_ feasible to use MD5 for authenticating IP packets; most IPsec implementations offer it and RFC 2402 (IPsec AH) explicitly mentions it. Note that IPsec is the technology on which most VPNs are based.

    4. Re:TCP by Anonymous Coward · · Score: 0

      Note that IPsec incurs, as a result of this (and other) processing, a significant penalty. Not so much as to make the network unusable, but it's generally a 60% or more hit on the CPU load. For a fast enough machine, you can still saturate a 100Mbit connection (dunno about gigabit). Hardware crypto accelerators are available to offload this to some extent, however

  47. Re:So if you need a freely available hash algorith by csrster · · Score: 1

    There's the brute force method. It only takes a month to run.

  48. Ahem... by tomstdenis · · Score: 2, Informative

    said this before...

    Dan Kaminsky is actually the dude who came up with the Stripwire idea. ... LAST YEAR.

    Tom

    --
    Someday, I'll have a real sig.
    1. Re:Ahem... by Anonymous Coward · · Score: 0

      Why not just sign with both md5 and sha1? The probability of generating collisions in 2 different hash functions is silly. I'm not saying that GPG shouldn't be added to portage but the combined hash functionality could be done within a week.

    2. Re:Ahem... by tomstdenis · · Score: 1

      If you're going to modify your system anyways (to use a hidden combo trick) you might as well just swap out a new hash.

      In the case of portage that could be done in parallel. E.g. keep using the MD5 hash and introduce a SHA-2 + DSA based signature. Then later on down the road make the switch and make the signatures mandatory.

      The problem is the Gentoo folk aren't cryptographers. They did MD5 hashing mostly to make sure you grabbed the right file off some random mirror (e.g. to avoid simple typos and what not). Not to stop attackers from modifying files.

      Tom

      --
      Someday, I'll have a real sig.
    3. Re:Ahem... by Anonymous Coward · · Score: 0

      Tom,

      I agree with what you're saying however my point was that code for concatenating 2 standard python hash functions could be added to the portage scripts more easily than support for GPG. If there is a genuine security concern, why are we still discussing this a year later? I expect someone who has never written any python could make the patch in a couple of hours.

    4. Re:Ahem... by tomstdenis · · Score: 1

      Why would you concatenate two hashes?

      My point is if your going to change the protocol in a NON BACKWARDS COMPATIBLE FASHION why not just change it for a proper hash like SHA-2 or Whirlpool?

      Tom

      --
      Someday, I'll have a real sig.
  49. Re:from the tin-foil-hat-dept by grimJester · · Score: 1, Funny

    Hey, wait up, you forgot your wallet, mr... Anderson?

  50. There are some limitations to this attack by GekkePrutser · · Score: 5, Informative

    As far as I know, the technique used for finding these MD5 collisions, cannot be performed with a GIVEN hash. So it's not possible to create, say, a copy of an already available RPM, add malicious code to it, and easily find some data to add to it to generate the same hash. This is not possible.

    The only thing the current 'crack' does is create two RANDOM input files that generate the same hashed output. So it's only useful for someone who can control both the 'original' and the 'malicious' version of the data which is being protected by an MD5 hash.

    So the dangers here are kind of limited though you could still do a lot of damage with it.

  51. Said it before, saying it again by cpeikert · · Score: 1

    The known attacks produce MD5 collisions with the same length!!

    Padding and byte-counts and so forth are a red herring and provide no mitigation. The basic colliding messages are all exactly 1 block long. From there you can do message-extension. Any hash function with this so-called Merkle-Damgard structure is vulnerable to length extension attacks. And yes, this is true of Whirlpool too (in contrast to another poster's assertion): if you find two messages with the same hash and the same length, you can extend them both with the same string and continue to get collisions.

  52. Secure is as secure does, sir. by lheal · · Score: 1

    "Secure" is both an adjective and a verb. We usually use it as an adjective, like "orange" or "happy".

    I've had several discussions with my wife over the years about something being "orange". She insists that some maize-like color is truly orange, but the color of the University of Illinois' basketball jersey is more of a red.

    Choose an algorithm that makes you feel secure. Understand that your chosen algorithm, whatever it may be, is not proved unbreakable. It's just that the work involved for someone to break it is greater than the value to them of doing so - and greater than the value to you.

    --
    Raise your children as if you were teaching them to raise your grandchildren, because you are.
  53. Merkle Damgard: Why len(x) and len(y) matter by 0ptix · · Score: 2, Informative

    AFAIK this is an attack on the underlying Merkel-Damgard paradigm which both SHA-1 and MD5 (amoungst others) employ. The paradigm goes as follows:

                                          IV | Intialisation vector of n-bits
                                      MB_i | Message Black i of n-bits
                                      HB_i | Hashblock i of n-bits
    f:(IV , MB_i) -> HB_i | is the underlying compression function which takes both an IV and a message block as input and outputs a hashblock.

    First the orginal messaage is split up (and padded if need be) into n-bit blocks. Then f is applied with an IV and MB_1 as input resulting in HB_1. f is then applied iteratively each time tacking the next message block as input while using the previouse hash block as the IV input.

    f(IV, MB_1) = HB_1
    f(HB_1, MB_2) = HB_2 ...
    f(HB_s-1, MB_s) = HB_s = H(Message)

    Merkle and Damgard proved that the over all construction is collision resistent given that the compression function f is collision resistant.

    As the parent post pointed out though the last block had better include the over all message length. If this is not the case then by extendeing 2 different but colliding messages x,y with the same plain text q the input to the compression function becomes identical since H(x) = H(y) = IV input for f. If on the other hand the length of the orginal message is included in the last block then even though the IV input for f is the same for f(H(x),q_1) and for f(H(y),q_1)..., the final message block (q_s) will again be different resulting in a different final hash block.

    If on the other hand len(x) = len(y) then the problem persists since both IV and message blocks will be the same when the final iteration of the algorithm is reached.

    Infact this attack is even stronger since by the same reasoning one can see that to produce H(x+q) all one needs is to know is H(x) (and len(x) if that is included in the last message block). No other information is needed about the orginal message x! H(x) is simply inserted as the IV for f when hashing q and so the iteration is "jump started" just where x finishes. (If the length is included in the last block then all that need be used is len(x)+len(q).)

    Disclaimer: Not to 100% sure about all this though so please feel free to correct me if i'm wrong... :-)

  54. Educate me please! by AchilleTalon · · Score: 1
    I just realize I don't know much about this stuff and when time comes to make a choice of an algorithm over another one, I just don't know which one is the better choice.

    Anyone here can provide some links/references to appropriate documentation to educate myself?

    --
    Achille Talon
    Hop!
  55. And that might just be easier than you think by Moraelin · · Score: 1

    For a start, it still massively enlarges the number of people you must trust.

    E.g., do you automatically trust everyone working at Blizzard? How about some disgruntled temp employee at the publisher? Do you trust everyone at the company who made the install program too? Etc.

    Due to the whole "if H(A) = H(B), then H(A+Q) = H(A+Q)", they don't even have to convince Blizzard of anything. As long as I control the A and B versions of the self-extractor module, Blizzard's own content is the Q in that equation. It doesn't matter. Automatically _any_ self-extracting patch made with executable A, can have that executable part at the front stripped and replaced with the malicious executable B, and the result will still match the MD5 sum. I don't have to convince them to change their self-installing patch on their site: it will just be automatically compatible with the exploit as it is.

    Basically the whole thing has been bumped from "if the hash matches, you're extremely probably getting the same version" to "you have to trust everyone along the whole bloody chain that they didn't silently plant a part they can swap-in later." I don't know about you, but the latter makes me a bit less comfortable.

    Then there are a lot of files that people install that aren't from the devs themselves.

    E.g., while getting Blizzard to include my modified files might be hard, it probably wouldn't be hard at all to get a site like Vidiot Maps to replace one of their maps with my own more detailed map of an area. They'd probably even thank me for it.

    Basically, yes, it does require a lot more effort and preparation than just generating a virus that matches a given MD5 sum from scratch. But you have to trust everyone along the chain that they haven't made that effort. If they did, that illusion that MD5 is safe as houses just becomes a liability.

    --
    A polar bear is a cartesian bear after a coordinate transform.
    1. Re:And that might just be easier than you think by njyoder · · Score: 1

      do you automatically trust everyone working at Blizzard? How about some disgruntled temp employee at the publisher? Do you trust everyone at the company who made the install program too?

      You have no choice but to trust them, that's not the issue here.

      As long as I control the A and B versions of the self-extractor module, Blizzard's own content is the Q in that equation.

      A and B are random garbage data that form an MD5 collision, they aren't valid executable data. Q is the only valid executable data and that's the data you have no control over.

      Look, you REALLY don't understand the article and many people here have misunderstood it just like you have and have been corrected, please just accept that you're wrong and move on.

      I'll explain it in no uncertain terms here:

      if H(A) = H(B), then H(A+Q) = H(B+Q)

      A & B are pre-generated random collisions that hash to the same values. They are random data. Thus, in order to make a 'good' and an 'evil' version of an executable, you have to two that both contain the same code, with them only differing by that random chunk of data (A & B). There's a conditional statement at the beginning of the code, if it encounters random chunk A, it executes the good code, if it executes random chunk B, it executse the bad code. The problem is though, the code is the Q in the equation--the part that Blizzard created, which you don't have control over.

      The hash for Blizzard's installer is actually H(Q), whereas you'd be creating H(A+Q)=H(B+Q). H(Q) is NOT the same hash as H(A+Q).

  56. Banning MD5 is stupid and small minded by ivan256 · · Score: 3, Insightful

    It's dumb to ban MD5. It is still, and will continue to be a useful tool in situations where a cursory comparison is needed for reasons other than security. It would be silly, stupid even, to use a larger hash that requires more (slower) computation in these situations. Banning MD5 can mean one of two things: Either it's a stupid publicity stunt that will result in slower code overall, or that Microsoft doesn't trust it's developers to be smart enough to know when a particular hash is appropriate.

    1. Re:Banning MD5 is stupid and small minded by Fweeky · · Score: 1

      Indeed. One of FreeBSD's block device encryption classes (phk's gbde) uses MD5 as a "bit blender" function for instance; just because something's using a hash function doesn't mean it's depending on it to be cryptographically secure; there are plenty of cases where you just want a nice distribution of seed bits to play with, like with BloomFilters and such.

    2. Re:Banning MD5 is stupid and small minded by ivan256 · · Score: 1

      Exactly. There are even more simple examples than that too. Like finding dupilcates in a large set of data. Since you're probably going to do a byte for byte comparison anyway when you get a hit, It may even make sense to use something weaker than MD5.

    3. Re:Banning MD5 is stupid and small minded by evilviper · · Score: 1
      It would be silly, stupid even, to use a larger hash that requires more (slower) computation in these situations.

      Since hash functions are all equally IO-bound, and not processor-bound, I can't see any reason NOT TO upgrade to a more CPU-intensive hash function for everything. At least, not on a 4GHz PC where there's tons of CPU power to go around.

      Can you?
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    4. Re:Banning MD5 is stupid and small minded by ivan256 · · Score: 1
      Yes. I'll make a list for you:

      • Not all computation is done on grossly overpowered machines. (Most systems are embedded)
      • High end storage is faster than the bus speeds on most systems, and many problem sets fit into memory, so you frequently find yourself CPU bound. (Try doing 100,000 8k IOPS on any normal machine... Yet enterprise storage is this fast.)
      • Many operations are latency dependant. (Checksuming user input protocols, for example)
      • Wasting cycles uses extra electricity and generates extra heat. (This is a big deal in datacenters)
      • Hash functions that generate hight bit count output can waste network bandwidth. This is a big deal in cases where you are IO bound. (Think of how much this would suck for things like rsync or BitTorrent)
      • It's more difficult to compare long hash output by eye. (Never underestimate the downside of poor HCI design.)


      I could come up with more all day. Use the right tool for the right job. In general, you'll be glad you did in the long run.
    5. Re:Banning MD5 is stupid and small minded by evilviper · · Score: 1
      (Most systems are embedded)

      Yes, and those embedded systems don't have multi-GB hard drives either, so they don't need as much CPU power to compute hashes on smaller files.

      Wasting cycles uses extra electricity and generates extra heat.

      The difference in electricity and heat output between MD5 and a newer hash is trivially small. Absolutely no-one will notice the difference. Just think about wasted CPU cycles to service the interrupts in comparison.

      Hash functions that generate hight bit count output can waste network bandwidth.

      Again, it's a trivially small difference, unless you're on a 2400baud connection.

      It's more difficult to compare long hash output by eye.

      I certainly don't agree with that. Comparing only 10 bytes of MD5 strings wouldn't be any better than comparing only 10 bytes of a Tiger hash. etc.

      Use the right tool for the right job. In general, you'll be glad you did in the long run.

      Using a slightly over-powered tool for the job (rather than one that is just barely adequate) works even better in the long-run.
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    6. Re:Banning MD5 is stupid and small minded by ivan256 · · Score: 1

      You keep saying "trivial this" and "trivial that". Think about computing these things millions and millions of times.

      It can make a HUGE difference in power utilization, network protocol overhead, etc...

      Your philosophy is similar to the ones that cause new software to be so bloated and unwieldy.

      The difference in electricity and heat output between MD5 and a newer hash is trivially small. Absolutely no-one will notice the difference. Just think about wasted CPU cycles to service the interrupts in comparison.

      Sure they do. Some of them have gigabytes built in, some of them connect to storage networks with petabytes of storage attached to them. Millions of them route networks and can cram tens or even hundreds of gigabits of data through per second using processors so wussy they would make your gameboy laugh. Want to take a guess as to why we can afford to have millions of these devices in service, hell, why we can afford to keep them powered up?

      it's a trivially small difference, unless you're on a 2400baud connection.

      Doubling or quadrupling overhead is still a 100% or 300% increase in network utilization whether your link is gigabit, or 300 baud. Even over the fastest of links there are people out there who's job it is to sqeeze out every last byte of overhead. The point where things are "fast enough" isn't even visible on the horizon yet.

      The difference in electricity and heat output between MD5 and a newer hash is trivially small. Absolutely no-one will notice the difference. Just think about wasted CPU cycles to service the interrupts in comparison.

      Buddy, you have no idea. This is abolutely the type of thing I think about every day for a living. It does matter. It does make a difference. People build special purpose processors into their machines to compute hashes and checksums faster and with less electricity for a reason. These types of special purpose processors are even starting to find their way into everyday home PCs. They're especially important in datacenters or big research environments though. In these places saving 0.1% in computations could mean the difference between needing another 128 nodes in their cluster, needing another ton of air conditioning, or paying thousands of dollars more a year in electricity costs.

      More is not always better. Quite frequently it's worse.

    7. Re:Banning MD5 is stupid and small minded by ivan256 · · Score: 1

      Stupid windows clipboard. Obviously that first quote is from the wrong part of your comment... I meant to quote the part about not having gigabytes of storage in embedded devices... You'll figure it out, I'm sure.

    8. Re:Banning MD5 is stupid and small minded by evilviper · · Score: 1
      I don't think I've ever seen a bigger straw-man.

      Your philosophy is similar to the ones that cause new software to be so bloated and unwieldy.

      That's patently ridiculous. Software bloat does not come from using stronger cryptographic functions, or anything of that nature. Most people would be just FINE with that cause of slow-down... It's the pointless "didn't bother to remove the crap" code that causes real bloat. Switching from one (IO-bound) hash to another that is only slightly more computationally intense is not bloat in the slightest.

      Some of them have gigabytes built in, some of them connect to storage networks with petabytes of storage attached to them.

      And they're all running WinXP embedded, right? No? Some other Microsoft OS? How many of them run on CE or some such? Because we ARE talking about Microsoft.

      Doubling or quadrupling overhead is still a 100% or 300% increase in network utilization whether your link is gigabit, or 300 baud.

      300% increase in network utilization? Umm, no. Not unless all you're sending is a constant stream of hashes.

      Even over the fastest of links there are people out there who's job it is to sqeeze out every last byte of overhead.

      Once again, these links have Microsoft OSes at both ends, right?

      People build special purpose processors into their machines to compute hashes and checksums faster and with less electricity for a reason.


      They're especially important in datacenters or big research environments though.

      Once again, these datacenters are running Windows on all their machines of course, otherwise this would have absolutely no point in this discussion...

      And for some reason, in this specialized cluster of Windows system, they are apparently forced to use the built-in Windows cryptographic functions, and completely unable to code their own for their specific purposes.

      More is not always better. Quite frequently it's worse.

      Not in cryptography. A faster hash that has lots of collisions and can easily be 'fooled' is almost NEVER useful.
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    9. Re:Banning MD5 is stupid and small minded by ivan256 · · Score: 1

      [...] cryptography [...] cryptography [...] cryptography

      See, there's your problem. Now I understand why you don't get it. Not everything you use a hash for is cryptography, or even security related. If you don't understand that, of course you won't understand what I'm saying. You need to understand this to see why you're so very, very wrong.

      300% increase in network utilization? Umm, no. Not unless all you're sending is a constant stream of hashes.

      I said 300% increase in overhead, not a 300% increase in utilization, however if you are familliar with the protocols I used as an example, you'd know that in the best case scenario, hashes are all you'd send over the network. Remember, hashes are used for a lot more than security.

      Once again, these datacenters are running Windows on all their machines of course, otherwise this would have absolutely no point in this discussion...

      Ever hear of a company called Egenera? They make very high density compute farms that run windows. They're not the only ones. So, yes. It is done. But you're the one who's pulling this windows requirement out of your ass.

      Not in cryptography. A faster hash that has lots of collisions and can easily be 'fooled' is almost NEVER useful.

      Yes I'm repeating myself, but it'll do you good. Many, if not most, of the uses of hash functions have nothing to do with security or cryptography.

    10. Re:Banning MD5 is stupid and small minded by evilviper · · Score: 1
      But you're the one who's pulling this windows requirement out of your ass.

      What in the hell are you talking about. This thread is all about: "At Microsoft, the MD5 hash functions are banned." The subject of this thread is: "Banning MD5 is stupid and small minded" How is this NOT directly about Microsoft? Do you have some sort of memory impairment?

      Many, if not most, of the uses of hash functions have nothing to do with security or cryptography.

      I understand that very well. Most often it's used to verify the integrity of data. The more often there will be a collision, the more often it will be wrong, the less useful it is. If not, why bother with MD5 in the first place, just stick with CRC32 which is even faster and smaller! After all, banning CRC is stupid and small minded...
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  57. Re:So if you need a freely available hash algorith by Martin+Blank · · Score: 1

    Hence the reason for going with SHA-256. It doesn't fix the issues with the SHA algorithm (which may eventually be found to break the whole SHA system, much as MD5's first clear weaknesses proved to merely be the first steps on the path to shatter its reliability in the course of only months), but rather covers up the weakness with additional difficulty that is still sufficient to deter attackers.

    --
    You can never go home again... but I guess you can shop there.
  58. RPM is still relatively safe... by PAjamian · · Score: 2, Informative

    RPM only uses MD5 to check for corruption of the type you might find during download. RPM actually uses GPG or PGP to sign the generated RPMs for security, and GPG is (afaik) capable of using nearly any hashing algorythm including ones that are yet to be invented. So as far as security is concerned RPM doesn't use MD5 but rather uses whatever hashing algorythm the GPG key that signed the RPM was generated with.

    --
    Windows is a bonfire, Linux is the sun. Linux only looks smaller if you lack perspective.
  59. Whirlpool is not based on AES by Paul+Crowley · · Score: 2, Informative

    Whirlpool is not "based on AES". It shares a few similiarities (and a designer) but it is a distinct algorithm in its own right. It has a larger block size, different S-boxes, a different linear component, a different key schedule and so on.

    I would interpret "based on AES" as meaning it actually uses AES itself (perhaps in tandem Davis-Meyer mode or similar).

    I like Whirlpool but it's not fast. I think Tiger is quite a bit faster.

    Bernstein's cache timing attacks, and the ever-growing gap between processor speeds and memory access times, mean that table-based primitives (which both of these are) are going out of style.

    1. Re:Whirlpool is not based on AES by Anonymous Coward · · Score: 0

      I personally don't think Bernstein's cache timing attack is as serious of an issue as Bernstein thinks it is. I think it's an interesting attack which can be all but defeated by either changing AES' key schedule or making the order of numbers in the tables securely random.

    2. Re:Whirlpool is not based on AES by Paul+Crowley · · Score: 1

      If you think AES needs to be changed incompatibly to address Bernstein's attack then you're asserting that it's very serious indeed. You're basically talking about creating a new algorithm with some similarities to AES; such an algorithm would have to go through a long approval process before seeing use. If we were going to do that, why would we use a "tweaked" version of AES rather than a whole new algorithm?

    3. Re:Whirlpool is not based on AES by MadMoses · · Score: 1

      Whirlpool is not "based on AES". It shares a few similiarities (and a designer) [...]
      True, that's what I wanted to express. Maybe I should have used different words (non-native speaker here).

      I like Whirlpool but it's not fast. I think Tiger is quite a bit faster.
      Seconded. It also needs fewer memory resources.

      Since you seem to know a bit about the subject, do you know anything about the state of the announced Tiger2?

      --

      Do not be alarmed. This is only a test.
  60. WRONG! by Anonymous Coward · · Score: 0

    The good.bin file DOESN'T need to start with a "vector designed to enable a collision". Any good file with any resulting hash can be used. You put random crap into the BAD file to make it match the hash of the bad file. Most file formats (including executables, Word documents, PDFs, and common compression formats like .zip files) have some way to include some bytes somewhere that have no effect on the use of the file. Any such format is vulnerable to this attack.

  61. Public key encryption... by oliverthered · · Score: 1

    All you have to do is sign the checksums so that the recipient can tell where they've come from and the jobs done.

    --
    thank God the internet isn't a human right.
  62. replace both the "Evil file" AND the extractor ? by Jedi_Gunsmith · · Score: 2, Insightful

    I just read the article, and i was thinking :If we can replace the extractor, then why bother creating an evil file with the same checksum ? The extractor can do the "evil patch" at extraction. The method in this article isn't very usefull for evil purposes.

  63. About this attack by Ernesto+Alvarez · · Score: 3, Insightful

    I've been reading the posts in this thread and I've noticed that there are two types of posters here: the ones who got it 100% right, and the clueless ones (there appear to be little or no posters in the middle ground).

    Now, the clueless ones are thinking of lots of "attacks" using this vulnerability, some of them really wrong. Since this has the potential of getting lots of people to do stupid things (like not trusting MD5 when they should), let's talk a little bit about the vulnerability and its effects.

    First of all: this is not new. There was an article here explaining the same attack a few months ago (about x.509 certificate collisions and how to fake postscript orders, if you know what I am refering to, please post a link).

    The attack goes like this:

    You have a block B1 that is known to collide with another block B2.
    You have some custom made code that looks like this:

    -----BEGIN SNEAKY CODE---------------
    If DATA[1] = DATA[2] then
    do something good
    else
    do something bad
    end
    DATA[1]
    DATA[2]
    -----END SNEAKY CODE-----------------

    The trick is that since there's a collision between B1 and B2 and MD5 makes the hash by reading sequentially, the hash for the whole program will be the same whether you fill DATA[1] and DATA[2] with B1 or B2 (in any combination). Since the code is DESIGNED to do different things depending on the collision area, by changing the contents of DATA[1] and DATA[2] you can have programs that do "good" or "bad" things, with the same hash. Please note it's been DESIGNED with that in mind.

    From now on I'll talk on absolute terms, while in reality there is a very small probability of things being right for an attack without being planned that way, so keep in mind that before saying "but that's not the whole truth.....".

    Now let's discuss what's possible to do and what's not:

    1.Oh no! Now, someone will create a virus that has the same hash than my favorite app!

    False: the app (or installer) would have to have been designed with that "feature" in advance.

    2.MD5 is worthless and should not be used anymore.

    False: MD5 is useless in the situation presented above. There are some very good uses of MD5 that are safe (like access control: this attack does nothing practical to you salted MD5 shadow file). MD5 should probably be watched for other undesirable properties, though. An alternate cryptographically secure function should be kept in reserve.

    3.I'll use another hash function, I'll be invulnerable to this attack.

    (somewhat) False: You'll be invulnerable until someone finds ONE collision in your new hash function (it might take a long time but....). Then you'll be vulnerable again. But now we all know what can be done with ONE collision. What you're thinking is probably good, but it's no silver bullet.

    4.Microsoft will forbid the use of MD5 and DES, and use SHA-1 and AES. We should do the same.

    (somewhat) True: Not for the reasons you're thinking though. If MS is really doing this, this attack is a lame excuse to do it. MD5 is still useable for some things, and SHA-1 is not much better than MD5 in the things related to this attack. IIRC these collisions were found using an attack derived from an attack on SHA-1. Right now, SHA-1 collisions can be found in 2^63 operations (and the clock is ticking). We should probably consider using a new hash function someday, but leave the decision to the cryptologists. About AES, it's about time. DES can be brute forced in reasonable time, and that's been like that for a few years. 3DES is slow. That's the reason for the AES contest, we should use since we have it.

    5.Someone could distribute some sort of binary and the switch it so it does lots of damage to unsuspecting people.

    True: That's exactly what the attack is about. Maybe you were wrong to trust [insert a name here].

    6.Who should be doing what and when?

    If you work in crypto, you probably k

  64. Two tags by idonthack · · Score: 1

    Why did he use Javascript?

    <!-- --> is all you need, and it works in every browser.

    --
    Why is it that when you believe something it's an opinion, but when I believe something it's a manifesto?
  65. What would you recommend to use in place of MD5? by soft_guy · · Score: 1

    I'm working on a system that uses MD5 in order to cache some data so that it does not always have to be retrieved over the network - the client instead asks for the MD5 and checks to see if it is in its disk cache before doing the more expensive operation of pulling the whole dataset and adding it to the cache. We plan to ship the system with the cache pre-populated so that in the field there won't be any "cache misses" until we ship an update and then there will only be just the one.

    In this case, I think the attack vector would be as follows. Some malware opens the cache and replaces the full data with an alternate set of data wiht the same hash key. This would result in the client not operating correctly (because it has the wrong data.) Of course, they could be done even without the md5 being compromised because the program just assumes that the data in the cache file is valid. This leads me to believe I ought to re-run the MD5 on the data in the cache when the cache file is loaded in order to validate it (discarding any records that don't hash correctly).

    So, assuming I do that, I should probably change away from MD5. Which algorithm should I use in its place? Any recommendations from the good folks on slashdot?

    --
    Avoid Missing Ball for High Score
  66. Actually, rpm does it by buchanmilne · · Score: 1

    rpmbuild -ba --sign ...

    or

    rpm --addsign rpm ...

    yum may verify the signature, but then ... this is (and has been for a long time) common practice across all rpm-based distros and tools (urpmi/apt etc).

  67. Probably one of the SHA family by Just+Some+Guy · · Score: 1
    Which algorithm should I use in its place?
    SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512 are the required secure hash algorithms for use in U.S. Federal applications, including use by other cryptographic algorithms and protocols, for the protection of sensitive unclassified information. FIPS PUB 180-1 also encouraged adoption and use of SHA-1 by private and commercial organizations.

    Source: Wikipedia

    --
    Dewey, what part of this looks like authorities should be involved?
  68. FLAWED EXAMPLE by merreborn · · Score: 1

    That attack doesn't work!!

    This attack operates on one fact:
    if MD5(x) == MD5(y) then MD5(x + q) == MD5(x + q)

    Here's the rub: the MD5 hash of your infected patch is MD5(x + q). The MD5 hash of blizzard's patch is MD5(q)

    MD5(q) != MD5(x + q)

    1. Re:FLAWED EXAMPLE by Moraelin · · Score: 1

      No, the attack works flawlessly if it's seeded in advance. Try reading for a change, instead of just waving around the same tired "but you can't fake H(Q)!!!!111" battle banner.

      Any program can be seeded in advance with an A by an insider, so guess what? I'm not starting from H(Q), I'm proposing precisely to start from H(A+Q).

      If I'm a disgruntled employee at Blizzard or at Vivendi or at whoever makes InstallShield, I can plant an A in advance in the self-extractor Q1 they use for their patches. Now they make a patch Q2, and pack it in a self-extracting archive. Normally they'd end up with Q1+Q2. Except they're really getting A+Q1+Q2. They make an MD5 sum of that. Guess what? It's H(A+Q1+Q2). That's the sum they'll publish and against which you'll check. If I swap in my B later, guess what I'll have? H(B+Q1+Q2). Guess what? It matches the original MD5 sum perfectly.

      You know what's even more fun? Even if I then resign, chances are they won't wipe out all computers and reinstall all libraries and utilities immediately. I can leave such seeds all over the place: in the installer, in the libraries, in a ton of places. Chances are the next few patches will be swappable like that even after I'm not even working there any more.

      Now time to exploit it. Let's say their next patch still contains my see A. I download it and swap B instead. How do I get the users to download that? Well, I don't even have to do any social engineering. Blizzard's patcher is based on BitTorrent (and thus on MD5). All I have to do is be one of the nodes that's offering that patch for download. I don't even have to convince the users that it's safe to download from me, because that BitTorrent-based installer will check the MD5 sums and trust me anyway. So out of some 4 million users downloading that patch, a few tens of thousand download my malicious code without even knowing.

      That's the problem. It's _not_ as unsafe as "any Joe Random can fake H(Q)", but an insider can plant an A in advance for later use.

      And unlike just planting malicious code, which might be found, the seed A here is innocent-looking garbage. Even if someone disassembled it, they wouldn't find any malicious code. Because it's just the random garbage A that serves as a placeholder for the real malicious code. It's there just so later I can swap in the B unnoticed.

      --
      A polar bear is a cartesian bear after a coordinate transform.
    2. Re:FLAWED EXAMPLE by merreborn · · Score: 1

      If you have access to the trusted source's servers (blizzards, in this example), you don't need an MD5 vulnerability to do damage.

      This whole MD5 flaw is massively overhyped. If it had any *real* practical application, we'd be seeing reports of actual exploits, not more of these contrived proofs of concept.

  69. typo by merreborn · · Score: 1

    if MD5(x) == MD5(y) then MD5(x + q) == MD5(y + q)

    the rest still holds.

  70. Not as interesting as a previous attack. by Anonymous Coward · · Score: 0
    This is similar to, but less efective then the attack mentioned here: http://it.slashdot.org/article.pl?sid=05/06/10/174 9256&tid=172&tid=93

    The linked to article talks about matching PS documents. That is a much more serious attack. The real flaw though is that it depends upon tricking people. The concept is that you generate a special postscript file. The file is read and digitally signed by by somebody. Then you swap out part of the file with annother one with the same md5sum and the PS yeilds a different message. The flaw here is with the person who signed the document is in confusing the output of a program with the program itself. The fact is that the program could have detected the target's printer and printed a different message there then on everybody else's printer.

    Now, I have digressed. Anyway this attack, while related to the ps one, is flawed in that it depends on a combination of 2 macilious files, a malicious bin file, and an extactor that is malicous also.

    The postscript code is closer to this (which would not work for c, due to the way executables are, but at least shows the basic flow of the exploit style.)

    good.c:
    /* This must be compiled without optimization */
    char vec[]=
    {
    0xd1, 0x31, 0xdd, 0x02, 0xc5, 0xe6 , 0xee , 0xc4 , 0x69 , 0x3d, 0x9a , 0x06
    , 0x98 , 0xaf , 0xf9 , 0x5c , 0x2f , 0xca , 0xb5 , /**/0x87 , 0x12 , 0x46 , 0x7e
    , 0xab , 0x40 , 0x04 , 0x58 , 0x3e , 0xb8 , 0xfb , 0x7f , 0x89 , 0x55 , 0xad
    , 0x34 , 0x06 , 0x09 , 0xf4 , 0xb3 , 0x02 , 0x83 , 0xe4 , 0x88 , 0x83 , 0x25
    , 0x71 , 0x41 , 0x5a, 0x08 , 0x51 , 0x25 , 0xe8 , 0xf7 , 0xcd , 0xc9 , 0x9f ,
    0xd9 , 0x1d , 0xbd , 0xf2 , 0x80 , 0x37 , 0x3c , 0x5b , 0xd8 , 0x82 , 0x3e
    , 0x31 , 0x56 , 0x34 , 0x8f , 0x5b , 0xae , 0x6d , 0xac , 0xd4 , 0x36 , 0xc9
    , 0x19 , 0xc6 , 0xdd , 0x53 , 0xe2 , 0xb4 , 0x87 , 0xda , 0x03 , 0xfd , 0x02
    , 0x39 , 0x63 , 0x06 , 0xd2 , 0x48 , 0xcd , 0xa0 , 0xe9 , 0x9f , 0x33 , 0x42
    , 0x0f , 0x57 , 0x7e , 0xe8 , 0xce , 0x54 , 0xb6 , 0x70 , 0x80 , 0xa8 , 0x0d
    , 0x1e , 0xc6 , 0x98 , 0x21 , 0xbc , 0xb6 , 0xa8 , 0x83 , 0x93 , 0x96 , 0xf9
    , 0x65 , 0x2b , 0x6f , 0xf7 , 0x2a , 0x70
    };
    int main()
    {
    if (vec[123] == 0xab)
    {
    /* Do bad thing */
    }
    else
    {
    /* Do good thing */
    }
    return 0;
    }
    Remeber that that actually would not work, but shows the basic flow.

    Again the problem is that a signature or hash can mean only that the file has not been modified (and really even that is not garenteed as these attacks show), but gives no indication that the author is not malicious. These current attacks require the author to be malicious.

  71. Re:replace both the "Evil file" AND the extractor by Clueless+Moron · · Score: 1
    I just read the article, and i was thinking: If we can replace the extractor, then why bother creating an evil file with the same checksum? The extractor can do the "evil patch" at extraction.

    TFA is indeed not clear. I'm assuming he meant that somebody publishes a) an extractor and b) an archive, and with MD5's for them both.

    People download, run, it all works great. Based on this trust, you download them both and try them too. Their MD5's match what everyone says it should be.

    Unfortunately, in the meanwhile the site owner has replaced the good archive with the evil one. The MD5's are the same, but you will now get pwn3d.

    But the fundamental problem here is really just that you are treating the extractor as a trusted binary. The archive isn't an issue.

    The extractor could instead have been coded to do something evil simply based on today's date, your login name, or a random number. No MD5 fiddling needed.

  72. Why not multi-algo hashes? by emc3 · · Score: 1

    Why don't we just make it a standard practice to provide "verification" hash checksums from multiple algorithms? If you provide *both* the MD5 and the SHA1 hashes for a file, it will be many, many levels of magnitude more difficult to construct replacement data which hashes the same both ways. The collision space for this must be infintessimal?

    If everybody just switches to AES or somesuch, aren't we just postponing the problem until similar methods of attack are proven against its algorithms? By combining multiple hash algorithms, you gain a sort of independent oversight.

    --

    Ernest MacDougal Campbell III
    geek ramblings
  73. think again by DrSkwid · · Score: 1

    would you mind telling me what the CRC field is for in the gzip format then ?

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
  74. No, YOU don't understand by Moraelin · · Score: 1

    No, you don't understand. No, I'm _not_ proposing to start from a Q and H(Q), so you can stop repeating that already.

    I'm proposing to start from an A that's been planted in advance, and thus from an A+Q. As long as my "seed" A is in there, I don't flippin' care what Blizzard's Q is there. I can replace my pre-planted A and with my pre-prepared B, and end up... guess what? With exactly a case of A+Q versus B+Q.

    E.g., an auto-extracting installer is exactly a case of A1+Q1. You have an installer A1 at the front which Blizzard _didn't_ write, and an archive at the end Q1 which Blizzard did create.

    If I control the A1 and have a B1 which hashes to the same value, I can swap it in the above scenario and keep the MD5 sum for the whole file unchanged.

    The executable itself, the A1 and B1 above will in turn, yes, be constructed like that. Out of a base installer Q, and two pieces of code A and B.

    Basically it's a case of A+Q+Q1 vs B+Q+Q1. Where only the Q1 part is under their control, Q is the actual installer, and A looks benign.

    Do you understand _now_? Again, I'm not saying I'm gonna start from scratch with a H(Q) and build a collision from there, I'm saying it's possible for an insider to plant a seed in advance that they can replace later.

    The same applies for anything else. I can for example distribute a pre-compiled library where one module is an A+Q case. The Q is the actual library module you wanted to use, and the A is planted there for future use. The moment you link that in your program, it becomes again an A+Q+Q1. Where Q1 is your code. Voila, I can now swap my B in and keep the MD5 sum intact for every single program you compiled with that library.

    That's the problem. No, you can't start from scratch, but basically it becomes feasible for one _hell_ of a lot of people to plant such pieces of padding in advance, for future use.

    --
    A polar bear is a cartesian bear after a coordinate transform.
  75. Your point is that an insider can do damage? by njyoder · · Score: 1

    I'm saying it's possible for an insider to plant a seed in advance that they can replace later[...]No, you can't start from scratch, but basically it becomes feasible for one _hell_ of a lot of people to plant such pieces of padding in advance, for future use.

    It's already possible without this MD5 attack. It's called modifying the code to put a trojan or subtle vulnerability in it. You're just proposing a needlessly complicated method of taking advantage of being on the inside.

    Your initial posting didn't mention anything about inside jobs, it described the attack entirely as if some outsider could launch this on their own.

    Please mod this guys posting down, "people on the inside can trojan executables" is not the slightest bit insightful, nor is it specific to MD5 attacks.

    1. Re:Your point is that an insider can do damage? by Moraelin · · Score: 1

      "It's already possible without this MD5 attack. It's called modifying the code to put a trojan or subtle vulnerability in it. You're just proposing a needlessly complicated method of taking advantage of being on the inside."

      The difference is that a trojan can be disassembled, or be noticed by a heuristic scan or whatever. If I plant BackOrifice in a library, someone might find it. If BackOrifice is B and I plant some random garbage A in advance, noone will find anything wrong with A. It's just random garbage that doesn't even do anything. It's just there as a placeholder for the real attack.

      It's just an extra attack. More complicated, yes, but it's an extra kind of attack nevertheless.

      "Your initial posting didn't mention anything about inside jobs, it described the attack entirely as if some outsider could launch this on their own."

      The message you were answering, though, did mention an executable made of two parts, and a very detailed description of what's replaced with what. But you didn't read that. You just jumped in to karma-whore with the "but you can't fake H(Q)!!!" thing. Without even understanding what you're talking about.

      "Please mod this guys posting down, "people on the inside can trojan executables" is not the slightest bit insightful, nor is it specific to MD5 attacks."

      You know what? It's not like I even care about karma or modding. In fact, I'd have only contempt to anyone who'd be scared into submission by such a lame thing. We're talking maths and security, not about enacting your fantasies of being the more popular prom-queen. If you think "right" and "wrong" in security are decided by who can get the mob to silence the other... I rest my case.

      Sure, mod me down if that bandages your little ego. In fact, go ahead and add me to your foes list or whatever, so you can get some more ego-masturbation the next time you get mod points. Like I care.

      But it nevertheless strikes me as _the_ most lame thing I've read in ages.

      You ran out of arguments, didn't you? And you're going for playing prom-queen instead. If you can't make a coherent point, you just have to get someone to silence the bad man who dares disaggree with you. Geesh, dude. Do you even realize how sad that is?

      --
      A polar bear is a cartesian bear after a coordinate transform.
    2. Re:Your point is that an insider can do damage? by njyoder · · Score: 1

      The difference is that a trojan can be disassembled, or be noticed by a heuristic scan or whatever.

      The same applies to your proposed attack. Your attack requires that the insider insert specific malicious code into Q to properly switch between the 'good' and 'evil' versions of code based on whether A & B are present.

      Seriously, you don't understand how this attack works. The executable differs ONLY by A & B, where A & B are random junk data that hash to the same value. Let me explain it again, this time in terms of how it's structured:

      =Data section=
      garbage_data = A or B

      =Code section= ...
          if garbage_data is A then do good()
          otherwise do evil() ...
      good() { ... }
      evil() { ... }

      END PROGRAM

      Get it now? The only difference between the two versions is the random garbage data (A & B). The good() and evil() code exists in both versions, it's just that only one is called in each version.

      If I plant BackOrifice in a library, someone might find it.

      And if you plant the malicious code in Q, which you absolutely must, someone might find it.

      If BackOrifice is B and I plant some random garbage A in advance, noone will find anything wrong with A.

      Jesus H. Christ on a stick. You still don't get it. BO couldn't possibly be B. As I've said, A & B are random junk data, they don't represent anything meaningful.

      The collisions found through this attack are found AT RANDOM. They are not found based on some data you created. That means A & B are random garbage. I will keep repeating that until eventually you get it.

      If you could, in fact, generate A (junk data) that hashed to the same value as meaningful code, then that would mean you wouldn't actually need an insider in the first place. Also, even if you could do that and attempted to do so, you'd still need some added code in the executable which could be detected with heuristics.

      Honestly, I have no idea what is going through your head, because you're not making any sense.

      But you didn't read that. You just jumped in to karma-whore with the "but you can't fake H(Q)!!!" thing. Without even understanding what you're talking about.

      I read it and it was wrong. You were under the mistaken impression that either A or B could be replaced with something meaningful. They can't. The executable is EXACTLY THE SAME aside from some junk data.

      We're talking maths and security...

      Which you don't seem to understand. Maybe you should take it as a hint that when everyone is telling you that you are wrong, maybe you are.

      You ran out of arguments, didn't you?

      No, I've presented mine very clearly. You, otoh, seem insistent on not listening to anything said.

      If you can't make a coherent point, you just have to get someone to silence the bad man who dares disaggree with you.

      Pot, kettle, black. You have yet to make a single coherent point. People are modding you down because you're wrong. You have made a common mistake that has been corrected many times in this post and yet you're one of the few who doesn't want to accept that they're wrong.

  76. His ass may be insightful ... by pbhj · · Score: 1

    But he, Sir, is not!

    As others have said, hashes (eg checksums) are numerical methods performed to produce a small number from a large one.

    The hash is complex so that when you have the small number it's hard to work out the large one. But imagine my hash p(n) takes number n between 10000 and 20000 and returns a single digit. The number of collisions is huge. I have a 1 in 10 chance of picking an "n" that gets the same hash as your number, "m" say. What I can't do is workout your "m".

    With passwords however. Most collissions aren't proper words. So a human can readily check which collision matches. Try a search for "jack the ripper".

  77. Tiger2 is just Tiger with different padding method by Paul+Crowley · · Score: 1

    According to the Tiger home page, Tiger2 is just Tiger using the MD5/SHA padding method. It's probably done to make it a more convenient drop-in replacement, rather than for any security reasons.

  78. Re:Tiger2 is just Tiger with different padding met by MadMoses · · Score: 1

    Ok, thanks!

    --

    Do not be alarmed. This is only a test.