Slashdot Mirror


Meaningful MD5 Collisions

mrogers writes "Researchers at Ruhr-Universität Bochum have found a way to produce MD5 collisions between human-meaningful documents. This could be used to obtain a digital signature on one document and then transfer it to another. The same technique is theoretically applicable to other hash functions based on the Merkle-Damgård structure, such as SHA-1." From the article: "Recently, the world of cryptographic hash functions has turned into a mess. A lot of researchers announced algorithms ("attacks") to find collisions for common hash functions such as MD5 and SHA-1 (see [B+, WFLY, WY, WYY-a, WYY-b]). For cryptographers, these results are exciting - but many so-called 'practitioners' turned them down as 'practically irrelevant'."

62 of 312 comments (clear)

  1. Common sense by gtrubetskoy · · Score: 2, Insightful


    Unless I'm missing something, all these guys are doing is using a format that can contain an infinite amount of extraneous information that has no effect on how it's ultimately rendered, so same thing can be done with a .doc, or an HTML file, and this isn't really a cryptographic discovery of hash function weakness of any kind, just common sense for most of us - the secure hash algorithms should be applied to the English (or whatever language) textual contents of the document, no the source code of it, such as PostScript used in the article, PDF, HTML or whatever. I guess the most important lesson here is that this technique can be applied to binaries pretty easily as well.

    1. Re:Common sense by ergo98 · · Score: 2, Informative

      and this isn't really a cryptographic discovery of hash function weakness of any kind

      It's an example of the hash collision weaknesses recently documented, giving a practical example of how it could be used for malicious purposes.

      Traditionally we haven't had to worry about nonsensical things like applying the hash only to easily verifiable English text, because the hash is supposed to practically protect against intentional searches for collisions.

    2. Re:Common sense by loose_cannon_gamer · · Score: 2, Insightful

      Your common sense seems a little ridiculous. Are we saying that all documents have to be reduced to text before applying our encryption? What about nontextual documents, like, say, process flowcharts, spreadsheets, powerpoint slides, multimedia?

      There are a lot of formats out there that allow additions of random undisplayed information, and so I presume that many of these formats are vulnerable to these attacks.

      I wonder how long it will take before there are exploits out there that take advantage of these techniques... Of course, I also wonder how many there already are...

      Oh well. The key concept behind security is, has been, and always will be trust. You should always ask yourself when you receive something from someone else how much you trust the source, and act accordingly.

      --
      In Soviet Russia, us are belong to all your base.
    3. Re:Common sense by iabervon · · Score: 3, Interesting

      Actually, the two documents are actually almost identical. The difference is only one block in the whole file, which essentially acts as a selector for which of the two sets of content is displayed. MD5 (like most hash functions) works on fixed-size blocks smaller than the average file. To hash a complete file, you hash the first block, feed that into the hash with the second block, feed that into the hash with the third block, and so forth. So they have two files, and the first blocks are the same, the second blocks are different but hash the same, and the rest of the files are the same. Of course, the second blocks are junk, but the postscript is expecting a block-sized arbitrary value at that point anyway, so it doesn't matter that there's junk there.

      So they are actually using a format that can contain an exact quantity of extraneous information that doesn't get rendered but entirely changes what does get rendered.

      The same thing could be done with PDF or doc, and executables, but not anything compressed (it won't decompress at all if a block is changed) and not HTML without javascript (there's no way to test which block of junk is included and show different results based on that).

    4. Re:Common sense by Anonymous Coward · · Score: 3, Informative

      Isn't this obivous if you check the file sizes?

      The files are the same size.

      The cksum comand (which uses a 32 bit CRC) spits out the checksum and a file size. Why doesn't md5sum do the same thing?

      It does - The file size is used as part of the MD5 hash. The MD5 algorithm hashes the file, then appends the file size and hashes that too. If it didn't do this then you could create an MD5 collision just by appending zeros to the file.

  2. These are important attacks.. by Ckwop · · Score: 5, Insightful

    As an amateur cryptographer, I must say that labeling these attacks as 'practically irrelevant'
    is at the very least misguided and at worst a shocking display of incompetence.

    Stop the fixation with plain-text messages, most messages are not plain-text. Your average word document
    contains loads of invisible data that doesn't get rendered. Pdf's contain "junk" data that doesn't get rendered either. Would
    you notice a single bit difference in an MP3? Or a single pixel colour change in a jpeg? Hell, you can even do it in HTML <div style="visibility:hidden">Junk goes here</div>.
    Mark my words, people will find in the next couple of months find two meaningful computer
    documents that hash to the same value but are different byte-wise.

    People undervalue these attacks because the attacker has to generate the collision before hand to use it.
    To properly appreciate the power of these attacks consider the following senario.

    Imagine we're agreeing a contract of employement and I'm your employer.
    I give you the first word document that includes all the standard terms, however, I've also drafted
    a Word document that contains a load of draconian clauses like banning you from working in any IT position five years
    after leaving the company. By adding junk that doesn't render to both documents, I've managed to find to make the hash
    of the two documents collide. Thinking I'm a nice employer, you sign the first document, which you do by signing the hash of
    document. However, I now have your signature on BOTH documents. I now make sure the company IT system "forget" the first document
    and I've successfully screwed you.

    This is a human example, but there are other examples that apply in computer systems. The problem is that in many situations
    the attacker can choose when you encrypt. Say you encrypt your e-mail conversation with your friend using S/MIME, many people click
    "Reply" and the message body of the other persons method appears in the new message. Because of these attacks,
    It's now no certainty that an attacker couldn't use this fact to construct collisions that an attacker could use.

    As another security researcher said (paraphrased) It's like you're in building and you've just heard the fire alarm go off.
    You can't see smoke but it's time to make your way calmly to the exit. That sums up the position with SHA-1 and MD5. Swap out the primitives
    before you start seeing smoke.

    It's not like we don't have alternatives anyway. Whirlpool uses the same wide-trail design principles has AES. It's slower than MD-5 or SHA-1 but it's much better designed. And beside, people would do well to realise you have to spend CPU cycles to get security.

    Simon.

    1. Re:These are important attacks.. by pz · · Score: 2, Informative

      I now make sure the company IT system "forget" the first document and I've successfully screwed you.

      Not quite, beacuse of reasonable doubt and the fact that the hypothetical employee would have copies of the document.

      However, Alice can get Bob to sign an innocuous recommendation letter that in the hidden version is a power of attorney for Bob's bank accounts. Alice can then take the fraudulent letter to Bob's bank and with apparent legality, take all of his money. (The difference being that a third party is involved who is naive to the document Bob signed and his intents.) This is effectively the scenario suggested in TFA, and because there is no alternate version of the document, Bob has no recourse (unlike the parent poster's suggestion).

      To exploit this hole in MD5 (SHA-1, or whichever hashing function you like), you need to create two completely different documents, not two which are different versions of the same thing and can be compared by humans. That's the real hole: these documents need to be printed out (or rendered on a screen) and interpreted by humans who assume that the rendering process is trustworthy because it is complete and veracious. If humans natively read PostScript with the same ease we read English, for example, this kind of exploit wouldn't be possible.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    2. Re:These are important attacks.. by cpeikert · · Score: 2, Informative

      How exactly could they create junk documents that also match the expected filesize.

      The collisions that have been found for MD5 are for pairs of documents that are the same size. The size constraint is not a problem.

    3. Re:These are important attacks.. by deanoaz · · Score: 4, Informative

      If two documents exist with the same hash, then they were both produced by the same source, since there is no practical, known way of finding a collision without having control of the content of both documents. Therefore, your signed copy of the original document proves that the employer created both versions.

      --
      If 'the people' in Amendment 2 are 'the state' then Amendments 1, 2, 4, 9, and 10 benefit the state, not you.
    4. Re:These are important attacks.. by Pyrrus · · Score: 2, Insightful

      If you are using the md5 sum of a digital document, I imagine you would keep a digital copy of the document as well. You'd then go to court with your signed physical document, md5 signed digital document, and the 2nd digital document with the same md5 sum and junk data that appears to be inserted to cause a collision. The employer would have a bit of explaining to do.

    5. Re:These are important attacks.. by Basje · · Score: 4, Insightful

      The one that makes the claim. In the above example you go to work for another employer. The malicous employer slaps you with the fraudulent contract _claiming_ you cannot do that.

      Then you whip out the original document. Presto: both documents have the same signature. Now it's up to the party making the claim to prove which contract was signed. Impossible.

      The reverse is also true: the programmer works on a super secret project. He signs a secrecy declaration. If he can apply the exact same signature to his lunch agreement, and walks out with the sourcecode, his employer is screwed, as per the example above.

      Etc. An agreement is legally binding in any form. The only reason to have a signed copy of any agreement (digitally or paper) is to have proof of the agreement. If the signature is no longer proof, no agreement rises above verbal agreements, evidence wise.

      --
      the pun is mightier than the sword
    6. Re:These are important attacks.. by hritcu · · Score: 2, Insightful

      I think that the most important type of attack involves binaries. For example I could surpass tripwire and enable an exploit that was never visible in my binary code. Almost none of you would be able to tell the difference (a couple of bits), not even tripwire (md5), between the exploitable and non-exploitable version of the program. But the change in behavior could be different so that the exploit is possible.

      Steps:
      1. Write a very useful tool
      2. Compile it
      3. Tamper the file to get two almost identical versions, one exploitable, the other "secure"
      4. Release the source and the secure binary
      5. Wait for the tool to achieve widespread adoption based on its "security"
      6. Overwrite the secure version with the exploitable one on your server and log the IPs of the people downloading it
      7. Wait a little more
      8. Use the exploit and gain access to the exploitable machines
      9. Once you are in charge overwrite the exploitable version of the tool with the "secure" one - tripwire should not detect this
      10. Party

      --
      If you don't fail at least 90 percent of the time, you're not aiming high enough. (Alan Kay)
    7. Re:These are important attacks.. by snorklewacker · · Score: 2, Insightful

      > Who'd have to explain things then?

      The "reasonable person" standard prevails. Remember, they have to explain the existence of that extra padding in the document. The jury, assuming it's composed of reasonable people ...

      sorry, had to take a moment. my co-workers are no doubt wondering what the hysterical laughter was about ...

      the jury is not likely to believe that you signed away your rights. Now if the employer's document specified a million-dollar signing bonus, then he pulled a switch on you to simply remove it, then the jury might be less inclined to believe you.

      Doubtful this sort of thing could happen with employment. A government contract on the other hand ...

      --
      I am no longer wasting my time with slashdot
    8. Re:These are important attacks.. by uncqual · · Score: 2, Interesting
      If I understand your point, in this case, one can reasonably assume the employer made the contract (it's unusual for employees to craft employment documents).

      However, more symmetric relationships (such as a merger of two companies or even an independent contractor providing IT services to a business) usually have both sides exchanging documents back and forth and they eventually end up with a version that requires no further revision - so it's not possible to figure out (with JUST the two "versions" of the agreement with the same hash) which side "produced" the final copy (and hence, was in a position to orchestrate the second bogus version).

      Am I missing something about the argument?

      --
      Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
    9. Re:These are important attacks.. by ars · · Score: 2, Informative

      Every time this comes up people mention the same idea - use two hashes for security!

      What people don't realize is that MD5 _IS_ two hashes already! That's how it works!

      To make it worse the hashes you mentioned MD5 and SHA-1 are based on exactly the same algorithm, so using it twice doesn't help you much.

      --
      -Ariel
    10. Re:These are important attacks.. by Fnord666 · · Score: 2, Informative
      You, the employee, are going to argue that the company created both documents with matching hashes, then had you sign the more benign one. The company is going to argue that you agreed to the draconian version, then created the benign version at a later time that has a matching hash and are just trying to get out of the obligation. The problem is that:
      1. Both versions are possible
      2. Explaining any of this to people who haven't figured out how to get out of jury duty isn't going to happen. Ever seen the glazed looks on jury members when expert witnesses testify?

      If I understand correctly, what this boils down to is that given an document and an MD5 hash, there is now a "reasonable" time based method of generating a second document with different content but a matching hash.

      For a hash based signature, there will exist documents that have matching hashes. This is refered to as the pigeonhole principle. If you have 10 pigeonholes to stuff messages in and 11 messages, one of the pigeonholes will get a second message.

      The linch pin of this process is the idea that it takes too much time to find or create a second document that has different content but the same hash value. In practice you want it to take so long that by the time a match is found, it no longer matters. What "no longer matter" means depends on the context.

      When an adversary can create/find a match in a couple of hours rather than centuries, all bets are off unless the signature expires in seconds.

      This really is an important result and has significant implications.

      --
      'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
    11. Re:These are important attacks.. by DickBreath · · Score: 2, Interesting
      To do what these guys did, you need a format that can:
      • Contain "junk" data that is never displayed.
      • Contain two different "messages" only one of which will be displayed.
      • Be able to select which message to display based on the result of a calculation involving the junk.


      Microsoft Office and OpenOffice.org documents both can contain executable content which can execute when the document is opened, and which can alter the contents of the document.

      I am not very familiar with Microsoft Office, but in OpenOffice.org, the default settings are to warn you when you open a document containing any macro code -- but the user can have turned off this feature.

      I don't know about MS Office's binary format.

      OpenOffice.org documents are simply Zip files of XML. (Yes, try renaming your OOo document's extention to ".zip" and unzipping it.) I know for a fact that I can take an OOo document written on Windows, move it to Linux, unzip it, and then re-zip it (using the "zip" command line tool) to get a smaller better compressed, but otherwise identical OOo document that opens in all versions of OOo. It may be possible to construct an OOo document that is a Zip, but where one or more zip file entries are completely UN-compressed, and therefore, where this technique could be used.
      --

      I'll see your senator, and I'll raise you two judges.
  3. no help by LiquidCoooled · · Score: 2, Insightful

    Anything you hash together will ALWAYS result in collisions.

    Extracting the formatting and code from the document will just make it EASIER to create a duplication.

    "Hello World!" might match with "Hello World!!!!!! this is extra stuff"

    at least leaving the exact formatting instructions in gives you a chance that the collision which leaves a compatible file is much more difficult than the hash of the simple raw text.

    --
    liqbase :: faster than paper
  4. Re:Wow...this is nerdy even for /. by k4_pacific · · Score: 2, Funny

    Basically, they used a high-powered particle accelerator to create MD5 collisions between human-meaningful documents, thus forging the missing link between thermodynamic and informational entropy.

    I hope that clears things up.

    --
    Unknown host pong.
  5. What are the alternatives? by CyricZ · · Score: 2, Insightful

    If MD5 is found to be insecure, what are the alternatives we can use when signing our open-source packages? Is there any other alternative that is even approaching the widespread use of md5sum?

    --
    Cyric Zndovzny at your service.
    1. Re:What are the alternatives? by specialbrad · · Score: 4, Informative

      The signing of open-source packages are to prevent download corruption usually. If a download is corrupted, the data will be different, and hence the hash will be different. Most of these attacks are malicious in that you have to go great lengths to find a collision to use. If your connection corrupts the download in such a way to produce a collision, your modem obviously hates you.

    2. Re:What are the alternatives? by rbarreira · · Score: 2, Informative

      I may be wrong but I think that for that purpose, the use of MD5 is still quite secure. What those researchers did was make 2 files with the same MD5, they didn't choose the md5 value itself. In order to crack the schemes you're mentioning the md5 value is a given value for which you want to generate another file (many times with the additional restriction that the file sizes must match).

      Read about collision attacks versus preimage attacks here.

      Unless you're assuming that at least one of the people responsible for redistributing the software have bad intentions?

      --

      The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F
  6. Text of Letters by pete-classic · · Score: 3, Informative

    For those who can't convientently view PostScript files, the text of the two letters:

    Julius. Caesar
    Via Appia 1
    Rome, The Roman Empire
    Alice Falbala fulfilled all the requirements of the Roman Empire
    intern position. She was excellent at translating roman into her gaul
    native language, learned very rapidly, and worked with considerable
    independence and confidence.
    Her basic work habits such as punctuality, interpersonal deportment,
    communication skills, and completing assigned and self-determined
    goals were all excellent.
    I recommend Alice for challenging positions in which creativity,
    reliability, and language skills are required.
    I highly recommend hiring her. If you'd like to discuss her attributes
    in more detail, please don't hesitate to contact me.
    Sincerely,
    Julius Caesar

    Julius. Caesar
    Via Appia 1
    Rome, The Roman Empire
    May, 22, 2005
    Order:
    Alice Falbala is given full access to all confidential and secret
    information about GAUL.
    Sincerely,
    Julius Caesar

  7. Explanation of the attack by swillden · · Score: 4, Informative

    What these researchers did was not to improve the known attacks on MD5, but to demonstrate a clever way of turning the known attack, generally considered to be of theoretical interest only, into an attack that could potentially really be used.

    The way they did it was to create a postscript document that actually contains two documents, one that the sender would be willing to sign and one that he presumably would not. The full text of both is contained in the file, but near the beginning of the file is a bit of code that compares two blocks of random-appearing bits, call them A and B. If A == B, the postscript interpreter will select the innocuous message and display that. If A != B, the interpreter will display the other message.

    The researchers then generated a pair of blocks with the same MD5 hash. In one copy of the postscript file, they used one of these blocks as both A and B. In the other copy, they used one block as A and the other as B. Because every bit of both documents before and after the two blocks is identical, and because those blocks hash to the same value, the documents hash to the same value.

    It's an interesting attack. It only applies to documents that are also programs, in some sense, but we use lots of document formats that fit that description.

    A simple countermeasure that makes such an attack more difficult is to compress the documents before signing.

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  8. Re:Wow...this is nerdy even for /. by William_Lee · · Score: 2, Informative

    Thanks for clearing that up. Now it all makes perfect sense!

  9. Re:huh? by Captain+BooBoo · · Score: 2, Funny

    Are you kidding? ANY article with the word "hash" in it is going to grab the attention of even the weakest synapse. ôô

  10. do you know how big 2^128 is? by frovingslosh · · Score: 4, Insightful
    Collisions are NOT and accidential everyday occurance. What is being discussed here is a deliberate md5 match, created by making just the right changes to a document to intentionally get an md5 match.

    2^128 is huge. It's larger by far than the number of all the files in all of the computers in the world. It larger than the number of stars in the universe. Chance collisions will not become an everyday occurance. No accidental collision has ever been found yet. Switching to larger keys will not change anything. Sure, they might make it slightly harder to make a deliberate collision (although I don't know for a fact that they make it harder at all, there were some reports of someone in Japan being able to create a collision by hand with only pencil and paper), but just wait 2 months and the computing power will catch up with that. It's not a matter of the size of the hash function.

    --
    I'm an American. I love this country and the freedoms that we used to have.
    1. Re:do you know how big 2^128 is? by RealityMogul · · Score: 5, Funny

      2^128 is huge. It's larger by far than the number of all the files in all of the computers in the world

      Pfft, let me show you my porn collection.

    2. Re:do you know how big 2^128 is? by QuestorTapes · · Score: 2, Insightful

      > Collisions are NOT and accidential everyday occurance.

      Probably.

      Playing the Devil's advocate here. Collisions would be impacted, possibly -very- strongly, by exactly how the GUIDs are generated. Just because there are 2^128 bits to play with doesn't mean we are using them all. There may be significant gaps; a large number of values may never be generated by the algorithm used.

      Combine with human errors in coding the algorithm and the odds of a collision may be much greater than expected.

  11. Kids in Finland don't agree by StreetFire.net · · Score: 5, Insightful

    Regarding being "practically irrelevant"

    "every time [some software engineer] says, 'nobody will go to the trouble of doing that,' there's some kid in Finland who will go to the trouble."
    Taken from Kevin' Mitnik's "The Art of intrusion"
    http://www.amazon.com/exec/obidos/tg/sim-explorer/ explore-items/-/0764569597/0/101/1/none/purchase/r ef%3Dpd_sxp_r0/104-8074733-7395136

  12. Possible solution by orb_fan · · Score: 2, Insightful

    Is the answer then to create a hash that is in fact the sum of two distinct hashing alogrithms? For example, use MD5 and SHA1. Since the alogrithms use different methods, the set of collisions from one would not overlap the set from the second (or rather the overlap would be vanishingly small). And if the overlap was larger than you wanted - just keep adding different hash alogrithms until you are satisfied.

    I realise that the computations involved aren't cheap, but it becomes a trade-off between security and speed - the more sure, the more time it will take.

    1. Re:Possible solution by orb_fan · · Score: 2, Informative

      You are assuming that I read hashing alogrithm threads, which I don't. If there is a reason why this wouldn't work, then please enlighten me.

  13. 7 bits difference by lavalyn · · Score: 2, Informative

    Compare the two files, and you see that there were only 7 bits that changed, all on the 5th most significant bit of the affected byte.

    I guess it's obvious from looking at the ps file in text form, but if it's that easy to mangle postscript to display two different layers (or is it changing comments or pointers? I am not a binary postscript parser.) I still don't think it's time to throw md5 away yet. Six months.

    What other hashing alternatives are there these days? SHA-1 apparently has the same kinds of weaknesses.

    --
    Doing the Right Thing should not be preempted by making a buck.
  14. Perhaps its time for another layer of protection. by vonstauf · · Score: 3, Funny

    We could just couple it with another widely used industry standard

    --
    " Yesterday upon the stair I met a man who wasn't there. He wasn't there again today. I wish that man would go away."
  15. Re:Wow...this is nerdy even for /. by jjares · · Score: 4, Informative

    Basically, when you do an md5 for a string, you transform an existing text with a variable length to a fixed length string. Now, imagine the variable text is 200bytes long, but the fixed string is 20 bytes long, you are obiously loosing information, and that there may be a combination of 200 bytes that produce the same 20 byte sequence, but the amount of combinations in 20 bytes (160 bits) make it highly unlikely that you will find a repeated sequence. What this investingators found is a way to replicate this sequences. The problem being that usually we check integrity with this md5 hashes, so teoretically, someone could alter a text and produce a new one that seems (from the md5 hashes) identical to the first one. This is specially nice for putting backdoors in source code downloaded from the net, as we often check it against an md5 hash.

  16. Okay, I'm impressed. by ave19 · · Score: 4, Interesting

    At first I thought: Postscript! Well, obviously. To find a collision, you've probably got to hide a clump of randomness in the document, and then rotate that clump until the hashes collide. If you tried to hide random data in a text file, it would be obvious to the person signing it. You need some format to hide the random bits from the viwer.

    I bet the random parts are REALLY BIG! I mean, you'd probably need a lot of random data before you could find a collision...

    Then I downloaded the files...

    There's almost nothing to them! I can't read PS, so I'm not sure how many of that handful of bytes at the beginning might be tweakable... but it's a lot less than I expected.

    Collisions must be very easy to find! I am now offically very worried about this.

    --
    ...or maybe not.
    1. Re:Okay, I'm impressed. by yeremein · · Score: 3, Informative

      To find a collision, you've probably got to hide a clump of randomness in the document, and then rotate that clump until the hashes collide.

      That's actually not what they did. They generated two essentially random blobs of data that have the same hash. We'll call these X and Y. They then created a PostScript document containing BOTH messages, the one that Alice's boss would sign and the one he presumably would not sign. They inserted two copies of block X into one of these documents, and a one X and one Y into the other. The original document contained code that compared the two blocks, and if they were the same, caused one message to be rendered, or if they were different, caused the other message to be rendered. Thus both documents hash the same (since X hashes the same as Y), but you see different text when you view the files.

      This sort of attack would only work on documents that can contain code of some sort. It would not work on text files.

  17. So you're saying I shouldn't implement MD5 ... by malcomvetter · · Score: 3, Interesting



    in my next big project?

    In all seriousness, I believe Schneier's right. We need a competition for a new hash function.

    Nah, let's just wait for 24 to drop the words "MD5" before we know it's really bad.

  18. Relative Resources: The Attackers' Advantage by G4from128k · · Score: 4, Interesting
    The core problem is that the resources of the defender are inevitably overwhelmed by the resources of the attacker.
    1. Relative expected values of gian vs. loss: The attacker thinks "I know I can gain a #BIG_NUM million dollars" and devotes their full effort to the attack. The defender thinks "I'm safe, there's a low probability, and I'm sure I'll catch the problem before it becomes real money, " and does not not devote effort to security becuase a Gartner report told him it was over-hyped. Thus, the attacker's perceived expected value is much higher than the defenders perceived expected loss and each invests accordingly.
    2. Rising Complexity: As IT systems become more complex, they become less secure. Each new device, new networking protocol, new physical layer, new OS feature, new networked application provides new opportunities for the attacker and a dilution of security resources for defenders.
    3. Time: The attacker has the advantage of time. New algorithms, new mathematical theories, new exploits, and faster processors all favor the attacker. What once was supposed to take the age of the universe to crack can be decrypted with a quickly declining number of networked (even zombied) PCs.
    4. Curse of Compatibility: Because so much crypto and security is networking related, it is subject to implementation delays caused by the need to be compatible. Defenders continue to use old, vulnerable systems to maintain compatibility with key partners. Patches don't solve the problem because the patch itself can introduce incompatibilities that make defenders leary of applying the patch with a very real chance of causing problems to avoid a hypothetical security issue.
    The bottom line is that the defender must protect all vulnerabilities while going about the day-to-day business of using the computer. In contrast, the attacker can devote full time to any weakness of their choice.
    --
    Two wrongs don't make a right, but three lefts do.
  19. Consider it to be an example... by Otto · · Score: 3, Informative

    If you don't know what a hash function is, then turn in your nerd credentials now.

    Basically, they provided an example case where one of these recent methods to generate hash function collisions can be turned into a "real world" attack.

    It's a very simple example case, but it demonstrates the point effectively. The point is that these recently discovered methods to generate collisions quickly are a real threat to any software using them as a method for digital signatures and such.

    The real world application here is that it is possible, probably in several good ways, to generate a couple of different files that have the same hash and also have meaningful data in them. The attacks found that generate seemingly random data with the same hashes can be used in ways that will let them apply to non-random, purposefully designed data.

    The example they use is where some secretary gets her boss to sign a document, and then uses his signature on another document which gives her access she shouldn't have. It's a way to forge a digital signature on a document by having them sign another one that you specially crafted.

    --
    - Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
  20. From a prior discussion... by erroneus · · Score: 3, Interesting

    I forget where or when exactly, so please feel free to run a search if you care to... it was here on Slashdot though.

    There was talk about someone being able to foil P2P networks by seeding bad stuff through random data formulated to fit the MD5/SHA1 code from legitimate files shared on those networks. The consensus was that it was BS and that even if it weren't BS there could be updates to make such attacks more difficult or impossible to perform.

    Am I missing something or are these two stories relevant to each other?

  21. Works for certificates, too by cpeikert · · Score: 4, Interesting

    Lenstra and others came up with a way to generate syntactically-correct X509 certificates that collide under MD5.

    Here's a link to the paper: Lenstra et al.

  22. Re:Security Through Obscurity by Dachannien · · Score: 3, Informative

    We're talking about cryptographic hashes here, not encryption. Encryption is meant to be a reversible process, and is therefore one-to-one. In other words, there's no concern over collisions with encryption.

    With cryptographic hashes, you're throwing away nearly all of the data to obtain a hash (a number) which represents the larger data set in such a way that (hopefully) the hash will never turn up again in practical usage. The article here indicates that there are ways being devised to force two data sets to have a hash collision while keeping the practical parts of the data sets the same.

    As for accusing encryption of being "security through obscurity", you're misusing that term. If knowing the encryption algorithm allowed you instant access to all data encrypted with that algorithm, then yes, the only security present would be dependent upon the secrecy of the algorithm itself. But that's not the case here. Encryption typically works by public key exchange, meaning that a key (a number) used to encrypt messages is shared with the encrypting partner, while the key to decrypt and recover the data is kept private (is never transmitted). Recovering the private key through brute force is not a compromise of the algorithm itself - given enough time, any private key can be recovered, regardless of the algorithm, but by increasing the key size arbitrarily, the time taken to find that key can also be increased arbitrarily.

  23. Re:Security Through Obscurity by MyTwoCentsWorth · · Score: 2, Informative

    Dear me,

    Your ignorance is so appalling that I have a hard time deciding where to start.

    First - this is about digital signatures, not about encryption - sharing the documents (by sending them over electronic media or otherwise) is needed.

    Second - obfuscated algorithms ??? Open any book on encryption to learn that the strongest encryption algorhitms rely on completely documented algorhitms - security through obscurity is so 80's...

    My suggestion - try to have something worth saying before making suggestions.

    Happy Posting.

  24. Re:Explanation of the attack -- enforcement issues by Eclectic+Engineer · · Score: 2, Interesting

    It is an interesting attack, and IANAL, but I'd be curious about the legal ramifications. If I slip a carbon (ah... the way-back machine) in a stack of papers and ask someone to sign the top one without thus informing them, I think my stealth probably invalidates the additional document(s).

    You could argue that there's a noticeable difference between pen and carbon -- making the copy hard to enforce -- but I'd argue the digital version is even easier: at least in the PS example, both "copies" of the document need to be present to preserve the hash.

    In normal (pen/paper) signature situations, I get a copy of what I signed. The same ought to apply to digital sigs, resulting in a simple legal challenge to the validity of the document.

  25. Re:Wow...this is nerdy even for /. by Ann+Elk · · Score: 2, Insightful
    • Download something critical, say, the current Linux kernel from kernel.org.
    • Insert a trojan/backdoor/whatever.
    • Manipulate the tar archive so the hashes match. This is the subject of TFA.
    • Somehow upload the trojaned kernel back to kernel.org.
    • Since the hashes for the original kernel and the trojaned kernel are identical, they both appear valid when checked against the signature.
    • ????
    • Profit!

    Before someone starts bitching about "lack of trust" in open source, please replace kernel with security update and kernel.org with microsoft.com in the above text.

  26. On the contrary... by Chris+Pimlott · · Score: 5, Insightful

    This attack shows us all once again that there is that the procedures for using cryptography are as important as the mathematical theories and proofs on which cryptography is based. People like to believe that it's just the algorithm that's important, and once you have such an algorithm it's equally applicable to messages of all sorts and formats. As this shows, it's clearly not the case.

    You may believe it's common sense, but to the average user, encrypting a simple letter like the memos used in the article expressed as a Word document is no different than encrypting a simple text email. Heck, many of these users probably have no idea that much of the plain-looking email they send and recieve is actually HTML, which is capable of hiding beneath its rendered surface all sorts of additional information.

    When's the last time you saw an email program that read in a Word document, extracted just the plain text content, signed or encrypted it and then repackaged it into some new format in a cryptographically sound way that would automatically be reconstituted as a Word document on the other side? Most just have a handy "Sign" or "Encrypt" button that will happy accept .ps or .doc just as readily as a simple text file.

  27. The provided exploit documents can be edited! by rar · · Score: 3, Insightful

    What I haven't seen mentioned yet, and people perhaps haven't realized, is that in providing these two postscript files, they have essentially provided you with an postscript signature exploit kit!

    All you need to do is download the two postscript documents and do *exactly corresponding edits* in both of them, and you get two documents saying different things and still have the same md5sums!

    I just tried exchanging Alice's name for my own, and surely it did work.

    Now, if they released a pdf-file hack, I would be genuinely worried :)...

    1. Re:The provided exploit documents can be edited! by rar · · Score: 4, Informative

      It is the same document, just relying on differences in the document name (it appears) to generate the different pages.

      No, you have missed the point. Go back and rtfa again. The attack still works if you rename the documents to the same filename.

      The difference lies in a generated "binary cookie" in the beginning of the postscript documents. This "cookie" makes the postscript intepreter either select to show document 'A' or 'B'. The "thing" with the cookies are that they are carefully selected to be md5-colliding. Result: both documents have the same md5sum.

      You can change the rest of the documents freely if you make the same changes in both documents. The md5sum will change, but it will still be the same for both documents.

      So. No. It is indeed a md5 collission attack.

  28. Re:big oops by tota · · Score: 2, Informative

    sha-256 and later have not been found to be weak yet. It does not mean that they are unbreakable (or I should say: possible to derive collisions) but it is definitely better.

    Another solution that should work quite well is to combine hashes: (md5+sha1) is definitely much stronger as you would need to find a collision that works for both algorithms at the same time.
    Possible, but not likely.

    --
    TODO: 753) write sig.
  29. Not exactly useful for fraud... by argent · · Score: 4, Insightful

    Clever, but it means the attack is not a general way to forge an MD5-signed document... you couldn't use this (for example) to seed a P2P network with malicious files that look like safe ones. It only works if you generate both documents, and it can only be used maliciously if it's never examined by an expert: the signer can't retain a copy of the signed document or obtain a copy through discovery.

  30. Re:Security Through Obscurity by loose_cannon_gamer · · Score: 3, Informative

    I think that's a big shortsighted... I agree that if we let history take a crack at it, that any encryption put together by smart people will eventually be breakable by smart people.

    However, most data that I deal with day-to-day is time relevant. Do I care if someone figures out my credit card number on an account I closed 5 years ago? Is it terrible if someone hacks an old email only to find out I was begging a professor for a passing grade in 1997?

    Encryption is meant to hide things, and for many things, the need to hide is temporary. If the hidden thing stays hidden as long as it needs to stay hidden, there is nothing wrong with it.

    Know the limitations on the technology you use, and know the parties with which you exchange information. Those two rules alone, if followed, will probably provide more than adequate real-world defense. Perfect? No. Good enough. Statistically, yes.

    --
    In Soviet Russia, us are belong to all your base.
  31. Size matters (even now) by Beryllium+Sphere(tm) · · Score: 2, Informative

    >Switching to larger keys will not change anything.

    Switching to longer hash sizes (not keys) will help against the current generation of SHA attacks. The effect of the Wang attacks is to knock (if memory serves) 16 bits off the work factor to find a collision. In other words, finding a collision in SHA-1 is now 64K times easier than brute force. Make brute force harder and the attack gets harder.

    Some numbers: SHA-1 produces a 160-bit hash. If you're trying to find a collision as oposed to generating one the birthday paradox works in your favor and you only need on average 2^80 trials for brute force. Wang has lowered that number to 2^64, which is uncomfortably small by today's standards and reckless by tomorrow's. SHA-256 produces a 256-bit hash: you do the math.

    The big worry is that cryptographic attacks tend to inspire research that finds better attacks. There's an unquantifiable but real risk that SHA-1 could collapse altogether.

    The crypto geeks are speulating that these weaknesses are inherent in any hash algorithm that can be calculated incrementally. Right now the output after block N of input depends only on blocks 1..N of input. If the intermediate result for block N depended on blocks N+1..EOF then the current attacks wouldn't work. Neither would hashing an input stream. Such a change would destroy the ability to hash a file in parallel with reading it.

  32. Re:Wow...this is nerdy even for /. by Beryllium+Sphere(tm) · · Score: 2, Insightful

    >teoretically, someone could alter a text and produce a new one that seems (from the md5 hashes) identical to the first one

    Not exactly. Not unless the attacker could choose the first text. The new attacks allow you to create two documents that collide, but don't (yet) allow you to take an arbitrary document and make another that collides with it.

    That word "yet" is important. A lot of bright crypto people will now start working on "preimage" attacks and they've got some new tools to work with. Be afraid. In a calm constructive way of course.

  33. That won't solve it by gotr00t · · Score: 3, Insightful
    Think of it this way: You can hash _any_ file of _any_ size using either MD5 or SHA1, and these algorithms will hive you an alphanumeric hash, which has a limited number of permutations. Thare are an infinate number of unique files (assuming adequate storage space), yet there are finite unique hashes.

    Therefore, no matter how many algorithms you sum up using your described method, the number of collisions is still infinite in amount. It is not the algorithms that are flawed, rather, it is the fundamential concept of hashing that allows collisions to happen.

    I would assume that the way to reduce the number of collisions is by increasing the length of the hash itself so as to increase the number of unique hashes.

  34. Re:Just use multiple hash methods by skelly33 · · Score: 2, Informative

    "It is probably not possible to get a pair of files that will collide in all three of those hashes."

    Not impossible, just less probable. Combining algorithms effectively increases the size of your fingerprint. The probability of a duplicate is a function of original data size and its proportion to the fingerprint data size. The closer those two data sizes are, the lower the probability of duplicates.

    It may seem difficult to pull off now, but eventually someone will make a nice, clever script to do it and then the script-kiddies will have a field day.

  35. This has a solution by yfarjoun · · Score: 2, Informative
    This is another example of the implementation of the birthday "paradox" to hashes as the so-called "birthday attack".

    See http://en.wikipedia.org/wiki/Birthday_paradox for details.

    Basically if you have enough documents you will be able to find two with the same hash. Not to be mistaken with the difficulty of finding a document with the same hash as a GIVEN document. This cannot be avoided regardless of the hash.

    To foil a birthday attack, NEVER agree to sign a given document. One must ALWAYS add some random junk to the document (clear text or hidden). This should be implemented in sofetware that helps in signing, but if the option isn't given, you can add a clear text "random" string.

    Yossi

  36. Re:Misuse of MD5? by thelittlestbuddy · · Score: 2, Insightful
    I hope this post was a troll.

    But if it wasn't let's just clear a few things up. MD5 is a cryptographic hash function, and is intended for use in things like digital signatures, message authentication codes, and many, many other applications. The attack in the paper demonstrates that MD5 is not suitable for use with digital signatures.

    By the pigeon hole principle, your "magical fingerprint" can never exist for compressing arbitrary files.

  37. Re:Wow...this is nerdy even for /. by Jasin+Natael · · Score: 2, Interesting

    It bears mentioning that md5 doesn't account for the length of the file. So if someone were to try installing a backdoor into a program, and had a sophisticated enough piece of software using this method, comments, metadata, or other information could be used to 'pad out' the file to make it seem like the original -- even with source code files. Especially in the case of executables, they could just insert random crap at the end of an executable file, and make the md5 hash (and possibly the size) come out identical to the original. Some of these have already been demonstrated.

    While this collision will be a big deal for signing documents, it shouldn't have any effect on web security (Digest Authentication, for one, uses MD5 pretty extensively). The lesson: While MD5 is still reasonably difficult to collide, the time to find collisions (~5 hours) on a normal PC means that malicious uses are now practical.

    I'm not entirely sure what the implications are -- would it be suitable to sign documents using multiple cryptographic functions (such as a signature containing SHA, MD5, and CRC32 hashes, along with the original filesize)? Maybe perform a simple, arbitrary transformation on the text content and use that to generate a seperate, complimentary MD5 or SHA hash?

    Jasin Natael
    --
    True science means that when you re-evaluate the evidence, you re-evaluate your faith.
  38. Nonces And Blind Signatures by Anonymous Coward · · Score: 2, Informative

    I can't believe that nobody, including Bruce Schneier, has pointed out the well understood principle that you *never* blindly sign something provided to you by someone else.

    Secure schemes involving one party signing material provided by another party should always involve a signing nonce. This is a random bitstring appended to the material to be signed. In the example from the article, if Caesar had added a nonce before signing the "good" document, then Alice would not have been able to produce a "bad" document with a matching hash.

    However, as someone pointed out, this kind of attack is possible without using hash collisions. You simply have to convince your victim to sign something that is code-like (such as the postscript files used in the example, or an HTML document with javascript, or a Word doc with Visual Basic macros, or an executable) without making an expert examination of the complete contents of the code. The code-like "document" can be written to present one set of behavior under one set of circumstances and a completely different behavior under other circumstances. If the signer is not competent to examine the program and be fully aware of what it does, then the signature can have no meaning.

    The attack demonstrated in the article is *still* of no practical consequence in my opinion. It doesn't mean that signed code can't be trusted; trusted code is always signed by it's creator, meaning that if the signature is valid *and* you trust the creator, then you can trust the code. It is still impossible for someone who didn't create the signed code to create a different piece of code that has the same hash, so you don't have to worry about an attacker replacing signed trusted code with signed malicious code. As always, you still have to worry about misplaced trust in the code's creator.

    What the attack in the article *should* be trying to make clear is that a signature is only as trustworthy as the signer's ability and desire to evaluate what is being signed. A third party auditor that signs code after reviewing it doesn't ensure that the code is trustworthy, it only ensures that the auditor *claims* the code is trustworthy. What the implications are for the actual trustworthiness of the code depends on the trust you place in the intentions and abilities of the auditor.

  39. Re:well... by Dwonis · · Score: 2, Insightful
    substitute "the last bit of each byte" or "padbyte=rand(DVD-byte)" where rand() is a random-# generator and it's random enough.

    [Disclaimer: I'm not an expert at cryptography, but I like to think I understand it better than most non-mathematicians outside the field. It would be really nice if a crypto expert could clarify this, but I don't expect that to happen on Slashdot.]

    You are correct that your scheme would add some security, but not nearly as much as our intuition might lead us to believe.

    Let's say you are going to transmit an n-bit message. Even if you don't transmit any information, an attacker knows that there are 2^n possible messages that you could transmit. If we assume that your message is compressed as much as possible, then before you transmit the message, all 2^n messages are equally likely (from an attacker's point of view).

    So, you have n bits of data. We'll call this message P (the plaintext). Now, let's say you generate another n-bit random message, called K (the key). Finally, you xor P and K together to produce C (the ciphertext), which is also an n-bit message.

    The theory behind the one-time pad says that if and only if there are at least 2^n equally-likely possibilities for K, then someone who only knows C cannot learn anything about P.

    We can express this a different way. Let's say you have an invertible function, C = f(P), and:

    • P is 2^n bits long.
    • C is 2^n bits long.
    • There are at least 2^n equally-likely functions for the function f

    Note that the function f is just a generic representation of the one-time pad algorithm and the key K, so similarly, we cam say that an attacker who knows nothing about the function f cannot learn anything about P from only C.

    And that's the problem: every time you transmit a message (P) that isn't *completely* random, you give the attacker a little information about f, unless you completely change f every time you transmit a new message. This why you can never re-use a key in the one-time-pad system.

    So let's say you have a key, K, that has fewer than 2^n equally-likely possibilities. Then, there are fewer than 2^n possible functions f. If there are still 2^n possible values for P, then an attacker can learn some information about P from C. So, if you don't want that to happen, you need to have 2^n possible functions for f.

    So, you have 2^n equally-likely functions for f, and you need to use a different one for every message. In order to let the recipient know which function to use each time you transmit a message, you have to transmit at least n bits of information to the recipient. I think you can see where this is going...

    If you were going to write an algorithm to implement the function f, the optimally-compressed description of the algorithm would have to be at least n bits long, and would need to be replaced for every new message that you send. It doesn't matter if f is an algorithm based on a DVD library, or a really complicated program. In order for the one-time pad to work (an attacker learns nothing about P from C), you need make sure that there are at least an additional n bits of information that the attacker knows nothing about.

    So in your example, you'd still need to send a new 4GB (optimally-compressed) version of rand() for every 4GB message you send.

    Nice try, though. Keep it up!

  40. Re:Wow...this is nerdy even for /. by dossen · · Score: 2, Informative

    It is actually not entirely true, that md5 (and sha1) does not account for the size of the file. Both md5 and sha1 use the same Merkle-Damgård structure, where the same function is applied to a running "total", initially an initialization vector, and a fixed-size block of the input. For both hashes, the blocks are 512 bits long, and the last block is padded and ends with the length of the file, as a 64-bit integer. So unless one of your files is extremely long, size is taken into account. Doesn't mean that size-altering is impossible, but the linked attack for one does not alter the size.