Slashdot Mirror


MD5 Collision Source Code Released

SiliconEntity writes "The crypto world was shaken to its roots last year with the announcement of a new algorithm to find collisions in the still widely-used MD5 hash algorithm. Despite considerable work and commentary since then, no source code for finding such collisions has been published. Until today! Patrick Stach has announced the availability of his source code for finding MD5 collisions and MD4 collisions (Coral cache links provided to prevent slashdotting). MD4 collisions can be found in a few seconds (but nobody uses that any more), while MD5 collisions (still being used!) take 45 minutes on a 1.6 GHz P4. At last we will be able to implement various attacks which have been purely hypothetical until now. This more than anything should be the final stake in the heart of MD5, now that anyone can generate collisions whenever they want."

411 comments

  1. SHA1 by mysqlrocks · · Score: 5, Funny

    So is SHA1 the recommended alternative?

    1. Re:SHA1 by Mind+Booster+Noori · · Score: 3, Informative

      SHA-1 is not the sollution.

    2. Re:SHA1 by psykocrime · · Score: 5, Informative

      So is SHA1 the recommended alternative?

      No, see:

      http://www.computerworld.com/securitytopics/securi ty/story/0,10801,99852,00.html

      and

      http://www.computerworld.com/softwaretopics/softwa re/story/0,10801,105875,00.html

      I like this quote:

      "SHA-1 is a wounded fish in shark-infested waters, but I'm more worried about MD5 because it's used everywhere," said Niels Ferguson, a cryptographer at Microsoft Corp. "Try to switch away from SHA-1 as quickly as you can, but switch away from MD5 first," he said when asked what recommendations he has regarding the algorithms during a panel discussion at the conference.

      --
      // TODO: Insert Cool Sig
    3. Re:SHA1 by mboverload · · Score: 1

      It was a JOKE people!

    4. Re:SHA1 by Anonymous Coward · · Score: 5, Informative
      No, MD5 and SHA1 were found to have better than brute-force attacks within a few months of each other.

      Crypto people are recommending SHA-256 or SHA-512 which is only like SHA-1 in name.

      Obviously check your the hash length beforehand and make sure your database column is wide enough.

      When migrating existing hashes to the new hash be careful not to store the old hash anywhere -- that can be the weak link in the chain. For example, generating passwords and having the MD5 around lets attackers generate valid inputs and then try them against the more computationally complex hash. It gives them an approach to attacking your stronger hash.

      Take a copy of your database and hash all the existing passwords into SHA-512 form, and you'll need some way of distinguishing the MD5-to-SHA512 hashes from the SHA512 hashes, so add a date column with todays date in it. Then write a function "hashString" as a wrapper that can identify when something was hashed, and go down a different branch of code based on that.

      The first branch does MD5 then SHA512, the second branch does SHA512, and it does this based on the date column.

      And, of course, re-salt both branches.

    5. Re:SHA1 by flosofl · · Score: 3, Insightful

      For the love of _______! (fill in appropriate name for your particular beliefs)

      Someone mod the GP post Funny, before we get more "informative" posts. It looked like a tongue-in-cheek comment to me. I actually laughed. Then I saw the follow-ups from the Unfunny Brigade... It was a joke!

      Seriously, who doesn't know about the SHA-1 weakness by now.

      --
      "This calls for a very special blend of psychology and extreme violence" - Vyvyan "The Young Ones"
    6. Re:SHA1 by jacksonj04 · · Score: 1

      Surely if all you are doing is hash comparison for password security you may as well just hash once to MD5 and once to SHA-1. The chances of there being an input which is not the password which collides with both the MD5 and the SHA-1 hashes and which passes your password input validation (Under 20 chars, alphanumeric only etc) is so slim as to be negligable.

      --
      How many people can read hex if only you and dead people can read hex?
    7. Re:SHA1 by Crizp · · Score: 1

      Well sometimes it's quite hard getting tongue-in-cheek humor, or any kind, in text only. That's what smilies were invented for, but now everybody are afraid of using them. It's not 'hackerish' enough, it's 'lame'.

      They serve a purpose. You're so stupid!
      or:
      You're so stupid! :)

      see? Use them, but don't abuse them. Support the Smilies Please Return movement!

    8. Re:SHA1 by MntlChaos · · Score: 1
      you may as well just hash once to MD5 and once to SHA-1.

      all it has to do is collide with MD5 or the two MD5 hashes have to collide with SHA-1. Or are you suggesting storing two hashes and verifying that both hashes match?
    9. Re:SHA1 by jacksonj04 · · Score: 3, Informative

      Yes, so

      hash1 = md5(input1)
      hash2 = sha1(input1)

      Then to validate you run the input through the same thing. Like I said, the chances of a valid input colliding with both are ignorably slim.

      --
      How many people can read hex if only you and dead people can read hex?
    10. Re:SHA1 by Anonymous Coward · · Score: 0

      Real link at http://www.stachliu.com/collisions.html

      Despite the HTML quality, we do have sufficient hardware and bandwidth to accomidate the traffic :)

      -Patrick Stach

    11. Re:SHA1 by tacocat · · Score: 1

      I thought a potential solution was to use both MD5 and SHA since the probability of creating a collision on both algorithms to still meet the intended goal of signatures.

    12. Re:SHA1 by Anonymous Coward · · Score: 4, Informative

      If you can find a collision in one, you can, for pretty much the same work, find a collision in the second hash. Cascading hash functions like this DOES NOT WORK.

      Look up the Joux Multicollision paper from 2004 CRYPTO. This is a famous result.

      This scheme you propose has been broken by Joux.

      -Matt

    13. Re:SHA1 by Taimoor · · Score: 1

      Given the restrictions present in most login systems, this method WILL work. The odds of there being a single alphanumeric phrase with a lenght of under 20 chars which also collides with both hashes is essentally nil.

      To be exact, it's the sum of all possible solutions to one. You can use tools to speed your search for collisions, but it garauntees that only the true password will work, because it has to pass both hashes.

      --Nick

    14. Re:SHA1 by SteveAyre · · Score: 1

      What about this? (The one I use because I think it prevents hash collisions being found).

      hash = MD5(secretstring.input)

      So long as secretstring remains secret, the attacker now has to find values inputs which begin with secretstring which will cause a collision. And since they don't know the values of secretstring they wouldn't be able to use use the collision source code to find a collision in this case.

    15. Re:SHA1 by Anonymous Coward · · Score: 0

      What do you think a salt is, genius?

    16. Re:SHA1 by dtfinch · · Score: 1

      I really don't see how being able to find MD5 collisions can allow someone to exploit an MD5 password hashing scheme.

    17. Re:SHA1 by SteveAyre · · Score: 1

      Exactly that. :P Point is, it stops (or severely reduces) the risk of a collision being found.

    18. Re:SHA1 by Anonymous Coward · · Score: 0

      If you're doing something along these lines, you want to use an HMAC construction (see Google). It has several advantages over your scheme. It's interesting to note that MD5-HMAC and SHA1-HMAC are not broken by the recent attacks.

    19. Re:SHA1 by this+great+guy · · Score: 1

      What about this?
      hash = MD5(secretstring.input)

      This doesn't work. Because the MD5 attack is basically what we call a "free-start collision". In other words, whatever is your "secretstring", it will not prevent attackers from finding collisions. See the researchers' paper for more info http://eprint.iacr.org/2004/199.pdf.

    20. Re:SHA1 by this+great+guy · · Score: 1

      Given the restrictions present in most login systems, this method WILL work.

      I am sick of seeing all those uninformed posts made by people thinking that MD5 is only used in "login systems" to "hash passwords". I am sure you didn't know that, for example, MD5 is also used to digitally sign the X.509 v3 certificates used by some HTTPS websites. So I am forced to repeat what the grand parent said, for the n-th time: No, in most cases, cascading hash functions like this DOES NOT WORK. The minority of cases where such an inferior method may work is irrelevant. A better solution is to develop a new secure hashing algorithm.

    21. Re:SHA1 by Shanep · · Score: 1

      I am sick of seeing all those uninformed posts made by people thinking that MD5 is only used in "login systems" to "hash passwords".

      I am sick of people who see a small statement and then assume a bunch of crap beyond what was really said.

      --
      War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
    22. Re:SHA1 by this+great+guy · · Score: 1

      I am sick of people who see a small statement and then assume a bunch of crap beyond what was really said.

      Ok, maybe I should have been that aggressive. But you have to admit that Taimoor's post is ambiguous. That's why I wanted to warn everyone that MD5 is also used to do much more than merely hashing passwords...

    23. Re:SHA1 by Shanep · · Score: 1

      Okay, sorry for snapping. You may just be a great guy afterall.

      ; )

      --
      War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
    24. Re:SHA1 by cafard · · Score: 1

      :)

      --
      This post is awesome.
    25. Re:SHA1 by FlameboyC11 · · Score: 1

      That's a salt, which was mentioned much earlier.

    26. Re:SHA1 by Splab · · Score: 1

      While you might find a collision of the hash you will find it darn hard to get in to the system since your collision doesnt include the secret string - so what you are doing is hash = md5(secretstring.whateveryouthinkdidthetrick) - which won't give you the correct hash.

    27. Re:SHA1 by ultranova · · Score: 1

      Take a copy of your database and hash all the existing passwords into SHA-512 form, and you'll need some way of distinguishing the MD5-to-SHA512 hashes from the SHA512 hashes, so add a date column with todays date in it. Then write a function "hashString" as a wrapper that can identify when something was hashed, and go down a different branch of code based on that.

      I have a better idea. Make the file, table, or whatever that contains the password hashes unreadable to everyone except whatever login program needs to verify against them. That way it doesn't even matter if you store them in plaintext, since no attacker is ever going to see the hash he's aiming against. I think I'll patent this idea and call it "shadow passwords" ;).

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    28. Re:SHA1 by Anonymous Coward · · Score: 0

      so what you are doing is hash = md5(secretstring.whateveryouthinkdidthetrick) - which won't give you the correct hash.

      Yes it will give the correct hash, because the whole point of a free-start collision is that it works whatever is the secretstring:

      md5(secretstring.whateveryouthinkdidthetrick) = md5(secretstring.input)
      md5(secretstring2.whateve ryouthinkdidthetrick) = md5(secretstring2.input)

      You really don't care about what "secretstring" is. If I had time, I could demo to you a simple example with the tool that has been released...

    29. Re:SHA1 by alderX · · Score: 1

      I'am not shure if the paper you are referring to is about the same topic. Actually what jacksonj04 is proposing is to use multiple hash functions independently. This is based on the assumption that it is harder to find a collision that hits for every hash function at the same time *.

      * standing for:
          md5(input1) == md5(input2) && sha1(input1) == sha1(input2)

    30. Re:SHA1 by Anonymous Coward · · Score: 0

      The very concept of hashing is trying to minimise what can be stolen in case of a break-in. You fail it! :)

    31. Re:SHA1 by Anonymous Coward · · Score: 0
      Only if you were brute forcing it and trying english to start with, and this attack has nothing to do with it.

      If you're not a cryptographer, please refrain from commenting on this topic. By all means learn and ask questions but don't state things so definitively, please. You're blending in with people who know what they're talking about.

      --Matt

    32. Re:SHA1 by Pla123 · · Score: 1

      What is the point of using both MD5 and SHA-1?

      It would be better to use both SHA-1(M) and SHA-1(M+1)

      But then it would be the equvalent of SHA-"320" so why not use SHA-512...

      If you are going to use 2 hashes then just increase the bits of the better one.

  2. Managed to get just the last few lines... by Saint+Aardvark · · Score: 4, Funny
    ...before even the Coral cache was Slashdotted, and it turns out they've written it in LISP:

    ))))))) ))))))))

    (With sincere apologies to Bryce Jasmer.)

    1. Re:Managed to get just the last few lines... by efuzzyone · · Score: 1

      What is written in lisp?

      --
      Creativity uninhibited www.kreeti.com
    2. Re:Managed to get just the last few lines... by Anonymous Coward · · Score: 0

      Shut up, you're not funny.

    3. Re:Managed to get just the last few lines... by Anonymous Coward · · Score: 5, Funny

      I downloaded the source, but it doesn't seem to be working properly. Does anyone have an md5sum of the original so I can verify I got the right code?

      -confused

    4. Re:Managed to get just the last few lines... by multipartmixed · · Score: 1

      Thanks. Yours funny, the original was a new one on me and prompted a hearty belly-laugh.

      Too bad those government coders don't know about ]. ;)

      --

      Do daemons dream of electric sleep()?
    5. Re:Managed to get just the last few lines... by Ignorant+Aardvark · · Score: 1

      I must apologize, for up until this point I didn't know that I was a man of God.

  3. shaken to our what? by tomstdenis · · Score: 3, Insightful

    It's important news but not really that shocking. MD5 was not something professionals would recommend for a few years already.

    Any half-way intelligent cryptographer would have suggested SHA-1, TIGER or perhaps HAVAL since quite some time already.

    Tom

    --
    Someday, I'll have a real sig.
    1. Re:shaken to our what? by ZachPruckowski · · Score: 2, Insightful

      It's important news but not really that shocking. MD5 was not something professionals would recommend for a few years already.

      But a household user can crack in an hour on a normal mid-line computer something that "a few years" ago professionals might have recommended. That's the news. If low-end PCs can crack encryption that's only a few years outdated, then the hounds are nipping at the heels of the cyptography industry. And imagine what hackers could do with more powerful computers (yes, I know there is a non-applicability issue with some forms of encryption). Or the government with all those Cray super-monstrosities.

    2. Re:shaken to our what? by Mind+Booster+Noori · · Score: 1

      SHA-1 is not the sollution. Take a look at SHA-224, SHA-256, SHA-384, and SHA-512.

    3. Re:shaken to our what? by tomstdenis · · Score: 0, Redundant

      I was talking about 5 years ago. Though I wouldn't trust SHA-2 today anyways. The design is just not sound.

      It's practically secure [SHA-2] as to be useful today but I wouldn't be happy about it.

      Tom

      --
      Someday, I'll have a real sig.
    4. Re:shaken to our what? by Anonymous Coward · · Score: 0, Flamebait

      Christ, so now you're implying you're a professional cryptographer, Tom? A bug-ridden open source crapfest is not professional software development.

  4. 1.6GHz P4? by CCFreak2K · · Score: 0, Redundant

    My desktop PC is a Pentium 4 at 1.5GHz, and even that thing is considerably slower compared to my 1.5GHz Celeron-M. A modern PC could crack it even faster.

    --
    "Beware of he who would deny you access to information, for in his heart he dreams himself your master."
  5. MD5 broken, not useless by Anonymous Coward · · Score: 2, Insightful

    Keeping in mind where MD5 is broken is important, so that good uses for this tool are not disposed of out-of-hand.

    md5 is still good for keeping track of if your files have changed. It should not be used for document signing.

    1. Re:MD5 broken, not useless by slashdot-me · · Score: 1

      md5 is still good for keeping track of if your files have changed

      But only for non-security uses. Don't use MD5 in your intrusion detection system.

    2. Re:MD5 broken, not useless by Anonymous Coward · · Score: 0
      Even tracking file changes is only good on a trusted system. Otherwise a hacker can change the files... eg, a bittorrent part that has established the hash days ago and given the attacker time to generate another valid piece.

      So it'll help with accidental errors and's a hell of a lot better than CRC, but not reliably against malicious attacks.

    3. Re:MD5 broken, not useless by KinkoBlast · · Score: 1

      Yep. I've only seen it used twice in a security-based situation, and in both those cases in a downloader do veriy chunks. It still GPG-checked the sigs on the programs afterward (both were custom systems that were only used "in-house")

    4. Re:MD5 broken, not useless by ashridah · · Score: 1

      Exactly.
      MD5's collision issue has been known for a while, infact you could even download tools that will simultaneously generate a 'good' and 'bad' version of something with the same (yet arbitrary) MD5sum (look up stripwire on google).

      This doesn't stop MD5 from being useful, however, IF you have a pre-existing trust in a document, and already know its MD5sum. This typically requires a 'Second pre-image' style attack, whereas collisions are best used in 'Birthday party' attacks.

      A 'Second pre-image' requires the attacker to find a document matching the md5sum for an existing document. 'Birthday party' attacks allow them to generate both documents, with an arbitrary (but identical) MD5sum. This is the basis that still makes things like tripwire useful (when used properly), but at the same time, it means you need to trust the vendor far more, as a malicious vendor could give you a trojan in place of a normal binary as an undocumented part of an update.

      That said, MD5's relatively small keyspace makes it less desirable, since it's slowly becoming possible to brute force it in 'reasonable' amounts of time with 'reasonably' pricey hardware. Jacking up the keyspace's size is a wise, future proofing move. (provided you do it with a well known, trusted, algorithm, not some proprietary crap)

      ash

    5. Re:MD5 broken, not useless by Anonymous Coward · · Score: 0

      Yes and no.

      It is good for checking whether those of your files that are on a guaranteed-secure computer has changed.

      It's no good for system binaries any more. Attackers can modify your bootup scripts and add some junk and still get the same md5. Which sucks.

  6. So what the hell do I do now? by jeblucas · · Score: 4, Interesting

    I'm essentially crypto ignorant. About all I've known to do was verify MD5 hashes on downloads. Now that this is by-and-large pointless, how to check the veracity of things like Linux ISO's, video drivers, etc, ad inifintum?

    --
    blarg.
    1. Re:So what the hell do I do now? by k4_pacific · · Score: 1

      I wouldn't worry about it too much, generating MD5 collisions large enough to be useful requires a multi-million dollar particle accelerator, so I wouldn't worry too much until someone figures out how to modify a microwave oven to do this.

      --
      Unknown host pong.
    2. Re:So what the hell do I do now? by DreadSpoon · · Score: 5, Insightful

      Do nothing.

      MD5 has not been invalidated for those uses. Checking the MD5 sum of an ISO download is not done for security purposes, it's done so that you can make sure you didn't get a bad byte or two somewhere in that 650MB. I mean, if hackers could upload a malware-filled ISO to the FTP server, they could upload a new MD5SUMS file too, right?

    3. Re:So what the hell do I do now? by Anonymous Coward · · Score: 0

      This does not make validating the MD5 digest pointless. It's not like you can create arbitrary text and have it be hashed to any MD5. This just generates random text that will match a give MD5 digest.

      This is more of a concern for cryptography, not message hashing.

    4. Re:So what the hell do I do now? by TetryonX · · Score: 1

      Generally on larger files it becomes increasingly difficult to have the same file size AND md5 hash. On most repositories, md5 being cracked will have little effect since it doesn't actually matter much about whether or not the file is exactly what it says it is if you are getting it from a secure repository. IANAC, but I figure it would be a lot more difficult to generate hashes at a quick rate and match the correct filesize.

      File verification is more or less just a way to determine whether or not the file became corrupted from between the server and your computer, so again no worries. If the file was already poisoned on the server, I can almost guarentee that the md5 will reflect the md5 of the poisoned file, and not a specially crafted poisoned file.

      --
      [!] No, I can't see my comments. They are not worthy of +3 moderation.
    5. Re:So what the hell do I do now? by discordja · · Score: 1

      well for the day to day I don't think the impact is overly important. Mind you, no one has been suggesting using MD5 for a few years now for important security measures. (pretty soon no one will suggest it for SHA-1 .. it's a never ending cycle as processing capacity grows). So always download your files from reliable sources and check the hash against the known good provided by the distributor and more likely than not you will have a good cd image or installer or whatever. So I'd not concern too much, there is not going to be a shift in the community to abandon MD5 tomorrow for the mundane checks.

      --
      I stole this .sig
    6. Re:So what the hell do I do now? by yamla · · Score: 4, Informative

      That's not what MD5 sums are used for. TCP/IP already has packet integrity. MD5 sums are indeed used to make sure you don't have a malware-filled ISO. The trick is that you grab the MD5 sum from a trusted source, then you can grab the ISO image from any mirror site. Assuming MD5 is safe (obviously not the case), you know your downloaded ISO is exactly the same as the one distributed from the central repository.

      --

      Oceania has always been at war with Eastasia.
    7. Re:So what the hell do I do now? by harrkev · · Score: 1

      Wrong. If the ISO has some spare room, you could throw in a trojan or two, and just throw in some random data in a junk file to make the hashes match. This is a big deal.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    8. Re:So what the hell do I do now? by Zocalo · · Score: 3, Informative
      Most things use multiple checksums, for instance on the updated copy of Lynx I just grabbed for Fedora:

      $ rpm --checksig lynx-2.8.5-23.2.x86_64.rpm
      lynx-2.8.5-23.2.x86_64.rpm: (sha1) dsa sha1 md5 gpg OK
      $

      So, even if it is also possible to generate collisions for DSA and GPG keys as well as SHA1 and MD5, the chances of being able to generate a collision for all four checksums/signatures at the same time is quite likely infinitesimally small. And that's just for a random file, things are going to get much more complex if you want that random file to can pass as whatever format the original was supposed to be and actually deliver a payload that might do something useful for the cracker.

      --
      UNIX? They're not even circumcised! Savages!
    9. Re:So what the hell do I do now? by camcorder · · Score: 1

      What if they do not change the MD5 file but put another iso that has same MD5 hash with the original iso file? It might take considerably long time to detect that kind of intrusion. But it's evident that best way to increase trustworthiness of downloads is using asymetric hashing methods (aka signing).

    10. Re:So what the hell do I do now? by Buzz_Litebeer · · Score: 1

      The main problem is it shouldnt be possible to have a functional CD image that matches the MD5 hash of a legit ISO.

      Though... I guess if someone had the time they might be able to make something that IS like the regular image, tacks in some trojans, and then adds junk data to the ISO until it matches.

      But i dont think that its possible to do that.

      --
      If you don't vote, you don't matter, so don't waste your time telling me your opinion
    11. Re:So what the hell do I do now? by diegocgteleline.es · · Score: 1

      Which is why FTP mirrors never should mirror MD5SUMS files

    12. Re:So what the hell do I do now? by Lord+Kano · · Score: 1

      I'm a crypto beginner. I've read a couple of books on the subject but I'm still no cypherpunk.

      Basically, this means nothing for file validation.

      When digitally signing documents, typically get the MD5 hash of the document and then sign that. It's faster to sign a hash than it is to sign an entire document. Also, it *might* provide you with more security (of your secret key) when you sign a hash than the entire document. It's all theoretical, but having more signed data to analyze might possibly provide an attacker with a faster avenue to deduce your secret key, it's HIGHLY unlikely but possible.

      Being able to find MD5 collisions means that if you sign a message that says "I agree to terms ABC." I can generate a message that says "I agree to terms XYZ." and make it appear that you have agreed to something different. Or a really connected attacker could intercept your original message and substitute a false one in its place.

      LK

      --
      "Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
    13. Re:So what the hell do I do now? by ThereCanBeOnlyOne007 · · Score: 1

      Sure, they would spoil your Linix ISOs and replace them with warez Windows...

    14. Re:So what the hell do I do now? by Anonymous Coward · · Score: 0

      MD5 is asymetric

    15. Re:So what the hell do I do now? by Omnifarious · · Score: 1

      Then why does RedHat sign their ISO hashes? No, it's as much for security as for verification. And it's broken.

      IMHO, using MD5 hashes for absolutely anything at all should be stopped as soon as possible. All the possible situations in which the ability to generate collisions might be a problem are very difficult to all analyze. People designed stuff with the idea that hash functions had particular properties. If any of those properties are proven to not hold, the hash function should be considered invalid for all purposes.

      Perhaps, after somebody seriously experienced in protocol design analyzes it again with respect to some particular purpose it can be considered useful for that purpose again, but I do not consider random people on Slashdot qualified to make that analysis, especially in the few minutes between a question being posted and the answer given.

      Despite SHA-1 not being quite so broken, I still refuse to use it for anything were I seriously care about the security. All the stuff I do now is designed to use SHA-2, though I feel that has something of a shaky foundation.

    16. Re:So what the hell do I do now? by Aranth+Brainfire · · Score: 3, Insightful

      "That's not what MD5 sums are used for."

      Heh, sorry, but you lose. I use them for that.

      I've never had a problem with an "infected" iso, but I have had one or two that downloaded wrong and flat out failed to work after I burned them. Sure enough, I check the md5 sum and it's not correct- download the iso again from the same site and it works.

      So I guess maybe either my hard drive is funky (or some other system failure on one end) or this "packet integrity" isn't so great.

      --
      "Quoting yourself is stupid." -Me
    17. Re:So what the hell do I do now? by Kadin2048 · · Score: 1

      Actually it's sounding more and more like it would be possible, if you read and look at the example further up in the thread with PostScript files. I think it still impractical though for most malicious users, because 1) the file size is so much bigger, meaning that the problem would be more computationally intensive, and 2) you're not trying to find a purely "weak" collision anymore (two arbitrary files with the same hash), instead you're trying to figure out what data you can pad your trojans with in order to make its hash equal the legit ISO's. I don't think the example code above will do that, at least not out of the box. I haven't examined it yet though.

      --
      "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
    18. Re:So what the hell do I do now? by swillden · · Score: 1

      Wrong. If the ISO has some spare room, you could throw in a trojan or two, and just throw in some random data in a junk file to make the hashes match.

      Not likely.

      This collision attack finds small blocks of plaintext that hash to the same value, not ISO-sized blocks. Now, the attacker can rely on the fact that if he can assure the real and bogus ISOs hash to the same value up to some point in the data stream and they are bit-for-bit identical after that point in the data stream, then the final hashes will be the same.

      If this were a pre-image attack (it's not, more on that later), then an attacker could pick a block of bytes in the original ISO that he would like to replace and generate a bunch of other plaintexts that hash to the same value as that block. These plaintexts would be more or less random, but perhaps he can find one of them that contains something usefully diabolical. Not very likely, but possible.

      To do what you describe, he would need to be able to generate a plaintext (junk) block such that the hash of the trojaned version up to and including the junk block matches the hash of the original ISO up to the same point. Further, the junk block has to avoid invalidating the structure of the ISO, either by being in a location where its value is irrelevant (which constrains the location of the block) or by looking like a valid data structure (which constrains the content of the block). Frankly, there's no guarantee that any such junk block even exists, and not even a pre-image attack would help an attacker find it.

      But it's worse than that, because the attacker doesn't have a pre-image attack; this is only a collision attack. So in order to make the attack work the attacker has to:

      1. Generate a bunch of collisions, looking for one with a plaintext that contains an appropriate sort of malicious chunk of code. Ideally, the other plaintext needs to contain apparently-useful code, but perhaps that's not necessary.
      2. Convince the distributor of the ISO to insert the non-trojaned (but probably pretty weird-looking) plaintext into the ISO.
      3. Distribute the trojaned ISO to the target(s).

      A more realistic attack would be something like the buggered Postscript file that was demonstrated. Perhaps the attacker could write a script or program that contains *both* the exploit code and the legit code, with a conditional that executes one or the other based on whether or not a static block of data equals a colliding plaintext. Then if he could get this program inserted into the ISO, with the plaintext value that selects the legit code, and if no one looked to notice the trojan, but just relied on black-box testing to verify that the code worked properly, then he could also distribute the ISO with the checked value swapped out for the other colliding plaintext. The two would hash to the same value, but behave differently.

      If the attacker can do that, though, it would be easier to just use some sort of trigger to activate the trojan code. That would have the advantage that it would work against every installation of the ISO... no need to try to make sure the target gets the "bad" one.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    19. Re:So what the hell do I do now? by Anonymous Coward · · Score: 0

      Careful, think before you disregard packet integrity. Packet integrity didn't fail here. The data arrived properly in the network stack. The corruption must have occured on the way to the disk.

    20. Re:So what the hell do I do now? by swillden · · Score: 3, Informative

      So, even if it is also possible to generate collisions for DSA and GPG keys as well as SHA1 and MD5, the chances of being able to generate a collision for all four checksums/signatures at the same time is quite likely infinitesimally small.

      Actually, only SHA1 and MD5 need collide. DSA and GPG aren't used for hashing. DSA is used for signing the hashes, and GPG just implements and structures all of the above.

      Still, you're right that finding collisions for both MD5 and SHA-1 on the same pair of files is extremely unlikely.

      And that's just for a random file, things are going to get much more complex if you want that random file to can pass as whatever format the original was supposed to be and actually deliver a payload that might do something useful for the cracker.

      Hammerhead, meet nail.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    21. Re:So what the hell do I do now? by Anonymous Coward · · Score: 0

      I've seen packet corruption caused by routers which had hardware problems. They'd accept a packet, change some fields (like setting the destination MAC address to the address of the next hop), and recalculate the checksum. This is all fairly normal, except when they manage to corrupt the packet before computing the new checksum. Then the corrupted data goes on to the next hop with a valid checksum, and *poof* your file is corrupted.

      Granted, I've only seen this once, and about 15 years ago, but it was so much bother to figure out what the problem was that I remember the whole thing clearly to this day.

    22. Re:So what the hell do I do now? by Anonymous Coward · · Score: 0
      I quote from Wikipedia:
      The TCP checksum is a quite weak check by modern standards. Data Link Layers with a high probability of bit error rates may require additional link error correction/detection capabilities. If TCP were to be redesigned today, it would most probably have a 32-bit cyclic redundancy check specified as an error check instead of the current checksum.
      ISOs are very large. Even if the probability of the TCP checksum given a false negative were small, the probability of one packet being screwed up is not trivial.
    23. Re:So what the hell do I do now? by Lord+Ender · · Score: 1

      This is cryptographically worhtless unless you either get a signed MD5sum or you get the sum from an SSL-secured server.

      --
      A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
    24. Re:So what the hell do I do now? by dotgain · · Score: 1
      Now, the attacker can rely on the fact that if he can assure the real and bogus ISOs hash to the same value up to some point in the data stream and they are bit-for-bit identical after that point in the data stream, then the final hashes will be the same. Don't you mean "to the same STATE up the some point in the data stream"?

      We might be at the same hash output, but aren't there still many more bits of "state" in the engine that all have to be the same too?

      Honestly, I can't remember, but I've got a hunch this is true.

    25. Re:So what the hell do I do now? by swillden · · Score: 1

      We might be at the same hash output, but aren't there still many more bits of "state" in the engine that all have to be the same too?

      I don't think so, but it doesn't matter, as it turns out, because I was quite wrong about the severity of this attack.

      The attack allows the attacker to generate a collision even with arbitrary, attacker-chosen IVs. That means that as long as there's a 128-byte block at some multiple of 64 bytes offset that the attacker can modify to a randomish value without turning the file into an obviously corrupted document, and as long as the attacker can control both copies, the attacker can set every byte preceding the 128-byte block to any value he wants.

      Even something as transparent as a C file can potentially be buggered this way, as long as no human looks at it. Supposing I have a C file that contains innocuous, useful code and another C file that contains malicious code. I can open a comment block at the end of each file and then add garbage to pad both out to where the sizes are multiples of 64 bytes. Then, I can hash each of them to get the IVs and internal states (if any -- I also don't recall), which I then feed into the collision finder. It will produce a pair of 128-byte blocks which, when appended to the two files, cause the files to hash to the same value, same state, etc. After that, I can append arbitrary text and as long as I append the same thing to both, they'll both have the same hash. So, I just close the comment block. Assuming neither of the 128-byte blocks contain a "*/", and as long as the C compiler doesn't barf at seeing a bunch of 8-bit characters, both files will compile fine, and hash to the same value, but will produce very different results when executed.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    26. Re:So what the hell do I do now? by univgeek · · Score: 1

      TCP uses a very simple one's complement kind of integrity test. If an intermediate router decides to foul up a packet it can and the problem will not be detected.

      Even in Stevens you can find a table which has number of errored packets detected AFTER all the CRC (ethernet) in between.

      In summary, yes an MD5 or other file-level checksum IS required to verify sanity of download.

      --
      All bow to his Noodliness!! His Noodle Appendage has touched me!
    27. Re:So what the hell do I do now? by m50d · · Score: 1

      Verify the pgp signatures instead, and check they were done with a better algorithm

      --
      I am trolling
    28. Re:So what the hell do I do now? by stonecypher · · Score: 1

      More likely is that your downloading application simply dropped the connection early and didn't notify you. The entire point of TCP is reliable sockets, and when used correctly, it is indeed reliable. Putting things in quotation marks doesn't make them particularly damning.

      Surprisingly, there are more than these two things that you've picked out of thin air that can go wrong.

      --
      StoneCypher is Full of BS
  7. The cycle begins anew... by rcbarnes · · Score: 2, Funny

    Great. Now that MD5 is dead, the slow/theoretical attacks on SHA1 can be the focus of collision research. I look forward to changing hash algorythms again from SHA1 in a year. :-/

    --
    "Fight for lost causes. You may discover they weren't."
    1. Re:The cycle begins anew... by Mind+Booster+Noori · · Score: 1

      SHA-1 is not the sollution. Take a look at SHA-224, SHA-256, SHA-384, and SHA-512.

  8. Should I care? by SlashAmpersand · · Score: 5, Funny

    This is all really interesting theoretically, but who has the money to run a 1.6 GHz P4?

    1. Re:Should I care? by Anonymous Coward · · Score: 0

      Yeah that isn't funny at all.

    2. Re:Should I care? by digital-madman · · Score: 0

      By cost to run...I assume you mean cost of air conditioning for that P4??

      -Digital Madman

      --
      A bullet sounds the same in every language. So stick a fucking sock in it...
    3. Re:Should I care? by m50d · · Score: 1

      Oh, I have the money, I just need to move to antarctica first.

      --
      I am trolling
  9. Replacement Hash Functions by Anonymous Coward · · Score: 5, Informative

    Recommended replacements are SHA (preferably SHA-2), WHIRLPOOL and/or RIPEMD.

    http://en.wikipedia.org/wiki/SHA-2
    http://en.wikipedia.org/wiki/WHIRLPOOL
    http://en.wikipedia.org/wiki/RIPEMD-160

    1. Re:Replacement Hash Functions by From+A+Far+Away+Land · · Score: 1

      Might I suggest a pre-emptive upgrade, and isntead of replacing MD5 with MD6, go with MD7 or even MD8 to avoid the need to upgrade after cracks are found for MD6 or 7. Or just use Blowfish 1024bit?

    2. Re:Replacement Hash Functions by Anonymous Coward · · Score: 0

      Because that's exactly what they're expecting you to do!

  10. Why? by mboverload · · Score: 3, Insightful

    Why not just use 2 different algorithms? Yes, it's possible. Or hell, use 3. Can some one tell me why not this isn't a standard practice? Even if one has a weakness, you still have the other to back it up

    I use HMAC-SHA-256 with PasswordMaker.

    https://passwordmaker.org/passwordmaker.html

    1. Re:Why? by einhverfr · · Score: 5, Insightful

      Even if SHA1 and MD5 have attackable collisions the chances are very low that you can find a meaningful collision that affects both algorithms.

      --

      LedgerSMB: Open source Accounting/ERP
    2. Re:Why? by mboverload · · Score: 1

      Although I was thinking of a hash of a hash, that works as well and fits into my question.

    3. Re:Why? by nahdude812 · · Score: 1

      As I understand it:

      Every re-hashing increases the collision rate, so generally for highest security, you use one unbroken hashing algorithm. Yes, using more than one hashing algorithm would reduce the chances that your whole line of defense is invalidated one day, but it increases the chance that a single hole could be found.

    4. Re:Why? by mysqlrocks · · Score: 1

      Why not just use 2 different algorithms?

      If this was in fact more secure than perhaps it would have already been released bundled as one algorithm. Why make people use two sets of algorithms unless the goal is to confuse potential crackers? However, mose uses of MD5 involve the recipient knowing the algorithm used so that wouldn't work.

    5. Re:Why? by Zone-MR · · Score: 1

      A hash of a hash? You mean something like SHA1(MD5('data')) ?

      In that case, if MD5 is broken and 'data' can be constructed to give the same MD5 hash, the SHA1 hash provides no extra security. In fact calculating a hash of a hash reduces the security, as if ANY of the algorithms in the chain are broken, the final result can be manipulated easily.

    6. Re:Why? by ninjaz · · Score: 2, Informative
      Why not just use 2 different algorithms? Yes, it's possible. Or hell, use 3. Can some one tell me why not this isn't a standard practice? Even if one has a weakness, you still have the other to back it up
      I noticed that NetBSD's source-based package management system, pkgsrc, already does this using SHA1 and RMD160 (apparently RIPEMD-160 is the official name for the digest). Here's what it looks like in the archive fetching phase of a package installation:

      => Checksum SHA1 OK for unzip-5.52/unzip552.tar.gz.
      => Checksum RMD160 OK for unzip-5.52/unzip552.tar.gz.
      ===> Extracting for unzip-5.52nb2

      One might also imagine that colliding two different hash types at the same time would be much more difficult than only at a time, anyway.

    7. Re:Why? by jonabbey · · Score: 2, Insightful

      True, but you could use a hash function like SHA1('data').MD5('data'), where the . operator stands for string concatenation.

      The reason that this isn't generally done is that should not provide more security than a proper cryptographic hash algorithm that produces hashes as long as the two different hash algorithms concatenated together.

      If you want additional collision resistance, just generate a longer hash. I believe this is how people are advised to handle SHA-class algorithms right now.

    8. Re:Why? by einhverfr · · Score: 2, Insightful

      If you want additional collision resistance, just generate a longer hash. I believe this is how people are advised to handle SHA-class algorithms right now.

      Hmmm....

      I actually like the concatenation more. The reason is that the longer hash solution only provides protection provided that the hash algorythm is not broken. Once it is broken your entire system fails. In other words, the security doesn't have much of a concept of depth.

      If you use two independant hash algorythms that are different, then the chance that a given change will provide a collision on both hashes is quite remote. This means that one not only has to break both algorythms but has to break both *in the same way.* This means that you could quite possibly use two broken hash algorythms safely.

      --

      LedgerSMB: Open source Accounting/ERP
    9. Re:Why? by Kent+Recal · · Score: 1

      Does anyone have an idea how long it would take to find a collision that matches both md4 and md5 at the same time?
      I guess it must be possible because otherwise people would probably simply use a combination of (cheap) md4 and md5 instead of looking for more powerful hash functions. Just how long, anyone have a clue?

    10. Re:Why? by iLogiK · · Score: 1

      a valid point. i don't know why nobody has thought of this before.
      i'm not sure how this program works (didn't understande a thing in the source code....then again i'm not at all familiar with md5), but seeing how hard it is to find a collision for one algorith, finding a collision that works with 2 alorithms seems almost impossible

    11. Re:Why? by Phleg · · Score: 3, Informative

      Just like mixing medications can have very bad synergistic side effects, so should encryption or hashing technologies be mixed and matched.

      As an example, when DES was first known to be broken, the most intuitive solution would be to double-encrypt the plaintext. However, upon cryptographic analysis, this acutally fails to improve the complexity of an attack (and in some cases may simplify it). Thus, Triple DES.

      Be very wary of trying to combine "broken" algorithms in an attempt to gain security, especially if you have no real grounding in cryptanalysis. Vulnerabilities in each have a nasty tendency to either amplify or at least complement each other in highly unpredictible ways.

      Remember one of the basic tenets of cryptography: it's easy to create an algorithm that you can't break. But just because you can't think of a way to break it doesn't mean there's not a trivial way to do so.

      --
      No comment.
    12. Re:Why? by einhverfr · · Score: 2, Interesting

      As an example, when DES was first known to be broken, the most intuitive solution would be to double-encrypt the plaintext. However, upon cryptographic analysis, this acutally fails to improve the complexity of an attack (and in some cases may simplify it). Thus, Triple DES.

      Your point is well taken. However, in this case, I believe you are mistaken. Indeed 3DES is just an extended form of DES which uses 2 keys and three encryptions. But here you are speaking of access so the analogy is likely to be fatally flawed anyway.

      My initial response was to the idea that a hash of a hash would provide any security and indeed would reduce it as your example illustrates. If you MD5 a SHA1 hash, you have reduced your security because a failure in either hash algorythm renders the entire chain ineffective (weakest link principle).

      Now, with my proposal, one would include independant hashs which would be checked independantly. If either one fails, one assumes that the data has been tampered with. The issue is that it would be difficult to defeat both simultaneously for this specific type of check. Being able to do so on demand while editing the file in a meaningful way might well prove impossible.

      Mixing encryption systems is indeed the bread and butter of the way our systems usually work. For example almost any key exchange algorythm uses a combination of encryption algorythms to ensure that symmetric encryption keys can be exchanged in a secure manner among previously unknown parties. But the key is that you want to make it a requirement to break every link on the chain and not have the system fail when one link is compromised. And yes, double-encrypting with the same key is stupid :-)

      --

      LedgerSMB: Open source Accounting/ERP
    13. Re:Why? by skelly33 · · Score: 1

      I don't use MD5 to encrypt data, but I use it to generate a "fingerprint" for document signing in my web applications. I too believe that appending the fingerprint with an SHA-1 derived signature should extend the useful life of both algorithms for this purpose and intend to implement it in the next couple weeks.

      Using this technique I expect that no hacker will be able to forge a document that will match the new, combined fingerprint of both algorithms where they might have been able to for each algorithm individually.

    14. Re:Why? by jonabbey · · Score: 2, Insightful

      Now, with my proposal, one would include independant hashs which would be checked independantly. If either one fails, one assumes that the data has been tampered with. The issue is that it would be difficult to defeat both simultaneously for this specific type of check. Being able to do so on demand while editing the file in a meaningful way might well prove impossible.

      Yes, but only if you mean 'might well prove impossible' in the same way that it 'might well prove impossible' to break SHA-1 or MD5. There's nothing magical in the mathematics that makes a hash generated partially by SHA-1 and partially by MD5 harder to break than a hash generated by SHA-256 or the like, which generates a longer hash than SHA-1 or MD5 alone.

      Remember, as long as the domain of source files to be hashed includes all possible data files longer than the generated hash, there will be collisions in the function's range. This is true even in the SHA1('data').MD5('data') case. And as long as there are collisions, it's just a question of how difficult it is to find them.

      Yes, prefixing your MD5 hash with an SHA1 hash should make it much harder to find a collision in both algorithms simultaneously, but the very same difficulty could be achieved with a single hash algorithm which generates a longer hash. The magic is in the quality of the algorithm and the length of the output, not in the fact that two algorithms were put to use.

    15. Re:Why? by Lord+Ender · · Score: 2, Insightful

      Combining hashes is effectively just inventing a new hash algorithm with longer digests.

      As a mathematician, this "seems" like it would be intuitively much more difficult for someone to discover an attack against.

      As an engineer, this seems like it is obviously NOT a "good" solution.

      If H1() and H2() are both weak, the hybrid H12() is obviously stronger than either, but also fundamentally flawed.

      --
      A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
    16. Re:Why? by einhverfr · · Score: 1

      I don't think you understand what I am saying.

      If you have, say, an SHA-256 or an SHA-512 hash and it turns out that a flaw is found in the algorythm that allows specially formulated additions to a message to generate a specific signature, then you will not be safe.

      The idea is that you need the following criteria to make a collision useful there needs to be a way to take arbitrary content and make it match an arbitrary signature.

      Now, if you take two hashes of the file using two different algorythms, then the chances of finding a collision that satisfies the above criteria that matches both independant algorythms becomes far more complex.

      For example, imagine that I have found a way to take an arbitrary-length SHA hash and using data equal to the length of the signature, create a correction that will make the signature match whatever I want it to be. If you are using SHA to look for tampering then it is rendered useless. If on the other hand I am also using MD5, crypt, or some other hashing algorythm, then the collision of one hash would likely not match on the other. The attack against one would likely fail on the other, and one would have a much more complex problem of creating an a colision for arbitrary content has just become *far* more difficult.

      This is not that difficult from the way 3DES works. 3DES corrected for the flaws in DES by using two keys (56-bit) and a more complex encryption procedure (basically encrypt thrice with two keys). Doubly encrypting doesn't help if you use the same key. But with 3DES you have essentially gone from a 56-bit key space to a 112-bit space. Not that 112 bits is anything to be proud of but this is just an example.

      The point is not that you are preventing collisions. The point is that you are protecting yourself against the possible future flaws found in an algorythm.

      --

      LedgerSMB: Open source Accounting/ERP
    17. Re:Why? by einhverfr · · Score: 1

      I don't use MD5 to encrypt data, but I use it to generate a "fingerprint" for document signing in my web applications.

      Just splitting hairs, but you know that MD5 is a one-way cryptographic hash, right? It encrypts your data so that nobody including you can read it :-)

      Now, so far I have seen *no* evidence that one could forge an MD5 hash such that it would not be immediately visible to a trained eye and yet still be useful. In other words, it is like forging a regular signature except that the forgery would be apparent even without having an original signature to compare it against (What are these little squares at the end of this document?)....

      --

      LedgerSMB: Open source Accounting/ERP
    18. Re:Why? by ArghBlarg · · Score: 1

      I believe a stronger method would be:

      Sig = SHA1(data | MD5(data) )

      I've seen many people suggest that simply concatenating the MD5 and SHA1 hashes would increase security, but that simply (roughly) doubles the effort required, as each hash can be broken in a separate pass; but the above requires that a collision must be found for *both* the SHA1 and MD5 algorithms for a given text and (dopplegänger + MD5 hash), simultaneously.

      *Nesting* the two hash algorithms would make things much harder.

      --
      ERROR 144 - REBOOT ?
    19. Re:Why? by m50d · · Score: 1

      No, they are very high. Almost a certainty in fact.

      --
      I am trolling
    20. Re:Why? by einhverfr · · Score: 1

      No, they are very high. Almost a certainty in fact.

      Well, in my mind usefullness goes down with the amount of padding one has to add.

      --

      LedgerSMB: Open source Accounting/ERP
    21. Re:Why? by skelly33 · · Score: 1

      The blindingly obvious approach to recognizing document forgeries you mentioned above requires an intelligent human to to recognize the distinction. But when you deal with automated systems where one machine generates a signed document and a second machine receives a document which it attempts to validate against the signature stored on the other machine... well this is where we get in trouble if the machines can't tell the difference.

      The example linked elsewhere in this topic pointing out two VERY different postscript files that have the same MD5sum is a testament to this problem, and is exactly what concerns me.

    22. Re:Why? by alderX · · Score: 1

      But I think you can't compare (i) the stacking of encryption algorithms into each other [1] / the iterative usage of encryption algorithms [2] with (ii) the approach of using multiple hash functions independently on the same input [3].

      The first two are based on the feeding the output of an algorithm back as input and this might bring side-effects.

      But given that [3] uses the algorithms indepdently from each other, I think you have a pretty good chance to lower the risk of collisions (especially if the hashing algorithms you are using are based on different families). Actually even adding algorithms which are weak on their own might nevertheless be beneficial in this combination scheme.

      JM2C

      [1] Standing for:
              ciphertext1 = algorithm2( algorithm1( plaintext1 ) )
      [2] Standing for:
              ciphertext1 = algorithm1( algorithm1( plaintext1 ) )
      [3] Standing for:
              hashkey1 = algorithm1( input1 )
              hashkey2 = algorithm2( input1 ) ...

    23. Re:Why? by jonabbey · · Score: 1

      I don't think you understand what I am saying.

      If you have, say, an SHA-256 or an SHA-512 hash and it turns out that a flaw is found in the algorythm that allows specially formulated additions to a message to generate a specific signature, then you will not be safe.

      Yes, I understand that.

      For example, imagine that I have found a way to take an arbitrary-length SHA hash and using data equal to the length of the signature, create a correction that will make the signature match whatever I want it to be. If you are using SHA to look for tampering then it is rendered useless. If on the other hand I am also using MD5, crypt, or some other hashing algorythm, then the collision of one hash would likely not match on the other. The attack against one would likely fail on the other, and one would have a much more complex problem of creating an a colision for arbitrary content has just become *far* more difficult.

      I understand that, of course. That's why I proposed concatenating the hashes, above.

      However, if you find a way to take an arbitrary-length SHA hash and find collisions at will, you have demonstrated that the SHA algorithm is deeply, fatally flawed.

      It so happens that MD5 and SHA are in the same class of algorithms, and the weaknesses that have been revealed in MD5 affect SHA as well. It's just that SHA hashes can be long enough that the weakness does not make finding collisions feasible. If your opponent knows that you're using hash concatenation, he has two simpler problems to solve.. a shorter SHA and a short MD5. The way these algorithms work, the effort involved may in fact be two considerably simpler challenges added together, rather than multiplied.

      Anyway, all I mean to say is that cryptographic algorithms are subtle, and while incorporating MD5 and SHA might get you a bit more strength, that doesn't make up for known algorithmic weaknesses in the hash algorithms.

    24. Re:Why? by einhverfr · · Score: 1

      The blindingly obvious approach to recognizing document forgeries you mentioned above requires an intelligent human to to recognize the distinction. But when you deal with automated systems where one machine generates a signed document and a second machine receives a document which it attempts to validate against the signature stored on the other machine... well this is where we get in trouble if the machines can't tell the difference.


      This is the problem though. There is a difference between collisions and meaningful collisions. Meaningful collisions are more difficult because you have to take into account the certain things such as context and intent.

      For example, if I want to alter a message you are sending from one machine to another I might be able to switch it with one with a different signature. In most cases, the document will be entirely ignored. Indeed the best one can do here is a sort of denial of service where you keep switching documents out so that the real documents never arrive. Altering the message to cause arbitrary content to be sent is far more difficult. I am not saying it is impossible, for I am sure that it is indeed possible. However, I think that people underestimate the difficulty in pulling something off at this time especially with certain types of content. For Postscript, the risk might be medium to high. For something like an X.509 cert, the thances are probably extremely low (and easy to detect).

      Again, word documents might be more risky than plain text email.

      --

      LedgerSMB: Open source Accounting/ERP
  11. Collisions on demand by Anonymous Coward · · Score: 0

    But will my insurance company cover the damage?

  12. Right now it's time to... by Tackhead · · Score: 2, Interesting
    > This more than anything should be the final stake in the heart of MD5, now that anyone can generate collisions whenever they want."

    In other words, right now it's time to...
    ...LICK OUT THE KAMS, NOTHERGUCKERS!

    1. Re:Right now it's time to... by Anonymous Coward · · Score: 0

      (Well, at least there's one person who remembers the MC5. Or you're with the KLF helping Hagbard battle the Illuminati... Conspiracies, conspiracies!)

  13. bittorrent? by rayde · · Score: 4, Insightful

    doesn't bittorrent use md5 to verify the sections of files it has downloaded? will this facilitate poison seeds? or does BT use something more complex than md5?

    1. Re:bittorrent? by Wesley+Felter · · Score: 4, Informative

      BitTorrent uses SHA-1.

    2. Re:bittorrent? by Anonymous Coward · · Score: 0

      http://www.bittorrent.com/FAQ.html#corruptdl

      How do I know the download isn't corrupted?

      BitTorrent does cryptographic hashing (SHA1) of all data. When you see "Download succeeded!" you can be sure that BitTorrent has already verified the integrity of the data. The integrity and authenticity of a BitTorrent download is as good as the original request to the tracker. Checking the MD5/CRC32/other hash of a file downloaded via BitTorrent is redundant.

    3. Re:bittorrent? by JPyun · · Score: 1

      Hash functions in file sharing are not used for security. They are used to ensure the file didn't become corrupted in transfer. SHA-1 is perfectly fine for this.

    4. Re:bittorrent? by Mind+Booster+Noori · · Score: 1

      Parent talked about poisoned seeds, and SHA-1 isn't fine to prevent them...

    5. Re:bittorrent? by suitepotato · · Score: 1

      Although, eMule/etc. uses a custom routine based on MD4 so how long till that is poisoned regularly?

      --
      If my grammar and spelling are off, I am [distracted/tired/careless] (take your pick)
    6. Re:bittorrent? by Mnemia · · Score: 1

      Practically, it is, at the present. It just might not be for much longer.

    7. Re:bittorrent? by Breakfast+Pants · · Score: 2, Informative

      He was talking about poisoned seeds. And theoretically SHA-1 isn't fine for this and is vulnerable to the same flaws as MD5. It isn't practical now (and certainly not as practical as poisoning kazaa shares... where every other byte in a file is skipped(!) for performance when generating a hash).

      --

      --

      WHO ATE MY BREAKFAST PANTS?
  14. WTF this source is useless by RootsLINUX · · Score: 3, Interesting

    I guess someone thinks he's a little too cool to comment his code properly. Yeah, like "/* B2 */" tells me anything useful you insensitive clog!

    --
    Hero of Allacrost, a FOSS RPG for *NIX/*BSD/OS X/Win
    1. Re:WTF this source is useless by Anonymous Coward · · Score: 0
      Here's an example of what he means:
      /* B1 */
      Q0[ 4] = (random() | 0xba040010) & ~(0x443b19ee | 0x00000601);
      Q0[ 4] |= (Q0[ 3] & 0x00000601);
      Q1[ 4] = Q0[ 4] - 0x7dffffe2;
       
      X0[19] = RR(Q0[ 4] - Q0[ 3], 22) - F(Q0[ 3], Q0[ 2], Q0[ 1]) - B0 - 0xc1bdceee;
      X1[19] = RR(Q1[ 4] - Q1[ 3], 22) - F(Q1[ 3], Q1[ 2], Q1[ 1]) - B1 - 0xc1bdceee;
      if(X0[19] != X1[19])
        continue;
      Apparently, 1 and 2-character identifier names are all the rage in China. And really, there's no need to bother with explaining the use of 0xba040010 or 0x443b19ee. It should really be self-evident.

      I mean seriously, what the hell is the significance of X0[19] and X1[19], what are they used for and what does it mean if they are or are not equal, and why continue the loop until they are equal? What does the call to macro 'F' do? How about the call to 'RR'?

      Oh wait, there it is:
      #define F(x, y, z) (z ^ (x & (y ^ z)))
      #define G(x, y, z) F(z, x, y)
      #define H(x, y, z) (x ^ y ^ z)
      #define I(x, y, z) (y ^ (x | ~z))
       
      #define RL(x, y) (((x) << (y)) | ((x) >> (32 - (y))))
      #define RR(x, y) (((x) >> (y)) | ((x) << (32 - (y))))
      It's so... obvious.. now..

      God, I agree, this stuff is completely unreadable. It is absolutely useless for any learning value at all, other than as a study on how not to write code that is going to be released.
    2. Re:WTF this source is useless by plalonde2 · · Score: 2, Insightful
      It's thoroughly documented: mind you you need to start with the reference given in the first comment. Variable names and comments relate to that paper.

      Not all code documentation lives in the source.

    3. Re:WTF this source is useless by Anonymous Coward · · Score: 1, Insightful

      Not all code documentation lives in the source.

      True, but if you aren't going to have useful documentation in the source why bother at all?

    4. Re:WTF this source is useless by Theatetus · · Score: 3, Informative

      If you RTFRFC you might notice that those are the variable and section names used in the algorithm specificaiton.

      --
      All's true that is mistrusted
    5. Re:WTF this source is useless by plalonde2 · · Score: 1

      I claim it is useful: it points you to the paper, and then labels the sections in ways that are consistent with the presentation in the paper. You can't expect to read all arbitrary code without some outside explanation of some of the tricky (or less naive) bits. I much prefer a clean external reference to a bunch of goop inline that in many cases makes no sense.

    6. Re:WTF this source is useless by ichigo+2.0 · · Score: 1

      ... insensitive clog!

    7. Re:WTF this source is useless by Theatetus · · Score: 1
      Apparently, 1 and 2-character identifier names are all the rage in China.

      Actually, it's more that they're all the rage in specifications for hashes. RTFRFC.

      What does the call to macro 'F' do?

      By convention, the permutations in a hash or cipher are given single-letter names beginning with F (for 'Function'). If you ever decide to implement a crypto suite you get used to names like F, G, H (permutations), or X0[19] (cell 19 of section 0 of the primary execution context).

      How about the call to 'RR'?

      It performes a Rotate Right on the word it's called on. (((x) >> (y)) | ((x) << (32 - (y)))) is a right circular shift of the word X by Y places. RL is a Rotate Left, or left circular shift.

      As a final caveat, that implementation of RR and RL is faulty for general values of X and Y but works within the constraints of the program.

      --
      All's true that is mistrusted
  15. The end of edonkey by Cone83 · · Score: 1

    I think this will be the end of edonkey. AFAIK edonkey still uses md4 hashes. If it really is possible to find an md4 collision in a few seconds, I'm sure the MI will soon create a script that randomly downloads files, and reshares a collision.

    1. Re:The end of edonkey by n0dalus · · Score: 4, Informative

      No. This only helps you find collisions in two randomly generated strings.
      It is still very difficult to produce a colliding file given a pre-existing file on the network.
      It should also be noted that edonkey splits a file into 9500KB chunks, and then into smaller chunks again, and hashes each one. It would be far more difficult to produce a chunk that causes collisions on all three levels.
      Anyway, I expect an eMule extension will come out soon to allow for sharing of SHA1 hashes between clients (if it doesn't already exist).

  16. Torrent Poisoning by dduardo · · Score: 0, Redundant

    Does this mean the RIAA/MPAA can poison torrents by generating files with the same hash?

    1. Re:Torrent Poisoning by Anonymous Coward · · Score: 0

      Well, no, at least not if someone doesn't give them the idea!

    2. Re:Torrent Poisoning by Mind+Booster+Noori · · Score: 1

      Maybe it's time for you to start using Anonymous p2p networks like GNUnet...

    3. Re:Torrent Poisoning by Anonymous Coward · · Score: 0
      Does this mean the RIAA/MPAA can poison torrents by generating files with the same hash?

      Could I buy some of that hash from you, Professor Jennings?
    4. Re:Torrent Poisoning by Anonymous Coward · · Score: 0

      Not at all.

      Bittorrent splits files up into pieces of about 64KB, then makes a hash of _each_ of those, as well as the entire file. They'd have to find collisions for at least one of the pieces and the whole file at the same time, and even then it's useless since things like MP3 have error recovery from corrupt data.

    5. Re:Torrent Poisoning by NFN_NLN · · Score: 1
      Bittorrent splits files up into pieces of about 64KB, then makes a hash of _each_ of those, as well as the entire file.

      Yeah so... that only ensures no one RUNS / USES a corrupt file at the END of the download. You can still poison a torrent.

      Example:

      User X starts downloading a torrent, then adds one ~64KB of corrupt data (as per example above) that still passes the MD5. Remember user X hasn't finished the total download so they don't MD5 the entire thing yet. However, other people inclusing user Y are still downloading these BAD pieces.

      After about an hour later on the XViD movie user Y finishes and does an MD5 on the ENTIRE file. Yup, it's poisoned. Can't trick me! I'll just download it again... oh no wait?! After N hours and a full MD5 hash I keep getting an invalid file... it's as if these BAD pieces were spread everywhere like POISON!!! Hence the term poisoning.

      Yeah, I know the complete file is shiat but it's too late and everyone keeps spreading the shiat!

    6. Re:Torrent Poisoning by Terrasque · · Score: 1

      Yes, yes yes...
      If you've actually read the article(s) (yeah, shocking idea, I know), you would have known that the attacker need to create BOTH sets of data that share the hash.

      So, the creator of the torrent could probably make some corrupt blocks to it... But you might see a tiny flaw in the logic there ;)

      --
      It's The Golden Rule: "He who has the gold makes the rules."
  17. How long till we see... by Anonymous Coward · · Score: 0

    How long till we see it integrated into cracking tools? Many Linux distros use MD5 for storing the passwords in the /etc/shadow file. How long will it take before they move on to SHA1 or Tiger or etc?

    1. Re:How long till we see... by Anonymous Coward · · Score: 0

      As far as I can see, it is pretty damn important for anything using MD5 for security purposes to update. Maybe it isn't so important for your CRAM-MD5 email authentication, but anything more important like on your corporate servers, etc, will need something better than MD5, and fast.

      Next year, CRAM-WHIRLPOOL authentication, hehe.

    2. Re:How long till we see... by Anonymous Coward · · Score: 0

      If they can get at your password file on your corporate servers, you have bigger problems than what hashing function is being used.

  18. Source?? by Anonymous Coward · · Score: 0

    I wouldn't exactly call this source code... this gives almost no explination as to why it works, I could have gotten about as much knowledge from decompiling a binary of this.

    1. Re:Source?? by yup2000 · · Score: 1

      yeah, it is a horrible example of source code. No comments whatsoever. It looks like the programmer deliberately took the comments out. Not very professional.

  19. am i missing something? by Intangion · · Score: 1

    this just means they can find more than one input that comes up with the same output it doesnt mean they can come up with an input to match a specified output (which can usually be protected anyway) so it really doesnt change much right? or am i missing something..

    1. Re:am i missing something? by Anonymous Coward · · Score: 0

      google the term "rainbow table"

  20. This is misleading - MD5 is still useful by hoggoth · · Score: 5, Insightful

    This new algorithm does not ruin the usefulness of MD5 hashes. The algorithm can generate two documents that have the same MD5 hash, an MD5 collision. But it can NOT generate an MD5 collision starting with an existing document. In practical terms, this means a file that has been signed with an MD5 hash is STILL secure. Nobody can replace the file with a different file that will have the same MD5 hash. However someone can prepare in advance two documents with the same MD5 hash and trick someone into believing one document is really the other. So if you trust the original source (a Linux distro for example) you can be confident you are downloading the original document.

    --
    - For the complete works of Shakespeare: cat /dev/random (may take some time)
    1. Re:This is misleading - MD5 is still useful by didit · · Score: 0

      So, this means that encrypted password are still safe too, right? (not to mention that everybody should be using shadow password by now and that root privileges are required to read /etc/shadow)

    2. Re:This is misleading - MD5 is still useful by MoogMan · · Score: 2, Insightful

      Nobody can replace the file with a different file that will have the same MD5 hash.

      Yet.

    3. Re:This is misleading - MD5 is still useful by miller60 · · Score: 1

      This is also important for SSL certificates, many of which use MD5. Existing certificates relying on MD5 are still secure, and new ones can be issued using different hashes. But this is one more motivator for NIST and the security community to decide on a way forward and start making it happen.

    4. Re:This is misleading - MD5 is still useful by swillden · · Score: 0

      This is also important for SSL certificates, many of which use MD5. Existing certificates relying on MD5 are still secure, and new ones can be issued using different hashes.

      Just to clarify: When you say "still secure", you mean "will be secure until the attack is hugely improved". x.509 certificates consist of a digital signature of a hash computed across a chunk of structured data and a public key. The collision generation method finds pairs of colliding plaintexts, but both plaintexts are effectively random. The odds of finding a collision whose plaintexts are both validly-structured x.509 descriptors and usable public keys are insignificant.

      To attack an x.509 certificate by attacking the hash that is signed, what you need to do is to find a plaintext that:

      1. Is a valid certificate structure, containing the appropriate paramaters and your identifying information (or something useful anyway) rather than the actual target's.
      2. Has a value in the public key portion that you can factor. Odds are good that you can factor a large random number.
      3. Hashes to the same value as the target's certificate and public key.

      Finding such a plaintext would be very, very hard even if you had a pre-image attack on MD5, which would allow you to generate plaintexts that hash to the known value. But this attack doesn't do that, so you effectively have to generate a long series of random hash values (each one taking at least a few minutes) and hope that you randomly run into one that matches some target certificate... then you have to hope that the plaintext in question is a value that looks like a cert.

      Cryptographers being the conservative sorts that they are, MD-5 is definitely deprecated due to this attack, but even a much better pre-image attack would be unlikely to call MD5-using certificates into serious question.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    5. Re:This is misleading - MD5 is still useful by Anonymous Coward · · Score: 0

      For now, but once wounded in this manner sometimes algorithms fall quickly to other attacks. Some don't. The fact that MD5 and SHA-1 have a weakness simply increases the likelyhood that they'll be broken completely by some other attack.

  21. Collisions do not mean the end of MD5 by afaik_ianal · · Score: 5, Insightful

    This more than anything should be the final stake in the heart of MD5, now that anyone can generate collisions whenever they want.

    No, no, no. This does not allow an attacker to generate any collision they like. They cannot find data that collides with a piece of data I provide them with. All they can do is provide me with 2 pieces of data that happen to collide.

    This means that an attacker can theoretically provide 2 different documents to people with the same hash, but they cannot easily produce a document that has the same hash as a document I have written.

    (Disclaimer: I haven't actually been able to RTFA (it's /.'d), but unless they have made an enormous breakthrough since this was last reported, this attack has very little implications for those of us who use MD5).

    1. Re:Collisions do not mean the end of MD5 by Anonymous Coward · · Score: 0

      Why don't mods ever check the posting date/time before marking a post as "Redundant"?

      Phht.

    2. Re:Collisions do not mean the end of MD5 by Anonymous Coward · · Score: 0

      If you say something poorly and someone else says it well, the better comment is the one readers ought to pay attention to, regardless of which one is older. And any comment admitting its author didn't RTFA should be expunged on principle--they can't have anything useful to add without even knowing what it says.

  22. "broken" does not mean broken by Edgewize · · Score: 5, Informative

    This program is an efficient way to generate two source blocks with the same resulting MD5. This program does NOT allow you to match an arbitrary MD5 hash. That may come some day, but unless I've missed a very important paper somewhere, it has not happened yet.

    This does not totally invalidate MD5 for verification. This attack still does not let you poison a torrent feed, etc, unless you are the author of the original source data and you engineered the data specifically to be vulnerable to this attack.

    1. Re:"broken" does not mean broken by petermgreen · · Score: 1

      btw torrent block hashes are sha1 not md5

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    2. Re:"broken" does not mean broken by Flunitrazepam · · Score: 1

      Does that mean we're not shaken to our roots anymore?

      Good, I found it rather uncomfortable

      --
      1) Your analysis is based on bad assumptions so your result is way off. 2) You're a sick bastard for fucking a horse.
  23. Will people stop saying this? by ivan256 · · Score: 2, Insightful

    This more than anything should be the final stake in the heart of MD5

    No, no it won't be. It won't be, because MD5 is useful for many things where the existance of an "easy" (in quotes because easy is relative) method of generating colisions is irrelavant.

    It won't even kill off the use of MD5 checksums as a signature for verifying authenticity, because if your data is smaller than the checksum there may not be a colision at all, and an exploit wouldn't matter.

    This is an important discovery, but it doesn't make MD5 useless any more than CRC32 is useless.

    1. Re:Will people stop saying this? by Phleg · · Score: 1

      It won't even kill off the use of MD5 checksums as a signature for verifying authenticity, because if your data is smaller than the checksum there may not be a colision at all, and an exploit wouldn't matter.
      Um, just because the file is smaller than the hash doesn't mean that there isn't a collision. The collision may be larger than the original file, but it's still a collision.

      Besides, what is the point on doing an MD5 for a file that is smaller than the hash itself? At that point, you can just do a diff. The entire reason for an MD5 for file verification is because it's a relatively small, easy to pass number so you can compare two small things, rather than two large things. Your example is completely nonsensical.

      --
      No comment.
    2. Re:Will people stop saying this? by ivan256 · · Score: 1

      The collision may be larger than the original file, but it's still a collision.

      Hardly matters if you're also verifying data length, or you have limited input length.

      Besides, what is the point on doing an MD5 for a file that is smaller than the hash itself?

      Millions of password files can answer this question for you.

    3. Re:Will people stop saying this? by Phleg · · Score: 1

      Millions of password files can answer this question for you.
      Irrelevant from the context of what you were saying. If I find a 491-bit value that collides with your eight-character password, your password is still cracked.
      --
      No comment.
    4. Re:Will people stop saying this? by m50d · · Score: 1

      In such situations, CRC32 is far simpler and just as effective. MD5 was designed as a cryptographic hash and isn't really good for anything else. Now we see it's no good as a cryptographic hash either. It's basically no good.

      --
      I am trolling
    5. Re:Will people stop saying this? by ivan256 · · Score: 1

      Really? Find me a version of PAM that allows 61 character input for passwords by default.

      I just checked Debian, and Suse defaults, and the maximums are 8 and 16 respectively. Not to mention that this exploit doesn't allow you to find a collision with an existing hash, but only to create two blocks of data with the same hash, *and* you don't have access to the hashed password.

      So, you probably can't find a 491-bit value that collides with my password in any reasonable amount of time, and even if you did it wouldn't do you any good because you couldn't enter it at the prompt.

    6. Re:Will people stop saying this? by Phleg · · Score: 1

      I just tested this on my Debian machine, and I was able to create a 92 character password on my first attempt, without having modified any PAM files. I'm sure higher than this is possible, but I didn't attempt it.

      Your point about not being able to find a value that collides with your password is also irrelevant from the original point you tried to make: "if your data is smaller than the checksum there may not be a colision at all", which is clearly bogus. As well, it's also likely to not hold true for much longer anyways. Since MD5 no longer has strong collision resistance, it's likely that others will soon find a way to generate collisions for existing hashes, thus taking away its weak collision resistance.

      --
      No comment.
  24. Got Salt? by Anonymous Coward · · Score: 2, Insightful

    OK, so clearly a scripted attack against MD5 is bad.

    But aren't most people using MD5 using salted (as opposed to unsalted) hashes? (for those unclear on the difference, a "salted" has basically uses a local seed as part of it's MD5 hash, in addition to the value to be encrypted)

    Doesn't seem likely that salted hashes can be easily broken by this technique, although clearly it's a concern that, should the salt value become known, all your passwords, etc, become breakable...

    1. Re:Got Salt? by Anonymous Coward · · Score: 0
      Salt in one way hashing makes no sense. Salting is possible in encryption because the data being encrypted is irrelevant if you know the key/method to decrypt. If you hash password+salt in order to verify that someone has logged in properly later on you need to hash password+salt again and the salt has to be identical each time. If you are using the same salt each time then you don't get any benefit from "salting" you are just padding the password with static data then hashing.

      Either way this isn't an issue, being able to generate two colliding byte streams is nifty and all but not practical for making MD5 dangerous to use for passwords/file authenticity etc..

    2. Re:Got Salt? by Anonymous Coward · · Score: 0

      isn't salt to protect against look up tables?

      if someone made a table of all 10 digit ascii combinations and thier hash, then they could do a reverse look up given the hash of almost any password. this would take a while to build, and use lots of space, but lookup could be very quick.

      if all the passwords on my system are stored with a salt 'bvaifva', then they would need a seperate lookup table for all the 10 digit ascii combinations.

      and on your system you use another salt, and they need a further lookup table.

    3. Re:Got Salt? by Anonymous Coward · · Score: 0

      Yes, salts do exactly that.

      Salting data before it is hashed, contrary to the grandparent post, serves a very important purpose. If (just to pick an idea at random) someone maintained a rainbow table lookup service, you could feed them a hash and they'd feed you a text password that matches that hash. You could then use that password to log into any system stupid enough to follow the GP's advice and not salt their password before hashing them.

      Some systems use the same salt for all of their passwords. This is better than no salt, but a custom-made rainbow table designed for that salt could still reverse-lookup every password on the system, once that salt is known. Hard computation-intensive work, but if the bad guys maintain a zombie army of a few thousand PCs, and your passwords protect valuable enough information, it's not out of the realm of possibility. It could take months or years, but once the rainbow table is generated your system is wide open (let's, for the moment, ignore password changing policies).

      So ideally you'd have a different (and totally random, minimum 64 bit) salt for every hash. The salt is stored right along with the hash, and protected just as jealously. Now if someone breaks into your system and gets your hashes and salts, they can do the same custom-distributed-rainbow-table-reverse-lookup attack as above, but the result of a successful customized rainbow table creation would be the ability to reverse-lookup one password. Then they'd have to do it again with the next salt, and so on.

      I wonder if anyone will read this.

  25. Not -that- bad by Parity · · Score: 2, Interesting

    The only attacks that these md5 collisions allow are denial-of-service/destruction-of-data attacks, they don't generally allow the compromise of protected data or access to systems or suchlike. The collision blocks that are generated are effectively random data. It has yet to be shown how to -craft- a collision block.

    If we could craft a collision block that contained a specified string at a specified position, that would be another issue altogether.

    The ability to find collision blocks easily does suggest that crafted collision blocks might be possible, but for now, you have as good a chance of getting a viable exploit out of /dev/random as out of a collision block.

    This doesn't mean we shouldn't look to other options for the newest releases of high-security software, but it doesn't mean that the md5 algorithm should be purged from our systems altogether either. It's still extremely valuable at detecting accidental corruption, and useful-with-caveats at detecting malicious corruption (45 minutes to discover a block of data that matches the sum is not really useful in either speed or resulting data for any kind of man in the middle attack, for example, so using md5 to validate network packets is safer than using it to validate disk files).

    Of course, the black hats may know more than we do about md5 weaknesses, but 'may know' is just as true of any other algorithm.

    --
    --Parity
    'Card carrying' member of the EFF.
    1. Re:Not -that- bad by Kjella · · Score: 1

      If we could craft a collision block that contained a specified string at a specified position, that would be another issue altogether.

      You can. You just can't make it based on an existing document, but you can create two files containing any specified content with the same hash. That means you have to make someone trust the first file and then abuse it using the second file. If you can maange to be subtle about it, well... For example, if you can create a source patch to an OSS project which makes one "sane" fix, and a few odd whitespace changes, adding spaces at the end of line or whatever. If they apply the patch as-is, the resulting binary could be one half of such a twin set. You could now distribute your twin as a legitimate version complete with MD5 and digital signature, while actually being a trojan. Convoluted but it could be done.

      Kjella

      --
      Live today, because you never know what tomorrow brings
  26. MD5 and verification by n0dalus · · Score: 4, Insightful

    Just because collisions can be generated doesn't mean that MD5 is dead.
    It might only take minutes to calculate two random strings with the same hash, but it would still take a very long time to calculate a second string that collides with a pre-existing string. So even though it is now cryptographically weak, it can still be used effectively to check the integrity of files.

    1. Re:MD5 and verification by flynt · · Score: 1

      And wouldn't that second string have to be 'meaningful' in the same context as the original, like code to be run?

    2. Re:MD5 and verification by MikeBabcock · · Score: 1

      This is lost on many people -- if you want to collision search a hash of Bob's message to Alice, you'll need to replace Bob's message with something equally meaningful.

      --
      - Michael T. Babcock (Yes, I blog)
  27. WHIRLPOOL by Anonymous Coward · · Score: 0

    Why not WHIRLPOOL (http://en.wikipedia.org/wiki/WHIRLPOOL), it is made by one of the makers of AES, Vincent Rijmen.

  28. MD5 is not an encryption algo by Junta · · Score: 1, Insightful

    It is a hash algo. It's used not to protect the content of anything, just to provide a method to validate content integrity, to show nothing accidental or intentional happened to change it.

    --
    XML is like violence. If it doesn't solve the problem, use more.
    1. Re:MD5 is not an encryption algo by san · · Score: 1

      It's also the default algorithm to hash passwords (i.e. if you type in your password, it gets hashed into an MD5 sum which is then compared to what the MD5-ed password should be, thereby avoiding plaintext password storage).

      Lucklily, most sane systems use salt so this algorithm won't work out of the box.

    2. Re:MD5 is not an encryption algo by swillden · · Score: 4, Informative

      t's also the default algorithm to hash passwords (i.e. if you type in your password, it gets hashed into an MD5 sum which is then compared to what the MD5-ed password should be, thereby avoiding plaintext password storage).

      Doesn't matter. This attack has no significant effect on password hashing, with or without salt.

      This attack allows you to find a pair of plaintexts that hash to the same value; you don't get to pick either the plaintexts or the hash value. It does not help you find a plaintext that hashes to a given value. To use this to attack an unsalted password hashing system you would need to first generate a collision, then convince the target of your attack to set one of those plaintexts to be his/her password, then you could log in using the other plaintext. But if you can convince the target to use a particular password, why not just use that to log in?

      This is not an insignificant cryptologic result, and people should move away from MD-5 as fast as practically possible (actually, people have been moving away from it for years due to some results against MD-4, which MD-5 is very similar to) but it doesn't really have any practical implications right now.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    3. Re:MD5 is not an encryption algo by san · · Score: 1

      Duh.. you're completely right. I should have RTFA :-)

    4. Re:MD5 is not an encryption algo by X · · Score: 1

      Actually, the completely appropriate description is that it's a *cryptopgrahic* hash algorithm. The key difference been that a cryptographic hash algorithm is difficult to invert (the "intentional" part of your statement) whereas other hash algorithms need not be (indeed many of the best algorithms for small hash tables are pretty easy to invert). They are critical components in most mechanisms for digital signing. Basically, with MD5 broken, you no longer have a secure way of verifying that the RPM's and .deb's installed on your system are the actual binaries and were distributed by the author of the packages. If you are using something like PGP with MD5 as the message digest, you no longer can trust that the messages you receive are actually from the sender baed on the cryptosystem .

      So yeah, this has huge implications in the world of digital security and cryptography.

      --
      sigs are a waste of space
    5. Re:MD5 is not an encryption algo by Glog · · Score: 1

      It is a hash algo. It's used not to protect the content of anything, just to provide a method to validate content integrity, to show nothing accidental or intentional happened to change it.

      You don't seem to understand... Having the MD5 hash of a piece of software and the ability to generate a collision for that hash will logically lead to more code which can take any dll/exe, inject malicous code along with whatever other random garbage characters are needed to produce the collision. Yes, the sizes will be different but even that can be fudged given the sophistication of today's hackers.

    6. Re:MD5 is not an encryption algo by lonb · · Score: 1

      Yes, that's true, however, it doesn't negate the reduction in MD5s value. For one: any system that uses one-way hashed password storage or key generation mechanism, as many sites use, is now some percentage easier to violate because multiple strings may resolve to same hash.

      --
      "Ain't I a stinka..." - Bugs
    7. Re:MD5 is not an encryption algo by swillden · · Score: 2, Insightful

      For one: any system that uses one-way hashed password storage or key generation mechanism, as many sites use, is now some percentage easier to violate because multiple strings may resolve to same hash.

      It's the case with all one-way hashes that multiple strings hash to the same value. Since an n-bit hash function can only produce 2^n distinct hash values, if there are more than 2^n possible input plaintexts, than multiple plaintexts hash to single values. In fact, since MD5 and similar can hash inputs of any length, on average there are an infinite number of plaintexts that hash to any given n-bit value.

      The existence of this attack doesn't change the existence or distribution of collsions... we always knew they existed. It just means that we can find some of them. Ideally, we'd like it to be practically impossible to find *any*, even though we know they exist.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    8. Re:MD5 is not an encryption algo by dotgain · · Score: 1
      That's not correct

      It was always the case that two values could collide to the same hash. You'd be a fool to think otherwise, there's only so many possibilities of hash output, but infinite possibilities of input. Therefore, for each hash value, there are an infinite number of plaintexts that will collide.

      Even at the design stage it was known there'd be collisions. They just thought they'd be near impossible to determine. Now it's not even difficult.

    9. Re:MD5 is not an encryption algo by dotgain · · Score: 2, Informative
      You don't seem to understand... Having the MD5 hash of a piece of software and the ability to generate a collision for that hash will logically lead to more code which can take any dll/exe, inject malicous code along with whatever other random garbage characters are needed to produce the collision. Yes, the sizes will be different but even that can be fudged given the sophistication of today's hackers. You don't seem to read the article. We don't yet have the ability to create a plaintext that will map to a predetermined hash value.

      While we might discover that "Donald Duck" has the same hash as "a", we don't get to choose either. We can't say "gimme a plaintext that hashes to this: 0x4faed...", all we can do so far is say "find two plaintexts that will give us the same hash"

      Not that this doesn't mean this is a very serious matter. My take on the situation is: "Oh. Shit."

    10. Re:MD5 is not an encryption algo by Haeleth · · Score: 4, Informative

      You don't seem to understand... Having the MD5 hash of a piece of software and the ability to generate a collision for that hash will--

      Stop right there, because it's quite clear that the one who doesn't understand is you. Nobody has the ability to generate a collision for a given MD5 hash. All we have is the ability to generate two bits of random junk that share the same hash. This makes some attacks possible, but it does not make it possible for you to distribute a malicious version of someone else's software that has the same MD5 hash as their version.

    11. Re:MD5 is not an encryption algo by Anonymous Coward · · Score: 0
      This makes some attacks possible, but it does not make it possible for you to distribute a malicious version of someone else's software that has the same MD5 hash as their version.

      Yes, but it does allow you to break software that you release. Consider:
      - A software company anonymously releases a warez torrent of their software.
      - The warez d00dz finish their download of the correct version and tell everybody that this torrent is 100% ok.
      - The company then starts seeding a second version that works incorectly but but that's not completely broken (ex. triggers random crash, installs trojan, virus, etc..)
      - New downloaders will have problems with the warez, call the d00dz that ok'd the software liars and be disillusioned with the sceen.

      This software company has thereby successfuly attacked the warez comunity. Is this not something we should protect our-selves from?

    12. Re:MD5 is not an encryption algo by Anonymous Coward · · Score: 0

      Hmmm...I don't know. It was quite impressive when they first revealed that they had broken this. They published two Postscript files (both of which where quite valid) and had some rather different wording in the message. It was something along the lines of being the difference of Bob telling Alice she is a wonderful boss in one and telling her to go fly a kite in the other. Both files had the same MD5 hash. Could be a bad day for someone.

    13. Re:MD5 is not an encryption algo by Al+Dimond · · Score: 1

      Um... I don't think you understood the point. Knowing the original data and the sum, this technique doesn't actually allow you to come up with another file with the same sum. It just finds random collisions.

      What is possible is the following: put one of the colliding pairs in the legit binary somewhere, and have two different versions of the binary, indistinguishable to MD5. Check that data when the program runs. If it matches exactly, run it. If it doesn't (it's from your poisoned version) install a rootkit and spyware on the machine.

      Now let's say we're using a P2P system that does seeding like BT but uses MD5. Correct me if I'm wrong, but I believe the sums would be taken against compressed versions of the binary. So you'd have to build your binary so that zipped it would contain the random data. I imagine that's a little harder. Also, it would only work on zip files. If someone tgz'ed it or rar'ed it you'd be stuck. And I'm not really in the warez community here, but shouldn't these big tracker sites have feedback? So users can see if a file is corrupted?

      I just don't see this causing confusion for very many users.

    14. Re:MD5 is not an encryption algo by ralmin · · Score: 1

      dotgain wrote: It was always the case that two values could collide to the same hash. You'd be a fool to think otherwise, there's only so many possibilities of hash output, but infinite possibilities of input. Therefore, for each hash value, there are an infinite number of plaintexts that will collide.

      In the situation where an unlimited length plaintext must hash to a limited length hash value, it is not necessarily the case that every single hash value has an infinite number of plaintexts that will collide. Certainly, there must be a number of hash values that do have an infinite number of corresponding plaintexts, but there may be some hash values that have a limited (or even zero) number of corresponding plaintexts.

    15. Re:MD5 is not an encryption algo by Anonymous Coward · · Score: 0

      "They published two Postscript files (both of which where quite valid) and had some rather different wording in the message"

      Nope. Both files contained both messages, and the MD5 hack was used to control an if-statement in the Postscript code. The only thing that demo showed was that if you're using hashes to sign critical stuff, the signing application must calculate the hash from some normalized version of whatever it is you think you're signing, rather than from the entire document container.

      (when you find yourself the emperor of the world, you can eliminate this risk by only signing stuff that you've generated yourself, and/or making sure you an application that always displays the entire document. text/plain is a good choice.)

    16. Re:MD5 is not an encryption algo by dotgain · · Score: 1

      Perhaps true, but that's not what matters. You were saying we now practically have a way of turning an MD5sum back into a cleartext password, and we don't.
      I'm not willing to try and generate every unique hash, because that's not the point.

    17. Re:MD5 is not an encryption algo by ralmin · · Score: 1

      dotgain wrote: Perhaps true, but that's not what matters. You were saying we now practically have a way of turning an MD5sum back into a cleartext password, and we don't. I'm not willing to try and generate every unique hash, because that's not the point.

      I was not saying we have a practical way of turning an MD5sum back into clear text. Obviously we don't, except by guessing passwords and comparing their hashes.

    18. Re:MD5 is not an encryption algo by dotgain · · Score: 1
      Whoops, I thought you were the person I replied to in the first place. You're right, you didn't say that at all.

      Sorry about that.

      I would have thought that each different MD5sum had an equal probability, but that's only based on a hunch. I have actually read the MD5 spec and implementation, but I'm not that at all good at statistics and all the formulae in those docs. So I wouldn't call myself a crypographer by any means, so I won't dispute that some checksums just won't be hit by anything, and you'll spend forever if you try to brute force it.

      I find it a little surprising, but I've been wrong and surprised before. Sometimes it really hurt.

      Cheers, Ben.

    19. Re:MD5 is not an encryption algo by Glog · · Score: 1

      Thanks for the explanation.

  29. Coral cache? by Viper+Daimao · · Score: 5, Funny

    (Coral cache links provided to prevent slashdotting)

    Im sorry, you must be new here.

    --
    "In the game of life, someone always has to lose. To me, if life were fair, that someone would always be Oklahoma." -DKR
  30. SHA-1??? by jd · · Score: 3, Insightful
    SHA-1 has known attacks, although none have (yet) proven to be useful for an exploit. The SHA-2 family (eg: SHA-256) are "unproven" and not part of the FIPS-180 standard (so cannot be used for US Government work), but I would regard them as being "probably safer" than SHA-1 for secure work.


    TIGER is good, as is Whirlpool. Whirlpool has the advantage that it uses AES as the basis, and AES is regarded as secure. It was also certified for secure work by NESSIE - a European group trying to do for the EU what NIST does for the US - which means that it's probably certified for use in secure environments in Europe.


    According to the Hashing Function Lounge, there are other hashing functions that have not been broken (eg: cellhash and fft-hash) but these are sufficiently obscure that a lack of a known exploit may be through lack of study and not through the presence of good security. It would make them good for beating skript-kiddies, as they won't have the skills to find exploits and those skilled enough at finding them aren't studying those algorithms much. (I don't like security through obscurity, but technically these aren't obscured as anyone CAN study the algorithm.)

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:SHA-1??? by poemofatic · · Score: 5, Informative

      Huh? The SHA-2 family have been standardized, approved by NIST, and recommended by the NSA as part of their suite B for some time now. They are *much* more proven than Whirlpool and required for government use, whereas Whirlpool is not allowed for government use. Look at the SHA-512, SHA-384, SHA-256 CMVP instructions and validation lists before you say that NIST has not approved these hashes.

      --

      When in doubt, have a man come through a door with a gun in his hand.

    2. Re:SHA-1??? by Kjella · · Score: 1

      Whirlpool has the advantage that it uses AES as the basis, and AES is regarded as secure.

      Yes, but why would that help? AES means that you can do (plaintext,key) -> cryptotext and (cryptotext,key) -> plaintext. If you have no key, (plaintext,null) -> hash means (hash,null) -> plaintext would be trivial. Basicly, I thought hashes had to operate on completely different principles than symmetric crypto. Please correct me if I'm wrong.

      --
      Live today, because you never know what tomorrow brings
    3. Re:SHA-1??? by Ed+Avis · · Score: 1

      Is there a simple library that provides these various hashing functions, so that when one is broken in future it'll be easy to swap it out and replace with another?

      As things stand now, going through all code to replace use of MD5 is quite a task.

      --
      -- Ed Avis ed@membled.com
    4. Re:SHA-1??? by Anpheus · · Score: 2, Informative

      You're half right, SHA-256, 384, 512 are part of FIPS 180-2. That is, they are approved for government use. FIPS 180-2 (PDF Warning!)

      Brief Caveat: SHA-1 is still more algorithmically secure, as finding a hash is not feasible on desktop computers. SHA-256 and higher length are in the same vein of logic that provides this security, but like any new algorithm, there is an insufficient amount of study to verify this. This applies to Whirlpool as well, AES might be sufficiently secure, but Whirlpool != AES.

    5. Re:SHA-1??? by ikegami · · Score: 1
      Whirlpool has the advantage that it uses AES as the basis, and AES is regarded as secure.

      Something everyone should know about cryptography is that "'A' is secure. 'B' is based on 'A', so 'B' must be secure." is a fallacy.

      Something else everyone should know about cryptography is that the performance of a crypto product at one task does not necessarily have any bearing on the performance of the same crypto product at another task. AES wasn't designed for hashing, so while it is regarded as secure for its task, that says nothing about its strength at hashing.

    6. Re:SHA-1??? by ChadN · · Score: 1

      You are wrong.

      For a very crude example, using your notation, try (plaintext,plaintext) -> cryptotext. Now, try to get plaintext back from cryptotext.

      Often, hashes are designed to be much faster than a symmetric encryption, so they don't directly use a symmetric cipher, but many of the same building blocks.

      --
      "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
    7. Re:SHA-1??? by jd · · Score: 1

      I'm usually half-right. Only, not everyone agrees on which half. Gives them something to argue over, which means I get lots of replies to read. :) I get your point, though, and do agree with you.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    8. Re:SHA-1??? by jafac · · Score: 2, Interesting

      For my application, SHA-1 incurred a HUGE performance penalty. (factor of 1000). Given that there are few other variables I am free to change in this system, which hash, among these others you mention, tends to be more lightweight?

      --

      These are my friends, See how they glisten. See this one shine, how he smiles in the light.
    9. Re:SHA-1??? by jd · · Score: 2, Interesting
      It depends a little on the implementation, but using mhash I get the following results when generating hashes of a gigabit file from DVD: (Results calculated by using gnu time, and using the user time, not the real time, result)


      • SHA-1 - 16 seconds
      • SHA-512 - 1 min, 17 seconds
      • Haval - 27 seconds
      • Whirlpool - 1 min, 14 seconds
      • Tiger - 18 seconds


      And then, when generating hashes of a 5 gigabyte file on my hard drive (there's been a lot of good stuff on Freshmeat, recently):


      • SHA-1 - 2 seconds
      • SHA-512 - 3 seconds
      • Haval - 5 seconds
      • Whirlpool - 5 seconds
      • Tiger - 3 seconds


      I'm going to guess, on the basis of these results, that you want to try Tiger as a first choice, Haval as a second choice, and that the mhash implementation will probably be OK for Tiger but if you opt for Haval you're probably better with a different hashing engine.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    10. Re:SHA-1??? by kasperd · · Score: 1

      For my application, SHA-1 incurred a HUGE performance penalty. (factor of 1000).

      Sounds a litle unlikely, SHA1 is not that slow. I did some timing on my computers, and SHA1 speed was in the range 8MB/s to 70MB/s (K6 and Athlon CPUs). Now if you could do something a thousand times faster without SHA1, it would have to be at least 8GB/s. Even my fastest computer would have a hard time doing memcpy at that speed.

      --

      Do you care about the security of your wireless mouse?
    11. Re:SHA-1??? by Splab · · Score: 1

      How the hell did you manage to read and hash 5 GB of data in 5 seconds?

    12. Re:SHA-1??? by jd · · Score: 1

      That's strictly CPU time, not I/O time.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    13. Re:SHA-1??? by jafac · · Score: 1

      Without talking too much about proprietary specifics, the application I'm talking about is MS Word validating code-signing on a macro template.

      Yes.

      Pity me.

      I'm starting to think that the only solution is to re-architect my product. Problem is - it was designed 10 years ago, and code signing wasn't in anyone's mind as a requirement at that time. Nobody even mentioned a need until about a year ago. (given that this product runs on a system that's not connected to a network that's accessible to the internet). My customer definately wont fund that. Nor will they find the performance hit acceptable.

      I slapped-on code-signing to see if it would work, and using Microsoft's Certificate Server as a CA, to generate a key to sign the Macro template, the Macro template now takes (on average) about 1000 times longer to open than it did before. My head hurts. I need a new job.

      --

      These are my friends, See how they glisten. See this one shine, how he smiles in the light.
    14. Re:SHA-1??? by kasperd · · Score: 1

      I slapped-on code-signing to see if it would work, and using Microsoft's Certificate Server as a CA, to generate a key to sign the Macro template, the Macro template now takes (on average) about 1000 times longer to open than it did before.

      So a lot of other stuff is going on, and you don't really know what part of it is spending the time. Signatures means asymmetric cryptography is needed, and that is a lot slower than the symmetric primitives. So as long as you are only signing small amounts of data the signature itself will require more time than the SHA1 hashing.

      --

      Do you care about the security of your wireless mouse?
  31. So you found a collision, big deal by goombah99 · · Score: 3, Interesting

    Maybe someone could explain why collisions are a serious problem for MD5. Or at least in what instances they are. I can see that in some cases, such as password hashing this could be a problem. But for many other uses I don't see what harm a collision has. Maybe I misunderstand but as I understand it MD5s are normally used in a checksum manner to sign or provide a fingerprint of a document. If you have an original document and compute it's MD5 then it can match some certified MD5 check sum. If someone were to generate a fake document they coul dnot design it to match the MD5 fingerprint. They could create some bit of gibberish that did match it but not a document that was useful as a forgery.

    I guess if one were trying to deliberately pedal gibberish, like say if you were the RIAA trying to destabilize a torrent net then that would be all you care about. But for more general issues I don't get it.

    Or is it now possible to generate a collision that also contains some intentional content like a binary program or source code or bank statement. That woul dbe be bad.

    It seems like even in the case of gibberish generation that some simple hacks could extend the useful life of this. For example, if you were to MD5 a whole document and then MD5 the concatenation of every other byte in the document, it woul dbe pretty hard to find two collisions that had that property simulateously. Sure I wont doubt there are better ways to hash something than adding hacks to MD5. I'm just saying that it seems like a simple hack might well be good enough to extend its useful lifespan for passwords and file shareing.

    But I invite you to explain to me why I'm wrong.

    --
    Some drink at the fountain of knowledge. Others just gargle.
    1. Re:So you found a collision, big deal by Krischi · · Score: 5, Informative

      See this: http://www.cits.rub.de/MD5Collisions/

      It demonstrates the generation of two postscript files with the same MD5 hash that nevertheless display completely differently.

    2. Re:So you found a collision, big deal by andyh1978 · · Score: 4, Informative
      Maybe someone could explain why collisions are a serious problem for MD5. Or at least in what instances they are. I can see that in some cases, such as password hashing this could be a problem.
      It's not a problem in password hashing. There is still no feasible way to compute one of the infinite plaintexts that would generate a given MD5 from just the MD5. Rainbow Tables are the main threat there, but they're defeated by salting (e.g. HMAC-MD5) as you have to regenerate the tables all over again (and find the salt in the first place). It doesn't hurt to go to a larger, more complex hash, but for this purpose, there's no additional worries. It's still "preimage resistant".
    3. Re:So you found a collision, big deal by iabervon · · Score: 3, Insightful

      This generates "weak collisions", which are where the attacker finds a pair of texts which hash to the same value, not "strong collisions", which are where the attacker finds a text that hashes to the same value as a text chosen by the user. So, someone could now set their password to some string, and then be able to type a different string to get in. (Except that neither are likely to be ascii text, making it a bit tough to type).

      The actual issues are for document signing; the attacker could give you one document to sign, and use the signature on a different document with the same hash. There are smaller issues in the case where code expects no two documents to have the same hash. Obviously, collisions must exist, but the code to handle the case is likely not to be well-tested, since test cases were previously impractical to find.

    4. Re:So you found a collision, big deal by Martin+Blank · · Score: 1

      Think of access control. Hashes are sometimes passed over cleartext (which has its own issues, and aren't the point here) because the hash has to be generated from the password before being sent, requiring the password be known. If you can come up with another password that hashes to the same thing, then you don't have to know the original password, but can gain the same access as another user.

      Making changes to MD5 is simply bandaging it, and may introduce new vulnerabilities.

      --
      You can never go home again... but I guess you can shop there.
    5. Re:So you found a collision, big deal by Beyond_GoodandEvil · · Score: 1

      This attack still requires trust, the example breaks down if Caesar generates the document himself. If Caesar generates the file then he knows the MD5 hash. The example works by having the two files: order which will hash to O' and letter which will hash to L'. The attack uses the fact that for some A and B the hash of O'+A' == L'+ B' where you stick the data that is A and B into the commented out parts of the Post script file. However, if Caesar compares the MD5 hash of his letter to the attacker he would see the O' != O'+A'.

      --
      I laughed at the weak who considered themselves good because they lacked claws.
    6. Re:So you found a collision, big deal by iLogiK · · Score: 1

      the problem is that if someone knows the MD5 of a password, they can use this code to generate another password with the same MD5. since passwords are usually stored hashed, an attacker wouldn't need to know the original password, only another password that would generate the same MD5

    7. Re:So you found a collision, big deal by Buzz_Litebeer · · Score: 1

      Can you provide an example of a file that has 2 legitimate pre-images that hash to the same function?

      IE,

      Some text here

      and

      Some text there

      That both equal the same hash?

      Its similar to the Alice problem described, but what he has posted seems like a theoretical attack without any concrete examples to prove that it can even happen.

      Though he does say he links to one such example, but is it practicle to do?

      --
      If you don't vote, you don't matter, so don't waste your time telling me your opinion
    8. Re:So you found a collision, big deal by andyh1978 · · Score: 2, Informative
      the problem is that if someone knows the MD5 of a password, they can use this code to generate another password with the same MD5. since passwords are usually stored hashed, an attacker wouldn't need to know the original password, only another password that would generate the same MD5

      How?

      The collision vulnerabilities do not allow this. They require both the MD5, and the original plaintext, to produce a modified plaintext that has the same MD5.

      Think about it - how do you know it's a collision at all, unless you have the original plaintext? A collision is two different plaintexts that produce the same MD5. You can't know if you have a different plaintext unless you have the original plaintext.

      If you had the original plaintext, that means you've got the original password, so collisions are entirely irrelevant. You've already got what you need to log in.

      There is still no way, other than the brute force enumeration which is made easier to look up through Rainbow Tables, to get from an MD5 to a plaintext that hashes to that MD5 value. The discovery of methods to produce collisions has not weakened MD5 any further - so far only the increase in computing power to produce Rainbow Tables has weakened this particular use. But trivial salting of the values makes Rainbow Tables useless once more.

      The use of MD5 as a method to checksum files has been blown out of the water, of course. That's the other use, which I'm not arguing about at all. You know both the plaintext and the MD5 there, because you've downloaded the file and the MD5 for the file which you trust can't be forged - which is no longer true.

    9. Re:So you found a collision, big deal by slashdotmsiriv · · Score: 1

      I maintain my position that MD5 collisions is a non-issue. The described attacks would only work when you are dealing with active document types and only for the particular attack scenarios. Because of the unfortunate properties of SHA-1, MD-5 like hash functions if you find collisions x1, x2 such that hash(x1) = hash(x2) you can have the two different string x1||S and x2||S hashing to the same value, hash(x1||S)=hash(x2||S). This is because these hash functionality is usually implemented as follows: start by hashing the first block (substring) of the string then take the output and hash it together with the second block, take the output and hash it with the 3rd block etc. If the first block of two different strings hashes to the same value while the rest of the string is identical you will get a collision. What these guys in Germany do is that they find a collision h(R1)=h(R2) (using the well known technique from China) enter it at the start of the ps document and then generate two different documents that look like this: A || "if A == R1 present S1 else present S2" || S1 || S2 . The first document has A=R1 and the second A=R2. Obviously these documents differ only by A and because A hashes to the same value they both hash to the same value. Caesar sees the presentation of the document that contains R1, which is S1, he likes it and signs its hash. Alice takes this signed hash and presents it as the signature of a document that contains R2 and effectively. But this document prints S2 and Caesar does not like S2. This is only usefull if Caesar is stupid enough to sign a document that looks like this. Of course, Caesar being aware that he runs an empire..., will for sure look at the source code of the documents he signs. Some people also referred to the problem that arises with signing binaries or source code. As somebody else mentioned in this forum, in these cases you trust the signer and the trusted signer (e.g M$) would never create a binary or source code file that looked like: R1 || "if A==R1 run good application else run virus" ||good application||virus. Even if the signer/coder was idiot enough to do it, you would still have to find R2 such that hash(R1)=hash(R2). In this case you should be able to find second pre-image of the MD5 hash, not a collision. Second pre-image means, that given an y = hash(x) you can find x'such that y=hash(x'). Nobody has even come close to devising a technique for that. If you could find second pre-images, you would be able to wreak havoc, for example by distributing fake Linux distros with the first block of the iso's being an MD5 second preimage of the valid distro. In conclusion, I will still use MD5 and SHA-1 until they come up with a practical attack. I think these guys have had enough publicity for nothing already.

  32. Lost by Anonymous Coward · · Score: 0

    great, will someone crack http://thedharmainitiative.org/ already?

  33. See pam_unix2 for blowfish in shadow files by Anonymous Coward · · Score: 1, Informative

    $2$blah instead of $1$blah MD5 http://www.thkukuk.de/pam/pam_unix2/. Quite useful.

  34. one case I can think of... by Anonymous Coward · · Score: 0

    Filesharing networks that use MD5 hashes to verify a file would be severely affected. Companies like Overmind that spam the networks could use the collision to generate junk files with matching hashes. Then when clients start downloading, they'll get pieces of the broken file instead of the real one, causing the whole download to be corrupt.

    1. Re:one case I can think of... by hoggoth · · Score: 1

      > Filesharing networks that use MD5 hashes to verify a file would be severely affected. Companies like Overmind that spam the networks could use the collision to generate junk files with matching hashes. Then when clients start downloading, they'll get pieces of the broken file instead of the real one, causing the whole download to be corrupt.

      Not at all.
      If a legitimate user puts a good file in the network, that file will have a unique MD5 hash. Overmind cannot create a file or pieces of files that match the MD5 hashes of the original file, poisoning it. They did not create the original file, so they cannot create another file that shares the same MD5 hash. If they create the original file, then they could create a second file that shares the same MD5 hash. But why the hell would they bother? If they created the original file, just make it junk and give it a really interesting name and put it on the network.
      Any jerk can put a bad file in the network with a filename that sounds really interesting. They always could, and they still can. That has nothing to do with MD5 hash collisions.

      --
      - For the complete works of Shakespeare: cat /dev/random (may take some time)
    2. Re:one case I can think of... by networkBoy · · Score: 1

      While not dealing with the current limitation of MD5, one could poision an MD5 protected P2P stream as follows:
      Create rainbow table for a large number of 64K blocks.
      Sign onto torrent
      oppourtunistically replace blocks with garbage if/when they match a hash you have a table entry for.

      It only takes one block per torrent to invalidate some of the downloads, others may require more blocks. Over time you will be growing your rainbow table and so can provide more poision.
      -nB

      --
      whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
  35. For windows users by Psionicist · · Score: 1

    If you get the error "error getting crypto context..." replace

    if(!CryptAcquireContext(&cryptHandle, NULL, NULL, PROV_RSA_FULL, CRYPT_NEWKEYSET ))

    with

    if(!CryptAcquireContext(&cryptHandle, NULL, NULL, PROV_RSA_FULL, 0 ))

    To actually run the program you have to convert your MD5 to four ints. Take your MD5, such as 098f6bcd4621d373cade4e832627b4f6

    Divide it in four pieces and convert them to dec.

    Hex
    =======
    098f6bcd
    4621d373
    cade4e83
    2627b4f6

    Dec
    =======
    160394189
    1176621939
    3403566723
    640136438

    Good luck.

    1. Re:For windows users by deischi · · Score: 1

      Alternatively if you get the error message and don't want to recompile, it probably helps if you run md5coll as Administrator. (At least that helped for me)

    2. Re:For windows users by Psionicist · · Score: 1

      Oh, silly me. You don't have to convert, just split the MD5 into four and append each part with 0x so the program understands you are writing in hex.

      For example: program 0x098f6bcd 0x4621d373 0xcade4e83 0x2627b4f6

    3. Re:For windows users by n0dalus · · Score: 1

      I think you have misunderstood the way this works.
      The arguments passed to this program are the initialization vectors for the algorithm, not a hash to generate collisions for.

  36. This is bad. by Boinger69 · · Score: 1

    This is bad, Now its only a matter of time before code like this is used to corrupt P2P networks whos primary file checking is based on md5 hashes.

    1. Re:This is bad. by JPyun · · Score: 1

      You can't make an arbitrary hash. P2P is fine.

  37. Next Version by winphreak · · Score: 1, Redundant

    So... When will MD6 come about? (yes, a weak version number pun, I know)

    --
    "I'm a well-wisher, in that I don't wish you any specific harm."
    1. Re:Next Version by handslikesnakes · · Score: 1

      Christ, it's not even a pun.

  38. quick and dirty benchmark (factorial) by Anonymous Coward · · Score: 1, Interesting

    a very simple perfomance check i love to run on every computer i come across:

    put windows calculator in scientific mode (to keep things comparable, please use windose calculator, not some custom written C+ program....)

    type in 100,000

    hit the n! button

    ignore the warnings that it will take a long time, don't even bother clicking on "Continue", because the calculation is still going.

    and report how long it takes to complete a factorial of 100,000

    please report what CPU you have.

    P4 3.2Ghz and Athlon 3200+ both do it in about 80 seconds....

    1. Re:quick and dirty benchmark (factorial) by klep · · Score: 1

      kcalc (KDE calculator) on a P4 3.4Ghz does it in less than 20 seconds.

    2. Re:quick and dirty benchmark (factorial) by Anonymous Coward · · Score: 0

      Athlon 2000+

      121 seconds

      (I clicked continue :-p)

    3. Re:quick and dirty benchmark (factorial) by Rafikichi · · Score: 1

      AMD Athlon 2600+ 100000! = 2.8242294079603478742934215780245e+456573 ~ 90 secs

    4. Re:quick and dirty benchmark (factorial) by __aaxwdb6741 · · Score: 1

      Result:
      2,8242294079603478742934215780245e+456573

      time: 3:15
      or: 195 seconds.
      Average CPU usage already: 10%
      CPU: Intel Pentium IV, 2.4GHz @ 2.6Ghz (18x144MHz)

    5. Re:quick and dirty benchmark (factorial) by Dragon218 · · Score: 1

      Instantaneous.

      The result:
      10101100000110001011100110100101100000000000000000 00000000000000

      --

      "It's the little touches that make a future solid enough to be destroyed" --William S. Bourroughs
    6. Re:quick and dirty benchmark (factorial) by Anonymous Coward · · Score: 0

      Dual 2.4GHz Xeon (note Windows calc only used 1 processor)

      Windows Calc -> 105 sec

      Mathematica 5 -> 1.26 sec

    7. Re:quick and dirty benchmark (factorial) by Anonymous Coward · · Score: 0

      funny! click the little decimal button, not the binary button

    8. Re:quick and dirty benchmark (factorial) by SoloFlyer2 · · Score: 1

      Gcalctool

      1 Second

      The answer is "Error" :)

      --
      "I reject your reality, and substitute my own" - Adam Savage
    9. Re:quick and dirty benchmark (factorial) by Tongo · · Score: 1

      P4 2.8Ghz in about 118 seconds with a crapload of other stuff running in the background.

    10. Re:quick and dirty benchmark (factorial) by Anonymous Coward · · Score: 0

      530 seconds!!! (eight minutes and 50 seconds)

      Celeron 600MHZ, Coppermine core (on die cache)

      Windows ME

    11. Re:quick and dirty benchmark (factorial) by 80+85+83+83+89+33 · · Score: 1

      tried it on a Dual G5 Mac, with it's built in calculator, but it gives an error with a number that big!

      --
      i disable sigs
    12. Re:quick and dirty benchmark (factorial) by Fusen · · Score: 1

      50 seconds on a A64 3200+

    13. Re:quick and dirty benchmark (factorial) by tom8658 · · Score: 1

      Whats your setup? My 3400+ does it in 77sec.

    14. Re:quick and dirty benchmark (factorial) by Fusen · · Score: 1

      MSI K8N Neo A64 3200+ 2x 512 DDR3200 winxppro sp2

    15. Re:quick and dirty benchmark (factorial) by __aaijsn7246 · · Score: 1

      About 140 seconds on an AMD Athlon 2200+ (1.80 GHz, 1.25 GB ram)

    16. Re:quick and dirty benchmark (factorial) by Anonymous Coward · · Score: 0

      2.8242294079603478742934215780245e+456573

      246.5 seconds, using 1.6 Megs on a pentium 4 2.53Ghz w/ 512 RAM that currently has 1597:18:39 uptime (66.5 days) running Windows XP service pack 2, running a 128k audiostream that uses about 2% of my cpu but somehow typically knocks 10% off of my performance in cpu-bound tests.

    17. Re:quick and dirty benchmark (factorial) by jesushaces · · Score: 1

      37 seconds, P4 3GHz :) Cool idea.

    18. Re:quick and dirty benchmark (factorial) by tom8658 · · Score: 1

      hmmm... The only difference is that I'm running a 3400+ on an Asus K8T800 Deluxe, any idea why there's such a large discrepency?

    19. Re:quick and dirty benchmark (factorial) by Council · · Score: 1

      Your sig mentions "funniest slashdot comment ever" but just gives the number. I can't figure out how to find a comment without the story ID, though the comment numbers are unique. Help?

      --
      xkcd.com - a webcomic of mathematics, love, and language.
    20. Re:quick and dirty benchmark (factorial) by Anonymous Coward · · Score: 0

      See? I told you Gnome is slow! It takes one whole second to determine that 100k is too big to factorialize!

    21. Re:quick and dirty benchmark (factorial) by mixmasterjake · · Score: 1

      Took about 157 seconds on a Pentium M 1.2

      --
      TODO: come up with a clever sig
    22. Re:quick and dirty benchmark (factorial) by UncleFluffy · · Score: 1

      Don't have windows calculator (not in windows right now), but just as an FYI:

      Dual Opteron 244, 2G RAM, Maple 10 ... 0.76 sec

      --

      What would Lemmy do?

    23. Re:quick and dirty benchmark (factorial) by Anonymous Coward · · Score: 0

      37 seconds? that's twice as fast as it should be... were you using windows calculator, or another calculator program?

    24. Re:quick and dirty benchmark (factorial) by 80+85+83+83+89+33 · · Score: 1
      --
      i disable sigs
    25. Re:quick and dirty benchmark (factorial) by Anonymous Coward · · Score: 0

      119 seconds on a 1.6GHz Pentium M with 512 MB RAM, running WinXP SP2.

      I wonder how come it seems to take relatively long for the others. Besides calc, I have Firefox, Putty, Thunderbird, Notepad and Citrix ICA Client / Lotus Notes running.

    26. Re:quick and dirty benchmark (factorial) by JediTrainer · · Score: 1

      1 minute 58 seconds (I had the clock open) on my IBM T42 notebook, which is a 1.6Ghz Pentium M.

      --

      You can accomplish anything you set your mind to. The impossible just takes a little longer.
    27. Re:quick and dirty benchmark (factorial) by jesushaces · · Score: 1

      Well, have tried it another 10 times, (windows calc.exe) and i've been getting 2 results mostly: 3 times i got it in less than 40 seconds, and 7 times i got it in 75(+/-2) [weird, isn't it? i've no idea why, since i didn't start any other applications, and it isn't as if the faster times were together (they were trials # 4,8,10) On a sidenote, Maple 9 took 0.8 seconds (fully expanded)

  39. hash collision while file sizes still match? by Anonymous Coward · · Score: 0

    So, not having read the FA, does this tool enable me to find collisions for different files *that are the same size*? or can I use size to discriminate?

  40. MD5 is still useful. by Theovon · · Score: 1, Informative

    This program generates ARBITRARY collisions. Given a tarball of a Linux kernel, you can generate some other file with the same MD5 hash. But can you generate a collision that is also a valid tarball that unpacks cleanly and compiles? The chances of that are so remote that I don't see it happening any time soon.

    Here's the real trick. Take your kernel tarball X, and your hacked version Y. Using this collision finder, can you find some garbage Z such that Y appended with Z has the same hash as X? (Tar will, however, complain about extra stuff at the end of the tarball, but it would unpack and compile.) That's a MUCH harder problem than finding arbitrary collisions and would take a HELL of a lot longer to produce than 45 minutes on a PC.

    1. Re:MD5 is still useful. by xquark · · Score: 1

      Actually you are incorrect, their code clearly allows the user to set the internal IVs of the md5 before finding
      collisions, what one can do is md5 the tarball get the state of the IVs and then pass them onto the md5col thing
      along with the required hash. It may take longer than 45mins but it will happen before the next release of the
      linux kernel is out.

      This works if the implementation of the tar ignores the garbage you concatenate to the end of the tarball.
      The problem here is some implementations have a running CRC which is concatenate at the end, attempt to modify
      the crc so that it takes the garbage into account and tarball wont be able to extract the archive, don't place
      the CRC there and the utility will post a big ugly message on the screen of an error. Happen to be using tar in
      a chain of commands and the result is the whole chain will fail.

      Arash Partow

      --
      Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.
    2. Re:MD5 is still useful. by Anonymous Coward · · Score: 0
      This program generates ARBITRARY collisions. Given a tarball of a Linux kernel, you can generate some other file with the same MD5 hash.
      You just proved how ignorant of cryptography you are with those two sentences. In your first sentence you restated that this is a collision attack, while in the second you imply it is a pre-image attack (which it is not).

      Stop spreading FUD and read some basic cryptography literature about hash function attack scenarios.
  41. SHA-1 isn't much safer, try SHA-256 or higher by Anonymous Coward · · Score: 0

    Any half-way intelligent cryptographer would have suggested SHA-1, TIGER or perhaps HAVAL since quite some time already.

    Actually collisions in SHA-1 were confirmed in February of this year, and refined in August. Any half-way decent cryptographer would be using SHA-256 or, better yet, SHA-384 or SHA-512. We've got the disk-size and bandwidth these days not to be worried about a few extra bytes. Bruce Schneier's initial article on this is instructive.

  42. Its not the end of the world... by Anonymous Coward · · Score: 0

    C'mon folks - yes there are plenty of uses for MD5 and yes MD5 is being used for a lot of applications now. However, it doesn't completely destroy its usefullness until we find a better replacement in the apps. Yes, 2 items can have the same hash - however it is unlikely to occur without trying to do it intentionally.

    Remember the hashes, keys, doors, traps and such to keep unwanted folks from touching your data are only a way of keeping honest people honest. If you have honest people around - there is nothing to worry about.

  43. Still not a preimage attack by ^BR · · Score: 1

    It's the same attack that has already ben spoken about, just now it is available for the masses.

    The scope of the attack is one can generate two files having the same MD5 sum, if he can have a random looking section in the middle of the file. i.e. possible in many binary formats but not possible in well formed ASCII text.

    What the attack doesn't do is given a MD5 hash being able to find a byte stream that hashes to the same value. So passwords stored as their MD5 sums are still safe, you can't attack the RADIUS protocol with it and constructs like HMAC-MD5 used in SSL and IPsec are still safe. What you cannot trust anymore is for example the mechanism used to check distfiles on some BSD where the port system check the MD5 of the freshly downloaded archive. Nothing proves it is the same one that the porter used for the port (OpenBSD has been safe for a few years checking not only MD5 but SHA1 and RIPEMD, dunno for the other BSDs). And certificate authorities that don't modify the CSR they are submitted also are vulnerable to people forging certificates. Any serious CA won't be caught doing that mistake again.

    One of the big lessons of these attacks on cryptographic hashes is : do not ever sign the hash of a document you didn't generate or at least modified (the document, not the hash).

  44. Weak code. by kg_o.O · · Score: 4, Funny

    This code is weak. I fired it up like 20 minutes ago and still haven't r00ted my box.

  45. Curious -- Salts... by Anonymous Coward · · Score: 0

    I know very little to absolutely nothing when it comes to encryption, so bear with me...

    BUT I have a few sites where back in the day the easiest way for me to provide cross server verification was to use MD5 (one was a M$ Server and the other was a Unix box...we had a tool written in ASP and there were a few compiled components that are pretty blackbox or I would have ported the whole damn thing to PHP a long time ago...) But at the time MD5 was the only hash alg that I could implement in both without having to install all sorts of libraries (and for some reason, the standard crypto librariest STILL won't install on my PHP server -- I've taken to piggybacking off of a local MySQL server to do this these days...kinda horrible to call a query just to return an encrypted value...but that MySQL install isn't used as I have a dedicated box these days).

    But back to the point, even then, I didn't feel MD5 was that secure...but I thought I had something that might work.

    In session variables, I have several bits of data -- for instance, the email address. I use that I add a 64bit string of gibberish that is randomly rotated in a table on my two servers as a salt (say for the moment, this is secure :-).

    To get to my question -- given that only part of the data is ever given out to the user, how hard is it use this sort of algorythm to figure out the current salt I'm using?

    I'm just wondering if this application will invalidate security protocol and if I will have to go back and revamp everything. Again, I know nothing, and I assume several folks that know nothing but enjoy talking out their ass will respond, so give me your best gibberish and I'll try to understand (I am, after all, a research scientist...I just know little about encryption and programming, but I can throw together crap just as well as any other monkey throw shit at the screen).

  46. This cannot be used to crack passwords by ^BR · · Score: 1

    It can be used to generate two byte streams having the same hash, for some applications a big enough problem to render MD5 unusable.

    1. Re:This cannot be used to crack passwords by Fallingcow · · Score: 1, Informative


      This cannot be used to crack passwords

      It can be used to generate two byte streams having the same hash[....]
      ... which cracks a password.

      The MD5 (or SHA, or whatever) password authentication, the hash is the only thing stored on the machine. When a user inputs a password, the input password is hashed using the same algorithm, and that's compared to the stored hash.

      So, a colision would create the same hash, and would thus authenticate just like the real password, assuming it didn't break some input size limit or something.

    2. Re:This cannot be used to crack passwords by cnettel · · Score: 2, Informative

      The program generates byte stream A and B, both having the same hash C. You can't input C and get an arbitrary B out (with this algorithm).

  47. could be fun to break apps by Anonymous Coward · · Score: 0

    rsync is one of many applications that makes use of md4 and md5. It uses md4 and md5 hashes as the final check to identify blocks don't need to be transmitted.

  48. We're doomed by ThereCanBeOnlyOne007 · · Score: 0, Redundant

    Dooooomed.

  49. Q and A by Sheepdot · · Score: 4, Informative
    For those of you who store passwords as hashes in your web apps, I've developed a little "Q & A" post here that explains this as best as possible.

    Question: Does this mean all my MD5 passwords for all my users can be cracked?
    Answer: The short answer is yes, they can be cracked. The long answer is no, not if you used a salt, and the attacker has to get those md5 hashes first. You are not safe you are storing your user's password field input directly to the database ala the php/sql method of:
    INSERT INTO users VALUES ('user','" . md5($password) . "');

    Question: How should I remedy this?
    Answer: Always use a salt or salts. For example in the case above you could use this php/sql method instead:
    INSERT INTO users VALUES ('user','" . sha1(md5($password . '¥1i9k') . 'a-thirty-five-ch4racter-l0ng-str1ng' . md5($password)) . "');

    Question: How/Why is this safer?
    Answer: Collisions are only direct input for the md5 function to get the same md5 hash. So in the above case, $password was directly taken from the user and made into a hash. Assuming an attacker got an SQL injection in and grabbed the database, they could run this collision creator on a hash and produce an input that gave that exact hash.

    But, this would be much more difficult with any code that introduced a salt. That is why the second code is better, it includes two salts that the attacker (through his SQL injection) is unable to account for.
    1. Re:Q and A by Sheepdot · · Score: 1

      I'm sorry, I'm noticing in other posts here that this does not actually create a valid input to create a collision based on an existing hash, so ignore the parent for now.

      This code just creates two inputs that will collide. Knowing this, I have to say that it looks like it might be a little while yet before MD5 is useless. But the days are certainly ticking down. Might not be a bad idea to start changing the way you salt or store your passwords now.

    2. Re:Q and A by Anonymous Coward · · Score: 0

      It's best to also use CHAP (challenge hash authentication protocol) and never send the same hash across a network. There are 2 weak traffic points in web apps: client to webserver and webserver to DB server (Assuming the webserver and DB server are different machines). Salting will cover webserver to DB server, CHAP will cover client to webserver.

    3. Re:Q and A by Terrasque · · Score: 1

      The mods that modded this insightsful should be modded either +1 funny, or -1 clueless..

      It's, as original poster said in reply, code for making collisions (two files having same hash. NOTE: That means attacker can create both datas, not that attacker can find one set of data and then make a collision on that), not (for example) crack md5'ed passwords.

      The salt advice would however help against rainbow table type attacks.

      --
      It's The Golden Rule: "He who has the gold makes the rules."
    4. Re:Q and A by cortana · · Score: 1

      If possible, use the interface your database provides to your operating system's crypt(2), rather than doing it yourself. :)

    5. Re:Q and A by CodeRx · · Score: 5, Interesting
      sha1(md5($password . '¥1i9k') . 'a-thirty-five-ch4racter-l0ng-str1ng' . md5($password))

      This is a very bad password salting scheme and vulnerable to a dictionary attack. Once I have your database and salts, I can run a dictionary of common passwords through your scheme and crack any weak passwords.

      You can make things much harder by having your salt change for each password - include the username for example. Now I have to run my entire dictionary through the sha/md5 function for each user. By doing this, you make the attack O(m*n) instead of O(m) (where m = the number of words in my dictionary and n = the number of users).

      And as you mentioned in a follow up post, this code only generates documents with identical md5 sums, it does not generate a document with a given sum. So MD5 is broken for document signing and the like, but secure for password hashing for the time being.

    6. Re:Q and A by Anonymous Coward · · Score: 0

      whoa there!

      compute the hash within the PHP/Perl/etc code - DONT use the database functions to do it. What if Apache and your SQL db are on different machines? Your example would be transmitting a cleartext password across the network.

    7. Re:Q and A by Night4554 · · Score: 1

      Doesnt the concept of salts depend on the concept of security through obscurity? I don't know your salt, or your mix-the-username-and-password function - and that's why it's more secure?

    8. Re:Q and A by PhiRatE · · Score: 3, Insightful

      No, a salt can be public knowledge, a salt is there to prevent the use of a pre-hashed dictionary, it does not help in the event of a matcher.

      Way back when the hash was unix crypt(), everyone got a little nervous about the idea that someone with a lot of computing power to burn might just fill up a large hard-drive with a map of plaintext->hash for the entire oxford dictionary for example. If they did that, then all they'd need to do subsequently to figure out the plaintext for any of the values they did the crypt()'ing for would be to look through the rive for the crypt value that matched the crypt value in the password file.

      The idea of a salt is simple, it changes the hashed result in a computably repeatable fashion, but one that requires that the entire hash has to be performed with the salt in mind in order to get the correct end result.

      So using a completely public bit of knowledge is fine. What is preferable is that the salt be random yet repeatable. That's why you see unix passwd files have the salt prepended to the hash (in crypt, it use to be the first 2 characters for example), you generate it randomly once, then whenever you want to see if the plaintext you've been given is the same, you grab the salt, hash the plaintext with it, and then check to see if the result matches the original hash.

      In summary, a salt protects you from the problem of a pre-computed hash dictionary, nothing else.

      --
      You can't win a fight.
    9. Re:Q and A by Anonymous Coward · · Score: 0

      Not even that. Salt merely raises the cost of a hash dictionary (you have to store 2^N hashes of each password rather than one).

    10. Re:Q and A by Sheepdot · · Score: 1

      I didn't realize what the code was until after I had read other's comments. As you can see from the first reply to my own post, I acknowledge this after the fact. You needn't beg for a mod to -1 the comment.

    11. Re:Q and A by slashdotmsiriv · · Score: 2, Informative

      You got the whole deal wrong. The published attacks are not able to find a second preimage that is: for a certain y = h(x) find x' such that h(x')=h(x)=y.
      All this salt deal is completely redundant. Salts would make finding second preimages harder but it is already hard.
      Ppl need to realize that collision resistance is not the same as second preimage resistance. What the so called attack on MD5 does is finding collisions, random pairs that actually hash to the same value. The probability those pairs hash to hash values that you find in password records is negligible. It is almost as probable as trying random passwords until you find the one you are looking for.

      MD5 was designed to be both collision and second preimage resistant. These guys in China showed it is not collision resistant. That is a great theoretical result, but far from useful in practice. It remains second preimage resistant, so enough with this fake hype.

      Your passwords are completely safe if stored as hashes.

    12. Re:Q and A by Sheepdot · · Score: 1

      There's a really big problem with your argument that this is a "very bad" salt: You are assuming someone has the code. The majority of SQL injections do not provide the attacker with the code.

      Even given your own example is only slightly better in that it uses the username. Oh well, if they have the code, (as you are assuming) they can just use the username which they also pulled from the SQL injection to salt the attempt on the password hash. Salting with the $username only adds maybe minimal complication.

      Your mechanism does not make things "much harder" than the example I provided. Only different.

      Now, along these lines, one could do (using your suggestion with the $username) sha1(md5($username . $password . 'salt1') . 'salt2' . $GLOBALS['somesetvalue']) and be reasonable secure as long as they didn't switch hosting providers. If one wanted to, they could even throw in the php version or phpini settings. But all that hinges on things remaining the same forever, which isn't likely.

      Oh well, at least you didn't say that combining md5 and sha1 weakens the hash. There are some that have claimed such.

    13. Re:Q and A by Night4554 · · Score: 1

      I have to agree - if I know your salt value, running a brute-forced, dictionary attack takes the same amount of time as doing it if it weren't salted. Brute-forcing takes the same amount of time as if it were unsalted. The only thing (I see) a salt protecting you from is a rainbow table that wasn't computed *with* the salt. Which is a semi-valid protection assuming your password is of a large-enough keyspace and length to make rainbow tables prohibitly expensive. If I discover my sysadmins are using '432' as their salt, I can go compute rainbow tables using that salt and then come back in a week/month/year with tables to use.

    14. Re:Q and A by Anonymous Coward · · Score: 1, Informative
      Ouch! No, don't use your suggested hashing algo.

      Lets go over a couple of problems in what you pointed:
      - Getting code can be easy: In the past there have been several php source disclosure vulnerabilities. While there are no currently known ones, it doesn't mean there won't be again.
      - Incorrect configuration can reveal source: re-install Apache but forget to put the php module? Source! Also, if you have different virtual paths to the same source, with different executability configurations, you may very well have a disclosure problem without knowing it.
      => Conclusion: don't rely on a fixed, hard coded secret!

      Next, assuming the attacker had read access to both your DB and source (and $GLOBAL):
      sha1(md5($username . $password . 'salt1') . 'salt2' . $GLOBALS['somesetvalue'])
      This becomes:
      sha1(md5($username . $password . known) . known)
      For a dictionary attack, this becomes no more dificult then:
      hash($username . $password)
      Further more, md5 generates a 64 bit hashs, sha1 generates a 128 bit hash. It is quite likely that it would be possible to authoritatively revers the md5 value from the sha1, eliminating sha1 from the equation. hashing a 64 bit value into 128 bits is no more secure than hashing into 64 bits.

      That said, using a user-name as a salt has the added downfall that if two sites use the same method (and extra constant salts) both will hash to the same value. This means you can authoritatively say that a given user has the same password on both sites, without knowing the password.

      The best solution, as the parent poster pointed out before suggesting a username based solution, is to have a randomly assigned salt for each user.

    15. Re:Q and A by Anonymous Coward · · Score: 0

      For those of you who store passwords as hashes in your web apps, I've developed a little "Q & A" post here that explains this as best as possible.

      Question: Does this mean all my MD5 passwords for all my users can be cracked?
      Answer: The short answer is yes, they can be cracked. The long answer is no, not if you used a salt, and the attacker has to get those md5 hashes first.


      While that's good advice, it has absolutely nothing to do with the story you're commenting on. The reasons to salt the password have nothing to do with this new algorithm that allows an attacker to generate hash collisions.

    16. Re:Q and A by Terrasque · · Score: 1

      no, not on the comment, the mods :) The comment itself is okay, as it is good practice to salt anyway ;)

      --
      It's The Golden Rule: "He who has the gold makes the rules."
    17. Re:Q and A by CodeRx · · Score: 1

      If I can do SQL injections there is a very good chance I can get your code. What if it is an inside job by someone with access to the code? In any case, you are trying to defend using a weak hash method by suggesting we depend on security through obscurity - bad idea.

      Adding the username makes it much harder. Do you not understand that O(m*n) is considerably harder than O(m)? If you have 1000 users, adding the username means I have to do 1000 TIMES the number of hash calculations to try to crack all of your users passwords. Making something 1000 times harder qualifies as "much harder" in my book.

      Using your hashing method I might be able to run a brute force attack on your user's passwords in a day - but if you had used a correct hashing method it would instead take me 1000 times longer or about THREE YEARS! And this is assuming you only have 1,000 users - have 10,000? Try 30 years. Convinced yet?

      You should be salting with a static salt plus a salt unique to each hash you make. Something like sha(password + sha(staticsalt+username)) would work fine as long as you don't let users change their usernames. This is standard crypto 101 type stuff.

      Combining MD5 and SHA is semi-retarded, but it doesn't make the hash weaker.

      (I am only being so blunt because you are giving very bad and dangerous advice in a public forum that could lead to disaster if someone follows it. I understand you are writing PHP and are probably new to programming - you are doing the right thing by even thinking about this stuff. If you are interested, I highly recommend Bruce Schneier's books on crypto, especially Applied Cryptography and Practical Cryptography)

    18. Re:Q and A by Sheepdot · · Score: 1

      Salts would make finding second preimages harder but it is already hard.

      It is important to note that finding two collisions is only one step of many on the way to total abandonment of a hashing system. Not only that, but for the sake of not being vulnerable to pre-generated MD5 rainbow tables, it is VERY important that you salt your user's passwords.

      Essentially, if you're not salting your web app user's passwords, you're taking a risk that you shouldn't have to.

    19. Re:Q and A by Sheepdot · · Score: 1

      Ugh. I cannot believe the level of competence on Slashdot.

      WHERE ARE YOU GOING TO STORE THIS RANDOMLY ASSIGNED SALT?

      You're not understanding that the purpose of include a salt as a variable in the source is to ELIMINATE the possibility that someone can crack user passwords based solely on data returned from an SQL injection attack.

      By using a "random" salt, you're going to have to store it *IN THE DATABASE*. There is actually a higher likelihood that someone will hack your database without using a hardcoded salt than with.

      Your arguments saying that "all someone needs is your source and the SQL database" is ridiculous when in your own case of randomly assigned salts, all they need is the SQL database!

      Also, there have been FAR MORE SQL injection attacks than source disclosure vulns. So many that source disclosure is so rare that many web app coders actually use hardcoded salts.

      It really bothers me when people who have no experience in web app security feel obliged to comment on it.

    20. Re:Q and A by Sheepdot · · Score: 1

      If I can do SQL injections there is a very good chance I can get your code.

      Wrong. There have been more SQL injection attacks than source disclosure vulnerabilities. Check Secunia if you don't believe me. I'm not commenting on this stuff because I have a bug up my ass, I'm commenting on it because I know it.

      What if it is an inside job by someone with access to the code?

      Then neither your method or my method will work. However with my method, they *HAVE* to get the source code. With your methods, they don't, they can do SOLELY an SQL injection attack.

      In any case, you are trying to defend using a weak hash method by suggesting we depend on security through obscurity - bad idea.

      No. I am willing to admit that adding the username to the hashing mechanism would be a great idea. Modern software, especially forum and CMS software like that which I work on, requires that users be able to change their names.

      So, you say, base it on the unique user ID that is autoincremented? Sure, that's fine. But supposing my source is open, they can just do an SQL injection and get the UID too. That's why nearly all modern forum and cms software include variables you are suggested to change, one of which is a salt. It's usually included along with the MySQL info.

      Not to knock your system, but it is far better to have a $salt variable that the user changes who is using your open source software when they install it, than to base hashes solely on information that is stored in an SQL databse.

    21. Re:Q and A by CodeRx · · Score: 1
      Wrong. There have been more SQL injection attacks than source disclosure vulnerabilities. Check Secunia if you don't believe me. I'm not commenting on this stuff because I have a bug up my ass, I'm commenting on it because I know it.

      Most PHP / CGI scripts on web servers are world readable. Most databases have a function to read in data from a text file. You do the math.

      The rest of your comment is nonsense. I specifically suggested a static salt + a salt that changes for each hash you generate. The static hash can be stored in your code if that is what you want to do. If you need to be able to change usernames, just add a password_hash column to your users table.

      And these aren't "my methods", this is standard crypto best practices.

    22. Re:Q and A by Sheepdot · · Score: 1

      Most PHP / CGI scripts on web servers are world readable. Most databases have a function to read in data from a text file. You do the math.

      So now an attacker has purchased an account (using a form of payment that allows them to be identified) on a server through the same hosting provider and just happens to be on the same server as I am. Not only that, but I'm stupid enough to make my scripts world readable? I'm sorry, "most" is a horribly inaccurate term here, and there's nothing left to say but "you are wrong." I'm starting to think you know far less about scripting and common security practices than your initial post would have indicated.

      Again, you're suggestion of relying on a random salt to hash with per user is horribly flawed. I made the exact argument before and I will repeat it here: ...nearly all modern forum and cms software include variables you are suggested to change, one of which is a salt.

      There is *nothing* wrong with the system I'm suggesting. What is wrong with your system is that it relies on a variable that has to be different for each user, and is thus stored in a database which only requires an SQL injection to obtain.

      I specifically suggested a static salt + a salt that changes for each hash you generate.

      I didn't see the static salt part. If you did, then your argument basically comes down to whether or not the "random salt" is any better. Since I'm arguing that anyone who can get the SQL password hash can get the random salt, it all relatively a moot point, IMHO.

      Again, nowhere did I see you even mention the static salt, all I heard you say was that my method was so horribly weak that it was obviously worth commenting on.

    23. Re:Q and A by CodeRx · · Score: 1

      All of my comments are flying right over your head. Please slow down, read and think a little bit before posting the first knee-jerk reaction that comes to mind.

      You are entirely missing the point of having a different salt for each password. If you use only a static salt I can run an entire dictionary of words through your hash function and then check the resulting hashes against the password hashes in your database. The number of hashes I must generate is equal to the number of words in my dictionary, or O(m).

      Now if you use a different salt for each password, I can't just hash the entire dictionary once and then check for matches. Instead I must hash the entire dictionary for each password (using it's unique salt) - this means I must generate O(m*n) hashes. I have already explained in a previous posting how O(m*n) is much more work than O(m).

      Using a unique salt for each hash doesn't improve the security of any one hash (which is what I think is throwing you off), but *drastically* improves the security of the collection of hashes you are generating by making it much harder to launch an effective brute-force attack against them.

      So now an attacker has purchased an account (using a form of payment that allows them to be identified) on a server through the same hosting provider and just happens to be on the same server as I am. Not only that, but I'm stupid enough to make my scripts world readable? I'm sorry, "most" is a horribly inaccurate term here, and there's nothing left to say but "you are wrong." I'm starting to think you know far less about scripting and common security practices than your initial post would have indicated.

      Our attacker doesn't need an account on the server. If the database server is running on the same server as the webserver (rather common), our attacker may be able to use the SQL injection attack to get the script source code. Or he may be able to use a buffer overflow in a SQL function to launch a shell with the same permissions as the database server. Or he may be able to exploit some other service or script on the server. There are many attack vectors to get the source of a file on a server. Stop trying to rely on security through obscurity.

      Sadly almost all PHP/CGI scripts out there are world readable. This is because the webserver typically runs as an unprivileged user ("nobody") and must be able to access these files. The easiest solution for users and admins is to make the files world readable, so that is just what they do. There are more secure ways to go about this - using suexec/phpsuexec, using groups, etc - but virtually no one does these things. Do you really disagree with this?

      Again, you're suggestion of relying on a random salt to hash with per user is horribly flawed.

      You give me far too much credit and make me smile with such statements (you might as well say "your suggestion that E=MC^2 is horribly flawed"). These are not my suggestions or methods. If you have discovered some unknown flaw in what I have described, there is an entire world of cryptologists who would love to hear about it. Unix operating systems have used these exact methods in constructing password hashes for many many years (see crypt(3)).

  50. fakecrc... by ^me^ · · Score: 0

    http://www.crc2003.250x.com/ although old, is also interesting.

    --
    No one ever says, 'I can't read that ASCII E-mail you sent me.'
    1. Re:fakecrc... by Jeld · · Score: 1
      No one ever says, 'I can't read that ASCII E-mail you sent me.'

      Of course they do, if their language doesn't use latin character set :)

      --

      Everybody Lies. But it doesn't matter since nobody listens.

  51. Thanks for the advice by Urusai · · Score: 1

    "Switch away from MD5/SHA1" to what?? SHA-256? Some AES-based hash? CRC-32?

    1. Re:Thanks for the advice by Articuno · · Score: 1

      With so many people using those, what about trying ROT13 (no one will think you are using that :-)

      --
      So Long and Thanks for All the Fish!
    2. Re:Thanks for the advice by rylwin · · Score: 1

      i prefer the always secure rot26 :)

    3. Re:Thanks for the advice by quigonn · · Score: 1
      --
      A monkey is doing the real work for me.
  52. Re:SHA1 (NO) by photon317 · · Score: 3, Informative


    MD5 is dead. SHA-1 is currently dying. They're both based on the same fundamental math, and the attacks on SHA1 are getting stronger and stronger. Now would be a really good time to get off of that entire family of hashes if you can (MD4, MD5, RIPEMD-* SHA-*, etc).

    The crypto world is in a little bit of a bind when it comes to strong hashes now. We simply don't have any left that both have a long proven track record of analysis and haven't been seriously compromised. Best bet, IMHO, is to go with a new-blood hash algorithm. They seem to be the answer we're looking for - but of course what they lack is more years of intensive study by the experts for flaws they themselves might harbor.

    So if you want to give them a whirl, I'd recommend looking at Tiger and Whirlpool:

    http://en.wikipedia.org/wiki/Tiger_(hash)
    http://en.wikipedia.org/wiki/Whirlpool_(algorithm)

    --
    11*43+456^2
  53. Doesn't Microsoft networking still use MD4? by Medievalist · · Score: 2, Interesting
    MD4 collisions can be found in a few seconds (but nobody uses that any more)
    I thought MD4-encrypted passwords were still flying around most of the world's MS-windows networks. I coulda sworn I had to explicitly turn them off on all my nets...
    1. Re:Doesn't Microsoft networking still use MD4? by Anonymous Coward · · Score: 1, Informative

      Yes, Active Directory enviroments still strongly encourage storing the password in MD4 hash format. Also, to take advantage of the 802.1x authentication client built into XP SP2 also requires the password to be stored in MD4 hash format.

      When you read "nobody" on Slashdot, it means that the egos of SiliconEntity and Zonk does not allow them to believe that there is an entire world that use something different than they do. If you replace "nobody" with "I" (or royal "we") then the comment makes more sense, such as "We the Zonk choose never to use MD4."

      The biggest problem with this "me" focused view of the world is that it misses where people indirectly use MD4. If you are an employee, have a bank account, use any utilities, belong to an HMO/PPO, etc. then you probably indirectly trust MD4 to authenticate who has access to your personal information.

      Thank goodness for disclosure laws like in CA which at least notify use after the fact how truely 0wn3d our personal data is.

  54. obfucated by Anonymous Coward · · Score: 0

    Are these entries for IOCCC or what?

  55. Not the death knell for MD5 (yet) by Dr.+Zowie · · Score: 1

    MD5 may be useless for producing undeniable electronic signatures (proof that Alice generated and distributed a particular document, in the face of Alice's denials) but it is still quite useful for fraud prevention (proof that Eve generated and distributed a particular document, despite her claims that it came from Alice).

    That is because the algorithm can produce pairs of documents that share a single hash, but that single hash is now known in advance.

    Thus, Alice can use the algorithm to produce a pair of documents that have the same (unknown-in-advance) hash -- but, once Alice has signed a document with MD5, Eve still can't easily produce a document that has the same (known-in-advance) hash as Alice's original screed.

  56. breaking torrents? by OrangeTide · · Score: 4, Insightful

    Ah! That's a very good point.

    now if you you were a software company you could put torrents out (I assume they use blocks of MD5sum), and then after the torrent becomes popular start randomly seeding people with blocks that hash correctly but are complete garbage (since you can't pick what exactly you hash). if you do it right you would have games out there that would still mostly run. but would crash, or have garbled game data, etc.

    I'm not sure if this is really all that useful. but this exploit certainly seems to make it easy to do.

    --
    “Common sense is not so common.” — Voltaire
    1. Re:breaking torrents? by dougmc · · Score: 3, Insightful
      That's what I was thinking -- this being used to break torrents and other p2p setups.

      Though to be fair, most games seem to come in the form of a compressed archive of some sort -- either a bunch of .rar files (for warez) or a .exe file with .cab files (for Windows installers) or something similar. In that case, the corruption would be detected at installation, though it wouldn't be easy to determine from the torrent exactly which blocks are corrupt.

      In short, MD5 being broken (and now the code being available) is very bad. I expect to see the anti-piracy vigilantes jumping on this very quickly to create code to totally break bit torrent and similar things -- it would come up and look like a seed, but would be spewing garbage that the other clients couldn't detect as garbage. Of course, such poisoning would also end up being used to corrupt completely legitimate torrents as well, just because people think it's fun.

      (The fix? Update things like bittorrent to use hash routines that haven't been cracked yet, or to use multiple hash routines on the same blocks.)

    2. Re:breaking torrents? by swillden · · Score: 2, Interesting

      now if you you were a software company you could put torrents out (I assume they use blocks of MD5sum), and then after the torrent becomes popular start randomly seeding people with blocks that hash correctly but are complete garbage (since you can't pick what exactly you hash)

      Yeah, they could do that. They could even use the same trick as the guys who showed the Postscript exploit and make the software do different things based on which of the collision pair is in the file. Get the right version and the software runs fine. Get the wrong version and the software formats your drive.

      In case it's not clear, the attack is to write some code like:

      byte a[] = { /* One collision plaintext block here */ };
      byte b[] = { /* Other collision plaintext block here */ };

      // ...

      if (memcmp(a, b, sizeof(a)) == 0)
      {
      // Do something useful
      }
      else
      {
      // Do something malicious
      }

      The legit version contains the same value for a and b, but the bogus version contains different values. Both programs hash to the same value, but they behave differently.

      This sort of attack can work wherever you can compare blocks and then follow different codepaths based on the result. It could probably work with the scripting language that DVD menuing systems are implemented in, too, so it could work for DVD ISOs. Audio and video streams would probably not be vulnerable.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    3. Re:breaking torrents? by sla291 · · Score: 1

      actually, bittorrent uses SHA1, so there's no need to worry for now.

    4. Re:breaking torrents? by joecr · · Score: 2, Informative

      One Minor problem is that BitTorrent actually uses SHA not MD5 for it's hashing. Check out the following link for more information. http://dessent.net/btfaq/#hash-check

    5. Re:breaking torrents? by Microlith · · Score: 1

      Bittorrent, at last check, used SHA-1.

      This will change too, eventually, but not as quickly if it were using MD5.

    6. Re:breaking torrents? by yahyamf · · Score: 1

      I don't think this will be much of a problem. The hash algorithm can be changed easily and most people are quick to update their P2P clients.

    7. Re:breaking torrents? by spitzak · · Score: 1

      I would think it would be easier to put out pre-broken software, perhaps it breaks after a certain date or when something happens on the internet. No need to find a hash collision, and this would work no matter what hashing is used.

    8. Re:breaking torrents? by kabloom · · Score: 1

      You could also join a torrent you didn't like and start sending out corrupted blocks to its users. Unfortunately, the RIAA would love that :(

    9. Re:breaking torrents? by m50d · · Score: 1
      (The fix? Update things like bittorrent to use hash routines that haven't been cracked yet, or to use multiple hash routines on the same blocks.)

      No, multiple hash routines *does not work*, as is said in every single hash related thread. Fortunately I'm pretty sure BT uses a newer hash already.

      --
      I am trolling
    10. Re:breaking torrents? by OrangeTide · · Score: 1

      Well the MD5 exploit does not allow you to take a hash and make up a piece of data for it. It just lets you generate multiple pieces of data that all hash to the same thing. So you would have to be the one who created the torrent to do this trick.

      --
      “Common sense is not so common.” — Voltaire
    11. Re:breaking torrents? by dougmc · · Score: 1
      No, multiple hash routines *does not work*, as is said in every single hash related thread.
      I must have missed it. I'll assume that you know what you're talking about here, but some more detail would be nice. MD5 has been broken (or at least collisions can be generated relatively easily now), MD4 is broken, and CRC32 is almost trivially broken (since it's a hash, but not an appropriate hash for cryptographic work.) In any event, if I have CRC32, MD4 and MD5 hashes for a given string, wouldn't it be massively more difficult to find another string with the same values for all three hashes, even though each hash has been `cracked' individually? (A citation to some sort of discussion would be useful, rather than just saying `no'.) (And if three hashes is too much, feel free to pick any two hashes from that list, though of course I'd be more impressed by breaking MD5+MD4 than MD5+CRC32, if not only because there's 96 more bits to worry about.)
      Fortunately I'm pretty sure BT uses a newer hash already.
      I was assuming that it used MD5, but apparantly it's SHA-1 (judging from the other responses to my post.) But isn't SHA-1 just a few steps further from the grave than MD5 (for lack of a better way of putting it?) That, and BT wasn't the only thing I was thinking of, though maybe none of the p2p systems use MD5.
    12. Re:breaking torrents? by m50d · · Score: 1
      I must have missed it.

      The first one this story seems to be http://it.slashdot.org/comments.pl?sid=168395&cid= 14040484.

      In any event, if I have CRC32, MD4 and MD5 hashes for a given string, wouldn't it be massively more difficult to find another string with the same values for all three hashes, even though each hash has been `cracked' individually? (A citation to some sort of discussion would be useful, rather than just saying `no'.)

      I used to provide detailled explanations for why it isn't more difficult but got fed up. I suppose it could be argued that it would be better not to post at all, but at least this way you know it isn't significantly more difficult and can look up more. Briefly, and approximately, most (read: every real-life one) attacks on hash functions both allow you to generate arbitrarily many plaintexts with a particular hash with only a little more effort than making two, and will also allow you to get your two things with the same hash from a (sufficiently large) list of plaintexts as easily as getting two things with the same hash given a "free" area (which is, again, how every real-life attack I've seen works). So say you were doing the thing with the postscript documents that was used to demonstrate an MD5 attack recently - they basically need two sections like 'var x = "xxxxxxxxxx"' and 'var x = "yyyyyyyyyyy"' with the hash of both being the same, and then because the same thing appended to both will give two things which also have the same hash (arguably a flaw, but necessary to be able to do nice things like hashing user input or stuff coming over the network) we can make the documents we want. Now, in their example they used the fast MD5 attack to only generate two such strings, but it's just as easy (almost, it's IIRC something like 1.5x as many cycles to generate 3, 1.75x as much to generate 4 and so on) to generate as many as you like, given enough bits. So if we're e.g. taking MD5 and MD4, they're IIRC both 64-bit hashes, so we start by saying we'll have a 128-bit xxxxxx string. Using our MD5 attack we generate (approximately) 2^64 of them with the same MD5. Now, I haven't studied the MD4 attack in as much detail as I'd like to, but I would be amazed if it could not be applied to find a collision among these 2^64 strings (on average, such a collision exists; to make it more certain, add a few bits, start with maybe 140-bit strings and generate 2^70 of them or so). If it isn't, we can do it the other way round - use the MD4 attack to generate 2^64 strings for which 'var x = "xxxxxxxx"' has the same MD4, and then apply the MD5 attack to find a collision among these - and that I know for certain would work.

      But isn't SHA-1 just a few steps further from the grave than MD5 (for lack of a better way of putting it?)

      Up to a point, but it's still stronger than MD5 ever was. Anything where the hash needs to be secure long-term needs to move away from SHA1 as quick as practical, and I wouldn't trust it against NSA etc. But neither of these really applies to P2P - unlike say a digitally signed contract where being able to generate one with a matching signature months or years later is security-breaking, being able to generate a fake file with the same hash even a day after it was downloaded does an attacker no good. So we can pretty much keep using SHA1 for bittorrent up to the moment there is a practical attack, though of course we need to make sure the infrastructure is in place to move to another hash immediately that happens. It's something to bear in mind, and it would certainly be an incredibly stupid idea to use SHA1 for any new protocol, but it's not an immediate problem.

      That, and BT wasn't the only thing I was thinking of, though maybe none of the p2p systems use MD5.

      I think we're lucky in that the timescales are such that any P2P system would not have used MD5 for situations that need a secure hash. MD5 has been known t

      --
      I am trolling
    13. Re:breaking torrents? by dougmc · · Score: 1
      almost, it's IIRC something like 1.5x as many cycles to generate 3, 1.75x as much to generate 4 and so on
      I don't really have enough information to figure out the progression here. How long would it take to generate 2^64 files with the same MD5. How about 2^128? I'm guessing it would be a huge amount of time.
      So if we're e.g. taking MD5 and MD4, they're IIRC both 64-bit hashes, so we start by saying we'll have a 128-bit xxxxxx string. Using our MD5 attack we generate (approximately) 2^64 of them with the same MD5.
      Ok, but they're 128 bit, not 64 bit. But even so, 2^64 is a huge figure. Even 2^32 for a CRC32 ought to make the MD5+CRC32 attack several orders of magnitude more difficult, though not 4 billion times more difficult if the MD5 attack really does let you generate additional collisions cheaply.

      In any event, I'll look for the `Joux Multicollision paper from 2004 CRYPTO' for more. Thanks for the pointers.

    14. Re:breaking torrents? by m50d · · Score: 1
      I don't really have enough information to figure out the progression here. How long would it take to generate 2^64 files with the same MD5. How about 2^128? I'm guessing it would be a huge amount of time.

      No, it would be just less than 2x as long. The actual numbers are possibly something different, but there's a finite limit, some n for which you don't take more than n times as long however many collisions you want to generate.

      --
      I am trolling
    15. Re:breaking torrents? by dougmc · · Score: 1
      I think we're lucky in that the timescales are such that any P2P system would not have used MD5 for situations that need a secure hash. MD5 has been known to be on shaky ground (it was in pretty much the same situation SHA1 is now) for IIRC over a decade now.
      I just realized something that does use MD5 files that would matter -- par2 files, usually used on Usenet. Fortunately any attack involving them won't be as devastating as one against p2p clients.
  57. Warning: dangerous typo in parent by Dr.+Zowie · · Score: 1

    Shit. I meant to type "...is not known in advance..." but instead typed "...is now known in advance...", completely reversing the meaning!

  58. You can spoof (almost) arbitrary documents! by ChaosDiscord · · Score: 4, Interesting
    Maybe I misunderstand but as I understand it MD5s are normally used in a checksum manner to sign or provide a fingerprint of a document. If you have an original document and compute it's MD5 then it can match some certified MD5 check sum. If someone were to generate a fake document they coul dnot design it to match the MD5 fingerprint. They could create some bit of gibberish that did match it but not a document that was useful as a forgery.

    Most document formats have lots of "dead space", parts you can pretty much modify at will without changing what the user actually sees. Comments in HTML or PostScript. Old junk data in Word documents. Executables can have just about anything you like added if you know your stuff. The MD5 attacks currently available only 128 "dead space" bytes to generate a collision. So far from being a gibberish document, one can generate almost any document you want. This page has a simple example with PostScript files. Both files have the same MD5 hash, but one is a relatively harmless letter of recommendation while the other is a grant of security clearance. Get your boss to sign your letter of recommendation digitally, swap in the security clearance file, and pass it on. This is a Big Deal and a Major Problem.

    1. Re:You can spoof (almost) arbitrary documents! by k98sven · · Score: 1

      The MD5 attacks currently available only 128 "dead space" bytes to generate a collision. So far from being a gibberish document, one can generate almost any document you want.

      Only if you can manipulate both documents, and tailor the original documents MD5 as well as that of your target document.

      I don't think that's what the grandparent poster was referring to at all.

    2. Re:You can spoof (almost) arbitrary documents! by prockcore · · Score: 1

      This page has a simple example with PostScript files. Both files have the same MD5 hash, but one is a relatively harmless letter of recommendation while the other is a grant of security clearance.

      Except that if you look at the actual postscript source of both files, you'll see they *both* contain both chunks of text. The only difference between the two ps files is a byte passed to an if-statement which determines which block to display. It takes 300 bytes of hash gibberish to counter the alteration of 1 byte.

    3. Re:You can spoof (almost) arbitrary documents! by Anonymous Coward · · Score: 0

      what the hell are you gibbering about?

  59. Easy cracking of linux user passwords by camcorder · · Score: 0, Redundant

    Actually this tool makes cracking linux user passwords using glibc 2.2. As this version of glibc using MD5 to encode in crypt function. It would only take matter of time to crack if an attacker get shadow file (or maybe passwd file on systems which shadow suite is not installed). There's no reason to use dictionary attack or something, just find collision text and use this text as login password.

    I really find this extremely scary as there're lots of p0wned machines around now it will be easy for attackers to hide themselves (ie. they won't need to keep open port to be uidzero after first intrusion) Actually it's better to patch crypt function to use other more secure algorithms.

    1. Re:Easy cracking of linux user passwords by Anonymous Coward · · Score: 1, Insightful

      Rubbish. This finds a collision between two "randomly" generated streams of data. Even if it could generate a collision for a known piece of data you would still need the original document - namely the password and salt, and if you already have the password then you don't need to find a collision.

      It won't magically produce a stream which hashes to a desired value - that's still the (infeasible) domain of brute force.

      Sure there are implications as mentioned with the two postscript documents (and other areas, I'd imagine), but unless you have a hand in producing the original document(s) it's not _such_ a biggie.

    2. Re:Easy cracking of linux user passwords by camcorder · · Score: 1

      Security is about being paranoid. Half of the precautions taken in most of secure sites are just for taking precaution. Why using an algorithm that has known issues instead of there're more robust ones. With ever increasing cpu powers there's no need to keep this algorithm since there're more secure ones around.

      Besides you can easily use the text as password if they both produce same hash. gettty will only check if hashes are equal not if your password and the random text is equal. Since their hashes are equal then they are same in front of gettty's eyes.

  60. It's chugging away... at something... by LithiumX · · Score: 1

    I downloaded the source, and took a quick look to make sure it wasn't really something intended to do something nasty.

    I then realized that, going over it, it was a little like a Neanderthal trying to figure out how a swiss army knife worked... and just compiled and ran the damned thing.

    I scanned over a good bit of the posts here, hoping for some info, and R'd a number of TFA's, but am not much closer to figuring out what I can actually do with this thing (or what collisions are for that matter). I'm even confused by it's parameters (you can pass it either four params or none at all. I'm not clear on what those params actually are for, or what the purpose in the default entries are).

    Does anyone have a reference to some useful info that a caveman could understand, or shed some light on how this actually threatens my stylishly antiquated md5 shadow file?

    We apologize in advance for any actual evolutionarily-challenged cave dwellers who may take offense at the above post.

    --
    Do not confuse "Freedom of Choice" with "Free Will".
  61. md5 verification is for basis crc style integrity by Splork · · Score: 1

    anyone who uses an MD5 published on a ftp server or website for that is being seriously misled. a secure hash would have to be signed using DSA or RSA in order for it be meaningful (md5 hasn't counted as a secure hash in eons for that purpose) verified by its public key signed by trustworthy-to-you roots of trust. the .pub files you see on linux kernel downloads for example are exactly this pk signed secure hash.

    the fact that you see people post md5s of things or do stupid things like include md5sum files in torrents (hello! bittorrent data is all identified and guaranteed by its SHA1s already) doesn't mean those people have a clue.

    the md5sum file of a download is only useful as a final integrity check to make sure no bits were messed up going over the wire or more likely by your own cheap ass non-ecc ram, overclocked cpu, unstable kernel or lousy hard drive.

  62. Editors: Change the Links by theelemur · · Score: 2, Informative

    The Coral Cache is extremely slow. Please point the links to the original site www.stachliu.com/collisions.html as the html is small/static and the machine it's sitting on has a decent (100mb) connection. Thanks.

  63. I think it's the opposite by Anonymous Coward · · Score: 0

    Now that the collisions are so easy to find, the RIAA will have a harder time (along with those bastards at BayTSP) proving that song downloaders have infringed. It's called "plausible deniability."

    1. Re:I think it's the opposite by fishbowl · · Score: 1


      >Now that the collisions are so easy to find, the RIAA will have a harder time (along
      > with those bastards at BayTSP) proving that song downloaders have infringed. It's
      >called "plausible deniability."

      So far, they have rarely if ever had to prove anything at all. The people they have sued have been essentially guilty, or else settled without calling the question in court, or else were boundary cases like grannies and infants.

      As for the whole "losses to file sharing" argument, I need to be convinced by a record (or movie) company declaring its losses to "piracy" on its annual tax return. That's my personal minimum standard for credibility of the claim.

      --
      -fb Everything not expressly forbidden is now mandatory.
  64. I may not have been clear enough... by ^BR · · Score: 3, Interesting

    Or you may have a brain tumor...

    The generaly accepted definition for "cracking a password" is, given the encrypted password, being able to generate a password the once entered in the authentication system will generate the same encrypted form.

    For the second time, this attack does not permit that. This attack permits build two byte streams that hashes to the same value. So in a password context, assuming the password system permits the entry permits more than 1024 bits of arbitrary data to be entered as password (I don't remind the details well, but I think this is the amount of data that you must be able to change between the two streams) one could generate to "passwords" (being that long they don't qualify for this name anymore IMHO) that would hash to the same value.

    How does that amount to an attack on MD5 passwords again? Never at any point the attack had been being able given a unknown to him MD5 hash to produce a stream that would hash to the same value.

    1. Re:I may not have been clear enough... by Fallingcow · · Score: 1

      Ok, I misread. Sorry.

      But you're really uptight.

      Chill, dude. :P

  65. w00 by Anonymous Coward · · Score: 0

    unsigned int m0[32] = {
    0x87472b83, 0x65d756a2, 0xfcfc845c, 0x13e3b751,
    0x5641f279, 0x412fbde5, 0xc30f8345, 0xe7407b20,
    0x06342d31, 0x625493ef, 0x7a949467, 0x55e92787,
    0x87f390f9, 0x70af1e1f, 0xda1d3b84, 0xf943cfa3,
    0x7e545b32, 0x1a120060, 0x3ba3a4d1, 0xfa5d29a0,
    0x172bd4ba, 0x1d799f03, 0x26a81918, 0xc0c1f8d3,
    0x74b4678d, 0xc744d004, 0x29464f42, 0x5de2c876,
    0x47dac8d9, 0x1717cc87, 0x3d20a7bf, 0xa91ae3ce,
    };

    unsigned int m1[32] = {
    0x87472b83, 0x65d756a2, 0xfcfc845c, 0x13e3b751,
    0xd641f279, 0x412fbde5, 0xc30f8345, 0xe7407b20,
    0x06342d31, 0x625493ef, 0x7a949467, 0x55e9a787,
    0x87f390f9, 0x70af1e1f, 0x5a1d3b84, 0xf943cfa3,
    0x7e545b32, 0x1a120060, 0x3ba3a4d1, 0xfa5d29a0,
    0x972bd4ba, 0x1d799f03, 0x26a81918, 0xc0c1f8d3,
    0x74b4678d, 0xc744d004, 0x29464f42, 0x5de24876,
    0x47dac8d9, 0x1717cc87, 0xbd20a7bf, 0xa91ae3ce,
    };

  66. Bad Idea by multipartmixed · · Score: 1

    Your suggestion implies that you OS uses MD5 for crypt(2) [or crypt(3C)]. It does not. It uses whatever password hashing function your OS uses. So, it could be 3DES!

    If you care about portability/repeatability, you'd be better off linking in OpenSSL.

    --

    Do daemons dream of electric sleep()?
    1. Re:Bad Idea by cortana · · Score: 1

      Oops, crypt(3) it is. Always get those two sections mixed up. Anyway, one would hope one's OS had a better crypt than just plain DES. The idea is to get your OS to do it properly rather than do some dodgy fixed string concatenated with the SHA1 of the MD5 of the user's password like the original post suggested...

    2. Re:Bad Idea by multipartmixed · · Score: 1

      Hope all you want, I'm 98% certain that Solaris 9's crypt function is still 3DES by default -- but you can theoretically add more crypt methods with security plug-ins.

      I'm also willing to bet that there are other mainstream OS's out there with weak password hashing, although IIRC Linux isn't [by default] one of them anymore.

      --

      Do daemons dream of electric sleep()?
  67. I believe SSL does just this by arevos · · Score: 1

    IIRC SSL XORs an MD5 and SHA1 hash together and uses the resulting combined hash.

  68. Re:It's chugging away... at something... by atomm1024 · · Score: 1
    or shed some light on how this actually threatens my stylishly antiquated md5 shadow file?

    It doesn't. These programs generate two files which hash to the same value. They do not generate new files mapping to an existing hash. (There'll probably eventually be a way to do that, but not yet.) Your shadow file is safe, pretty much.

    --
    Signature.
  69. SHA1 is not a good alternative in some cases by chongo · · Score: 3, Interesting
    SHA1 is not a good alternative in some cases. For details on the cryptographic hash problem, see my paper:
    SHA1 Cryptographic Hash Update
    My paper talks about the general problem at a high level. It gives a summary of common opinions expressed at the NIST Cryptographic Hash conference. Moreover, it gives developers some specific cryptographic hash recommendations.

    For the impatient, here is a summary for my recommendations for 2005-2006:

    • Avoid non-standard cryptographic hashes
    • Stop using MD5 now except for:
      • MD5 HMAC and MD5 hashed Passwords
      • Replace MD5 HMAC and MD5 hashed Passwords with SHA256 or SHA1 before end of 2007
    • Existing applications that use SHA1, where possible, should changed to use SHA256 before the end of 2008
      • For interoperability with older applications and hardware, these applications may have to also support SHA1
      • If you must support both SHA256 and SHA1, take care so that a "man in the middle" cannot inappropriately downgrade
    • Until a new Advanced Hash Standard (AHS) is adopted, new applications and hardware should be designed to use SHA256
      • For interoperability with older applications and hardware, these applications may have to also support SHA1
      • If you must support both SHA256 and SHA1, take care so that a "man in the middle" cannot inappropriately downgrade
    • All new applications and protocols must be designed to be algorithm agile
    • Existing applications and protocols should be modified to be algorithm agile by the end of 2008, if not sooner
    • SHA384 or SHA512 may be used in place of SHA256 in the above examples
      • Keep in mind that SHA384 and SHA512 are slower and larger than SHA256 or SHA1
    • Because it is possible that SHA1 will become unacceptably weak before 2008, and because SHA256 may become vulnerable to attack before Advanced Hash Standard (AHS) is adopted, a defense in depth approach must be taken

    See the paper for mode details.

    --
    chongo (was here) /\oo/\
    1. Re:SHA1 is not a good alternative in some cases by scovetta · · Score: 1

      Another good comment made at the NIST conference was that there really is no theoretical proof for the existence of a one-way function. Hash functions are black magic that just "happens to work" at this point. Also, there's no reason why factoring large numbers has to be so hard? (does there exist an O(n) or O(1) solution to factor an arbitrarily large number?)

      So I agree with you that algorithm agility will be the only way to keep ahead of the progress made by researchers (Xiaoyun Wang et al.), at least until some serious thought goes into AHS.

      By the way, the full conference proceeding are available here.

      --
      Wer mit Ungeheuern kämpft, mag zusehn, dass er nicht dabei zum Ungeheuer wird. --Nietzsche
  70. Hypothetical???? by Anonymous Coward · · Score: 0

    "...which have been purely hypothetical until now."

    Because source code hasn't been handed out to the public does not mean that any of this is "purley hypothetical." Much mathematical documentation has been released to the public on the subject.

  71. Has anybody warned Microsoft? by Anonymous Coward · · Score: 0

    They are still authenticating their ISO images with CRC... Maybe they should upgrade to MD5... (SHA-256 maybe too advanced for then...)
    http://msdn.microsoft.com/vstudio/express/support/ install/

  72. Editors: For the love of god, change the links! by theelemur · · Score: 1

    That Coral Cache link has gone from being slow to not resolving (for me).

  73. still won't be easy to exploit by u19925 · · Score: 1

    even if you use this algorithm, it is likely to produce a bigger document. this fact can be used to in court to challenge. the only problem would be that someone has to find manually that the document has been forged. the second part could be that the forged document must not leave any signature of being forged (e.g. must not contain redundant data etc...). so yes, i would advise people not to use MD5 anymore, but I would still say it is okay to take some time and safely migrate to another algorithm rather than panic.

  74. Must control both docs by Anonymous Coward · · Score: 0

    MD5 is only vulnerable if the attacker controls the documents.

    Anybody can create two very different looking documents that have the same MD5 hash.

    On the other hand, if you create a document with an MD5 hash it's VERY difficult for somebody else to could come up with a different document (gibberish or otherwise) that has the same MD5 hash.

  75. MD5 collisions is a non-issue by slashdotmsiriv · · Score: 1

    IMHO MD5 collisions is a non-issue.

    The described attacks would only work when you are dealing with active document types and only for the particular attack scenarios.

    Because of the unfortunate properties of SHA-1, MD-5 like hash functionality implementations, if you find collisions x1, x2 such that hash(x1) = hash(x2) you can have the two different string x1||S and x2||S hashing to the same value, hash(x1||S)=hash(x2||S).


    This is because these hash functionality is usually implemented as follows: start by hashing the first block (substring) of the string then take the output and hash it together with the second block, take the output and hash it with the 3rd block etc. If the first block of two different strings hashes to the same value while the rest of the string is identical you will get a collision. What these guys in Germany do is that they find a collision h(R1)=h(R2) (using the well known technique from China) enter it at the start of the ps document and then generate two different documents that look like this:
    A || "if A == R1 present S1 else present S2" || S1 || S2 . The first document has A=R1 and the second A=R2. Obviously these documents differ only by A and because A hashes to the same value they both hash to the same value. Caesar sees the presentation of the document that contains R1, which is S1, he likes it and signs its hash. Alice takes this signed hash and presents it as the signature of a document that contains R2 and effectively. But this document prints S2 and Caesar does not like S2.

    This is only usefull if Caesar is stupid enough to sign a document that looks like this. Of course, Caesar being aware that he runs an empire..., will for sure look at the source code of the documents he signs.

    Some people also referred to the problem that arises with signing binaries or source code. As somebody else mentioned in this forum, in these cases you trust the signer and the trusted signer (e.g M$) would never create a binary or source code file that looked like:
    R1 || "if A==R1 run good application else run virus" ||good application||virus. Even if the signer/coder was idiot enough to do it, you would still have to find R2 such that hash(R1)=hash(R2). In this case you should be able to find second pre-image of the MD5 hash, not a collision. Second preimage means, that given an y = hash(x) you can find x'such that y=hash(x'). Nobody has even come close to devising a technique for that.

    If you could find second pre-images, you would be able to wreak havoc, for example by distributing fake Linux distros with the first block of the iso's being an MD5 second preimage of the valid distro.

    In conclusion, I will still use MD5 and SHA-1 until they come up with a practical attack. I think these guys have had enough publicity for nothing already.

    1. Re:MD5 collisions is a non-issue by Anonymous Coward · · Score: 0

      Actually, you are incorrect. You can generate collisions at any point in a file simply by starting with the partial MD5 sum up to the point in the file that you want to insert the block at. Simply pass the 4 32-bit words of the MD5 hash to the example code and it will generate two blocks that fit into the file at your chosen point and the final file will have the same hash. Whether or not this makes real attacks any easier is up for speculation, but it at least allows for an attacker to attack existing ISO images by submitting a minor change to a piece they own, for which they know the exact location it will fall in the ISO file. It wouldn't be that hard if the ISO build was scripted, the change came immediately after a release, and was a "critical bug fix" or something similar. Thankfully, open source already has a fully open model in which this is unlikely to succeed simply because someone will look at the patch and notice it's a goofy binary only patch. Hopefully.

      Regarding your second point about the start of applications: It wouldn't be hard to insert a block into the file header which would change the entry point into the program or some similar attack which would be difficult to detect. Just hide the actual attack in machine language as inline data in the code, and jump to it's address from the header. There are a lot of attack possibilities, but so far they do all require a breach of trust between trusted parties, and protect against third party attacks.

    2. Re:MD5 collisions is a non-issue by slashdotmsiriv · · Score: 2, Informative

      "You can generate collisions at any point in a file simply by starting with the partial MD5 sum up to the point in the file that you want to insert the block at.
      " That's incorrect. MD5 operates over 512-bit blocks and uses state generated from the hash of the previous blocks to generate the new state for the current block. The state is actually the 4 words of the 128 digest.
      Here is the alg from wikipedia:
      init h0
      init h1
      init h2
      init h3
      for each 512-bit chunk of message
      break chunk into sixteen 32-bit little-endian words w(i), 0 i 15 //Initialize hash value for this chunk: var int a := h0
      var int b := h1
      var int c := h2
      var int d := h3

      //Main loop:
      for i from 0 to 63
      if 0 i 15 then
      f := (b and c) or ((not b) and d)
      g := i
      else if 16 i 31
      f := (d and b) or ((not d) and c)
      g := (5×i + 1) mod 16
      else if 32 i 47
      f := b xor c xor d g := (3×i + 5) mod 16 else if 48 i 63
      f := c xor (b or (not d)) g := (7×i) mod 16

      temp := d
      d := c
      c := b
      b := ((a + f + k(i) + w(g)) leftrotate r(i)) + b
      a := temp
      //Add this chunk's hash to result so far: h0 := h0 + a
      h1 := h1 + b
      h2 := h2 + c
      h3 := h3 + d
      var int digest := h0 append h1 append h2 append h3 //(expressed as little-endian) What these guys have done is to find 512 bit blocks or larger that return the same hash given the same initial state h0, h1, h2, h3. As u can see above, if you find a collision for state i, ho, h1, h2, h3 the same collision will not work for the different state MD5 will have say after processing a number of blocks. It is computationally feasible to find many collisions for the initial state but to find collisions for all possible states and to have these collisions occuring in the original file is very hard. So, no you cannot simply search the file for blocks that are among the blocks for which u found a collision and then simply replace them with their collision. The MD5 for which you found the collision and the MD5 that is used over the file are essentially two different functions and will not hash to the same value. As for the rest of your comments: I hope you realize that you don't just give their program an MD5 hash and it outputs two values that hash to that output. MD5 has been shown not to be collision resistant but it is still preimage and second preimage resistant. I explicitly said that we assume the signer is trusted, you did not need to say anything about trusted parties. If the signer is not trusted the whole signed code scheme is meaningless.

  76. P2P Poisoning? by lilmouse · · Score: 1

    So, since the RIAA knows the SHA-1 hash and one plaintext "string" of (who's popular these days?) "The Attack of the WereChicken" (or was that cucumber?), does that mean better poisoning of P2P networks soon?

    --LWM

  77. w00 by Anonymous Coward · · Score: 0

    unsigned int m0[32] = {
    0x87472b83, 0x65d756a2, 0xfcfc845c, 0x13e3b751,
    0x5641f279, 0x412fbde5, 0xc30f8345, 0xe7407b20,
    0x06342d31, 0x625493ef, 0x7a949467, 0x55e92787,
    0x87f390f9, 0x70af1e1f, 0xda1d3b84, 0xf943cfa3,
    0x7e545b32, 0x1a120060, 0x3ba3a4d1, 0xfa5d29a0,
    0x172bd4ba, 0x1d799f03, 0x26a81918, 0xc0c1f8d3,
    0x74b4678d, 0xc744d004, 0x29464f42, 0x5de2c876,
    0x47dac8d9, 0x1717cc87, 0x3d20a7bf, 0xa91ae3ce,
    };

    unsigned int m1[32] = {
    0x87472b83, 0x65d756a2, 0xfcfc845c, 0x13e3b751,
    0xd641f279, 0x412fbde5, 0xc30f8345, 0xe7407b20,
    0x06342d31, 0x625493ef, 0x7a949467, 0x55e9a787,
    0x87f390f9, 0x70af1e1f, 0x5a1d3b84, 0xf943cfa3,
    0x7e545b32, 0x1a120060, 0x3ba3a4d1, 0xfa5d29a0,
    0x972bd4ba, 0x1d799f03, 0x26a81918, 0xc0c1f8d3,
    0x74b4678d, 0xc744d004, 0x29464f42, 0x5de24876,
    0x47dac8d9, 0x1717cc87, 0xbd20a7bf, 0xa91ae3ce,
    };

    real 49m56.628s
    user 44m52.036s
    sys 0m5.060s

  78. Regarding security through obscurity by slavemowgli · · Score: 1

    Actually...

    Regarding security through obscurity, I think your comment shows a rather common misconception, namely that every kind of obscurity that might be construed as improving security (rightfully or not) is automatically bad, and (as a corollary) that security through obscurity is automatically bad.

    I don't think that's true, though. Security through obscurity is not real security, but and you should never rely on it - never alone, of course, and also not in conjunction with other security mechanisms -, but that doesn't mean it can't be useful.

    It's doubtful that it would be in the case you mention, of course - obscurity, for hash functions, cryptographic algorithms etc., usually just means that they aren't well-studied (at least not as much as others) and might have weaknesses nobody has found yet, but in *general*, obscurity can provide an additional deterrent against attacks that can be useful, especially in real-world applications where not everything is black and white.

    Here's an example: a few years ago, I was running the SSH server on one of my computers on a non-standard port. From a theoretical point of view, that doesn't change anything: somebody doing a port scan would still find it easily, so if a weakness were found in SSH (the protocol) or OpenSSH (the implementation), my server would've been just as vulnerable as any other server out there. But still, if a script kid had decided to take a tool that scans for vulnerable SSH servers and roots them, mine probably would've escaped; and the constant password brute-forcing attempts from Asian IPs stopped as well. So while using a non-standard port was utilising obscurity, it did make me more secure, in real-life terms (even though the same wasn't true in terms of theoretical security).

    How does this apply here? Not at all, really, as I said above - you're better of using a well-tested hash function like SHA-512 (why restrict yourself to SHA-256? The extra bits will keep you safe for a little longer when SHA-256 is broken - and it *will* be broken, eventually) than an obscure hash function that noone ever heard about.

    But the automatic reaction that IT professionals seem to have that obscurity is *always* bad is rubbish. As long as you don't rely on it (that is, as long as your system would still be secure even if the obscurity were gone), it's not only not bad, but in fact can be a valuable tool.

    Just food for thought.

    --
    quidquid latine dictum sit altum videtur.
    1. Re:Regarding security through obscurity by jd · · Score: 1
      In some cases, I would completely agree with you. Where obscurity doubles the number of possibilities for a crypto or hash function that need to be examined, you increase the effective number of bits of that function by 1. Thus, in such cases you can use obscurity to increase the effective length of the cryptographic key.


      In other cases, it merely means that the code has been inadequately examined and (as such) cannot be considered totally trustable.


      The problem is in knowing which case applies, at any given time.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    2. Re:Regarding security through obscurity by jonadab · · Score: 1

      > Security through obscurity is not real security, but and you should never rely on it - never
      > alone, of course, and also not in conjunction with other security mechanisms -, but that doesn't
      > mean it can't be useful.

      Obscurity, by itself, does not constitute security. However, obscurity, in one form or another, used correctly, is a *vital* component of every (worthwhile) security system of which I am aware.

      In the case of hashing, it's not the hash algorithm that needs to be obscure. For some uses of hashing, it's the source text that needs to be kept obscure, which is the whole point of hashing it in the first place. (This is the case e.g. with password hashing.) Other uses of hashing have different characteristics.

      --
      Cut that out, or I will ship you to Norilsk in a box.
  79. Hash functions necessarily have collisions by Metuchen · · Score: 1

    A hash function is essentially a very lossy compression algorithm. Because it's lossy, there is no way to recreate the original file from the hash; thus, it is trivial to prove that there are multiple original files matching the same MD5 hash. Doing this in a brute force fashion may be easy to implement, but it would definitely be time consuming (for the CPU, anyway). What's truely fascinating here is that you can approach this from from the more "elegant" direct method; thus allowing the potential "masquerading" of malicious information as being legitimite. This is great work, even if it adds no real value (since it effectively breaks a working standard).

    On a side note, I sure hope there aren't any standing copyrights on MD5; otherwise, this work will be quashed by the DMCA in short order, and we'll have to revert to printing this code in tiny print on t-shirts like we did DeCSS

    --
    # They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety. --Fran
  80. Maybe. by jd · · Score: 1
    At the moment, there are a million and one APIs for hash functions and many of the basic libraries (such as the shadow password kit, PAM and SSL) use their own internal code to provide these functions. In addition, you've a lot of kits out there providing hash functions by reference (each using their own method of referring to a function) and not all kits provide all functions. gcrypt and mhash seem to do the best at providing a decent selection, but I would not call them either comprehensive or truly transparent.


    I've been throwing around for a while the idea of doing the same for crypto references the same thing that OS groups have done with well-known port numbers - have a standard list of names/numbers that everyone can refer to, so that swapping between implementations is easier and multi-toolkit packages don't have to have lots of complicated mangling to figure out who refers to what how.


    I've also been throwing around the idea of getting popular toolkits (such as SSL, the shadow password suite, etc) to have the ability to use some popular crypto library as an alternative to their internal functions, precisely so that users can update to new algorithms quickly and easily. It would seem to make a lot more sense than the existing approach.


    Finally, I've been proposing a much more standardized internal API so that whenever NIST or NESSIE or someone else runs a crypto contest, the example functions can be directly uploaded into the crypto libraries so that people can experiment with them, so that we can see what happens when you REALLY turn a lot of eyes onto these algorithms.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  81. Not "some percentage easier to violate" by alienmole · · Score: 1
    Yes, that's true, however, it doesn't negate the reduction in MD5s value. For one: any system that uses one-way hashed password storage or key generation mechanism, as many sites use, is now some percentage easier to violate because multiple strings may resolve to same hash.

    That's not correct, in general. The fact that "multiple strings may resolve to the same hash" is a given, with any hashing system, because the number of bits in the original is larger than the number of bits in the hash, so there are more combinations of strings than there are of hashes, therefore "multiple strings may resolve to same hash".

    This exploit makes it possible to generate collisions, but as the OP points out, that doesn't help you in the password hashing case since you don't know the text that you're trying to generate a collision for, and in any case, it seems the algorithm doesn't let you pick your source text.

    If the current result somehow shows that MD5 suffers from more collisions than other hash algorithms, then that might make it "some percentage easier to violate" than was previously known to be the case. As it is, the weakness of MD5 for password hashes doesn't seem to have been demonstrated.

  82. MD5+LEN by trollable · · Score: 1

    I'm currently using the following ID for my files: md5(filecontent)."-".length so it gives something like 1a566e4f3293bc8739113457634128b5-cb451. So am I safe? What is the probability to have two files with the same MD5 and the same length?

    1. Re:MD5+LEN by Anonymous Coward · · Score: 0

      MD5 and all the other hashing computations (SHA-1, RIPEMD-160 etc) already include the total length of the data in the final step of the hash computation. So you're not really adding anything.

    2. Re:MD5+LEN by kasperd · · Score: 1

      What is the probability to have two files with the same MD5 and the same length?

      All the MD5 collisions which have been produced so far had the same length. It is much more difficult to produce a collisions with two strings of different length. Which means your extra check is essentially worthless as far as security is concerned. But of course knowing the length of a file would tell you if that mismatching MD5 sum was caused by an incomplete download. And of course if the length doesn't match, you wouldn't need to spend time computing the MD5 sum.

      --

      Do you care about the security of your wireless mouse?
    3. Re:MD5+LEN by trollable · · Score: 1

      Thank you for your answer. It is very informative and very usefull. I will have to use a better hasher. Yes the length is usefull by itself. What is the best patent-free hasher (with an opensource impl) ? SHA-1 ? Of course, it must be short enough (it must be fit in an URL).

    4. Re:MD5+LEN by kasperd · · Score: 1

      What is the best patent-free hasher (with an opensource impl) ?
      I think there are open source implementations of all good hash functions.

      SHA-1 ?
      Probably not the best choice, there seems to be a litle doubt about how secure SHA1 is. It may soon be broken. I think there is one called SHA-384, which may be a better choice for you.

      Of course, it must be short enough (it must be fit in an URL).
      How long can you accept the hash to be? If the length of the string is important, then hexadecimal encoding is not the best choice. If you use base64 you can reduce a 384 bit hash from 96 to 64 characters. If 64 is too much, you will need a hash of less than 384 bits. It is supposed to be secure to just truncate the string to what length you find acceptable. But of course the shorter the hash is, the faster a collision can be found by brute force. AFAIR SHA-384 is in fact just SHA-512 truncated to 384 bits. And since it seems the length of the hash is more important to you than the CPU time spent, then a hash where the internal state is larger than the final result may be the best choice.

      --

      Do you care about the security of your wireless mouse?
    5. Re:MD5+LEN by trollable · · Score: 1

      Thanks for the answer. I'm going to use SHA-512+LEN. AFAIK, an URL is limited to 1024 chars, so it is ok. I prefer to use hex encoding (instead of base64). Looks cleaner. The CPU time is important but much less than avoiding collisions.

      How long can you accept the hash to be?
      About 600 chars for the encoded form. The whole URL is:
      yak:///SHA512-LEN/filename.ext

  83. Not a big risk by typical · · Score: 1

    Uh.

    My understanding is that Tiger has not undergone the degree of review that MD5 and SHA1 have. I've never heard of HAVAL.

    My vague understanding is that what the people here have done is made slightly more convenient an *extremely* limited specific attack based on a collision that someone else discovered. Basically, in my understanding, if you start the file/bitstream/whatever that you are hashing with their particular special block, they have a tool that can generate a bitstream that collides with the first one. However, it's pretty unlikely that anyone is going to have a file that actually starts with this block, unless they're trying to demonstrate the collision.

    Now, this is interesting for cryptographers, because there's a few things that were safe to do before that aren't any more. The big one is you can't have a system in which Person A hands Person B a file (of unconstrained content), which Person B then MD5 hashes and signs the hash of and returns to Person A. However, things like:

    * Passwords hashed with MD5

    * P2P files addressed with an MD5 hash or hash tree

    * Linux isos ...Are all still safe.

    The Slashdot article seems to be mostly frightening because the submitter, "SiliconEntity", is misleading Slashdot readers in an attempt to have a more exciting story.

    --
    Any program relying on (nontrivial) preemptive multithreading will be buggy.
    1. Re:Not a big risk by MechaStreisand · · Score: 1

      Actually, I think the P2P files and Linux (or FreeBSD) isos do have a problem with MD5 now. In those cases, the plaintext is easily available, and with both the plaintext and the MD5 sum, an attacker can make a corrupt version of the plaintext with the same MD5. They can't add in arbitrary code or whatever, but corrupt files that MD5sum correctly are still pretty bad.

      --
      Disclaimer: IANAL. This post is, however, legal advice, and creates an attorney-client relationship.
  84. Breaking your very own torrents? :-/ by Crouty · · Score: 1
    I fail to see why this may be a relevant scenario.

    Why would someone first put up a torrent and then poison it with bad blocks?

    **AAs are interested in poisoning data they did not publish themselves. They cannot make the uploader use blocks they have collisions for.

    * shrug *

    --
    On se Internetz nobody noes your German.
    1. Re:Breaking your very own torrents? :-/ by kabloom · · Score: 1

      **AAs are interested in poisoning data they did not publish themselves. They cannot make the uploader use blocks they have collisions for.

      Why not? they did, after all, publish the music in the first place.

    2. Re:Breaking your very own torrents? :-/ by petermgreen · · Score: 1

      well if this type of attack was availible for sha-1 (which it isn't yet) a game vendor (the ??AAs have less flexibility because of all the shit that tends to happen to audio or video before its posted) could put one of a colliding pair in the official release and use the other to poisen torrents (though other files before the game exe in the torrent could cause problems with this technique by misaligning the blocks).

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  85. TCPv6? by tepples · · Score: 1

    If TCP were to be redesigned today, it would most probably have a 32-bit cyclic redundancy check specified as an error check instead of the current checksum.

    In the IPv6 stack, is TCPv6 identical to TCPv4?

  86. Project Growth? by Skudd · · Score: 1

    A friend of mine has been running http://www.md5lookup.com/ for quite some time, and has been working on generating the hashes as fast and efficiently as possible.

    !!!Conceptual Blathering Below!!!

    Projects like this are a major threat to the "security" of the internet, given the fact that MD5 is still widely used. However, I am of the belief that the concept can be turned around and used to optimize the transmission and storage of large files. Granted, there is a lot of legwork for the CPU required, I have had the idea for the last year or so that it would be possible to transmit a small hash and control string, and build a large file from it. Imagine downloading a Slackware ISO in under 5 seconds, and building it in n seconds!

    For one, it would solve the problem of network latency and load, because there would no longer be any massive chunks of data being transferred.

    Second, it would allow individuals with slower internet connections (dialup, slow DSL, etc.) access to content they never dreamed possible.

    Third, it would just flat out be cool. :)

    I realize that it would take a lot of computing power to rebuild a file from a simple hash, but which would you rather have?: A 4+ day wait to download a DVD image or a ~1 hour wait to build the file?

    1. Re:Project Growth? by slashdotmsiriv · · Score: 1

      What you are suggesting is efficient compression which is already deployed everywhere (MPEG is a form of compression). Hashes however have nothing to do with it. Hashes can be viewed as an extremely lossy compression schemes. In fact hash functions are designed to lose all information prior to hashing, since one of fundamental properties of hash functions is their one-wayness. Hash functions are supposed to be one way, you can never recover the original hashed file. Thus, you would never be able to recover the whole file, perhaps you would be able to retrieve some bits. Note that the so called attacks on MD5 find collisions, that is random pairs that hash to the same value. That is a far cry from obtaining the preimage of a hash, that is obtaining the file from its MD5.
      (in fact i now feel embarrassed that I actually spent 1 minute to argue about that, this should be obvious)

    2. Re:Project Growth? by Skudd · · Score: 1

      Well, first off, I respected your response until that last line. Excuse me for having a somewhat feasable idea. We're not all rocket scientists around here, you know.

      And secondly, I would like to point out that I wasn't referring to MD5 in particular, but the concept. I realise that there is such a thing as compression, and that it is quite effective. I am of the belief that there is a much higher level of compression attainable, though.

    3. Re:Project Growth? by Anonymous Coward · · Score: 0

      Yeah, just go check out lzip!

      They've implemented lossy compression so good, you can get your file down to 0 bytes! It's amazing! Scientific!

  87. Solution for p2p software by Anonymous Coward · · Score: 0

    So, now someone can make some garbage with the same md5 summ as a file, we want to dowload, he may destroy everything. Yeah, that can become a problem, but the solution is really easy.
    Our P2p Programms then just have to use multiple kinds of Hashing Algos. So maybe someone can make garbage with the same md5 summ like our file and someone may be able to make a file with the same hash summ that algorithm b produces. Same for hash algorithm c.

    But i think it`ll be ALMOST impossible to make one garbage File that makes an identical Hash summ like our file with md5,b and c!!!!

    And for verifying whether a file was transferred correctly from an official server, for that md5 will still be usable. Almost impossible, that we get randomely the same md5 summ.

  88. the solution for p2p software by Anonymous Coward · · Score: 0

    Our P2p Programms then just have to use multiple kinds of Hashing Algos. So maybe someone can make garbage with the same md5 summ like our file and someone may be able to make a file with the same hash summ that algorithm b produces. Same for hash algorithm c.

    But i think it`ll be ALMOST impossible to make one garbage File that makes an identical Hash summ like our file with md5,b and c!!!!

  89. MD5 and digital signatures by dido · · Score: 2, Informative

    The main problem with MD5 as it's used today comes when MD5 is used as a component of a digital signature scheme. Most digital signature schemes based on public key crypto work like this:

    1. Generate hash of document to be signed
    2. Encrypt hash of document using signer's private key (this is the signature)
    3. Send document along with signed hash to whoever cares

    To verify a digital signature, the following is performed:

    1. Recompute hash of signed document
    2. Decrypt signature using signer's public key, producing calculated hash
    3. Compare computed hash and signed hash--if they match, signature is authentic

    Now, it's easy to see why this spate of collision attacks on hashing algorithms is so deadly. If, given some signed document, you can produce another document that verifies to the same signature, well, I guess you're in a world of hurt. If these documents happen to be public key certificates, well, the whole PKI more or less collapses. And well, here's a bit of news: someone has done just that with X.509 certificates based on MD5.

    --
    Qu'on me donne six lignes écrites de la main du plus honnête homme, j'y trouverai de quoi le faire pendre.
    1. Re:MD5 and digital signatures by swillden · · Score: 2, Interesting

      And well, here's a bit of news: someone has done just that with X.509 certificates based on MD5.

      Ouch. I'm not sure how I missed that one. The first page of their paper shows that I greatly underestimated the severity of the attack. What I hadn't realized is:

      Due to the ability of Wang's method to produce MD5 compression function collisions for any IV, and due to the iterative structure of MD5, we can append a collision to any block of data of our choice (provided that the bitlength is a multiple of the MD5 block length), while maintaining the collision property.

      My understanding was that the collision generation algorithm only created relatively small blocks, both effectively random, and that appending those blocks to different chunks of data would produce different hashes, even though the blocks themselves hash to the same value. But apparently the method can produce collisions even with arbitrary, attacker-specified IVs. That means effectively that you *can* craft a collision to any file, even a pre-existing file, as long as the file format has some "dead" space at an appropriate offset and the changes you want to make can precede this location.

      This is significantly worse than I understood it to be. It still doesn't affect password hashing, but it affects many other uses of hashing. The exploit that showed a pair of different postscript documents with different visible content but the same hash didn't require the property that Lenstra et al used to create the certs, which led me to my erroneous assumption.

      Thanks for pointing out this attack.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  90. MOD ME DOWN by swillden · · Score: 5, Informative

    The parent comment, which I wrote, was based on a severe misunderstanding of the extent of the capability of the attack. In particular, I didn't realize that the attack could find collisions even with arbitrary, attacker-specified IVs. What that means is that it is indeed possible to generate x.509 certificates containing different keys but the same MD5 hash (and therefore the same signature). In fact, it's been done.

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  91. How to calculate SHA-2 under GNU/Linux by YA_Python_dev · · Score: 3, Informative

    A lot of people have the programs md5sum and sha1sum installed, but they often don't have equivalent programs for the SHA-2 family. To calculate those you can use the command:

    gpg --print-mds [files]

    or you can download a dedicated program: http://www.ouah.org/~ogay/sha2/.

    --
    There's a hidden treasure in Python 3.x: __prepare__()
  92. OS X by Anonymous Coward · · Score: 0

    Calculator.app on Tiger on my PB 1.5GHz is clearly superior to all these other results because:
    1) The answer returns immediately. None of this waiting around crap.
    2) The answer is "Infinity", which goes to show that OSX can handle much bigger numbers (Infinitely bigger!) than other OS's.

    No wonder AAPL is soaring.

  93. gnupg configuration help by kabloom · · Score: 2, Interesting

    Is it advised to switch GnuPG from SHA-1 for signatures to SHA-256 (like immediately)? How can I make this configuration change?

    1. Re:gnupg configuration help by chongo · · Score: 1
      Is it advised to switch GnuPG from SHA-1 for signatures to SHA-256

      I recommended that:

      • Existing applications that use SHA1, where possible, should changed to use SHA256 before the end of 2008

      So GnuPG should be modified to use SHA-256 before the end of 2008, in my opinion. However I also said:

        • For interoperability with older applications and hardware, these applications may have to also support SHA1

      Given the number of SHA1 PGP key signatures out there, I suspect that GnuPG will have to support SHA1 for a long time to come.

      Unless I'm mistaken, GnuPG and friends do not support SHA256 today. Therefore even if you were to somehow sign your new PGP key using the SHA256 hash, not many people would be able to process it. So what needs to happen is that the developers / maintainers of PGP / GnuPG code base need to extend the application and data formats to allow for SHA256. As I stated:

      • Existing applications and protocols should be modified to be algorithm agile by the end of 2008, if not sooner

      In the case of PGP / GnuPG and friends, the if not sooner is a must.

      --
      chongo (was here) /\oo/\
  94. 64-bit porting by mi · · Score: 1
    s/unsigned int/uint32_t/g

    Rnfl Xnezn!

    --
    In Soviet Washington the swamp drains you.
  95. Adi Shamir's Hash Function *IS* 'unbreakable'.... by iamcf13 · · Score: 3, Informative
    Let Ron 'RSA' Rivest tell you why....

    (from material at the Pure Crypto Project - http://senderek.de/pcp/ )

    Quote below from http://senderek.de/pcp/pcp-security.html


    Adi Shamir once proposed the following hash function:

              Let n = p*q be the product of two large primes, such that
              factoring n is believed to be infeasible.

              Let g be an element of maximum order in Z_n^* (i.e. an
              element of order lambda(n) = lcm(p-1,q-1)).

              Assume that n and g are fixed and public; p and q are secret.

              Let x be an input to be hashed, interpreted as a
              non-negative integer. (Of arbitrary length; this may be
              considerably larger than n.)

              Define hash(x) = g^x (mod n).

    Then this hash function is provably collision-resistant, since
    the ability to find a collision means that you have an x and
    an x' such that

              hash(x) = hash(x')

    which implies that

              x - x' = k * lambda(n)

    for some k. That is a collision implies that you can find a
    multiple of lambda(n). Being able to find a multiple of lambda(n)
    means that you can factor n.

    I would suggest this meets the specs of your query above.

                      Cheers,
                      Ron Rivest

    Ronald L. Rivest
    Room 324, 200 Technology Square, Cambridge MA 02139
    Tel 617-253-5880, Fax 617-258-9738, Email



    The nice thing about Adi Shamir's hash function is that it, as well as the RSA cryptosystem co-created with Rivest and Len Adleman is all based on simple modular exponentiation.

    Too bad the Feds consider arbitrary precision mathematics used for encryption purposes to be 'a munition' and 'a controlled export'.... :(

    Years ago, they raked Phil Zimmerman over the coals over his email cryptosystem PGP then (eventually) left him alone.

    Can't cryptosavvy individuals secure the details of their affairs with strong encryption WITHOUT being hassled by 'the Man'?...

    P.S. However, Rivest came up with a scheme that gives you 'confidientiality *without* encryption' through a scheme he calls Chaffing and Winnowing

    Enjoy! :)

  96. Exactly! by 0xB00F · · Score: 3, Informative

    Because that would require the hashed password and a preimage attack.

    See here.

    Summary for those who are lazy:

    • This is a collision attack. The attacker will be able to find two messages that will produce the same hash, but the attacker cannot choose what the hash will be. So this rules out attacks on hashed passwords.
    • Since a collision attack can find to messages that will produce the same hash, it is possible to use this to break message signing, such as DSA and RSA. where a hash of the message is generated first and then signed cryptographically.
    • Collision attacks cannot be used to tamper with existing SSL certificates. It can be used to craft a CSR which will allow you to receive a server certificate with a collision to one containing a wildcard for the domain name and an expiration date far in the future. This by far is one of the most dangerous exploits because most CA's will issue certificates without completely verifying the identity of the requester.
    • Because of the way MD5 is used in SSL 3.0/TLS, these attacks do not affect it.
    • Collision attacks do not affect MD5 and SHA-1 when they are used in an HMAC. So even though a hash function can be broken by either by collision attacks, they can still be used safely in HMAC.
    • Tampering a signed binary is only possible using a preimage attack.
    • All will have naturally collisions. How exploitable they are depends on how easy it is to find those collisions.

    In conclusion, the value of this attack/exploit is only relative to how the hash function is used in an application. Just because this exploit and source code for it exists, that does not that these hashing algorithms are completely useless.

    1. Re:Exactly! by Anonymous Coward · · Score: 0
      So this rules out attacks on hashed passwords.... How exploitable they are depends on how easy it is to find those collisions.
      Yes there are natural collisions, that's beside the point. How exploitable they are depends on how easy it is to find collisions? Hmm... maybe like ease this very story is about? Ya-think?

      Even generating valid inputs for SHA-512 is impossible (where impossible is defined as really-really hard), and now with an MD5 sitting around we've got an attack vector. Sure we can't choose our input but we've greatly minimised the number of hashes to try because we know the input for SHA-512 is one of the inputs for the MD5. Now we can generate valid MD5 inputs at a much greater pace and which, if the MD5 is available, greatly reduces the time to hack it.

      Also, hackers had databases of hash lookups before, and now it's more feasible to try more complex variations, so more complex pseudo-english variations and simple salts are now more feasible.

  97. Stake in the heart? I think no. by Vo0k · · Score: 1

    MD5 may be dead for security reasons, but it's still great for reliablity checksums where no malicious activity is provided.
    Yesterday a guy sends me an order, the project is in Corel 12. But he knows I have Corel 9 only, so 2 minutes later he sends another mail, "corel 9 replacement".
    But the new file won't open.
    md5sum file1.cdr file2.cdr
    "Excuse me sir, but you have sent the same ver.12 file again."

    It's still a great hashing/checksum algorithm and it will still work in all the situations where no malicious activity is involved.

    --
    Anagram("United States of America") == "Dine out, taste a Mac, fries"
    1. Re:Stake in the heart? I think no. by Anonymous Coward · · Score: 0

      Umm... so is /bin/diff

      No cryptography required.

  98. Please think straight people by tod_miller · · Score: 1

    There are three things we are talking about:

    1 ) Finding a collision is one piece of the puzzle. How to exploit it?

    Lets say you are hosting 'happy frappy firefox downloads!' and you are helping spread firefox, yet your version contains some nast13z malware.

    Is this a situation where a MD5 hash collision will help you? Probably not - simply changing the file, yet publishing the original MD5 next to it, and very few people will check. The file size will usually give you away, even if one BYTE changes.

    Someone mentioned empty space in files to help generate a collision.

    So we take two different files, and we try and get the first file to collide with the second file (THIS IS NOT WHAT THIS CODE DOES!).

    Also has this been benchmarked on larger files? File that may in fact be targetting (software installs?) Or is this not for that?

    3 ) The next is password hashes. My password for slashdot it fr4pp4l1fe. The hash is stored on slashdots server.

    Well done. Now, as long as the hashes are safe, then I am safe. If someone could easily find another password that hits that, then I am scr3w0r3d!

    Now usually anyone can lookup password hashes (or I have been able to do this in systems I have been in...) and regularly passwords in databases are reset by a dbadmin using a simple PLSQL like : set passwort = leetShizzle('monday').

    Collisions do pose a threat (how much of a threat?) for passwords. Knowing the hash you can find another input. Not the original.

    2 ) Poisoning torrents.

    You can poison torrents by writing a maliscious client that returns shitty data. However, with hashes, a node will verify the data you are passing with a hash function - after receiving it. If the has failed, it will assume a broken packet that wasn't caught with CRC or a bad client.

    Collisions allow certain files to be pre-loaded with poisoned content, yet look like the original. Because original files are large, the whole files cannot be passed around for nodes to check the fidelity, since they are fact being downloaded anyway (if that makes sense).

    Hashes breakable means different code can be sent. However, will a length check thwart this?

    Do clients hash every block of data?

    please type the word in this image: gruesome
    random letters - if you are visually impaired, please email us at pater@slashdot.org

    --
    #hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
  99. Can someone explain - what makes is gt brute force by tod_miller · · Score: 1

    Faster than brute force?

    What makes it so? Can someone explain in english? Looking at the code I couldn't really get a flavour of why this is faster than brute force.

    Thank you.

    Bungle.

    --
    #hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
  100. What the hell? by JediTrainer · · Score: 1

    although clearly it's a concern that, should the salt value become known, all your passwords, etc, become breakable...

    ...and this is why anybody who implements salt generates a unique random salt for *each* password, not one system-wide one.

    --

    You can accomplish anything you set your mind to. The impossible just takes a little longer.
  101. This is a MAC, nothing to do with what's at hand by ^BR · · Score: 1
    MAC as in Message Authentication Code, not Media Access Control.

    And a bad one at that. Better use MD5(secret input secret2) or even better use a properly defined one like HMAC if what you want is a MAC.

  102. Re:This is a MAC, nothing to do with what's at han by Anonymous Coward · · Score: 0

    There is an attack on this construction as well.

    Please people, stop proposing constructions if you aren't a cryptographer.

    -Matt

  103. MD5 tombstone by jose+parinas · · Score: 1

    I have collected many posts regarding this issue on my blog category on MD5. And posted my practical exploit of MD5 some time ago on slashdot. This new code is the tombstone for MD5, in fact, now we can write a better exploit of MD5, for example, someone can attack a mirror site of apache, that uses MD5 as check sum, and replace a distribution of the software with some time of malware.

  104. Hyper-Threading? by 80+85+83+83+89+33 · · Score: 1

    hey, does your P4 have Hyper-Threading? i've heard of it doing strange things; sometimes my friend's 3.0GHz with Hyper-Threading machine will play mp3s twice as fast! (can't remember what player he was using though.)

    --
    i disable sigs