Meaningful MD5 Collisions

Common sense by gtrubetskoy · 2005-06-10 07:40 · Score: 2, Insightful

Unless I'm missing something, all these guys are doing is using a format that can contain an infinite amount of extraneous information that has no effect on how it's ultimately rendered, so same thing can be done with a .doc, or an HTML file, and this isn't really a cryptographic discovery of hash function weakness of any kind, just common sense for most of us - the secure hash algorithms should be applied to the English (or whatever language) textual contents of the document, no the source code of it, such as PostScript used in the article, PDF, HTML or whatever. I guess the most important lesson here is that this technique can be applied to binaries pretty easily as well.

Re:Common sense by ergo98 · 2005-06-10 07:46 · Score: 2, Informative

and this isn't really a cryptographic discovery of hash function weakness of any kind

It's an example of the hash collision weaknesses recently documented, giving a practical example of how it could be used for malicious purposes.

Traditionally we haven't had to worry about nonsensical things like applying the hash only to easily verifiable English text, because the hash is supposed to practically protect against intentional searches for collisions.
Re:Common sense by 3770 · 2005-06-10 07:48 · Score: 1

They said that they have a method to make it easier to find collissions.

While you are right that you always will have collissions they say that they can find it in a reasonable time.

What is interesting though is that in their example they have control of both documents and they may use that to find a collision.

This means that they may have a harder time to find a collision for an arbitrary document. If that is the case i don't know. I'm just speculating.

--
The Internet is full. Go Away!!!
Re:Common sense by loose_cannon_gamer · 2005-06-10 08:09 · Score: 2, Insightful

Your common sense seems a little ridiculous. Are we saying that all documents have to be reduced to text before applying our encryption? What about nontextual documents, like, say, process flowcharts, spreadsheets, powerpoint slides, multimedia?

There are a lot of formats out there that allow additions of random undisplayed information, and so I presume that many of these formats are vulnerable to these attacks.

I wonder how long it will take before there are exploits out there that take advantage of these techniques... Of course, I also wonder how many there already are...

Oh well. The key concept behind security is, has been, and always will be trust. You should always ask yourself when you receive something from someone else how much you trust the source, and act accordingly.

--
In Soviet Russia, us are belong to all your base.
Re:Common sense by Anonymous Coward · 2005-06-10 08:19 · Score: 1, Insightful

Your common sense seems a little ridiculous. Are we saying that all documents have to be reduced to text before applying our encryption? What about nontextual documents, like, say, process flowcharts, spreadsheets, powerpoint slides, multimedia?
What's a little ridiculous is your lack of security knowledge. Yes, signatures MUST be applied to meaningful parts of the document files, and exactly how to do it right varies from format to format - simply running md5sum on a file is rediculously risky!
Re:Common sense by iabervon · 2005-06-10 08:30 · Score: 3, Interesting

Actually, the two documents are actually almost identical. The difference is only one block in the whole file, which essentially acts as a selector for which of the two sets of content is displayed. MD5 (like most hash functions) works on fixed-size blocks smaller than the average file. To hash a complete file, you hash the first block, feed that into the hash with the second block, feed that into the hash with the third block, and so forth. So they have two files, and the first blocks are the same, the second blocks are different but hash the same, and the rest of the files are the same. Of course, the second blocks are junk, but the postscript is expecting a block-sized arbitrary value at that point anyway, so it doesn't matter that there's junk there.

So they are actually using a format that can contain an exact quantity of extraneous information that doesn't get rendered but entirely changes what does get rendered.

The same thing could be done with PDF or doc, and executables, but not anything compressed (it won't decompress at all if a block is changed) and not HTML without javascript (there's no way to test which block of junk is included and show different results based on that).
Re:Common sense by Foolomon · 2005-06-10 08:32 · Score: 1

Oh well. The key concept behind security is, has been, and always will be trust. You should always ask yourself when you receive something from someone else how much you trust the source, and act accordingly.
This is even truer in light of the Israeli spyware incident. The spyware was installed via the Autoplay feature of Windows and was distributed on CDs sent to target companies that supposedly contained business proposals, if memory serves me correctly.

Trust is everything vis-a-vis computer security.
Re:Common sense by Anonymous Coward · 2005-06-10 08:56 · Score: 0

Isn't this obivous if you check the file sizes?

The cksum comand (which uses a 32 bit CRC) spits out the checksum and a file size. Why doesn't md5sum do the same thing?
Re:Common sense by jacksonj04 · 2005-06-10 09:20 · Score: 1

If you're really worried, hash twice with differing algorithms. Colliding MD5 with sensible data may be possible, colliding SHA-1 with sensible data may be possible, finding sensible data which collides with both is nay impossible.

--
How many people can read hex if only you and dead people can read hex?
Re:Common sense by m50d · 2005-06-10 09:27 · Score: 1

The point is that an attack which allows finding collisions is good enough, even if it's in very bad looking data. This wouldn't have worked against an algorithm with no attacks, just a partially-broken one like MD5.

--
I am trolling
Re:Common sense by Anonymous Coward · 2005-06-10 09:51 · Score: 3, Informative

Isn't this obivous if you check the file sizes?

The files are the same size.

The cksum comand (which uses a 32 bit CRC) spits out the checksum and a file size. Why doesn't md5sum do the same thing?

It does - The file size is used as part of the MD5 hash. The MD5 algorithm hashes the file, then appends the file size and hashes that too. If it didn't do this then you could create an MD5 collision just by appending zeros to the file.
Re:Common sense by ahdeoz · 2005-06-10 10:01 · Score: 1

really. All you need to do is throw a bytecount on the end of and md5sum and it's impregnable again.
Re:Common sense by dossen · 2005-06-10 22:40 · Score: 1

No, it's not impossible. It's at most as difficult as colliding a 288-bit (MD5 being 128-bit and SHA1 being 160-bit) hash-function, assuming that the combination of the two does not have any weaknesses. Do you have any references to analysis of the MD5+SHA1 function? Are you sure that the combination is significantly better than either of the two functions?
Re:Common sense by jacksonj04 · 2005-06-11 02:10 · Score: 1

I have no references, only common sense. Since they both use different hashing algorithms entirely, the hash for one piece of text will be entirely different in both algorithms. In order to collide your differing text with one algorithm you need to pad your file, the chances of finding padding which causes a collision in both algorithms is very, very slim. I'll do the maths when I wake up properly.

MD5(originaltext) = hash1
SHA1(originaltext) = hash2

You may be able to create something so that MD5(newtext) = hash1, but the chances are that SHA1(newtext) will not equal hash2. I wasn't talking about combining (eg hashing the hash, that won't help), but instead of creating and storing two seperate hashes for comparison.

--
How many people can read hex if only you and dead people can read hex?
Re:Common sense by dossen · 2005-06-11 10:00 · Score: 1

Well, the combination I was contemplating, was concatenating the hashes, as you are absolutely right that md5(sha1(x)) is not a good idea. But (md5(x)<<160)+sha1(x), which is exactly what you end up getting, if you calculate both hashes and store the results, is as I said 288 bits of hash. And while md5 and sha1 are different, they are not that different. I'm not enough of a cryptohead to discuss the internals, but one observation is that both use the same Merkle-Damgård structure, with 512-bit blocks, padded in the same way. This means that unless you define some non-trivial transformation for the input to one of the hashes, the attack of the article will work, as long as you can find a collision for concat(md5,sha1). Finding that collision will, at most, be as difficult as any other 288-bit hash, but it might be easier (I have no idea, but the collision attacks on either function may be combinable, or some other weakness may come out this way) than with a hash-function _designed_ to utilize 288 bits. And just to point out something, this attack is not padding the files in the end to fix the hash. The payload of the document comes after the hash-collision. So if just one collision is found in a hash-function, an endless series of postscript document-pairs can be created.
Re:Common sense by Anonymous Coward · 2005-06-11 13:30 · Score: 0

That's bad. You are constructing a 288 bit hash function knowing full well that there is a known efficient partial collision attack for the first 128 bits. Not good. I wouldn't quit my day job and plan on making it big as a hash function designer if I were you.
Re:Common sense by jacksonj04 · 2005-06-11 22:50 · Score: 1

I'm not constructing a 288 bit hash, I'm using two different (hopefully non-colliding) hashing methods on the same text. *I am not concatenating the hashes*

If it makes you feel better, replace MD5 and SHA-1 with two other hash methods, the principle is still the same.

Assume you want to verify a download - you publish both the MD5 and the SHA-1 of the file.

If somebody wants to create a malicious copy of the file, they can try collide with the MD5. Despite the fact that the MD5 of their new file will collide with the MD5 of the original, the SHA-1 of their new file is unlikely to collide with the SHA-1 of the original.

--
How many people can read hex if only you and dead people can read hex?
Re:Common sense by devilspgd · 2005-06-12 05:20 · Score: 1

Better yet, hash the readable text once and the entire document in the second hash.

This will completely avoid cases where extra data can be used to tweak the hash.

However, you'll still have to watch for "readable" but invisible text, white-on-white text, or black-on-white text which happens to be superimposed over a black image (which could help dodge a simple "ignore 'invisible' text" routine)

--
Give a man a fish, he'll eat for a day, but teach a man to phish...

Speaking of 128 bit collisions by Anonymous Coward · 2005-06-10 07:40 · Score: 0

What happens when databases using GUIDs finally become large enough that GUID collisions are everyday occurances? Will we have to switch to using 256 bit keys?

big oops by Anonymous Coward · 2005-06-10 07:42 · Score: 0

Does this mean that MD5/SHA1 should no longer be used? What about AES's hash's?

Re:big oops by tota · 2005-06-10 08:22 · Score: 2, Informative

sha-256 and later have not been found to be weak yet. It does not mean that they are unbreakable (or I should say: possible to derive collisions) but it is definitely better.

Another solution that should work quite well is to combine hashes: (md5+sha1) is definitely much stronger as you would need to find a collision that works for both algorithms at the same time.
Possible, but not likely.

--
TODO: 753) write sig.
Re:big oops by kabloom · 2005-06-10 09:23 · Score: 1

Be careful with MD5, don't worry about SHA1 yet.

This isn't so much a flaw with the hashes as it is a way of exploiting certain kinds of documents to take advantage of a hash collision.

Plain text messages, HTML documents, and the like don't appear to be exploitable with this type of attack. But executable files and postscript files can be exploited -- to a certain extent.

You can't exploit arbitrary files -- the file has to be specially constructed to allow for the exploit. The problem isn't with using MD5sums to verify the integrity of programs on your machine. The problem here is digitally signing untrusted data -- it only raises the bar for what you need to check before you sign data. In the article, what Caesar should have done is indicated that he'd type up the letter for Alice and send her a signed version by the end of the day. (Although I'm not sure how to handle the social niceties involved)

Wow...this is nerdy even for /. by William_Lee · 2005-06-10 07:42 · Score: 1

Anyone want to explain the real world applications of this to someone who is considering turning in his nerd credentials after being unable to get the gist of this from the write up...and please don't tell me to RTFA, this is after all /.!

Re:Wow...this is nerdy even for /. by k4_pacific · 2005-06-10 07:45 · Score: 2, Funny

Basically, they used a high-powered particle accelerator to create MD5 collisions between human-meaningful documents, thus forging the missing link between thermodynamic and informational entropy.

I hope that clears things up.

--
Unknown host pong.
Re:Wow...this is nerdy even for /. by William_Lee · 2005-06-10 07:46 · Score: 2, Informative

Thanks for clearing that up. Now it all makes perfect sense!
Re:Wow...this is nerdy even for /. by jjares · 2005-06-10 07:56 · Score: 4, Informative

Basically, when you do an md5 for a string, you transform an existing text with a variable length to a fixed length string. Now, imagine the variable text is 200bytes long, but the fixed string is 20 bytes long, you are obiously loosing information, and that there may be a combination of 200 bytes that produce the same 20 byte sequence, but the amount of combinations in 20 bytes (160 bits) make it highly unlikely that you will find a repeated sequence. What this investingators found is a way to replicate this sequences. The problem being that usually we check integrity with this md5 hashes, so teoretically, someone could alter a text and produce a new one that seems (from the md5 hashes) identical to the first one. This is specially nice for putting backdoors in source code downloaded from the net, as we often check it against an md5 hash.
Re:Wow...this is nerdy even for /. by Anonymous Coward · 2005-06-10 07:58 · Score: 0

The potential is that someone could post a "mirror" of your favorite software, say Win XP SP3 (thinking future), that has the same MD5 hash as the real thing but is in fact a massive virus (or trojan, or some other malware). Of course, for this example, that would be redundant.
Re:Wow...this is nerdy even for /. by Ann+Elk · 2005-06-10 08:13 · Score: 2, Insightful
- Download something critical, say, the current Linux kernel from kernel.org.
- Insert a trojan/backdoor/whatever.
- Manipulate the tar archive so the hashes match. This is the subject of TFA.
- Somehow upload the trojaned kernel back to kernel.org.
- Since the hashes for the original kernel and the trojaned kernel are identical, they both appear valid when checked against the signature.
- ????
- Profit!
Before someone starts bitching about "lack of trust" in open source, please replace kernel with security update and kernel.org with microsoft.com in the above text.
Re:Wow...this is nerdy even for /. by Anonymous Coward · 2005-06-10 08:18 · Score: 0

you are obiously loosing information

you are obviously unaware of the proper use of loose and lose
Re:Wow...this is nerdy even for /. by deanoaz · 2005-06-10 08:35 · Score: 1

The article states that people were able to manipulate two documents that contained displayed and undisplayed data (postscript) so that both produced the same MD5 hash string and different displayed text. They could only do this since they controlled the content of both documents and got to determine the hash value before it was made known.

That means scary scenarios in which someone downloads a document or piece of code, modifies it, makes it match the original (already published) hash, and redistrutes it are false. The article discusses no such capabilities being discovered.

It just means that if you accept someone else's complex data object and a hash to go with it, be aware that if they wanted to they could have simultaneously created another data object that matched the same hash.

"Today, if you are not confused, you are just not thinking clearly." - U. Peter

--
If 'the people' in Amendment 2 are 'the state' then Amendments 1, 2, 4, 9, and 10 benefit the state, not you.
Re:Wow...this is nerdy even for /. by SolusSD · 2005-06-10 08:42 · Score: 1

it isn't "nerdy" even for slashdot. yes, RTFA, it presents a very easy to understand example.
Re:Wow...this is nerdy even for /. by Beryllium+Sphere(tm) · 2005-06-10 09:01 · Score: 2, Insightful

>teoretically, someone could alter a text and produce a new one that seems (from the md5 hashes) identical to the first one

Not exactly. Not unless the attacker could choose the first text. The new attacks allow you to create two documents that collide, but don't (yet) allow you to take an arbitrary document and make another that collides with it.

That word "yet" is important. A lot of bright crypto people will now start working on "preimage" attacks and they've got some new tools to work with. Be afraid. In a calm constructive way of course.
Re:Wow...this is nerdy even for /. by Anonymous Coward · 2005-06-10 09:01 · Score: 0

1. get a PGP signed email from G W Bush; that is, Content + Signature(ContentHash)
2. sp00f an email so it hashes to the same ContentHash (and hence the same Signature can be reused); sample content: "I do goats, everyday; and, goddammit, it's my right to still have my balls dripping with goat fluids when I'm shagging Laura."
3. post it on slashdot and have real nerds - not the charlatans asking about hashes - verify that the email validates under the public key of G W Bush
4. ???
5. Profit
Re:Wow...this is nerdy even for /. by jjares · 2005-06-10 09:26 · Score: 1

In true slashdot tradition, I didn't RTFA :-)
Thanks for the clarification
Re:Wow...this is nerdy even for /. by Anonymous Coward · 2005-06-10 09:35 · Score: 0

Most CD images I'm downloading are including BOTH md5 and sha-1 sums now. I don't know of a single package manager or cd burning app that will read and verify both however.

Theoretically it's possible to pad a document to collide both hash algorithms. But really pretty unlikely.
Re:Wow...this is nerdy even for /. by gibson_81 · 2005-06-10 09:44 · Score: 1

That means scary scenarios in which someone downloads a document or piece of code, modifies it, makes it match the original (already published) hash, and redistrutes it are false. The article discusses no such capabilities being discovered.

OTOH, IIRC, that would be covered by the birthday paradox: It is a lot more likely that you find two random people have the same birthday than that you find a random person with the same birthday a a specific person.

So, the danger is not someone changing a document signed by someone else, but rather someone having two versions of a document: You sign doc A, since you agree, but then he shows you doc B, which you don't agree to at all, but since he made it so they have the same signature, you've essentially signed them both ...
Re:Wow...this is nerdy even for /. by rar · 2005-06-10 11:39 · Score: 1

Anyone want to explain the real world applications of this[?]

You leave us guessing at your background, but in case you are an "xbox hacker", here is an explanation for you :) :

Let say "ax" and "by" happens to be colliding in SHA1. (They are not, but just pretend)

Lets say that an "insider" working on an xbox game on purpose put this into the code:

if(!strcmp("ax" == "by")) {
startup linux or something
} else {
run computer game
}

Lets say Microsoft signed that game.

THEN the "evil linux hackers" could take this game and exchange "by" for "ax" inside the executable. The executable would *still* confirm to Microsofts signature; the SHA1 would not have changed.

Result: Microsoft signed executable starting linux.

But --- I guess this isn't really that different from an "insider" inserting some other backdoor... So, its not that useful as an attack I guess...

--
Open Materials Database
Re:Wow...this is nerdy even for /. by Jasin+Natael · 2005-06-10 13:19 · Score: 2, Interesting

It bears mentioning that md5 doesn't account for the length of the file. So if someone were to try installing a backdoor into a program, and had a sophisticated enough piece of software using this method, comments, metadata, or other information could be used to 'pad out' the file to make it seem like the original -- even with source code files. Especially in the case of executables, they could just insert random crap at the end of an executable file, and make the md5 hash (and possibly the size) come out identical to the original. Some of these have already been demonstrated.

While this collision will be a big deal for signing documents, it shouldn't have any effect on web security (Digest Authentication, for one, uses MD5 pretty extensively). The lesson: While MD5 is still reasonably difficult to collide, the time to find collisions (~5 hours) on a normal PC means that malicious uses are now practical.

I'm not entirely sure what the implications are -- would it be suitable to sign documents using multiple cryptographic functions (such as a signature containing SHA, MD5, and CRC32 hashes, along with the original filesize)? Maybe perform a simple, arbitrary transformation on the text content and use that to generate a seperate, complimentary MD5 or SHA hash?
Jasin Natael

--
True science means that when you re-evaluate the evidence, you re-evaluate your faith.
Re:Wow...this is nerdy even for /. by Fnord666 · 2005-06-10 13:38 · Score: 1

I think your example still falls under the first case, not the birthday paradox. You are still starting from a selected executable/person and looking for a match. The birthday paradox would come into play if you downloaded an entire library of executables and were just looking for any two that had a matching hash.
And the math behind the birthday paradox is almost as much fun to explain as the monty hall problem!

--
'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
Re:Wow...this is nerdy even for /. by Anonymous Coward · 2005-06-10 13:44 · Score: 0
Nope. Preimages attacks haven't been discovered for md5 (please RTFA). The way it works is:
- Download kernel at kernel.org
- Make harmless changes to kernel and convince Linus this is the new authoritative kernel
- Make another kernel that emails any naked pictures of geek girlfriends (yes, they do exist) to a phisher email address.
- Give both kernels same md5 sum
- Hav harmless kernel uploaded
- Break in to kernel.org and replace kernel with evil kernel
- Profit by jacking off all night to pictures of random Linux user's naked girlfriends
Yes, I know. This is Slashdot. Clue is optional (but a lack of clue doesn't stop people from being opinionated twits)
Re:Wow...this is nerdy even for /. by pAnkRat · 2005-06-11 05:29 · Score: 0

So the whole thing cannot be done, if the boss opens ms-word, writes his name in the document his secretary gave him, safes the document, and signs _this_ document only.

So if you sign some document, be sure to have "produced" it yourself.

Is this correct?

--
we need an "-1 Plain wrong" moderation option!
Re:Wow...this is nerdy even for /. by jjares · 2005-06-11 05:37 · Score: 1

I know you are a troll, but I actually don't speak english natively. There are people who doesn't, you know?
Re:Wow...this is nerdy even for /. by dossen · 2005-06-11 11:20 · Score: 2, Informative

It is actually not entirely true, that md5 (and sha1) does not account for the size of the file. Both md5 and sha1 use the same Merkle-Damgård structure, where the same function is applied to a running "total", initially an initialization vector, and a fixed-size block of the input. For both hashes, the blocks are 512 bits long, and the last block is padded and ends with the length of the file, as a 64-bit integer. So unless one of your files is extremely long, size is taken into account. Doesn't mean that size-altering is impossible, but the linked attack for one does not alter the size.

These are important attacks.. by Ckwop · 2005-06-10 07:42 · Score: 5, Insightful

As an amateur cryptographer, I must say that labeling these attacks as 'practically irrelevant'
is at the very least misguided and at worst a shocking display of incompetence.

Stop the fixation with plain-text messages, most messages are not plain-text. Your average word document
contains loads of invisible data that doesn't get rendered. Pdf's contain "junk" data that doesn't get rendered either. Would
you notice a single bit difference in an MP3? Or a single pixel colour change in a jpeg? Hell, you can even do it in HTML <div style="visibility:hidden">Junk goes here</div>.
Mark my words, people will find in the next couple of months find two meaningful computer
documents that hash to the same value but are different byte-wise.

People undervalue these attacks because the attacker has to generate the collision before hand to use it.
To properly appreciate the power of these attacks consider the following senario.

Imagine we're agreeing a contract of employement and I'm your employer.
I give you the first word document that includes all the standard terms, however, I've also drafted
a Word document that contains a load of draconian clauses like banning you from working in any IT position five years
after leaving the company. By adding junk that doesn't render to both documents, I've managed to find to make the hash
of the two documents collide. Thinking I'm a nice employer, you sign the first document, which you do by signing the hash of
document. However, I now have your signature on BOTH documents. I now make sure the company IT system "forget" the first document
and I've successfully screwed you.

This is a human example, but there are other examples that apply in computer systems. The problem is that in many situations
the attacker can choose when you encrypt. Say you encrypt your e-mail conversation with your friend using S/MIME, many people click
"Reply" and the message body of the other persons method appears in the new message. Because of these attacks,
It's now no certainty that an attacker couldn't use this fact to construct collisions that an attacker could use.

As another security researcher said (paraphrased) It's like you're in building and you've just heard the fire alarm go off.
You can't see smoke but it's time to make your way calmly to the exit. That sums up the position with SHA-1 and MD5. Swap out the primitives
before you start seeing smoke.

It's not like we don't have alternatives anyway. Whirlpool uses the same wide-trail design principles has AES. It's slower than MD-5 or SHA-1 but it's much better designed. And beside, people would do well to realise you have to spend CPU cycles to get security.

Simon.

Re:These are important attacks.. by Reality+Master+101 · 2005-06-10 07:44 · Score: 1

However, I now have your signature on BOTH documents. I now make sure the company IT system "forget" the first document and I've successfully screwed you.
Er, only if you're stupid enough not to keep a copy of a document that you sign.

--
Sometimes it's best to just let stupid people be stupid.
Re:These are important attacks.. by LiquidCoooled · 2005-06-10 07:48 · Score: 1

Your right.
How exactly could they create junk documents that also match the expected filesize.

Add in that restriction and md5 could become a difficult problem again :)

--
liqbase :: faster than paper
Re:These are important attacks.. by compm375 · 2005-06-10 07:53 · Score: 1

You could also use stronger hashes, or more powerful hashes. All you really need to do is stay ahead of the technology. Just use a hash type that will take at least a certain number of years to crack.
Re:These are important attacks.. by Ckwop · 2005-06-10 07:54 · Score: 1

There are 2^128 hashes and for a document of one megabyte there are 2^1048576 documents. That means that there are roughly 2^1048576/2^128 = 2^1048448 collisions for documents of the same size.

Factor in the fact that it take 2^64 time to *brute-force* a collision in MD5 and the fact that we now have an attack that can find any collision in minutes on a laptop then I'd say it's pretty reasonable that someone can do it.

Simon.
Re:These are important attacks.. by swillden · 2005-06-10 07:56 · Score: 1
Your average word document contains loads of invisible data that doesn't get rendered. Pdf's contain "junk" data that doesn't get rendered either. Would you notice a single bit difference in an MP3? Or a single pixel colour change in a jpeg? Hell, you can even do it in HTML.

I think that's insufficient to be able to mount the attack.

To do what these guys did, you need a format that can:
- Contain "junk" data that is never displayed.
- Contain two different "messages" only one of which will be displayed.
- Be able to select which message to display based on the result of a calculation involving the junk.
MP3s don't meet these criteria, nor do JPEG images. HTML files do, if you allow for Javascript or the like. I think "plain" HTML is inadequate.

Say you encrypt your e-mail conversation with your friend using S/MIME, many people click "Reply" and the message body of the other persons method appears in the new message. Because of these attacks, It's now no certainty that an attacker couldn't use this fact to construct collisions that an attacker could use.

Given HTML e-mail, plus Javascript, yes. OTOH, interpreting Javascript in e-mails is a bad idea anyway.
--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:These are important attacks.. by Ckwop · 2005-06-10 08:02 · Score: 1

Well, you can have two MP3s that sound the same even though they're different and you can have two jpegs that look the same even though they're different.

There's plenty of scope for changing the files, we only need to select roughly 128-bits in each file to message about with to drive a collision.

I agree with your suggestion to use Javascript over HTML to "disguise" the change.

Simon.
Re:These are important attacks.. by ryturner · 2005-06-10 08:05 · Score: 1

So if it goes to court, how do you determine which of the two documents was the one signed by the employee?
Re:These are important attacks.. by Anonymous Coward · 2005-06-10 08:06 · Score: 0

That won't work either since they can accuse you of fabricating your copy of the document.
Re:These are important attacks.. by arkanes · 2005-06-10 08:07 · Score: 1

You need a document which is either exactly the same size (if there's size checks built into verification), or close enough in size that it won't raise suspicion. Incidently, this is a major weakness in crappy bloated formats like .doc and .pdf, where people don't blink twice about a 2 page text document being 4 megs in size.
Given that restriction, you then need to be able to generate a collision. And not just any collision, but a *specific* collision that has your malware or subtle contract alterations or whatever in it. Hashing is used as verification, so simple corruption is useless.

The flaws being found are not totally irrelevent, by any means, but they're theoretical mathematical attacks on hashing. The practical security of hashing as verification is not especially weakend.
Re:These are important attacks.. by pz · 2005-06-10 08:09 · Score: 2, Informative

I now make sure the company IT system "forget" the first document and I've successfully screwed you.

Not quite, beacuse of reasonable doubt and the fact that the hypothetical employee would have copies of the document.

However, Alice can get Bob to sign an innocuous recommendation letter that in the hidden version is a power of attorney for Bob's bank accounts. Alice can then take the fraudulent letter to Bob's bank and with apparent legality, take all of his money. (The difference being that a third party is involved who is naive to the document Bob signed and his intents.) This is effectively the scenario suggested in TFA, and because there is no alternate version of the document, Bob has no recourse (unlike the parent poster's suggestion).

To exploit this hole in MD5 (SHA-1, or whichever hashing function you like), you need to create two completely different documents, not two which are different versions of the same thing and can be compared by humans. That's the real hole: these documents need to be printed out (or rendered on a screen) and interpreted by humans who assume that the rendering process is trustworthy because it is complete and veracious. If humans natively read PostScript with the same ease we read English, for example, this kind of exploit wouldn't be possible.

--

Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
Re:These are important attacks.. by St.+Arbirix · 2005-06-10 08:11 · Score: 1

As an amateur cryptographer

I'm curious, do you get to take classes in that or is it graduate work? I'm currently in my undergrad at a school that doesn't have any cryptology courses. When I graduate in December I'll be heading off to the Navy to (hopefully) do crypto, but I wonder where civilian cryptology programs are.

--
Direct away from face when opening.
Re:These are important attacks.. by MBGMorden · 2005-06-10 08:20 · Score: 1

Simple solution would be to use multiple hashing algorithms. You'll have to bust your butt trying to create a meaningful document with the same MD5 hash as the original, but if we check with two alorithms, say MD5 & SHA-1, then it's going to be damn hard (I'd almost say impossible, but I'll not go that far without a mathematical proof) to get anything meaningful that produces the same MD5 hash AND the same SHA-1 hash as the original does.

--
"People who think they know everything are very annoying to those of us who do."-Mark Twain
Re:These are important attacks.. by Locke2005 · 2005-06-10 08:21 · Score: 1

people would do well to realise you have to spend CPU cycles to get security.
This should be obvious to anyone. Any reasonably fast and efficient hash or encryption algorithm can be brute forced given enough time on a sufficiently large parallel machine. This means 1) Any truly effective hash or encryption must be slow, even on the latest hardware, and 2) any algorithm is reasonably secure today won't be secure against a dedicated attack by somebody with sufficient resources 10 years from now. Given the constantly advancing state of the art, any usable algorithm will eventually be broken; nothing can stay encrypted forever.

--
I've abandoned my search for truth; now I'm just looking for some useful delusions.
Re:These are important attacks.. by cpeikert · 2005-06-10 08:22 · Score: 2, Informative

How exactly could they create junk documents that also match the expected filesize.

The collisions that have been found for MD5 are for pairs of documents that are the same size. The size constraint is not a problem.
Re:These are important attacks.. by deanoaz · 2005-06-10 08:23 · Score: 4, Informative

If two documents exist with the same hash, then they were both produced by the same source, since there is no practical, known way of finding a collision without having control of the content of both documents. Therefore, your signed copy of the original document proves that the employer created both versions.

--
If 'the people' in Amendment 2 are 'the state' then Amendments 1, 2, 4, 9, and 10 benefit the state, not you.
Re:These are important attacks.. by swillden · 2005-06-10 08:24 · Score: 1

Well, you can have two MP3s that sound the same even though they're different and you can have two jpegs that look the same even though they're different.

But you can't practically generate two MP3s or JPEGs that have *meaningful* differences, yet hash to the same value.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:These are important attacks.. by Idimmu+Xul · 2005-06-10 08:24 · Score: 1

MP3s and JPGs do allow tags to contain extra data that will not effect how the music sounds or what the image looks like at all.

That's pretty irrelevent though as the whole point of this hack is that you change the sound or the image anyway, then add the extra data to make it hash correctly to a specified hash

--
The problem with slashdot is that most of its users were bullied and stuffed into lockers as kids!
Re:These are important attacks.. by Pyrrus · 2005-06-10 08:31 · Score: 2, Insightful

If you are using the md5 sum of a digital document, I imagine you would keep a digital copy of the document as well. You'd then go to court with your signed physical document, md5 signed digital document, and the 2nd digital document with the same md5 sum and junk data that appears to be inserted to cause a collision. The employer would have a bit of explaining to do.
Re:These are important attacks.. by Basje · 2005-06-10 08:33 · Score: 4, Insightful

The one that makes the claim. In the above example you go to work for another employer. The malicous employer slaps you with the fraudulent contract _claiming_ you cannot do that.

Then you whip out the original document. Presto: both documents have the same signature. Now it's up to the party making the claim to prove which contract was signed. Impossible.

The reverse is also true: the programmer works on a super secret project. He signs a secrecy declaration. If he can apply the exact same signature to his lunch agreement, and walks out with the sourcecode, his employer is screwed, as per the example above.

Etc. An agreement is legally binding in any form. The only reason to have a signed copy of any agreement (digitally or paper) is to have proof of the agreement. If the signature is no longer proof, no agreement rises above verbal agreements, evidence wise.

--
the pun is mightier than the sword
Re:These are important attacks.. by ptr2void · 2005-06-10 08:38 · Score: 1

You don't even need JavaScript. A browser that implements some subset of CSS or even tables is enough. It's easy to hide content from a skimming reader, let alone the technically challenged.
Re:These are important attacks.. by Locke2005 · 2005-06-10 08:47 · Score: 1

An exception to the above has occurred to me. Encryption with a one-time pad is secure for as long as the pad is secure, so it can be both fast and unbreakable. A secure scheme for pad distribution is left as an excercise for the user, and this applies only to encryption, not message digests (hashes). It should be possible to eventually brute force a collision to any message digest.

--
I've abandoned my search for truth; now I'm just looking for some useful delusions.
Re:These are important attacks.. by Anonymous Coward · 2005-06-10 08:49 · Score: 0

But even if the IT system "forgets" about the first document, the new document has to have both sets of information in it (in raw form), even if only some of it shows up when you open the file through a browser/Word/Etc. So in other words, the source of the document basically has the evidence that the employer is attempting to screw you in the first place. If you look at their example ps files, you'll see that both messages are contained, and what is really different is which one it hides and which one it shows.
Re:These are important attacks.. by coekie · 2005-06-10 08:53 · Score: 1

No, I don't think they're important. Signing such a non-plaintext file by looking at it in a viewer only (and (a bank) accepting such a signed document) is the mistake. You don't even need MD5 collisions to exploit that.

Suppose you write a HTML file with some java-script embedded that displays one text if the date is before X, and another if it's after. you get someone to sign it before X, and then use it afterwards.
Similar, in general, different viewers interprete (broken) files differently. So if you know which viewer the signer and the bank use, you could probably write a postscript file that displays different text in those viewers.

No checksum is going to prevent that. Nor is displaying files differently than other viewers a security problem that should be fixed in every viewer.
So the only way to prevent this, is not signing "binary" (not fully plain text human parsable) files written by someone else.
Re:These are important attacks.. by hibiki_r · 2005-06-10 08:55 · Score: 1

A crafty employer would make the 'normal' document match the draconian one by adding crap to it, making the draconian document appear to be the original. Who'd have to explain things then?
Re:These are important attacks.. by swillden · 2005-06-10 09:01 · Score: 1

You don't even need JavaScript. A browser that implements some subset of CSS or even tables is enough. It's easy to hide content from a skimming reader, let alone the technically challenged.

You *might* be able to do it with CSS, but not tables. Hiding content isn't enough, you need to be able to dynamically select different content based on the result of a calculation on the hidden "random" junk. CSS allows you to use the functions in the XPath specification in some limited ways, so you might be able to finagle something. Plain tables aren't enough.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:These are important attacks.. by rmccann · 2005-06-10 09:04 · Score: 1

"But you can't practically generate two MP3s or JPEGs that have *meaningful* differences, yet hash to the same value."

Not yet, but how long ago did people say that you couldn't generate 2 documents that would have the same hash.
Re:These are important attacks.. by hritcu · 2005-06-10 09:13 · Score: 2, Insightful

I think that the most important type of attack involves binaries. For example I could surpass tripwire and enable an exploit that was never visible in my binary code. Almost none of you would be able to tell the difference (a couple of bits), not even tripwire (md5), between the exploitable and non-exploitable version of the program. But the change in behavior could be different so that the exploit is possible.

Steps:
1. Write a very useful tool
2. Compile it
3. Tamper the file to get two almost identical versions, one exploitable, the other "secure"
4. Release the source and the secure binary
5. Wait for the tool to achieve widespread adoption based on its "security"
6. Overwrite the secure version with the exploitable one on your server and log the IPs of the people downloading it
7. Wait a little more
8. Use the exploit and gain access to the exploitable machines
9. Once you are in charge overwrite the exploitable version of the tool with the "secure" one - tripwire should not detect this
10. Party

--
If you don't fail at least 90 percent of the time, you're not aiming high enough. (Alan Kay)
Re:These are important attacks.. by fossa · 2005-06-10 09:15 · Score: 1

Think of "file size" as a type of hash. You could use the file size as a document hash. It would, of course, not be robust to attack, but it would be a convenient and fast way to answer the question "are these files the same?". Rsync can be told to use file size to answer this question. To get back to the point, any hashing algorithm takes a file of any size and creates a hash of (for the sake of argument) fixed size N. Whatever properties of the file were used to come up with the hash is irrelevant. The pigeon hole principle demands that there will be collisions because there are more "files of any size" than hashes of size N. A collision means that whatever properties of the file were used in the algorithm to create the hash are identical in the two colliding files, including file size if that property is included in the hash. Again, collisions are a given.

I once wondered if a [de]compression algorithm for, say, jpegs, could be made like this: given a hash, a file size, and the fact that the resulting file must be a valid jpeg, decompress this into the right picture. I think this is common mistake because it's difficult to comprehend the numbers involved. The requirement of being valid jpeg file is just adding another property of the file into the hash. The pigeon hole principle always wins, and all hashes squeeze A LOT of files into a drastically smaller space.

Saying things like "use MD5 and file size" or "use MD5 and SHA-1" don't really make much sense. Sure, one would hope that the techniques used to actually find useful collisions would be confounded by this... but remember there will always be collisions. A good hashing algorithm's only purpose is to ensure that these collisions can only be found through brute force and not through some shortcut that allows one to create forgeries. If it's easy to find MD5 collisions, then even if SHA-1 did not have weaknesses then finding such MD5 collisions would drastically reduce the search space to find dual MD5 SHA-1 collisions possibly bringing a brute force attack into the realm of possibility.
Re:These are important attacks.. by m50d · 2005-06-10 09:29 · Score: 1

Not really. Given a long enough hash the hash itself can be fast and if it works as intended it will not be breakable. MD5 is *supposed* to take 2^128 times as long to break as to make. If it was full strength, there would not be a problem, even though it's fast to create.

--
I am trolling
Re:These are important attacks.. by snorklewacker · 2005-06-10 09:40 · Score: 2, Insightful

> Who'd have to explain things then?

The "reasonable person" standard prevails. Remember, they have to explain the existence of that extra padding in the document. The jury, assuming it's composed of reasonable people ...

sorry, had to take a moment. my co-workers are no doubt wondering what the hysterical laughter was about ...

the jury is not likely to believe that you signed away your rights. Now if the employer's document specified a million-dollar signing bonus, then he pulled a switch on you to simply remove it, then the jury might be less inclined to believe you.

Doubtful this sort of thing could happen with employment. A government contract on the other hand ...

--
I am no longer wasting my time with slashdot
Re:These are important attacks.. by yellowstone · 2005-06-10 09:43 · Score: 1

I've managed to find to make the hash of the two documents collide.
OK. But it seems to me, this can only work once, since a critical part of the scheme requires
Thinking I'm a nice employer, you sign the first document
The first employee this happens to goes out, sets up a blog, and broadcasts his unhappiness all over the world. As a result, you become everyone's last choice for an employer. This is a good strategy only if your goals include keeping the level of talent working for you as low as possible, and running your business into bankruptcy as quickly as possible.

--
150 Opening BINARY mode data connection for slashdot.sig (129323052 bytes).
Re:These are important attacks.. by uncqual · 2005-06-10 09:48 · Score: 2, Interesting

If I understand your point, in this case, one can reasonably assume the employer made the contract (it's unusual for employees to craft employment documents).
However, more symmetric relationships (such as a merger of two companies or even an independent contractor providing IT services to a business) usually have both sides exchanging documents back and forth and they eventually end up with a version that requires no further revision - so it's not possible to figure out (with JUST the two "versions" of the agreement with the same hash) which side "produced" the final copy (and hence, was in a position to orchestrate the second bogus version).

Am I missing something about the argument?

--
Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
Re:These are important attacks.. by Finuvir · 2005-06-10 09:53 · Score: 1

You'll have to bust your butt trying to create a meaningful document with the same MD5 hash as the original, but if we check with two alorithms, say MD5 & SHA-1, then it's going to be damn hard (I'd almost say impossible, but I'll not go that far without a mathematical proof) to get anything meaningful that produces the same MD5 hash AND the same SHA-1 hash as the original does.

Such collisions must exist. The potential document space is much larger than the MD5,SHA-1 space. There are more possible documents than possible combinations of hashes, therefore collisions must exist (by the pigeonhole principle). Granted, this doesn't prove that for any given combination of hashes there is at least one collision on that combination.

--
Why is anything anything?
Re:These are important attacks.. by ptr2void · 2005-06-10 10:35 · Score: 1

What about changing table row sizes and adjusting the random junk to compensate?
Re:These are important attacks.. by ars · 2005-06-10 10:50 · Score: 2, Informative

Every time this comes up people mention the same idea - use two hashes for security!

What people don't realize is that MD5 _IS_ two hashes already! That's how it works!

To make it worse the hashes you mentioned MD5 and SHA-1 are based on exactly the same algorithm, so using it twice doesn't help you much.

--
-Ariel
Re:These are important attacks.. by swillden · 2005-06-10 10:58 · Score: 1

Not yet, but how long ago did people say that you couldn't generate 2 documents that would have the same hash.

Irrelevant. Barring some further, deeper weakness in the hash functions, it's not possible to take the next step. Keep in mind that this new attack is an exploit of an old weakness, not a new weakness.

Of course, anything is possible. It's possible that next week someone will publish a pre-image attack. It's also possible that RSA will be broken and that someone will prove a 1-page proof of Fermat's Last Theorem using only 19th century mathematics.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:These are important attacks.. by tap · 2005-06-10 11:00 · Score: 1

The presence of the two documents would clearly be proof that foul play was involved. It's also known that the same person created both documents. All the employee needs to prove is that the company is the one who created the document he signed.
Re:These are important attacks.. by Anonymous Coward · 2005-06-10 11:03 · Score: 0

Or you could just skip parts 1-7 and write a trojan that will spread itself through mail or something.. ;)
Re:These are important attacks.. by swillden · 2005-06-10 11:11 · Score: 1

What about changing table row sizes and adjusting the random junk to compensate?

Nope, the MD5 collision attacks aren't good enough. To maintain the hash value, you can substitute one block of junk for another equal-sized block of junk -- and you have basically zero control over what's in those two blocks of junk. You can't change anything else. If you change row sizes, you've changed something else and the hash value of the document will be different.

There are certainly cleverer people than me, but I certainly don't see how it could be done with HTML tables.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:These are important attacks.. by hackstraw · 2005-06-10 12:08 · Score: 1

Therefore, your signed copy of the original document proves that the employer created both versions.

Too bad you and the court only have one copy.

Granted this may be enough for a cryptographer, mathematician, or some security guru to get upset, but I love those "privacy" agreements and whatnot that say "We will not do X, Y, or Z. But we reserve the right to change the agreed upon agreements at any time without your consent. Please sign below agreeing to my future unknown demands."

There will be collisions in any finite hash algorithm. An md5 has is what 16 bytes? How can all of the data in the universe be uniquely labeled with only 16 bytes of information?

Now, to be able to generate md5 collisions at will in realtime. That is a concern, and then we will either change or use multiple hashes ASAP.

Another thing to keep in mind, is that most people are pussies, and they really aren't out to do anything really bad. There are those that will do anything quasi-legal and dishonest to make a buck or whatever, but few people are out to actually do real harm. For those out to do real harm, knock them off. Its that simple. Ends the problem immediately and permanently.
Re:These are important attacks.. by Anonymous Coward · 2005-06-10 12:13 · Score: 0

...
8. Find some IPs not responding to your exploit.
9. Wonder if they've cought on to you.
10. Panic!
Re:These are important attacks.. by Fnord666 · 2005-06-10 13:07 · Score: 2, Informative
You, the employee, are going to argue that the company created both documents with matching hashes, then had you sign the more benign one. The company is going to argue that you agreed to the draconian version, then created the benign version at a later time that has a matching hash and are just trying to get out of the obligation. The problem is that:
1. Both versions are possible
2. Explaining any of this to people who haven't figured out how to get out of jury duty isn't going to happen. Ever seen the glazed looks on jury members when expert witnesses testify?
If I understand correctly, what this boils down to is that given an document and an MD5 hash, there is now a "reasonable" time based method of generating a second document with different content but a matching hash.

For a hash based signature, there will exist documents that have matching hashes. This is refered to as the pigeonhole principle. If you have 10 pigeonholes to stuff messages in and 11 messages, one of the pigeonholes will get a second message.

The linch pin of this process is the idea that it takes too much time to find or create a second document that has different content but the same hash value. In practice you want it to take so long that by the time a match is found, it no longer matters. What "no longer matter" means depends on the context.

When an adversary can create/find a match in a couple of hours rather than centuries, all bets are off unless the signature expires in seconds.

This really is an important result and has significant implications.
--
'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
Re:These are important attacks.. by Fnord666 · 2005-06-10 13:16 · Score: 1

Looking again at the method used, it does rely on control of the original content in both documents and generating the second one afterwards is not the same. Sorry about that.

--
'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
Re:These are important attacks.. by Dwonis · 2005-06-10 13:37 · Score: 1

How exactly could they create junk documents that also match the expected filesize.
Add in that restriction and md5 could become a difficult problem again :)

Um... that's what they *did*:

$ ls -l *.ps -rw-r----- 1 dwon dwon 2029 2005-06-10 21:36 letter_of_rec.ps -rw-r----- 1 dwon dwon 2029 2005-06-10 21:36 order.ps $ md5sum *.ps a25f7f0b29ee0b3968c860738533a4b9 letter_of_rec.ps a25f7f0b29ee0b3968c860738533a4b9 order.ps
Re:These are important attacks.. by stoborrobots · 2005-06-10 14:26 · Score: 1

What's even funnier about that, in this particular thread, is that the Summary includes the statement: "The same technique is ... applicable to ... SHA-1."

So it's not just that slashbots don't know that MD5 and SHA-1 are based on the same algorithm, but also that they can't read the summary which tells them that their favourite "solution" won't work!

And, to think I used to come here for the insight... Nowadays I come here purely for the entertainment value...

--
"Go to CNN [for a] spell-checked, fact-checked summary" -- CmdrTaco
Re:These are important attacks.. by Dwonis · 2005-06-10 15:00 · Score: 1

Because of the birthday "paradox", MD5 takes on average 2^64 operations to find a collision. You often have a similar problem using 128-bit keys.
Re:These are important attacks.. by the+way,+what're+you · 2005-06-10 16:53 · Score: 1

Mark my words, people will find in the next couple of months find meaningful ...
Busted!
md5(your post) == md5("fRi5t P0sT")

--
example.org - powered by Linux!
Re:These are important attacks.. by Anonymous Coward · 2005-06-10 19:05 · Score: 0

Thinking I'm a nice employer, you sign the first document, which you do by signing the hash of
document.

Um, no. You sign a document by encrypting a hash of the document. You do not sign a document by signing a hash of the document, as doing so would result in infinite recursion.
Re:These are important attacks.. by hritcu · 2005-06-10 19:49 · Score: 1

So the only way to prevent this, is not signing "binary" (not fully plain text human parsable) files written by someone else.

Last time I checked everybody was using md5 to hash binaries and text files alike, some binaries don't have a "human parsable" form you can sign on and (most important) binaries are NOT ALWAYS SIGNED BY HUMANS. How will a program tell the difference between a tampered and a non tampered file that it cannot understand anyway? Finally, you cannot tell exactly what an executable does unless you run it.

--
If you don't fail at least 90 percent of the time, you're not aiming high enough. (Alan Kay)
Re:These are important attacks.. by hritcu · 2005-06-10 20:05 · Score: 1
Replace:
- "very useful tool" with "copyright infringing video game"
- "exploit" with "virus"
- web server (implied) with "peer2peer network"
Here you go; this atack should be more meaningful.
To quote Cringely: "Remember, you read it here first." :)
--
If you don't fail at least 90 percent of the time, you're not aiming high enough. (Alan Kay)
Re:These are important attacks.. by dustmite · 2005-06-11 04:03 · Score: 1

Who the hell signs important contracts like that though only electronically? NDAs, employment contracts, there is no way I'd ever sign anything but the paper versions of those. These days, faxed signatures are pretty much accepted, so even faxing actual signed print documents is better.
Re:These are important attacks.. by dustmite · 2005-06-11 04:09 · Score: 1

Hence the importance of compiling from source, rather than using packaged binaries.

Still, most people just packaged binaries.

Also, trojans can be quite cleverly and effectively hidden even in source for a long time before anyone discovers them.

But even so, compiling from source will always be the most reliable and secure method, compared to any other distribution method. It's like capitalism - not perfect, but better than any other system.
Re:These are important attacks.. by DickBreath · 2005-06-11 07:17 · Score: 2, Interesting
To do what these guys did, you need a format that can:
- Contain "junk" data that is never displayed.
- Contain two different "messages" only one of which will be displayed.
- Be able to select which message to display based on the result of a calculation involving the junk.
Microsoft Office and OpenOffice.org documents both can contain executable content which can execute when the document is opened, and which can alter the contents of the document.

I am not very familiar with Microsoft Office, but in OpenOffice.org, the default settings are to warn you when you open a document containing any macro code -- but the user can have turned off this feature.

I don't know about MS Office's binary format.

OpenOffice.org documents are simply Zip files of XML. (Yes, try renaming your OOo document's extention to ".zip" and unzipping it.) I know for a fact that I can take an OOo document written on Windows, move it to Linux, unzip it, and then re-zip it (using the "zip" command line tool) to get a smaller better compressed, but otherwise identical OOo document that opens in all versions of OOo. It may be possible to construct an OOo document that is a Zip, but where one or more zip file entries are completely UN-compressed, and therefore, where this technique could be used.
--

I'll see your senator, and I'll raise you two judges.
Re:These are important attacks.. by bani · 2005-06-12 15:03 · Score: 1

real IDS use multiple hashes (md5,sha1,ripemd), not just one.

good luck finding a collision that works across multiple hash algorithms.
Re:These are important attacks.. by grokster · 2005-06-13 21:26 · Score: 1

In South Africa, contracts don't even have to be signed to be binding - all that is required is a (verbal) agreement that a particular document is the binding version of the contract. So emails could be enforced as contracts even without digital signatures...
Re:These are important attacks.. by anthony_dipierro · 2005-06-19 07:47 · Score: 1

With a few exceptions, such as transferring the title of real estate, that's true just about everywhere. A signature is just evidence that you agreed to the contract.

That said, I don't think having a written contract without a verifiable signature is as bad as having a verbal contract. Sure, one of the problems with verbal contracts is it's difficult or impossible to prove that the parties agreed to them. But another problem is that memories are not very reliable. Yes, if someone is going to create fake documents and outright perjure themselves in court the judge (or jury) is going to have to decide which, if either, document is credible. But this is a lot easier when you have two documents in front of you. And there are plenty of court cases over verbal agreements where both sides legitimately believe they are telling the truth. Had there been a written contract in place, the issue might have never been litigated in the first place.
Re:These are important attacks.. by anthony_dipierro · 2005-06-19 08:12 · Score: 1

Swap out the primitives before you start seeing smoke.

I'd go further than that, though. Don't digitally sign a document (using a key you can't repudiate) unless you've created that document yourself.

If you must digitally sign a document which you haven't created yourself, do so without using any hash whatsoever.

I couldn't agree more... by huckda · 2005-06-10 07:42 · Score: 1

- but many so-called 'practitioners' turned them down as 'practically irrelevant'."

--
"Just Smile and Nod." --Huck

Coral Cache QUICK! Before it gets /.'ed by Spy+der+Mann · 2005-06-10 07:44 · Score: 1

http://www.cits.rub.de.nyud.net:8090/MD5Collisions /

no help by LiquidCoooled · 2005-06-10 07:45 · Score: 2, Insightful

Anything you hash together will ALWAYS result in collisions.

Extracting the formatting and code from the document will just make it EASIER to create a duplication.

"Hello World!" might match with "Hello World!!!!!! this is extra stuff"

at least leaving the exact formatting instructions in gives you a chance that the collision which leaves a compatible file is much more difficult than the hash of the simple raw text.

--
liqbase :: faster than paper

Re:no help by ergo98 · 2005-06-10 07:49 · Score: 1

Extracting the formatting and code from the document will just make it EASIER to create a duplication.

From a pure data perspective it makes it no easier or more difficult. However, I think his point was moreso that if one were to hash some text, it would be significantly more difficult to find a collision while maintaining plausible text - e.g. you can just add a comment block and add variable sizes of bytes.
Re:no help by linuxhansl · 2005-06-10 09:24 · Score: 1

While it is true that any hashing has collisions, the law of big numbers applies here.

Up to recently nobody has found a collision in 128 bit hashes like MD5. 128 allows for roughly 3x10^38 different hashes. SHA-256 allows for roughly 10^77 different hashes, which is close to number of atoms in the observable universe (estimated to be between 4×10^78 and 6×10^79).
So, unless you present the algorithm with a lot of different input you'll never find a collision.

It is probably safe to say that nobody will *ever* be able to find (or generate) *any* collision in SHA-256 and higher.
Re:no help by Loconut1389 · 2005-06-10 09:50 · Score: 1

never say "*ever*".. I'll speculate that quantum computing could produce such a computation faster than we'd like to think.
Re:no help by Loconut1389 · 2005-06-10 09:53 · Score: 1

rather, to clarify, once quantum computing evolves a while (years, decades?)
Re:no help by DuckofDeath87 · 2005-06-10 10:15 · Score: 1

Could we not then say, move to SHA-512, or SHA-2^100. Just make the numbers bigger. Until we get processors that can do anything instantly, we can always just make the numbers bigger and it all evens out.

A critical event... by jsight · 2005-06-10 07:45 · Score: 1

See Schneier's Blog for more thoughts on the subject. I am sure it will get fleshed out more as more details emerge.

--
Throw the bums out!

Security Through Obscurity by Trolling4Columbine · 2005-06-10 07:45 · Score: 0, Offtopic

"Encryption" is just another flawed method of concealing information behind obfuscated algorithms. History has proven time and again that such techniques are inevitably compromised, and therefore useless.

My suggestion: if you want your data to be protected that well, don't transfer it over electronic media.

--
Socialism: A feeling of discontent and resentment caused by a desire for the possessions or qualities of another.

Re:Security Through Obscurity by Anonymous Coward · 2005-06-10 08:04 · Score: 0

You're a very good troll. I hope someone mods you up by mistake :D
Re:Security Through Obscurity by Dachannien · 2005-06-10 08:05 · Score: 3, Informative

We're talking about cryptographic hashes here, not encryption. Encryption is meant to be a reversible process, and is therefore one-to-one. In other words, there's no concern over collisions with encryption.

With cryptographic hashes, you're throwing away nearly all of the data to obtain a hash (a number) which represents the larger data set in such a way that (hopefully) the hash will never turn up again in practical usage. The article here indicates that there are ways being devised to force two data sets to have a hash collision while keeping the practical parts of the data sets the same.

As for accusing encryption of being "security through obscurity", you're misusing that term. If knowing the encryption algorithm allowed you instant access to all data encrypted with that algorithm, then yes, the only security present would be dependent upon the secrecy of the algorithm itself. But that's not the case here. Encryption typically works by public key exchange, meaning that a key (a number) used to encrypt messages is shared with the encrypting partner, while the key to decrypt and recover the data is kept private (is never transmitted). Recovering the private key through brute force is not a compromise of the algorithm itself - given enough time, any private key can be recovered, regardless of the algorithm, but by increasing the key size arbitrarily, the time taken to find that key can also be increased arbitrarily.
Re:Security Through Obscurity by MyTwoCentsWorth · 2005-06-10 08:09 · Score: 2, Informative

Dear me,

Your ignorance is so appalling that I have a hard time deciding where to start.

First - this is about digital signatures, not about encryption - sharing the documents (by sending them over electronic media or otherwise) is needed.

Second - obfuscated algorithms ??? Open any book on encryption to learn that the strongest encryption algorhitms rely on completely documented algorhitms - security through obscurity is so 80's...

My suggestion - try to have something worth saying before making suggestions.

Happy Posting.
Re:Security Through Obscurity by loose_cannon_gamer · 2005-06-10 08:23 · Score: 3, Informative

I think that's a big shortsighted... I agree that if we let history take a crack at it, that any encryption put together by smart people will eventually be breakable by smart people.

However, most data that I deal with day-to-day is time relevant. Do I care if someone figures out my credit card number on an account I closed 5 years ago? Is it terrible if someone hacks an old email only to find out I was begging a professor for a passing grade in 1997?

Encryption is meant to hide things, and for many things, the need to hide is temporary. If the hidden thing stays hidden as long as it needs to stay hidden, there is nothing wrong with it.

Know the limitations on the technology you use, and know the parties with which you exchange information. Those two rules alone, if followed, will probably provide more than adequate real-world defense. Perfect? No. Good enough. Statistically, yes.

--
In Soviet Russia, us are belong to all your base.
Re:Security Through Obscurity by bturnip · 2005-06-10 08:25 · Score: 1

And if this was a time before electronic media, would the suggestion be "don't write it down on physical media"? ;)
Re:Security Through Obscurity by Anonymous Coward · 2005-06-10 09:10 · Score: 0

This is excellent advice, since noone has ever forged a paper document.
Re:Security Through Obscurity by snorklewacker · 2005-06-10 09:46 · Score: 1

Technically, encryption does rely on an "obscure algorithm". It relies on the fact that it takes a good long time to find that obscure algorithm unless the receiver possesses all or part of it.

Not that that's the point the GP was ignorantly trying to make, but in the end, it's all about obscurity :)

--
I am no longer wasting my time with slashdot

What are the alternatives? by CyricZ · 2005-06-10 07:45 · Score: 2, Insightful

If MD5 is found to be insecure, what are the alternatives we can use when signing our open-source packages? Is there any other alternative that is even approaching the widespread use of md5sum?

--
Cyric Zndovzny at your service.

Re:What are the alternatives? by specialbrad · 2005-06-10 07:52 · Score: 4, Informative

The signing of open-source packages are to prevent download corruption usually. If a download is corrupted, the data will be different, and hence the hash will be different. Most of these attacks are malicious in that you have to go great lengths to find a collision to use. If your connection corrupts the download in such a way to produce a collision, your modem obviously hates you.
Re:What are the alternatives? by rbarreira · 2005-06-10 08:03 · Score: 2, Informative

I may be wrong but I think that for that purpose, the use of MD5 is still quite secure. What those researchers did was make 2 files with the same MD5, they didn't choose the md5 value itself. In order to crack the schemes you're mentioning the md5 value is a given value for which you want to generate another file (many times with the additional restriction that the file sizes must match).

Read about collision attacks versus preimage attacks here.

Unless you're assuming that at least one of the people responsible for redistributing the software have bad intentions?

--

The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F
Re:What are the alternatives? by tjw · 2005-06-10 08:12 · Score: 1

'gpg -ba THEFILE'

This generates a detatched signature named "THEFILE.asc".

The .asc file can used to verifiy the authenticity of THEFILE by the end user with the command:

'gpg --verify THEFILE.asc'

This method is already in pretty widespread use for open source software distribution. For example, Slackware has used for all official packages since 8.1.

--

XJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-ANTI-UB E-TEST-EMAIL*C.34X
Re:What are the alternatives? by m50d · 2005-06-10 08:31 · Score: 1

RIPEMD160 is the only hash from that "generation" still standing, so it's probably the simplest one to use. There are newer hashes that probably still work though, SHA256, along with the new Tiger and Whirlpool hashes.

--
I am trolling
Re:What are the alternatives? by Anonymous Coward · 2005-06-10 09:40 · Score: 0

Tiger is not that much younger; it's from 1996; only two years younger than SHA-1. It comes from an era when that many hash algoritms didn't come out (there is also PANAMA that was never formally specified with test vectors).

5-pass HAVAL also has never been broken (3-pass has been), but the general feeling is because no one has tried since HAVAL never became widely used.
Re:What are the alternatives? by Rikus · 2005-06-10 09:49 · Score: 1

md5 is also used for verifying that a package retrieved from a mirror is in fact authentic. If the main site can't handle all the download traffic, or something like ports/portage/pkgsrc wants to use its own mirror list, it is usually wise to check the file against a reliable hash from the official server. A problem like this could potentially allow mirror owners to create fake packages which pass the checks employed by package managers or humans, though I'm pretty sure pkgsrc uses both md5 and sha1 at the moment. Of course, the likelihood of a mirror owner wanting to do something evil isn't too high, but it's clearly something which has been given consideration.
Re:What are the alternatives? by Rikus · 2005-06-10 09:54 · Score: 1

No, I'm wrong. That's just not how it works.
This wouldn't apply.

Sorry.
Re:What are the alternatives? by MrDomino · 2005-06-10 10:09 · Score: 1

The signing of open-source packages are to prevent download corruption usually.

No, that is not what md5sums are for. Corrupted downloads can be prevented with a simple CRC check; MD5's goal as a cryptographic hash function is to verify that data received from an untrusted source, e.g., a mirror site for a software download or an unknown computer user (with a cryptographic key) is identical to data that a trusted source distributed in some way. Being able to generate meaningful collisions in a cryptographic checksum is a very, very bad thing; a malicious attacker can forge digitally signed documents and convince users that arbitrary binary downloads are in fact safe and trusted programs.
Re:What are the alternatives? by Korth · 2005-06-10 10:12 · Score: 1

D.J. Bernstein's Poly1305-AES seems like an interesting alternative, it's in the public domain.
Re:What are the alternatives? by Laplace · 2005-06-10 11:30 · Score: 1

Being able to generate meaningful collisions in a cryptographic checksum is a very, very bad thing;

Holy shit. Now how am I ever going to be able to sleep at night?

--
The middle mind speaks!
Re:What are the alternatives? by Anonymous Coward · 2005-06-10 17:42 · Score: 0

If corruption checking is all that's necessary, then CRC's would be just as if not more appropriate, as they are easier to calculate.

MD5's are used by projects such as Truecrypt and Gnupg, not to mention Freenet and Tor to ensure the authenticity of the downloaded package. This prevents someone from tampering with one of the mirrors. On the other hand, it is a lot harder to create such meaningful collisions for zip files, but not impossible.

Text of Letters by pete-classic · 2005-06-10 07:46 · Score: 3, Informative

For those who can't convientently view PostScript files, the text of the two letters:

Julius. Caesar
Via Appia 1
Rome, The Roman Empire
Alice Falbala fulfilled all the requirements of the Roman Empire
intern position. She was excellent at translating roman into her gaul
native language, learned very rapidly, and worked with considerable
independence and confidence.
Her basic work habits such as punctuality, interpersonal deportment,
communication skills, and completing assigned and self-determined
goals were all excellent.
I recommend Alice for challenging positions in which creativity,
reliability, and language skills are required.
I highly recommend hiring her. If you'd like to discuss her attributes
in more detail, please don't hesitate to contact me.
Sincerely,
Julius Caesar

Julius. Caesar
Via Appia 1
Rome, The Roman Empire
May, 22, 2005
Order:
Alice Falbala is given full access to all confidential and secret
information about GAUL.
Sincerely,
Julius Caesar

Re:Text of Letters by pete-classic · 2005-06-10 07:49 · Score: 1

Oops. Missed a couple of lines from the first letter:

May, 22, 2005
To Whom it May Concern:

So it should read:

Julius. Caesar
Via Appia 1
Rome, The Roman Empire
May, 22, 2005
To Whom it May Concern:
Alice Falbala fulfilled all the requirements of the Roman Empire
intern position. She was excellent at translating roman into her gaul
native language, learned very rapidly, and worked with considerable
independence and confidence.
Her basic work habits such as punctuality, interpersonal deportment,
communication skills, and completing assigned and self-determined
goals were all excellent.
I recommend Alice for challenging positions in which creativity,
reliability, and language skills are required.
I highly recommend hiring her. If you'd like to discuss her attributes
in more detail, please don't hesitate to contact me.
Sincerely,
Julius Caesar
Re:Text of Letters by spood · 2005-06-10 10:25 · Score: 1

To be fair, it should be noted that each of these texts was fully contained in both documents.

--
---- Just another spud server.

Explanation of the attack by swillden · 2005-06-10 07:46 · Score: 4, Informative

What these researchers did was not to improve the known attacks on MD5, but to demonstrate a clever way of turning the known attack, generally considered to be of theoretical interest only, into an attack that could potentially really be used.

The way they did it was to create a postscript document that actually contains two documents, one that the sender would be willing to sign and one that he presumably would not. The full text of both is contained in the file, but near the beginning of the file is a bit of code that compares two blocks of random-appearing bits, call them A and B. If A == B, the postscript interpreter will select the innocuous message and display that. If A != B, the interpreter will display the other message.

The researchers then generated a pair of blocks with the same MD5 hash. In one copy of the postscript file, they used one of these blocks as both A and B. In the other copy, they used one block as A and the other as B. Because every bit of both documents before and after the two blocks is identical, and because those blocks hash to the same value, the documents hash to the same value.

It's an interesting attack. It only applies to documents that are also programs, in some sense, but we use lots of document formats that fit that description.

A simple countermeasure that makes such an attack more difficult is to compress the documents before signing.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.

Re:Explanation of the attack by rbarreira · 2005-06-10 07:54 · Score: 1

Yes, that shows a drawback of the method - evidence of the attack is present on both files, you just have to hex edit them and look for their content...

Nevertheless, it's a good way to shut the mouths of the ones who say that hash function attacks are still theoretical...

--

The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F
Re:Explanation of the attack by deinol · 2005-06-10 10:32 · Score: 1

If A != B, the interpreter will display the other message.

So when they go to court, and really examine the documents, they notice that the "real" document is embedded into the fake one. So while it may fool a person who isn't looking for it, it won't fool a real person, and they will easily be caught for their fraud.

Also, do people actually use "electronicly signed" documents for legaly binding contracts yet? I know I would be wary of doing so. Most of the world is still using the old fashioned paper trail, much easier to use in court.

Not that I'm opposed to the idea of a secure electronic system, but I know enough about cryptography to know that everything is broken given sufficient time.

--
Got Apathy?

It's been around long enough... by amalcon · 2005-06-10 07:50 · Score: 1

Maybe now, people will finally start migrating to a different hash algorithm for signatures. MD5 has been around long enough that people have known this sort of thing is possible for some time. It's only now that they finally sat down and did the proof-of-concept.

Of course, MD5 passwords are probably still safe, for now (between maximum password lengths and the fact that this attack will have a hard time actually doing that), but it's only a matter of time.

--
-Amalcon

Re:It's been around long enough... by Anonymous Coward · 2005-06-10 10:44 · Score: 0

It's not a matter of time, MD5 passwords will always be as safe as they are now.

Getting plaintext back from MD5 is not any "easier" now, cryptographically, than it ever has been. What's 'easier' is spoofing one block of bits as another using MD5.

As an aside, the "To confirm you're not a script" idea is getting ridiculous. I can barely read that text, you frigging numbskulls.

Re:huh? by Captain+BooBoo · 2005-06-10 07:52 · Score: 2, Funny

Are you kidding? ANY article with the word "hash" in it is going to grab the attention of even the weakest synapse. ôô

do you know how big 2^128 is? by frovingslosh · 2005-06-10 07:52 · Score: 4, Insightful

Collisions are NOT and accidential everyday occurance. What is being discussed here is a deliberate md5 match, created by making just the right changes to a document to intentionally get an md5 match.

2^128 is huge. It's larger by far than the number of all the files in all of the computers in the world. It larger than the number of stars in the universe. Chance collisions will not become an everyday occurance. No accidental collision has ever been found yet. Switching to larger keys will not change anything. Sure, they might make it slightly harder to make a deliberate collision (although I don't know for a fact that they make it harder at all, there were some reports of someone in Japan being able to create a collision by hand with only pencil and paper), but just wait 2 months and the computing power will catch up with that. It's not a matter of the size of the hash function.

--
I'm an American. I love this country and the freedoms that we used to have.

Re:do you know how big 2^128 is? by RealityMogul · 2005-06-10 07:55 · Score: 5, Funny

2^128 is huge. It's larger by far than the number of all the files in all of the computers in the world

Pfft, let me show you my porn collection.
Re:do you know how big 2^128 is? by smclean · 2005-06-10 09:22 · Score: 1

2^128 is huge. It's larger by far than the number of all the files in all of the computers in the world. It larger than the number of stars in the universe.
"I mean, you may think it's a long way down the road to the drug store, but that's just peanuts to 2^128"

--
"'Yrch!' said Legolas, falling into his own tongue."
Re:do you know how big 2^128 is? by QuestorTapes · 2005-06-10 10:23 · Score: 2, Insightful

> Collisions are NOT and accidential everyday occurance.

Probably.

Playing the Devil's advocate here. Collisions would be impacted, possibly -very- strongly, by exactly how the GUIDs are generated. Just because there are 2^128 bits to play with doesn't mean we are using them all. There may be significant gaps; a large number of values may never be generated by the algorithm used.

Combine with human errors in coding the algorithm and the odds of a collision may be much greater than expected.
Re:do you know how big 2^128 is? by Impy+the+Impiuos+Imp · 2005-06-10 13:28 · Score: 1

> Collisions would be impacted, possibly -very-
> strongly, by exactly how the GUIDs are
> generated. Just because there are 2^128 bits
> to play with doesn't mean we are using them
> all. There may be significant gaps; a large
> number of values may never be generated by the
> algorithm used.

Exactly. Has anyone done a statistical analysis of these to show they'll cover a significant portion of the range based upon typical input domain?

--
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.

Kids in Finland don't agree by StreetFire.net · 2005-06-10 07:52 · Score: 5, Insightful

Regarding being "practically irrelevant"

"every time [some software engineer] says, 'nobody will go to the trouble of doing that,' there's some kid in Finland who will go to the trouble."
Taken from Kevin' Mitnik's "The Art of intrusion"
http://www.amazon.com/exec/obidos/tg/sim-explorer/ explore-items/-/0764569597/0/101/1/none/purchase/r ef%3Dpd_sxp_r0/104-8074733-7395136

Re:Kids in Finland don't agree by Anonymous Coward · 2005-06-10 08:10 · Score: 1, Informative

Nice referrer link.
Re:Kids in Finland don't agree by Anonymous Coward · 2005-06-10 09:16 · Score: 0

Hey, he earned it.
Re:Kids in Finland don't agree by eobanb · 2005-06-10 18:26 · Score: 1

every time [some software engineer] says, 'nobody will go to the trouble of doing that,' there's some kid in Finland who will

Yeah, but no one will ever go through the hassle of writing a free clone of Unix. I mean, come on.

--
Take off every sig. For great justice.

Possible solution by orb_fan · 2005-06-10 07:55 · Score: 2, Insightful

Is the answer then to create a hash that is in fact the sum of two distinct hashing alogrithms? For example, use MD5 and SHA1. Since the alogrithms use different methods, the set of collisions from one would not overlap the set from the second (or rather the overlap would be vanishingly small). And if the overlap was larger than you wanted - just keep adding different hash alogrithms until you are satisfied.

I realise that the computations involved aren't cheap, but it becomes a trade-off between security and speed - the more sure, the more time it will take.

Re:Possible solution by eli867 · 2005-06-10 08:43 · Score: 1

Well, no, that's not the answer. For one thing, MD5 and SHA1 potentially have the same flaw (as stated in TFA).

Second, that's a bit like saying that my house is only protected by a screen door... so the answer is to buy a second screen door.
Re:Possible solution by m50d · 2005-06-10 08:53 · Score: 0

No, it isn't. That's a stupid idea for several reasons, all of which you see explained at least 4 times in every hashing algorithms thread. Mods, please stop giving the idea insightful. For a start, most modern hashes are far from distinct.

--
I am trolling
Re:Possible solution by orb_fan · 2005-06-10 08:56 · Score: 1

OK, maybe using MD5 and SHA1 was a bad example.

Actually, it's more like saying, "The holes in my screen door are too big, so I'll get a second screen door with equally big whole, just offset by half a hole." You end up with a better screen door.
Re:Possible solution by Anonymous Coward · 2005-06-10 08:58 · Score: 0

No, that's probably not the solution. You will most likely not find common collisions between the two, but you will most certainly find collisions in the combination which are not present in either. A good principle when designing hash-functions is 'keep it simple; one idea at a time'.
Re:Possible solution by orb_fan · 2005-06-10 08:59 · Score: 2, Informative

You are assuming that I read hashing alogrithm threads, which I don't. If there is a reason why this wouldn't work, then please enlighten me.
Re:Possible solution by dragondm · 2005-06-10 11:04 · Score: 1

Ahhh, yeh, for many coders not well versed in security, when they discover that they have a security problem with an algorithim, they think "Well, I know how to fix this! I'll add another algorithim!"

Now they have *two* security problems.

--
-- -- The Dragon De Monsyne
Re:Possible solution by m50d · 2005-06-11 09:24 · Score: 1

Well, the main reason is that there are few independent hashing algorithms. So if you can make a set of files that hash the same under one, it's more likely that they'll hash the same under another. For the extra bits you're using, it's far more effective to use a longer version of one of the hashes (e.g. sha-256 or 512). Schneier (sp?) explains it better than I do.

--
I am trolling

7 bits difference by lavalyn · 2005-06-10 07:55 · Score: 2, Informative

Compare the two files, and you see that there were only 7 bits that changed, all on the 5th most significant bit of the affected byte.

I guess it's obvious from looking at the ps file in text form, but if it's that easy to mangle postscript to display two different layers (or is it changing comments or pointers? I am not a binary postscript parser.) I still don't think it's time to throw md5 away yet. Six months.

What other hashing alternatives are there these days? SHA-1 apparently has the same kinds of weaknesses.

--
Doing the Right Thing should not be preempted by making a buck.

Re:7 bits difference by MarkByers · 2005-06-10 08:54 · Score: 1

SHA-1 apparently has the same kinds of weaknesses.

I don't that is exactly true.

From http://www.schneier.com/blog/archives/2005/02/sha1 _broken.html:

"collisions in the the full SHA-1 in 2**69 hash operations, much less than the brute-force attack of 2**80 operations based on the hash length."

2**69 is still a very large number, so you probably shouldn't be worried just yet. It isn't quite the same as the bit trick for MD5. If anyone knows of a bigger weakness in SHA-1, I would be interested to know.

--
I'll probably be modded down for this...
Re:7 bits difference by Anonymous Coward · 2005-06-10 09:14 · Score: 0

No worry. There are a few hash functions (PDF document) which have not been broken yet.

I like Tiger myself.
Re:7 bits difference by spood · 2005-06-10 10:43 · Score: 1

Any hash algorithm with computable collisions is vulnerable to this attack. This research is not so important from the perspective that an MD5 collision was found (because this has already been shown several times by other researchers), but that we have to be careful about how we use hash functions.

It should be obvious that we should not simply sign the hash of just any data that comes across our desk - we need to make sure we are absolutely convinced that it is safe to sign that hash because we have fully vetted the contents of the file. In this case, Caesar should have looked at the contents of the .ps file (which contains functional code) to see that it also contained an authorization for Alice to view secret documents. Then he should have fed her to the lions.

--
---- Just another spud server.

Perhaps its time for another layer of protection. by vonstauf · 2005-06-10 07:56 · Score: 3, Funny

We could just couple it with another widely used industry standard

--
" Yesterday upon the stair I met a man who wasn't there. He wasn't there again today. I wish that man would go away."

Re:huh? by rbarreira · 2005-06-10 07:57 · Score: 1

Care to explain why?

--

The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F

Obvious solution by gr8_phk · 2005-06-10 07:57 · Score: 1

The obvious solution is to encrypt the file/document/OSSpackage with a private RSA key. The only way to view it then is to use the public key - which is published just like a MD5 sum.

There are 2 reasons people do this. 1) decryption takes longer than computing a checksum. and 2) this would force everyone to decrypt the file in order to use it. Only paranoid people and those who realize bits get corrupted bother to verify checksums now.

Reliable solutions are entirely available, people just need to start using them. BTW, I'm no expert but I think there are simpler ways than that suggested above.

Re:huh? by Compact+Dick · 2005-06-10 07:58 · Score: 1

I find it hard to believe that even Slashdot readers find this interesting.

This latest development should not only concern us nerds, but the rest of us as well. You see, this time they demonstrated that two messages can have the same hash, and neither of them need look like gibberish.

In other words, one might read "Bow before me and my enormous todger!" while the forgery could be "Warning: you need an electron microscope to view my penis". How can one tell which is real and which is fake?

--
Use ISO 8601 dates [YYYY-MM-DD]

Okay, I'm impressed. by ave19 · 2005-06-10 08:01 · Score: 4, Interesting

At first I thought: Postscript! Well, obviously. To find a collision, you've probably got to hide a clump of randomness in the document, and then rotate that clump until the hashes collide. If you tried to hide random data in a text file, it would be obvious to the person signing it. You need some format to hide the random bits from the viwer.

I bet the random parts are REALLY BIG! I mean, you'd probably need a lot of random data before you could find a collision...

Then I downloaded the files...

There's almost nothing to them! I can't read PS, so I'm not sure how many of that handful of bytes at the beginning might be tweakable... but it's a lot less than I expected.

Collisions must be very easy to find! I am now offically very worried about this.

--
...or maybe not.

Re:Okay, I'm impressed. by gtrubetskoy · 2005-06-10 08:13 · Score: 1

I bet the random parts are REALLY BIG! I mean, you'd probably need a lot of random data before you could find a collision...
You only need as much random data as fits in the size of the hash signature. I.e. for MD5, 128 bits is enough for at least 1 collision.
Re:Okay, I'm impressed. by Anonymous Coward · 2005-06-10 08:39 · Score: 0

Looks like the length of the random data in the file is 128 bytes x2 (for the comparison); and of course there's some overhead for the if/else and the fact that it contains copies of each document within itself. However... yeah, .ps, .doc, etc., often contain a lot of junk. And, note that the 'order' itself is comparatively small as well.
Re:Okay, I'm impressed. by yeremein · 2005-06-10 08:55 · Score: 3, Informative

To find a collision, you've probably got to hide a clump of randomness in the document, and then rotate that clump until the hashes collide.

That's actually not what they did. They generated two essentially random blobs of data that have the same hash. We'll call these X and Y. They then created a PostScript document containing BOTH messages, the one that Alice's boss would sign and the one he presumably would not sign. They inserted two copies of block X into one of these documents, and a one X and one Y into the other. The original document contained code that compared the two blocks, and if they were the same, caused one message to be rendered, or if they were different, caused the other message to be rendered. Thus both documents hash the same (since X hashes the same as Y), but you see different text when you view the files.

This sort of attack would only work on documents that can contain code of some sort. It would not work on text files.
Re:Okay, I'm impressed. by Basje · 2005-06-11 01:19 · Score: 1

Eh, no. Assuming a perfect distribution, for colisions to happen you need at least one bit more than the size of your hash.

In case of the 128bit MD5, again assuming that the distribution is perfect, you need at least 128+1 bits for 1 collision. With 128 bits there are no collisions.

--
the pun is mightier than the sword

So you're saying I shouldn't implement MD5 ... by malcomvetter · 2005-06-10 08:01 · Score: 3, Interesting

in my next big project?

In all seriousness, I believe Schneier's right. We need a competition for a new hash function.

Nah, let's just wait for 24 to drop the words "MD5" before we know it's really bad.

Re:So you're saying I shouldn't implement MD5 ... by Qzukk · 2005-06-10 08:14 · Score: 1

You should provide some mechanism by which a user can select N hash methods to be used, with perhaps a warning against chosing MD5 or SHA1 alone. Then rather than users having to worry about this or that hash/encryption being broken, they can just switch, with warnings when they use the old algorithms (maintained for backwards compatibility).

I suspect that in the near future, signing will be done using two different hashes to seriously restrict the collision space.

--
If I have been able to see further than others, it is because I bought a pair of binoculars.
Re:So you're saying I shouldn't implement MD5 ... by Beryllium+Sphere(tm) · 2005-06-10 08:42 · Score: 1

What he said.

You could reasonably choose SHA256 or Whirlpool for some of those user-selectable hashes. SHA256 still has some safety margin against the new attacks and Whirlpool is a different kind of design.

Relative Resources: The Attackers' Advantage by G4from128k · 2005-06-10 08:02 · Score: 4, Interesting

The core problem is that the resources of the defender are inevitably overwhelmed by the resources of the attacker.

Relative expected values of gian vs. loss: The attacker thinks "I know I can gain a #BIG_NUM million dollars" and devotes their full effort to the attack. The defender thinks "I'm safe, there's a low probability, and I'm sure I'll catch the problem before it becomes real money, " and does not not devote effort to security becuase a Gartner report told him it was over-hyped. Thus, the attacker's perceived expected value is much higher than the defenders perceived expected loss and each invests accordingly.
Rising Complexity: As IT systems become more complex, they become less secure. Each new device, new networking protocol, new physical layer, new OS feature, new networked application provides new opportunities for the attacker and a dilution of security resources for defenders.
Time: The attacker has the advantage of time. New algorithms, new mathematical theories, new exploits, and faster processors all favor the attacker. What once was supposed to take the age of the universe to crack can be decrypted with a quickly declining number of networked (even zombied) PCs.
Curse of Compatibility: Because so much crypto and security is networking related, it is subject to implementation delays caused by the need to be compatible. Defenders continue to use old, vulnerable systems to maintain compatibility with key partners. Patches don't solve the problem because the patch itself can introduce incompatibilities that make defenders leary of applying the patch with a very real chance of causing problems to avoid a hypothetical security issue.

The bottom line is that the defender must protect all vulnerabilities while going about the day-to-day business of using the computer. In contrast, the attacker can devote full time to any weakness of their choice.

--
Two wrongs don't make a right, but three lefts do.

The junk would be obvious by PIPBoy3000 · 2005-06-10 08:02 · Score: 1

If things went to court, it should be fairly easy to obtain a copy of the forged electronic document and look for added junk.

A novice might miss it, but a trained eye could easily see the garbage.

What worries me more is executeables where you're depending on the hash to match. If the file sizes are approximately the same, it would be very easy to trick someone into running something that is actually something else.

Re:The junk would be obvious by MAdMaxOr · 2005-06-10 08:10 · Score: 1

If things went to court, it should be fairly easy to obtain a copy of the forged electronic document and look for added junk.

Lets imagine a more devious employer. What if the junk was on the legitimate document? The employer has you sign the one with junk, and then when it goes to court, the employer claims that YOU generated a hash collision to write yourself an easier/better contract.

It works both ways. The junk could be on either document, and both parties have motives to forge a document.
Re:The junk would be obvious by kallisti · 2005-06-10 08:23 · Score: 1

The junk could be on either document, and both parties have motives to forge a document.

Since this is an attack in which you create two documents, the party which created the document must be the would-be forger. I think your evil employer would have a hard time convincing the court that I wrote the document they had me sign.
Re:The junk would be obvious by Jaime2 · 2005-06-10 10:00 · Score: 1

You created it so that you could get of the nasty contract that you signed. What's so hard to believe about that?
Re:The junk would be obvious by Anonymous Coward · 2005-06-10 10:24 · Score: 0

You obviously didn't RTFA. Both documents need to be created at the same time.
Re:The junk would be obvious by judo_badger · 2005-06-10 11:25 · Score: 1

The scenario describes documents, but this can also be applied to binary executable files. A benign program and a malicious trojan can have the same MD5 signature ( but likely different file sizes ).
Re:The junk would be obvious by Jaime2 · 2005-06-10 12:15 · Score: 1

Read the article, looked at the documents. There is no need for both to be created at the same time. It would be impossible for a court to tell which of these ocurred:

1. Employer made two documents, one really nasty with no hidden garbage and one really nice with garbage that makes them hash the same. Employee signs nice document and Employer copies that signature to nasty document.

2. Employer made nasty document, and Employee signed it. Then, after the fact, Employee makes a nicer document and adds crap to it to make it hash to the same value as the nasty document. Employee then claims scenario #1 actually ocurred.

Timestamps can be forged and this process can happen at any time after the deal is signed. That's the whole problem here. It is supposed to be nearly impossible to do either of those 2 above with SHA-1 or MD5. If it turns out to be trivial (the article claims it only takes a few hours with standard hardware), then these signature algorithms are worthless. Fortunately, there are alternatives.
Re:The junk would be obvious by dossen · 2005-06-11 12:01 · Score: 1

How can the documents be created at different points in time? They consist of the following parts (if you check the postscript source, which is waht is hashed):

Postscript header
junk
postscript "if" using the junk as selector
first message
postscript "else"
second message
postscript "endif"

The only difference between the two documents is the junk part.

Creating a faked document after the fact is another kind of attack.

Re:First post by rbarreira · 2005-06-10 08:03 · Score: 1

Well, you're of course a troll, but for people who might not be very informed about this... That unlikely event that you talk about has just been purposely produced. And they could have used any different collision they wanted for this effect, since if you append the same content (ANY content will do) to two files which are an MD5 collision, you end up with... two files which are an MD5 collision.

--

The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F

Consider it to be an example... by Otto · 2005-06-10 08:03 · Score: 3, Informative

If you don't know what a hash function is, then turn in your nerd credentials now.

Basically, they provided an example case where one of these recent methods to generate hash function collisions can be turned into a "real world" attack.

It's a very simple example case, but it demonstrates the point effectively. The point is that these recently discovered methods to generate collisions quickly are a real threat to any software using them as a method for digital signatures and such.

The real world application here is that it is possible, probably in several good ways, to generate a couple of different files that have the same hash and also have meaningful data in them. The attacks found that generate seemingly random data with the same hashes can be used in ways that will let them apply to non-random, purposefully designed data.

The example they use is where some secretary gets her boss to sign a document, and then uses his signature on another document which gives her access she shouldn't have. It's a way to forge a digital signature on a document by having them sign another one that you specially crafted.

--
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.

Re:Consider it to be an example... by spood · 2005-06-10 10:57 · Score: 1

The example they use is where some secretary gets her boss to sign a document, and then uses his signature on another document which gives her access she shouldn't have. It's a way to forge a digital signature on a document by having them sign another one that you specially crafted.

This is imprecise and can be interpreted in horribly incorrect ways.

The example they use involves a secretary giving a document to her boss in order that he will sign the hash of that document. Beforehand the secretary has prepared another nearly identical document with the same exact hash value that is nevertheless rendered differently by PostScript. She then presents this other document and the boss's signature of the hash of that document, which is identical, in order to gain security clearance that she doesn't have.

The difference is subtle, but important.

--
---- Just another spud server.
Re:Consider it to be an example... by Fnord666 · 2005-06-10 13:29 · Score: 1

Correct. Technically you could sign the document itself, but everyone involved might starve to death waiting for the operation to complete. It has been considered acceptable to generate a hash of the document, then digitally sign the hash, which is a lot smaller. This assumes that the hash is unique and that a different document with a matching hash cannot be found in a reasonable amount of time.
In reality, you are signing every document/program/binary/whatever that has that same hash.

--
'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
Re:Consider it to be an example... by Anonymous Coward · 2005-06-10 17:51 · Score: 0

and then there might be malicious code

From a prior discussion... by erroneus · 2005-06-10 08:03 · Score: 3, Interesting

I forget where or when exactly, so please feel free to run a search if you care to... it was here on Slashdot though.

There was talk about someone being able to foil P2P networks by seeding bad stuff through random data formulated to fit the MD5/SHA1 code from legitimate files shared on those networks. The consensus was that it was BS and that even if it weren't BS there could be updates to make such attacks more difficult or impossible to perform.

Am I missing something or are these two stories relevant to each other?

Re:From a prior discussion... by Anonymous Coward · 2005-06-10 09:06 · Score: 0

Yeah sure. You start off by giving everyone a cool mp3 file, and make it known that this mp3 file has a certain md5 sum (there may be a click at the beginning of the song, but most people hardly notice it)

Then you reaplce that mp3 file with one of you farting and burping. People think they're getting the cool song, but they get to enjoy the sound of your farts instead. Again, there may be a click at the beginning of the song.

If you trust the person making the sums, things are OK. If not--well, use the Tiger hash instead.

I really wish Linux distros would stop trating md5 sums as if they were secure.
Re:From a prior discussion... by MS-06FZ · 2005-06-10 09:29 · Score: 0

Somewhat, but not really. As someone else mentioned, there were only 7 bits different between the two files.

The reason the attack worked in the story is because Alice created <b>both</b> documents. But what she actually created was two variations of a single document, with nearly the same contents (that is, each document contains both messages) but each of the two documents is set up to show just one of the two messages. She tricks Caesar into digitally signing one of the electronic documents, and then uses that digital signature on the other.

The reason that wouldn't work with the MP3's is because the people trying to distribute the false content didn't create the original MP3's. Alice was able to have her secret message incorporated into the hash key, and then hinge it (but preserve the key) to create two visually different documents.

People looking to use a similar trick on MP3 files could flip sets of bits, mangling the file but preserving the checksum through experimentation, perhaps, until the file was unreadable. That might be good enough to beat peer-vs-peer distribution.. but it'd be much harder to start with a pre-existing document and turn it into something <i>meaningful</i>.

--
---GEC
I'm but the humble pupil, seeking to snatch the scratchbuilt pebble from the master's fully articulated hand
Re:From a prior discussion... by cmburns69 · 2005-06-10 09:59 · Score: 1

The solution lies in having the P2P engine ask for MD5's of a specific, and random portion of the file.

That way, not only does the file have to generate the same hash, but any part of the file would have to generate the same hash as it's valid counterpart.

If this sort of collision can be reliably created, then may God have mercy on our souls!

--
Online Starcraft RPG? At
Dietary fiber is like asynchronous IO-- Non-blocking!
Re:From a prior discussion... by petermgreen · 2005-06-10 10:15 · Score: 1

not all that related at least not YET

p2p networks vary hugely in the quality of the hash used. i belive some use crc32 which is known broken.

others use things like md5/sha1 which until recently have been considered virtually unbreakable.

theese attacks are against sha1 and md5 but they do not give the capability to make a collision attack against a block of an existing torrent or similar they could only be used in a bait and switch manner if that.

so not hugely relavent for now possiblly very relevent if the breakage gets worse.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Re:From a prior discussion... by Anonymous Coward · 2005-06-10 10:51 · Score: 0

Yes they are relevant to each other. It is actually worse than what you just described. The article says it has been demonstrated that it is possible to make a file, wich has the same hash as another, but is containing a virus... This is a huge security problem as many linux distributions uses md5 hashes to confirm file integrety.
Re:From a prior discussion... by julesh · 2005-06-11 07:01 · Score: 1

I really wish Linux distros would stop trating md5 sums as if they were secure.

They are, as long as you can trust that they generated the MD5s with files that they produced the content for. This attack does not work unless you control both versions of the file you're trying to switch.

Works for certificates, too by cpeikert · 2005-06-10 08:04 · Score: 4, Interesting

Lenstra and others came up with a way to generate syntactically-correct X509 certificates that collide under MD5.

Here's a link to the paper: Lenstra et al.

Re:Works for certificates, too by swillden · 2005-06-10 11:33 · Score: 1

Lenstra and others came up with a way to generate syntactically-correct X509 certificates that collide under MD5.

That is a *much* more significant result than this Postscript document trickery.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:Works for certificates, too by cpeikert · 2005-06-11 02:41 · Score: 1

I agree wholeheartedly. That's why I'm surprised that Slashdot paid attention to the PS trick and not the result on certs. (Wait, am I surprised??)
Re:Works for certificates, too by swillden · 2005-06-11 03:09 · Score: 1

Wait, am I surprised?

I am, but only because I'm a hopelessly naive optimist.

BTW, I glanced at your home page and was amazed at how much our interests coincide. You're practically a younger, smarter, better-looking version of me! ;-)

Regarding your Jan news item -- have you looked into madman? I think its AutoDJ feature is better than IMMS. It's certainly much more configurable. AmaroK also has an interesting approach. AmaroK uses an on-line database to select songs from your collection that are "similar" to what you're currently playing. That allows you to effectively pick a genre and then continue listening to music from your collection that is roughly in the same mood/style. With no training whatsoever, it does a great job of picking my music out of a large collection that contains lots of music I don't like (my wife's music). Madman does a better job, but that takes some training to achieve.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:Works for certificates, too by cpeikert · 2005-06-11 03:20 · Score: 1

Regarding your Jan news item -- have you looked into madman?

I am now disillusioned with IMMS. It was great to start with, but quickly got stuck in some 'local optima' and never played anything outside my top-top-top favorites (which quickly became no-so-top). Also, it has a lot of annoying playback quirks. So thanks for those pointers to AutoDJ and Madman....!
Re:Works for certificates, too by julesh · 2005-06-11 06:58 · Score: 1

Unfortunately not. Their technique requires both generated certificates to have the same name details, but different keys.

The only practical use I can think of in generating certificates with a hash collision is if it enables me to persuade an authority to sign one certificate which I can then switch some details on and pretend to be somebody else who the certificate issuer wouldn't normally have signed for. This technique doesn't allow you to do this. In fact, it doesn't do anything that most certificate issuers wouldn't normally allow you to do (have two identical certificates except for the key) anyway, as long as you paid them enough.

It was only a matter of time by m50d · 2005-06-10 08:05 · Score: 1

I hope all those who said the previous MD5 attacks were nothing to worry about will take it back. A theoretical break or partial break is almost always followed by a practical break. The lesson to learn from this is not to do the same thing with SHA1 and, I would suggest, AES. The vulnerabilities might be impractical now but they won't stay that way. Move now, while you're still at least partially secure, rather than once there's a practical break.

--
I am trolling

Re:It was only a matter of time by Anonymous Coward · 2005-06-10 08:26 · Score: 0

Ok, admittedly the MD5 hash of the sum my knowledge of hash algorithms quite likely collides with the MD5 hash of /dev/null.

But isn't this just showing a theoretical weakness of any hash algorithm? Use two hashes, then. One of the normal file and one of an encrypted version. One might collide, but both won't. Think this is a stupid idea? Well, then kindly reread the first paragraph. ;)

Re:Completely OT by Anonymous Coward · 2005-06-10 08:07 · Score: 0

Is that the same device that was used against WTO protestors in the United States?

Go America! One day you will be as oppressive as China.

--- Slashdot can't count:

Slow Down Cowboy!

Slashdot requires you to wait 2 minutes between each successful posting of a comment to allow everyone a fair chance at posting a comment.

It's been 25 minutes since you last successfully posted a comment

Re:Speaking of 128 bit collisions, UUIDs and GUIDs by dmeranda · 2005-06-10 08:09 · Score: 1

The GUID (as called by Microsoft) has it's history in the UUID, which was first "invented" by OSF as part of it's big DCE specs (long before Microsoft adopted the format and gave it a spiffy new name).

The UUID and GUID look alike, but the UUID was constructed in a much different manner than GUIDs seem to be nowdays. GUIDs are basically 128-bits of random data (usually made by passing pseudo-random seeds through a hash). UUIDs on the other hand contained structured data too in addition to randomness, to try to prevent accidental reuse. For instance the rightmost 48 bits come from the Ethernet MAC address if available. Other bits contained date/time information, process/thread ids, etc.

There was an expired Draft RFC written in February 1998 which explained a lot of the technical details of GUIDs and UUIDs (surprisingly authored by Microsoft). You may still be able to find copies of it out on the net (the filename was "draft-leach-uuids-guids-01.txt")

Re:Explanation of the attack -- enforcement issues by Eclectic+Engineer · 2005-06-10 08:09 · Score: 2, Interesting

It is an interesting attack, and IANAL, but I'd be curious about the legal ramifications. If I slip a carbon (ah... the way-back machine) in a stack of papers and ask someone to sign the top one without thus informing them, I think my stealth probably invalidates the additional document(s).

You could argue that there's a noticeable difference between pen and carbon -- making the copy hard to enforce -- but I'd argue the digital version is even easier: at least in the PS example, both "copies" of the document need to be present to preserve the hash.

In normal (pen/paper) signature situations, I get a copy of what I signed. The same ought to apply to digital sigs, resulting in a simple legal challenge to the validity of the document.

[summary] by 823723423 · 2005-06-10 08:10 · Score: 1

[1]
[LdW] A. Lenstra, B. de Weger: On the possibility of constructing meaningful hash collisions for public keys
http://www.win.tue.nl/~bdeweger/CollidingCertifica tes/ddl-full.pdf

[2]
Using the length extension property present in MD5 and all other hash functions following the (almost omnipresent) Merkle-Damgaard design principle, we constructed a pair of documents to display either the letter or the order.

My New Hash Function by MobileMrX · 2005-06-10 08:10 · Score: 0

I am currently working on a hash function that returns the exact value of the hashed value. The algorithm works a little something like this:

MyHash(ValueToBeHashed) = ValueToBeHashed

I have done extensive testing on this algorithm and have not been able to produce any collisions yet... I'll keep you guys updated. (I also used this new function to hash this message for verification!)

--------

MobileMrX Hash:

I am currently working on a hash function that returns the exact value of the hashed value. The algorithm works a little something like this:

MyHash(ValueToBeHashed) = ValueToBeHashed

I have done extensive testing on this algorithm and have not been able to produce any collisions yet... I'll keep you guys updated. (I also used this new function to hash this message for verification!)

Stupid Forced Collision by ch0p · 2005-06-10 08:13 · Score: 0

Compile the following untested code twice, once with the name md5collision, and again with some other name. Check the hashes, and check the output. This is the same as the above postscript example.

#include <stdio.h> int main(int argc, char *argv) { if (argv[0] != "md5collision") { printf("This was a dumb, forced, collision that would never happen in real live.\n"); } else { printf("This is a legitimate file, with the same hash as that other file I compiled.\n"); } exit(0); }

Re:Stupid Forced Collision by figge · 2005-06-10 09:41 · Score: 1

% md5sum ./md5* d80ae934acc0231a0fcb99716346c8e6 ./md5collision d80ae934acc0231a0fcb99716346c8e6 ./md5other % ./md5collision This was a dumb, forced, collision that would never happen in real live. % ./md5other This was a dumb, forced, collision that would never happen in real live.
Same hash, same output. What's your point? ;-)

(Hint: strcmp(argv[0], "md5collision") != 0. You might also have some trouble with argv[0] being "/home/ch0p/md5collision", "./md5collision" etc, but hey, your point got across. I'm just nitpicking :-) )
Re:Stupid Forced Collision by ch0p · 2005-06-10 11:22 · Score: 0

Hey! It said untested for a reason! ;P

On the contrary... by Chris+Pimlott · 2005-06-10 08:14 · Score: 5, Insightful

This attack shows us all once again that there is that the procedures for using cryptography are as important as the mathematical theories and proofs on which cryptography is based. People like to believe that it's just the algorithm that's important, and once you have such an algorithm it's equally applicable to messages of all sorts and formats. As this shows, it's clearly not the case.

You may believe it's common sense, but to the average user, encrypting a simple letter like the memos used in the article expressed as a Word document is no different than encrypting a simple text email. Heck, many of these users probably have no idea that much of the plain-looking email they send and recieve is actually HTML, which is capable of hiding beneath its rendered surface all sorts of additional information.

When's the last time you saw an email program that read in a Word document, extracted just the plain text content, signed or encrypted it and then repackaged it into some new format in a cryptographically sound way that would automatically be reconstituted as a Word document on the other side? Most just have a handy "Sign" or "Encrypt" button that will happy accept .ps or .doc just as readily as a simple text file.

Re:On the contrary... by cpeikert · 2005-06-11 02:57 · Score: 1

This attack shows us all once again that there is that the procedures for using cryptography are as important as the mathematical theories and proofs on which cryptography is based.

I see what you are saying, but I think you're taking exactly the wrong lesson.

The lesson is that exceedingly strong, conservative notions of security are needed for cryptographic primitives, because you can't predict/constrain their use in any meaningful way. You shouldn't be happy with a hash function that just resists collisions on text documents, because it will fail in unpredictable ways, and those failures may be "amplifiable." Even if you think you'll be dealing only with text-only documents, written in English, less than 2KB in size, you still want a function that resists collisions on any binary string.

People like to believe that it's just the algorithm that's important, and once you have such an algorithm it's equally applicable to messages of all sorts and formats.

That should be the case for any well-designed algorithm. If the algorithm doesn't work for arbitrary data in any format, it is no good and should not be used.

Re:3n(ryp710n 15 0v3rr473d by Anonymous Coward · 2005-06-10 08:15 · Score: 0

In this case, I think the lameness filter has been a little lenient...

practically irrelevant... by Anonymous Coward · 2005-06-10 08:16 · Score: 0

MD5 seemed like a good way to check for duplicate jpg images, sometimes an image with the same data had a different name.
But equivelent MD5 hashes of different images made a mess of that scheme.
Oddly enough SHA had the same problem, though I didn't check to see if it was for the same set of images...
I thought it was maybe a problem with the python MD5 and SHA libs and abandonded the idea.
I wonder if the nature of jpg compression significanlty reduced the (mathmatical) image "space" and thus
raises the probability of hash equivelence for non identical images.
One should be able to see the effect with a many thousand jpg images.

Re:practically irrelevant... by julesh · 2005-06-11 09:06 · Score: 1

MD5 seemed like a good way to check for duplicate jpg images, sometimes an image with the same data had a different name.
But equivelent MD5 hashes of different images made a mess of that scheme.

If you have two different JPEG files that have the same MD5 hash but show a different image, I suspect security researchers would like to see them. Especially if they're the kind of JPEG images most slashdotters seem to have more of than the other kinds...

also a matter of algorithm strengh by tota · 2005-06-10 08:17 · Score: 1

As you rightly pointed out, collisions are a natural phenomenon linked to the limited space of the hash.
What is important -and the reason for moving away from md5 and sha1- is the fact that flaws in these algorithms can be used to derive collisions more easily than it should be possible to.
Problem is that it is extremely hard to prove that an algorithm is strong, it is much easier to disprove: just find an example. Science at its best!

--
TODO: 753) write sig.

From Schneier's Blog... by sam5550 · 2005-06-10 08:18 · Score: 1

Other attacks on MD5

The provided exploit documents can be edited! by rar · 2005-06-10 08:19 · Score: 3, Insightful

What I haven't seen mentioned yet, and people perhaps haven't realized, is that in providing these two postscript files, they have essentially provided you with an postscript signature exploit kit!

All you need to do is download the two postscript documents and do *exactly corresponding edits* in both of them, and you get two documents saying different things and still have the same md5sums!

I just tried exchanging Alice's name for my own, and surely it did work.

Now, if they released a pdf-file hack, I would be genuinely worried :)...

--
Open Materials Database

Re:The provided exploit documents can be edited! by Richard_J_N · 2005-06-10 08:29 · Score: 1

Not, however very useful. Because if you want to exploit this, you'd need messages which aren't about Gaul and Caesar!
Re:The provided exploit documents can be edited! by braindead · 2005-06-10 09:04 · Score: 1

All you need to do is download the two postscript documents and do *exactly corresponding edits* in both of them, and you get two documents saying different things and still have the same md5sums!

No. The whole point of hashing that it is extremely difficult to figure out what *exactly corresponding edits* are, so the provided exploit documents cannot be edited. This is why these two postscript files are noteworthy: they are different, yet they hash to the same thing. Since the two postscripts are different, changing the same letter in both will NOT result in the same hash.

You still have to use the attack from Wang and Yu [WY05], so having these postscript files makes no difference to an attacker. Unless he happens to deal with Alice and Caesar, of course.
Re:The provided exploit documents can be edited! by rjw57 · 2005-06-10 09:23 · Score: 1

Since the two postscripts are different, changing the same letter in both will NOT result in the same hash.

Actually the textual part of both is the same, they just found some PS magic to choose between two 'sub-documents' in the same PS. Indeed one can trivially show that modifying both files will result in a collision since both files contain the text of both messages.

OTOH, in the real world, one would be hard pressed to change the contents of an existing document.

--
Rich
Re:The provided exploit documents can be edited! by braindead · 2005-06-10 09:33 · Score: 1

I know it's bad form to reply to my own post, but after some more thinking it seems that you are right: we can edit the documents, and the edits don't even need to match. That's because the two documents are put together anyways in the attack.

I tried it. It works. That's pretty cool, and slightly scary too!

So, yes, they basically published an "exploit" as we'd say.
Re:The provided exploit documents can be edited! by m50d · 2005-06-10 09:39 · Score: 1

That was my first though. My second was "no, they can't be that dumb". Hopefully they've put the lengths and offsets in the exploit code, so it will only work if the two letters you make have exactly the same length as the ones in their example. Still a cause for concern - but then duplicating their attack is very much possible anyway, that's why it's an attack.

--
I am trolling
Re:The provided exploit documents can be edited! by digitalhermit · 2005-06-10 09:40 · Score: 1

I'm not so sure that your original message was all that off the mark.

The attack is really against Postscript from what I can see. It is the same document, just relying on differences in the document name (it appears) to generate the different pages. In fact, it's not so unusual for unix scripts to behave differently based on how it was called. Same binary, different behaviour.

The article premise is wrong. It is not a collision attack, just a Postscript attack. Adding a single byte will change the md5sum, so that's still safe.
Re:The provided exploit documents can be edited! by spood · 2005-06-10 10:35 · Score: 1

The reason why this works is that they are taking advantage of the length-extension property of the hash. This means that if any two documents have the same hash, adding the same bits to the end of the two documents will also have the same hash.

If you really wanted to spread the FUD, you could say that the researchers have published a univeral adapter for MD5 exploits simply by providing two distinct pieces of data with the same hash - a hash collision. With these two pieces of data, you could create ANY two files that could behave differently (executable files, HTML files with JavaScript, Word Documents with macros, etc...) depending on the value of these two colliding pieces of data. You would just need to make sure that both files were identical except for the two sections with the hash collisions. This, in fact, has already been done by previous researchers. The examples are referenced by the authors for EXE files and certificates.

--
---- Just another spud server.
Re:The provided exploit documents can be edited! by rar · 2005-06-10 10:40 · Score: 1

No, you and others who have replied to my post still don't seem to get it!

You open your favorite editor that don't destroy binary data (emacs is fine). You open the two postscript files. You will see how each contains a different "header", but after that *both* files include *both* messages(!)

Now, the headers are different and uses some postscript trickery to select which of the messages is going to be shown on screen, but yet "magically" manages to "preserve" the md5sum.

The rest of the two documents can be edited at will. You can for example change the word "Orders" to "Obligation to pay" if you wish. You just have to change this in *both* files. In file 'A' where it is not actually shown on screen, and in file 'B' where it is shown on screen.

Then check the md5sum's of the files. The md5sum will have *changed* relative to the old md5sum, but they will still be *exactly equal* between the two files.

It works. I have tested it. Please test it before you comment to say that it does not work.

--
Open Materials Database
Re:The provided exploit documents can be edited! by rar · 2005-06-10 10:55 · Score: 1

OTOH, in the real world, one would be hard pressed to change the contents of an existing document.

You make it sound like the attack isn't feasible. I belive that is far from the truth in the *actual* real world.

All you need is to pull off some lame excuse: "Oh, they needed it in this format -- I had to convert your letter of recomendation to postscript; can you please sign this version too?".

Combine 1) This attack, 2) A little social engineering, 3) Someone who does not know about the recent discoveries of md5 collisions; and you have all you need for a successful attack.

The biggest problem I see is that this attack is done in *postscript* => something that your average Windows user cannot read at all. *That* will make your boss suspicious to sign it. Now, if it was 'pdf' on the other hand...

--
Open Materials Database
Re:The provided exploit documents can be edited! by rar · 2005-06-10 11:08 · Score: 4, Informative

It is the same document, just relying on differences in the document name (it appears) to generate the different pages.

No, you have missed the point. Go back and rtfa again. The attack still works if you rename the documents to the same filename.

The difference lies in a generated "binary cookie" in the beginning of the postscript documents. This "cookie" makes the postscript intepreter either select to show document 'A' or 'B'. The "thing" with the cookies are that they are carefully selected to be md5-colliding. Result: both documents have the same md5sum.

You can change the rest of the documents freely if you make the same changes in both documents. The md5sum will change, but it will still be the same for both documents.

So. No. It is indeed a md5 collission attack.

--
Open Materials Database
Re:The provided exploit documents can be edited! by Richard_J_N · 2005-06-10 12:16 · Score: 1

Ah. I am impressed!

Not exactly useful for fraud... by argent · 2005-06-10 08:22 · Score: 4, Insightful

Clever, but it means the attack is not a general way to forge an MD5-signed document... you couldn't use this (for example) to seed a P2P network with malicious files that look like safe ones. It only works if you generate both documents, and it can only be used maliciously if it's never examined by an expert: the signer can't retain a copy of the signed document or obtain a copy through discovery.

Re:Not exactly useful for fraud... by Stauf · 2005-06-10 14:49 · Score: 1

...the attack is not a general way to forge an MD5-signed document...

True, but it has enough application to be a little worried. Imagine a company using a contractor to do some renovations, the company sends the contractor an electronic document (a .doc for arguments sake, widely accepted and contains a level of indeciferable junk) with a set of requirements, which the contractor digitally signs. A month later, when the work is finished and the contractor asks for payment - the company whips out a different document, with completely different requirements, that appears completely legitimate and matches the contractor's digital signature. The company claims the work wasn't done correctly, and uses it as an excuse to not pay.

This sort of scenario would make it very, very difficult for either side to prove their case. It is also a general enough case that it has significant real-world applications - it can effectively turn any digital contract into a "he-said, she-said" argument.
Re:Not exactly useful for fraud... by argent · 2005-06-10 15:01 · Score: 1

A month later, when the work is finished and the contractor asks for payment - the company whips out a different document, with completely different requirements, that appears completely legitimate and matches the contractor's digital signature. The company claims the work wasn't done correctly, and uses it as an excuse to not pay.

The problem with doing this is there's a huge risk: the very first time someone calls in an expert who can analyse the document and figure out what actually happened that would be totally compelling evidence of intent to defraud. Not only do you lose the civil suit, but you're the defendant in a criminal case as well. And any other contracts you've signed are in doubt as well.

That's the real reason that people are willing to accept such easily forged documents as signed faxes as binding contracts, because you really can't get away with forging them on a alrge enough scale to make it worthwhile.

Oh, I'm sure there's a few situations where this might be usable as part of a scam that's big enough to justify the risk, but I can't think of any where more conventional forms of deception aren't both easier and safer.

If you read the Slides link for the Eurocrypt 2005 by Calyth · 2005-06-10 08:27 · Score: 1

They did point out that they were exploiting the fact that if the MD5 hashes of X1 and X2 collide, then the MD5 hashes of X1+S (read: X1 concatenated with S) and X2+s also collide.
As much as you didn't like it, Lucks and Daum had just turned that into a possibly-practical hash function weakness. If they can do this with Word...

Civilian cryptology training by Anonymous Coward · 2005-06-10 08:29 · Score: 0

Here's a short article Bruce Schneier wrote on how to become a cryptographer (as a civilian): http://www.schneier.com/crypto-gram-9910.html

Re:huh? by egypt_jimbob · 2005-06-10 08:38 · Score: 1

obviously your "enourmous todger" is fake. It's easy to hide something tiny with something enourmous.

--
I am a leaf on the wind. Watch how I soar.

Re:huh? by pyrrhonist · 2005-06-10 08:40 · Score: 1

In other words, one might read "Bow before me and my enormous todger!" while the forgery could be "Warning: you need an electron microscope to view my penis". How can one tell which is real and which is fake?

I would have thought that with a name like yours, the answer would be obvious.

--
Show me on the doll where his noodly appendage touched you.

what'd i tell ya'all by XO · 2005-06-10 08:41 · Score: 1

last time we had this discussion , maybe 3-6 months ago.. and they were meaningless collissions.. i told everyone "give it another 6 months or so, and someone'll figure out how to spoof properly"

--
"Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/

Re:what'd i tell ya'all by dustmite · 2005-06-11 04:29 · Score: 1

Unless you had already scientifically proven it though (i.e. worked out the MD5 problems yourself), then your "prediction" amounts to nothing more than a lucky guess. You had no way of knowing, and you could just as easily have been wrong. Historical rates of finding crypto flaws have no prediction value whatsoever because each "event" is completely statistically independent. In other words, you may have been right, but you knew nothing. You were right by chance.

Weak + weak = weak by MarkByers · 2005-06-10 08:42 · Score: 1

The combination of two weak hash functions is just another weak hash function. You might be able to buy yourself a small amount of time, but it will be broken. If there is a large set of operations that can modify the file without changing the first hash, and another large set of operations which can modify it without changing the second hash, you can construct the intersection of these two sets to find the modifications which keep both hashes (and therefore also their sum) the same.

Strong hash functions are difficult to design, and require a lot of research and testing.

--
I'll probably be modded down for this...

Re:Weak + weak = weak by orb_fan · 2005-06-10 08:53 · Score: 1

Good point. But what I'm saying is that the intesection of two hash collisions is smaller than either set of collisions of the two hashes.

The worst case scenario is that the intersection is equal to the smaller of the two collision sets. But I would find is unbelievable that two different alogrithms generate collisions so that one is a subset of the other.

Re:Speaking of 128 bit collisions, UUIDs and GUIDs by Anonymous Coward · 2005-06-10 08:46 · Score: 0

I found a copy of "draft-leach-uuids-guids-01.txt" here.

Re: single pixel colour change in a jpeg by Anonymous Coward · 2005-06-10 08:52 · Score: 0

Or a single pixel colour change in a jpeg?

This is not possible to do, you know. Unless of course you store jpegs with zero compression, which sort of defeats the whole purpose of jpeg compression.

Size matters (even now) by Beryllium+Sphere(tm) · 2005-06-10 08:57 · Score: 2, Informative

>Switching to larger keys will not change anything.

Switching to longer hash sizes (not keys) will help against the current generation of SHA attacks. The effect of the Wang attacks is to knock (if memory serves) 16 bits off the work factor to find a collision. In other words, finding a collision in SHA-1 is now 64K times easier than brute force. Make brute force harder and the attack gets harder.

Some numbers: SHA-1 produces a 160-bit hash. If you're trying to find a collision as oposed to generating one the birthday paradox works in your favor and you only need on average 2^80 trials for brute force. Wang has lowered that number to 2^64, which is uncomfortably small by today's standards and reckless by tomorrow's. SHA-256 produces a 256-bit hash: you do the math.

The big worry is that cryptographic attacks tend to inspire research that finds better attacks. There's an unquantifiable but real risk that SHA-1 could collapse altogether.

The crypto geeks are speulating that these weaknesses are inherent in any hash algorithm that can be calculated incrementally. Right now the output after block N of input depends only on blocks 1..N of input. If the intermediate result for block N depended on blocks N+1..EOF then the current attacks wouldn't work. Neither would hashing an input stream. Such a change would destroy the ability to hash a file in parallel with reading it.

Re:Size matters (even now) by julesh · 2005-06-11 08:58 · Score: 1

Wang has lowered that number to 2^64, which is uncomfortably small by today's standards and reckless by tomorrow's

Do you know how long it would take to brute force 2^64 trials? Or how much memory you would need to store the results?

Do the maths. It comes in at thousands of years (even if you're very generous with the estimate of how many hashes per second you can calculate), and millions of terabytes (even using the most storage-efficient solution, which is clearly not going to be very fast at all).

The crypto geeks are speulating that these weaknesses are inherent in any hash algorithm that can be calculated incrementally

I haven't been staying up to date with this, but I'm pretty sure that these particular attacks are based on differential cryptanalysis. My understanding is that this technique is only applicable to algorithms that have particular flaws (e.g. particular inputs can be found that only cause a small number of bits to change in intermediate stages, and these changes can be arrange to cancel each other out). Some new designs of hash algorithm claim to be able to prevent this kind of analysis from working (e.g. Tiger). Unless I've missed somebody pointing out why this isn't the case, then anyone speculating like that is jumping the gun somewhat.

Misuse of MD5? by skelly33 · 2005-06-10 09:04 · Score: 1

I never thought MD5 was supposed to be used to authenticate the content of a file, only as a quick check for simple bit errors. I'm not buying any of this discussion about coming up with a new hashing algorithm. MD5 works great when it is used the way it is supposed to be. The trouble with coming up with something better is that you are trying to represent a large body of data with a smaller body of data. The ONLY situation in which your smaller body of data will absolutely be uniquely coupled with the original large body of data is in the case of lossless data compression such as LZW. Any other form of lossy data compression or tokenization will run the risk of duplicates matching the same fingerprint. Someone who comes up with a magical fingerprint that can be tied to one and only one large body of data will have discovered the Holy Grail of data compression.

Re:Misuse of MD5? by Dr.+Sp0ng · 2005-06-10 09:14 · Score: 1

Someone who comes up with a magical fingerprint that can be tied to one and only one large body of data will have discovered the Holy Grail of data compression.

Well, that's impossible - but that's not even the point. The entire point of hashing algorithms is that they are one way, so you're not (supposed to be) able to retrieve the original.

But anyway, their 2-way equivalent is encryption, not compression. The idea of strong hashing algorithms (like MD5 was thought to be before it was broken) is that it's computationally infeasible to find a collision, so yes, one of its uses is to authenticate data. For a quick check for bit errors, that's what weak hashing mechanisms such as CRC are for.
Re:Misuse of MD5? by m50d · 2005-06-10 09:41 · Score: 1

Nope. It was meant to be a cryptographic hash. A cryptographic hash means you have to not be able to find a collision faster than brute force would. MD5 was thought to suit that, but now fails it, quite severely actually

--
I am trolling
Re:Misuse of MD5? by Misanthropy · 2005-06-10 09:43 · Score: 1

The problem is that what was once computationally infeasible becomes much less so as processing power increases.
1024 bit public-key encryption is computationally infeasible to break now, but in 20-30 years it may be easily done.

This mainly applies to brute force attacks. If someone discovers a flaw in the algorithm then 30 years of hardware advances might not be necessary.
Re:Misuse of MD5? by Anonymous Coward · 2005-06-10 11:07 · Score: 0

Did you ever wonder how cryptographic signatures worked?

That's right, you encrypt the MD5 hash of the message with your private key, and to verify it people decrypt the MD5 hash with your public key and compare it to the hash of the message. What were you saying about MD5 not being used for authenticating files?

Same with SSL certificates. The CA encrypts the MD5 hash of your public key with their private key to sign it.

The whole point is not that you're mapping uniquely from a large domain to a small domain, that is impossible. What is important is that you're mapping from a large domain to a small domain, where an inverse mapping is very difficult to find. That's called pre-image resistance. Finding any two items in the large domain that map to one item in the small domain is called a collision, and md5 and sha-1 were thought to be collision resistant for about O(2^64) operations, when instead it is only about O(2^30) operations, give or take a couple orders of magnitude.
Re:Misuse of MD5? by thelittlestbuddy · 2005-06-10 12:18 · Score: 2, Insightful

I hope this post was a troll.
But if it wasn't let's just clear a few things up. MD5 is a cryptographic hash function, and is intended for use in things like digital signatures, message authentication codes, and many, many other applications. The attack in the paper demonstrates that MD5 is not suitable for use with digital signatures.

By the pigeon hole principle, your "magical fingerprint" can never exist for compressing arbitrary files.
Re:Misuse of MD5? by Dr.+Sp0ng · 2005-06-11 03:47 · Score: 1

The problem is that what was once computationally infeasible becomes much less so as processing power increases. 1024 bit public-key encryption is computationally infeasible to break now, but in 20-30 years it may be easily done.

Yeah, but increasing the key size increases the load much more dramatically on the cracking side than on the encrypting side. A small increase in key size doesn't dramatically increase the amount of time it takes to encrypt something, but it does do so for the time it takes to crack a key.

Just use multiple hash methods by matthewmok · 2005-06-10 09:07 · Score: 1

Tripwire does this. They use CRC32 and MD5. You could use MD5 and SHA and CRC32 if you wanted. It is probably not possible to get a pair of files that will collide in all three of those hashes.

Re:Just use multiple hash methods by skelly33 · 2005-06-10 09:20 · Score: 2, Informative

"It is probably not possible to get a pair of files that will collide in all three of those hashes."

Not impossible, just less probable. Combining algorithms effectively increases the size of your fingerprint. The probability of a duplicate is a function of original data size and its proportion to the fingerprint data size. The closer those two data sizes are, the lower the probability of duplicates.

It may seem difficult to pull off now, but eventually someone will make a nice, clever script to do it and then the script-kiddies will have a field day.

That won't solve it by gotr00t · 2005-06-10 09:14 · Score: 3, Insightful

Think of it this way: You can hash _any_ file of _any_ size using either MD5 or SHA1, and these algorithms will hive you an alphanumeric hash, which has a limited number of permutations. Thare are an infinate number of unique files (assuming adequate storage space), yet there are finite unique hashes.

Therefore, no matter how many algorithms you sum up using your described method, the number of collisions is still infinite in amount. It is not the algorithms that are flawed, rather, it is the fundamential concept of hashing that allows collisions to happen.

I would assume that the way to reduce the number of collisions is by increasing the length of the hash itself so as to increase the number of unique hashes.

Re:That won't solve it by orb_fan · 2005-06-10 12:50 · Score: 1

*DING*

The light-bulb has just gone on!

Thanks

I guess what I was saying was in effect to increase the size of the hash by taking the two hashes and concatenating them - you might as well take a single algorithm that generates a longer hash.

Thinking about it - hashes, by their every nature will always have collisions. The only safe way is to use encryption instead, but that kind of defeats the purpose.
Re:That won't solve it by Derleth · 2005-06-10 14:26 · Score: 1
Thinking about it - hashes, by their every nature will always have collisions. The only safe way is to use encryption instead, but that kind of defeats the purpose.

There are two distinct uses for hashing functions:
1. Proving that the file hasn't been maliciously tampered with.
2. Proving that the file hasn't been damaged by a dirty network or cosmic rays impacting a disk drive/RAM module/flash drive/etc.
Only the first is significantly impacted by collisions. If you are betting the company (or country) on the unforgability of your computer files, you better have a more secure system than any hash algorithm. A good hash algorithm is a good tool, but it isn't a complete protocol. And don't let anyone fool you: Designing a good security protocol is hard. Very few companies really want the hassle. That is why the kids from Finland usually have it so easy when they take the time to crack things like CSS.

If you're simply defending against shoddy networks, SHA-1 might even be overkill: Statistically, the odds are heavily in your favor even with something as 'insecure' as MD5. These news reports won't really impact people who simply have to maintain file integrity against random lossage.

The one thing that can't be cracked is an educated, skeptical mind. Use yours and if you don't have one, stay the hell away from my computers.
--
How can you use my intestines as a gift? -Actual Hong Kong subtitle.

I know the solution! by davidwr · 2005-06-10 09:20 · Score: 1

Apply digital signatures to X not H(X). So what if sizeof(signature)=toobigtohandle, it's better than validityof(signature)=uncertain.

Actually, for smallish documents where the equivalent of a notary signature is required, this may be the way to go.

Sigh.

In the meantime, expect courts to allow people to repudiate supposed signatures when the validity is in doubt, the same way banks allow individuals to swear under oath that a signature on a stolen or altered check is fake.

In the meantime, I'll sign my employment contracts the old fashioned way and keep a copy in my files.

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.

Re:I know the solution! by Anonymous Coward · 2005-06-10 18:52 · Score: 0

Actually, long messages are signed a different way:

Generate a random symmetric key; then sign the symmetric key, then take a MAC over the long message using the symmetric signature key.

In other news, you don't blindly sign H(X), it first performs a Fiestel-like round, i.e., PSS (Bellare / Rogaway)

This has a solution by yfarjoun · 2005-06-10 09:21 · Score: 2, Informative

This is another example of the implementation of the birthday "paradox" to hashes as the so-called "birthday attack".

See http://en.wikipedia.org/wiki/Birthday_paradox for details.

Basically if you have enough documents you will be able to find two with the same hash. Not to be mistaken with the difficulty of finding a document with the same hash as a GIVEN document. This cannot be avoided regardless of the hash.

To foil a birthday attack, NEVER agree to sign a given document. One must ALWAYS add some random junk to the document (clear text or hidden). This should be implemented in sofetware that helps in signing, but if the option isn't given, you can add a clear text "random" string.

Yossi

Re:This has a solution by thelittlestbuddy · 2005-06-10 12:23 · Score: 1

This is another example of the implementation of the birthday "paradox" to hashes as the so-called "birthday attack".
No it isn't.
This attack is much more efficient than a birthday attack. The MD5 attack used in this paper required around 2^37 computations of MD5, whereas a birthday attack would require around 2^64 computations in addition to some severe space requirements.
Re:This has a solution by julesh · 2005-06-11 07:36 · Score: 1

One must ALWAYS add some random junk to the document (clear text or hidden). This should be implemented in sofetware that helps in signing, but if the option isn't given, you can add a clear text "random" string.

Doesn't help. MD5 (and most others) are block based hashes -- if they find a single pair of blocks that hash to the same value one of those two can be inserted into a document at any block boundary and the hash comes out the same, regardless of what is in the remainder of the document. You can add as much random text as you like, as long as you don't disturb their collision block they can just duplicate your changes to the alternative document and it still hashes the same as the one you signed.
Re:This has a solution by yfarjoun · 2005-06-12 10:44 · Score: 1

Hmmm, That's crap. I had no idea. Thanks. Y
Re:This has a solution by NOPteron · 2005-06-16 15:36 · Score: 1

Another solution is to sign with 2 signature-types simultaneously, and require that they both check-out. . .

IF the MD5 signature is bogus, because it was a constructed/rigged document, will the alter-document ALSO be an SHA1 collision? AND a GPG collision?

( and what was that rule about unrolling-the-loop? Going from 1 to 2 is the hugest increase in effectiveness?

Seems to be an underlying principle that loop-unrolling is only an /instance/ of, since using 2 different hashes significantly increases security, so long as the really-are different, now just differently-named variants of the same algorithm. . . )

--
IPTables enhancement Fail2Ban bans cracker-login's

not exactly 'practically irrelevant' by Gollum2001 · 2005-06-10 09:23 · Score: 1

but almost. MD5 it's obsolete now and SHA-1 have been phased out. While things like this demonstrate some "probable but yet to be seen" attacks and in theory are good to see the weak spot of this kind of hash functions, the fact is that by now every serious cryto soft must be using high order sha, like SHA-256 or SHA-512 or stronger has functions like Tiger or Whirpool. There is no real threat. Some people said here in /. that Schneier wants a new competion, well, that's good, but not really necessary as SHA was designed with this kind of problems in mind (every hash funcion has collisions, you can't avoid that) that's why there are high order SHA funtions.

--
"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former" - Albert Einstein.

Re:not exactly 'practically irrelevant' by julesh · 2005-06-11 08:47 · Score: 1

MD5 it's obsolete now and SHA-1 have been phased out

Perhaps in the field you use hashing in, however for many applications these two are the de-facto standards.

the fact is that by now every serious cryto soft must be using high order sha, like SHA-256 or SHA-512 or stronger has functions like Tiger or Whirpool.

This attack is not related to the size of the hash, but rather to an error in the algorithm that allows you to compute two blocks with identical hashes easily. This will be harder with SHA-256 and so, but probably not as much harder as you think. I don't know the design of SHA-256, but unless it altered significantly from SHA-1 I'd expect to see an attack against it along these lines in the next 2-3 years. A new design *is* required to fix this problem.

Tiger might be that design; it was certainly designed to be resistant to the family of attacks currently being used on MD5 and SHA1. I don't know whirlpool, though.

A new competition would be good, as it would help people choose between SHA-256 et al, Tiger and any other competition available.
Re:not exactly 'practically irrelevant' by Gollum2001 · 2005-06-11 10:51 · Score: 1

This attack is not related to the size of the hash, but rather to an error in the algorithm that allows you to compute two blocks with identical hashes easily.

I did not say anything about hash size. The article says that _first_ one has to find random collisions for MD5 and in the case of high order SHA finding random collisions is, right now, almost impossible. With MD5 you can find a collision in a few hours with a normal PC. So it's not that it just produces a bigger hash, it has nothing to do with that.

I don't know the design of SHA-256, but unless it altered significantly from SHA-1 I'd expect to see an attack against it along these lines in the next 2-3 years

SHA is a family of functions, all of them different (different constants and functions) but based in MD4 structure (except SHA-224 and SHA-384, obtained from truncating SHA-256 and SHA-512) so what applies to SHA-1 it's probably not true for the rest of the family. Even if they share the same 'structure' the problem is still 'to find collisions'. Finding collisions in SHA-1 and SHA-256 requieres a very different approach.

A new design *is* required to fix this problem.

No, it's not. The problem now is that it's easy to find collisions in MD5 because it's now a weak algorithm with only 64 rounds. SHA-1 has 80 rounds and still no one has found a real method to find collisions in SHA-1, only partial attacks with a 40 round SHA-1. The article says thay 'once you've found collisions...'.

(Tiger) was certainly designed to be resistant to the family of attacks currently being used on MD5 and SHA1

Tiger is based in another 'structure' so MD5 and SHA-1 attacks doesn't even apply to Tiger.

A new competition would be good

I still think that it's not a good idea. We have a lot of hash funcions (most of them are unbroken), some of them are 'standards', changing the whole thing would only be a complete mess.

--
"Only two things are infinite, the universe and human stupidity, and I'm not sure about the former" - Albert Einstein.

Re:Explanation of the attack -- enforcement issues by m50d · 2005-06-10 09:32 · Score: 1

If you're signing hashes of documents without seeing the document itself you're a huge fool. And if you're seeing the document (presumably on your own machine, since security-conscious people don't go leaving their private keys around everywhere) you can make a digital copy of it, and keep it, easily.

--
I am trolling

Why cryptographic hashsums work by linuxhansl · 2005-06-10 09:33 · Score: 1

Every hashsum necessarily has collisions. There are two reasons why cryptographic hashes work anyway: 1. The value distribution (a small change in the input leads to a completely different hashsum). 2. The large set of possible hashes. MD5 is 128 bits. That allows roughly for 3x10^38 values. That is huge number. SHA-256 is 256 bits which allows for roughly 10^77 hashes. That is only within orders of magnitude of the estimated number of atoms in the observable universe (between 4×10^78 and 6×10^79). So I think it is safe to say that nobody will ever see or be able to generate a collision with SHA-256 or higher (unless there is a flaw in the algorithm).

A secure scheme for pad distribution by davidwr · 2005-06-10 09:41 · Score: 1

Well, not so much a secure scheme as making pad distribution a lot easier.

Suppose I'm a spy and I want to encrypt 4GB of data to send to my HQ. I need a 4GB pad.

We arranged ahead of time for me to memorize a list of movies and byte-offsets. My first pad is the DVD for that movie, starting at the prearranged byte-offset.

I rent the movie at the local video store, watch the movie, then burn a DVD with the encrypted data and find some way to smuggle it back to HQ. If it's compromised, no big loss assuming they don't trace it back to me.

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.

Re:A secure scheme for pad distribution by Dwonis · 2005-06-10 14:04 · Score: 1

Think about it some more.
That's not a one-time pad, even if there were only one copy of the DVD. The pad must be *random*. That is, it must be impossible (both practically and theoretically) to construct a test that would determine whether or not any given pad is valid. Individual DVD movies do not have this property. Even if the DVDs themselves *were* totally random, everyone has access to them, so assuming that there are 100 million different movies to choose from, your scheme has *at most* 27 bits of computiational security, and probably substantially less.

Of course, if you know that your attacker isn't going to have access to all the DVD movies in the world, then your calculated risk is lowered (your security remains the same). However, the reason why people talk about the one-time pad in the first place is that it's immune to *any* attacker. Your scheme is interesting, but it's not comparable to a one-time pad.

It's good to see you're thinking about it, though. If you keep at it, you'll improve.

Attack on Bittorrent? by Anonymous Coward · 2005-06-10 10:01 · Score: 0

What if Microsoft was to create a hash collision in their Longhorn CD image? Then when someone makes a torrent of the released version, they flood the network with the bogus version instead.

It's the protocol that's broken in this example. by hypnagogue · 2005-06-10 10:03 · Score: 1

The problem in this case isn't the hash weakness as much as the protocol weakness. If Ceasar, instead of just hashing the content that was handed to him, hashed the content + a random string he generated and then provided the random string and the signed hash output, then this attack would be impossible.

HTTP digest authentication does just this with the nonce and cnonce strings. Sure, MD5 may be weak, but digest authentication that uses MD5 isn't. So this criticism of implementors not taking this seriously is largely misplaced: most people only use MD5 in ways that aren't impacted by the attack.

--
Liberty you never use is liberty you lose.

Re:It's the protocol that's broken in this example by hypnagogue · 2005-06-10 10:26 · Score: 1

At the risk of being modded insightful by proving myself wrong, it should be noted that PKCS#7 and, by inclusion, S/MIME is subject to this sort of attack. PKCS#7 signing with "unauthenticatedAttributes" hashes the document directly.

--
Liberty you never use is liberty you lose.

Risk is small by DarkOx · 2005-06-10 10:28 · Score: 1

The only case I can think these techniques might be usable is if you wanted to change a decimal point to a comma or add a zero or something like that on a number. Changeing an entire text word in the document would be practically impossible. This might be somewhat useful for some forgeries but not many..

--
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html

MD5 passwords are probably still safe... by xxxJonBoyxxx · 2005-06-10 10:41 · Score: 1

Ahem. Rainbow tables?

Re:MD5 passwords are probably still safe... by Anonymous Coward · 2005-06-10 11:54 · Score: 0

Ahem. Salt?

(Idiot)

ROT13 = funny? by xxxJonBoyxxx · 2005-06-10 10:43 · Score: 1

Why is ROT13 mod'ed funny every time a crypto discussion comes up? (After the 10th time...it's not funny anymore.)

Re:ROT13 = funny? by vonstauf · 2005-06-10 11:41 · Score: 1

To be honest...I'll always find ROT13 funny for the mere reason that when I look under the hood of most software that "encrypts the password so the hax0r doesn't hax0rize it" its good ole ROT13. Its the whoopie cushion of crypto.

--
" Yesterday upon the stair I met a man who wasn't there. He wasn't there again today. I wish that man would go away."

The real problem is signing binaries, not md5 by coekie · 2005-06-10 11:17 · Score: 1

Signing such a non-plaintext file by looking at it in a viewer only (and (a bank) accepting such a signed document) is the mistake. You don't even need MD5 collisions to exploit that.

Suppose you write a HTML file with some java-script embedded that displays one text if the date is before X, and another if it's after. you get someone to sign it before X, and then use it afterwards.

Similar, in general, different viewers interprete (broken) files differently. So if you know which viewer the signer and the bank use, you could probably write a postscript file that displays different text in those viewers.

No checksum or cryptographic mechanism is going to prevent that. Nor is displaying files differently than other viewers a security problem that should be fixed in every viewer.

So the only way to prevent this, is not signing "binary" files (not fully plain text human parsable) written by someone else.

semi-custom hashes... by griasr · 2005-06-10 11:25 · Score: 1

since the first news about hash-weaknesses i turned to use "multi-hashes". meaning, hashing the whole file first then just hash some more fixed defined small parts of it with a "weak hash" like md5. then i pulled out a "hash of all hashes" which wont be too easy to reproduce or fake.

Ammo for the MPAA to attack bittorrent? by Benm78 · 2005-06-10 11:38 · Score: 1

I must say it is a remarkable piece of work, creating two different documents that make sense after rendering. In case of these documents, some evidence of tampering might be easy to find upon closer examination of the files, but its still a very important thing.

This also makes me wonder on how widespread the consequences may be. Using MD5 and similar digests is common practice, and was until recently considered quite safe.

As the bittorrent protocol uses extensive hash checking, could this new discovery mean that MPAA and the likes will be able to insert corrupt data to torrent networks? I can imagine this would be a very real 'application' of this research, if it allows one to create blocks of data with hashes and sizes as desired.

Even worse, in torrent networks, these poisoned blocks will probably be re-distributed by all peers - or am I missing something?

Re:Speaking of 128 bit collisions, UUIDs and GUIDs by J'raxis · 2005-06-10 12:17 · Score: 1

There are (almost) fully random UUIDs, also. There's one UUID format in which a couple bits denote "the rest of this UUID is random" basically. Other types of UUIDs use the MAC address and a timestamp, or sometimes a randomized MAC section (in which case a bit in that section will be set which is never set in a real MAC address).

--
Liberty in your lifetime

Digital Forensics by Anonymous Coward · 2005-06-10 12:34 · Score: 1, Interesting

This kind of information will wreak havoc with digital forensic departments. Right now we use various checksums, the most common being MD5, to verify the integrity and accuracy of a drive image/duplicate. Even if the chances of an accidental collision are so remote, the fact that people are researching and reporting on the fact that it can be done means proving this is a duplicate in court will be harder. Best case scenario, jury is shown reports like this, you then have reasonable doubt in the jury that the drive may NOT be an exact copy. Worst case scenario, the defence claims your lab doctored the results, and then they "prove" it with these algorithms and reports.

Possible solution: don't rely on... by Mr2001 · 2005-06-10 12:45 · Score: 1

...a single hash algorithm. Calculate multiple hashes of the same document: MD5, SHA-1, CRC-32, and whatever else you can think of. Then sign all of them at once.

Now the attacker doesn't just have to generate a new document with the same MD5 sum, he has to generate a new document that has the same sum under all those hashes simultaneously. If someone manages to figure that out within the next 5 years, I won't just eat my hat, I'll switch on the Infinite Improbability Drive and turn into a hat.

--
Visual IRC: Fast. Powerful. Free.

Re:Possible solution: don't rely on... by Anonymous Coward · 2005-06-11 04:58 · Score: 0

Well, quantum computing might allow for this, also if the weaknesses for both MD5 and SHA-1 turn out to be practical (there is not enough documents out there on the math of it). Then a nice parralel cluster could do it.
Considering the density of FPGA's or even better CPLD's you can get a huge amount of processing power in a small space.
The weakness in SHA-1 seems to allow for an array of FPGA's no larger then 1U to calculate a padding for a given document to result into a given hash in under 2 days

Re:Speaking of 128 bit collisions, UUIDs and GUIDs by hairyfeet · 2005-06-10 13:19 · Score: 1

i have always wondered why they wouldn't use someone's browsing history as well as the MAC address as part of the "random" data.there is lots of data there to play with like time,sitename,site I.P address,etc. Of course i've never really been "crypto guy", so there might be a reason,I just don't know about.

--
ACs don't waste your time replying, your posts are never seen by me.

I believe Caesar used ROT3 by MacDork · 2005-06-10 13:29 · Score: 1

http://en.wikipedia.org/wiki/Caesar_cipher

Re:It's the protocol that's broken in this example by stoborrobots · 2005-06-10 14:10 · Score: 1

Hmmm.... when you say "content + a random string" are you talking about an appended string, like a nonce?

Because this attack is invulnerable to that - MD5 has a lovely property that if A and B collide, then appending the same random string to both A and B results in another pair of collisions. So it is not enough to append a random string to the document before signing.

--
"Go to CNN [for a] spell-checked, fact-checked summary" -- CmdrTaco

well... by davidwr · 2005-06-10 14:52 · Score: 1

substitute "the last bit of each byte" or "padbyte=rand(DVD-byte)" where rand() is a random-# generator and it's random enough.

The keys to this scheme are:
1) your adversary doesn't know DVDs are the basis of the one-time pad. With one-time pads, security through obscurity IS security.
2) your adversary doesn't know the algorithm to turn the DVD into a pad. It could be something as simple as a bit-offset like I suggested, or as complex as you want it to be, as long as it's easy to impliment. Heck, it can even be 2 DVDs xor'd together.

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.

Re:well... by Dwonis · 2005-06-10 16:42 · Score: 2, Insightful
substitute "the last bit of each byte" or "padbyte=rand(DVD-byte)" where rand() is a random-# generator and it's random enough.
[Disclaimer: I'm not an expert at cryptography, but I like to think I understand it better than most non-mathematicians outside the field. It would be really nice if a crypto expert could clarify this, but I don't expect that to happen on Slashdot.]

You are correct that your scheme would add some security, but not nearly as much as our intuition might lead us to believe.

Let's say you are going to transmit an n-bit message. Even if you don't transmit any information, an attacker knows that there are 2^n possible messages that you could transmit. If we assume that your message is compressed as much as possible, then before you transmit the message, all 2^n messages are equally likely (from an attacker's point of view).

So, you have n bits of data. We'll call this message P (the plaintext). Now, let's say you generate another n-bit random message, called K (the key). Finally, you xor P and K together to produce C (the ciphertext), which is also an n-bit message.

The theory behind the one-time pad says that if and only if there are at least 2^n equally-likely possibilities for K, then someone who only knows C cannot learn anything about P.

We can express this a different way. Let's say you have an invertible function, C = f(P), and:
- P is 2^n bits long.
- C is 2^n bits long.
- There are at least 2^n equally-likely functions for the function f
Note that the function f is just a generic representation of the one-time pad algorithm and the key K, so similarly, we cam say that an attacker who knows nothing about the function f cannot learn anything about P from only C.

And that's the problem: every time you transmit a message (P) that isn't *completely* random, you give the attacker a little information about f, unless you completely change f every time you transmit a new message. This why you can never re-use a key in the one-time-pad system.

So let's say you have a key, K, that has fewer than 2^n equally-likely possibilities. Then, there are fewer than 2^n possible functions f. If there are still 2^n possible values for P, then an attacker can learn some information about P from C. So, if you don't want that to happen, you need to have 2^n possible functions for f.

So, you have 2^n equally-likely functions for f, and you need to use a different one for every message. In order to let the recipient know which function to use each time you transmit a message, you have to transmit at least n bits of information to the recipient. I think you can see where this is going...

If you were going to write an algorithm to implement the function f, the optimally-compressed description of the algorithm would have to be at least n bits long, and would need to be replaced for every new message that you send. It doesn't matter if f is an algorithm based on a DVD library, or a really complicated program. In order for the one-time pad to work (an attacker learns nothing about P from C), you need make sure that there are at least an additional n bits of information that the attacker knows nothing about.

So in your example, you'd still need to send a new 4GB (optimally-compressed) version of rand() for every 4GB message you send.

Nice try, though. Keep it up!

Nonces And Blind Signatures by Anonymous Coward · 2005-06-10 15:06 · Score: 2, Informative

I can't believe that nobody, including Bruce Schneier, has pointed out the well understood principle that you *never* blindly sign something provided to you by someone else.

Secure schemes involving one party signing material provided by another party should always involve a signing nonce. This is a random bitstring appended to the material to be signed. In the example from the article, if Caesar had added a nonce before signing the "good" document, then Alice would not have been able to produce a "bad" document with a matching hash.

However, as someone pointed out, this kind of attack is possible without using hash collisions. You simply have to convince your victim to sign something that is code-like (such as the postscript files used in the example, or an HTML document with javascript, or a Word doc with Visual Basic macros, or an executable) without making an expert examination of the complete contents of the code. The code-like "document" can be written to present one set of behavior under one set of circumstances and a completely different behavior under other circumstances. If the signer is not competent to examine the program and be fully aware of what it does, then the signature can have no meaning.

The attack demonstrated in the article is *still* of no practical consequence in my opinion. It doesn't mean that signed code can't be trusted; trusted code is always signed by it's creator, meaning that if the signature is valid *and* you trust the creator, then you can trust the code. It is still impossible for someone who didn't create the signed code to create a different piece of code that has the same hash, so you don't have to worry about an attacker replacing signed trusted code with signed malicious code. As always, you still have to worry about misplaced trust in the code's creator.

What the attack in the article *should* be trying to make clear is that a signature is only as trustworthy as the signer's ability and desire to evaluate what is being signed. A third party auditor that signs code after reviewing it doesn't ensure that the code is trustworthy, it only ensures that the auditor *claims* the code is trustworthy. What the implications are for the actual trustworthiness of the code depends on the trust you place in the intentions and abilities of the auditor.

Re:Nonces And Blind Signatures by julesh · 2005-06-11 08:50 · Score: 1

Secure schemes involving one party signing material provided by another party should always involve a signing nonce. This is a random bitstring appended to the material to be signed. In the example from the article, if Caesar had added a nonce before signing the "good" document, then Alice would not have been able to produce a "bad" document with a matching hash.

That doesn't work for this attack, because it produces a single document that contains both texts, and differences in just one hashing block of the file trigger a change in what is shown. Any changes made in one copy can simply be copied over to the other and it will continue to work fine.

Re:Speaking of 128 bit collisions, UUIDs and GUIDs by J'raxis · 2005-06-10 16:29 · Score: 1

The MAC isn't used as a seed to some random function, it's used as-is, since it's a unique number you already possess. This means you can generate universally unique IDs without having to register for a special number, like you do if you want an SMB OID. Basically, you need two values to make a guaranteed-unique identifier -- a time and a location: the first half of a UUID is the time it was created (with a 100ns resolution) and the MAC is the where.

The random type of UUIDs, which aren't guaranteed unique, but are merely probabilisticly unique -- and which don't use MACs at all -- are meant to be fallbacks if a MAC is unavailable on your system, or if one has privacy concerns and doesn't wish to broadcast a UUID's location of generation.

--
Liberty in your lifetime

Is this employer/employee fraud useful? by njyoder · 2005-06-10 20:06 · Score: 0

I'd be interested to see how this attack is actually useful in the real world. Even given that they're using a code-base format like postscript, under what strange scenario is an organization going to rely on important documents being digitally signed by people other than the original creator? Any kind of contract or important document is going to be hand signed with at least one witness.

This attack is easily revealed too. A simple examination of the postscript documents shows that it contains two versions--the neferious and non-nefarious versions. Not just that, but the person signing the document will most likely have a copy of the non-nefarious version stored, so if someone tries to pull some fraud, they just whip out the version _they_ received and show that the hashes match up.

The only other use I've seen usggested is for generating two certificates (or any equivalent) with different public keys. But what would be the use in that since it would have to be the same person generating both certificates? Congratulations Mr. Hacker, you can generate two certificates with different keys that both appear to come from Mr. Hacker.

Even without technical knowledge of all this, could you really get away with any of this? If a contract or official "command" document of some kind was forged, people would find out the instant someone tried to take advantage of what they have done that something fishy was going on.

Really, how dense do you have to be to not notice that suddenly a low level employee is going into high access areas despite no one being informed about it beyond a single electronic document? How dense do you have to be to a) not notice a big change in a contract and b) not keep a personal copy of what you signed? Frankly, it would be easier to just swap *printed* forms of contracts than it would be to do this electronci forgery since no one uses electronic contracts like that anyway.

Re:Is this employer/employee fraud useful? by julesh · 2005-06-11 06:23 · Score: 1

I'd be interested to see how this attack is actually useful in the real world. Even given that they're using a code-base format like postscript, under what strange scenario is an organization going to rely on important documents being digitally signed by people other than the original creator? Any kind of contract or important document is going to be hand signed with at least one witness.

The only situations I'm aware of where it is common for documents to be signed by somebody other than the creator are:

* X.509 certificates -- in this case the documents are very small, contain little redundancy and it is unlikely that such an attack could be made to work.
* Date stamping services that prove a document existed at a particular time -- but this attack isn't applicable here because you have to produce both documents at the same time.
* Peer-to-peer networks that use reviews on web sites to help users differentiate good files from bad ones. This last category *might* be vulnerable.

I honestly don't think this kind of attack is really useful, despite what anyone might say. It cannot be applied to contracts, because in order for the contract to be useful it would have to be possible to produce the original digital file that was signed in court. Anyone reasonably competent examining this file could easily see that the text of both documents is in there -- this alone would be enough to throw doubt on the contract, and whoever had produced the contract would almost certainly lose the case.

The only other use I've seen usggested is for generating two certificates (or any equivalent) with different public keys. But what would be the use in that since it would have to be the same person generating both certificates? Congratulations Mr. Hacker, you can generate two certificates with different keys that both appear to come from Mr. Hacker.

If the attack could be made to work (which it can't in its current form) then you could use it to persuade the issuer to sign a certificate they wouldn't usually sign -- e.g., you own example1.com, while somebody else you want to attack owns example2.com; if you can produce a certificate request that looks like it is for example1.com but has the same hash code as a valid certificate request for example2.com, then you could submit the one for example1.com, take the signature returned to you and apply it to example2.com.

But the attack doesn't work like that -- it relies on being able to use effectively random changes to a small and otherwise meaningless chunk of the file in order to switch the meaning of the document between two extremes. This requires the format of the document to have some very advanced capabilities (in the case of the attack presented, they're using the general programming capabilities of postscript to choose between two different documents), which certificate signing requests don't.

I mention that P2P networks might be vulnerable; here's how an attack on one of those would work:

You have a piece of software that would be popular on a P2P network (say the latest version of a popular free software project). You want to include a trojan horse program in it.

The P2P network has a review site where trusted people evaluate software available on it and check whether there are any trojans, etc, in the package. They do this by closely monitoring the behaviour of the programs in a set of tests that you know little about. They're good at spotting trojans; you're pretty sure that if you released your software with the trojan in it, they'd spot it and would recommend users didn't download the file, destroying the effectiveness of your plan. If however they were to evaluate your file and not spot the trojan, thousands of people would be infected...

You produce a program that, during startup, checks one data block against another (in an identical fashion to the postscript document described above) and starts your trojan thread only if they differ/match/whatever. You release the non-
Re:Is this employer/employee fraud useful? by njyoder · 2005-06-11 12:56 · Score: 0

Well as you said, this wouldn't work well on compressed files. Assuming that distributing a raw .exe is no big deal (you could say it was compressed within the exe), I think these people who examine the files before voting on them would pick up on it quickly.

The way I see it there are two types of "attack files":

1. The random blob is meaningful data, such as a public key in a certificate.
2. The format allows for code to be used (like with postscript).

In the case of programs, only #2 really applies. It is possible to insert junk data in an executable easily, there are many sections that you can just fill with meanginless garbage, which vary depending on the platform.

The real issue is that, like with the postscript file, you'd need both versions of the executable in the same file. People examining the executable could look through an assembly dump of the file and see that somewhere in the beginning there is code that decides, based off some random junk data, to decide to execute one part or another.

This can be easily thwarted with the heurestics in virus checkers. Just like with viruses, people will try reusing code, the code that decides which section to execute based off the random junk data. Likewise, people will use similar means to store junk data.

Now it's true, someone could get clever, encrypt the startup code (make it polymorphic) and do tricks to modify it so it appears to be new, BUT the same also applies to new viruses/trojans and their variants. So really, you're back to where you started from--forced to write new trojans/viruses (or rather, new startup code using similar techniques) to get around the anti-virus software.

A question, then ... by dustmite · 2005-06-11 04:25 · Score: 1

Which part of the above is MD5-specific? It sounds like the above would be possible regardless of the hash algorithm used? Or is it that they've found a method to find X and Y that both hash to the same value quickly enough with MD5. From your description it sounds like X and Y can be very small though, in which case it shouldn't take too long to find X and Y "brute-force" for any hash algorithm?

Re:A question, then ... by julesh · 2005-06-11 06:42 · Score: 1

From your description it sounds like X and Y can be very small though, in which case it shouldn't take too long to find X and Y "brute-force" for any hash algorithm?

MD5 (and most similar hashes) work on blocks of data at once; if you can produce a single block of data that hashes the same, it doesn't matter what's in the file before or after it, it'll always have the same effect on the resulting hash. I think the blocks are 64 bytes for MD5, so that's the size of your X and Y.

Still, to brute force it, you'd need to perform on average 2^64 hashes (because there are 2^128 possible outcomes for the 2^512 possible inputs), each one using approximately 3-500 arithmetic unit cycles* and producing a *very* large lookup table to determine if you'd found a hit**.

So, no, brute forcing it isn't feasible, even for very small blocks.

* or roughly 200 processor cycles on a modern intel processor, so you're probably looking at about 2^40 processor seconds or so, or about 2^15 processor years, or much longer than I'd be willing to wait)

** to do the lookups you'd need at least 2^64 * 72 bytes of RAM to store each hashed result and the input that generated it, or about 300 million terabytes. It would be best to store the information in a hash table large enough to store every possible result, at which point you'd need 2^128 * 64 bytes of RAM, which is a *ludicrous* volume! In reality, somewhere between the two would suffice nicely, perhaps just 600 or 900 million terabytes.
Re:A question, then ... by dustmite · 2005-06-11 08:44 · Score: 1

OK, thx, I think I see then. So the 'MD5 weakness' comes into it then because the Wang and Yu paper gives them a way, they say, to calculate the two different blocks with the same hash relatively quickly i.e. in a few hours .. according to the article: "Based on [WY05], we implemented an attack to find random collisions for the MD5 compression function. It took just a few hours on a customary PC."

Pigeon-hole principle == Dirichlet's Box Principle by Anonymous Coward · 2005-06-11 04:29 · Score: 0

Tell me how this applies?

-- Ender, Duke_of_URL

Re:It's the protocol that's broken in this example by julesh · 2005-06-11 09:02 · Score: 1

Prepending a string might work, though. Just a thought.

perhaps I'm not clear by davidwr · 2005-06-11 16:21 · Score: 1

In my scenario, over the life of my spying, I have a very limited amount of data to send. Say it's 2KB per week for 10 years, that's about a MB.

There are thousands of commercial DVDs and tens of thousands of commercial CDs available in both my country and his.

For each CD or DVD, assume we use the key=gzip(/dev/dvdrom)+offset, using only bit #n out of each byte, and discard the rest, and wrapping around one time (and one time only) to get the full length of the gzip'd file.

In effect, I have probably 5% of the bytes on any CD or DVD as part of the pad, and the effective pad length is 5% times the combined size of all CDs and DVDs whose names I can commit to memory.

To make matters worse, my adversary doesn't know I'm using DVDs and CDs as a data source.

Now, of course, if my adversary can figure out that I'm using commercial DVDs as the basis of my data, then I've lost a huge amount of security, but not nearly as much as if he somehow intercepted an actual one-time pad - in that case it would be "game over, man."

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.

Re:It's the protocol that's broken in this example by herberts · 2005-06-11 20:28 · Score: 1

Seems to me that A and B have to end at MD5 block boundaries otherwise md5(A+Z) != md5(B+Z).

Re:It's the protocol that's broken in this example by stoborrobots · 2005-06-12 05:02 · Score: 1

Yeah, you're probably right - but if you're already engineering A and B so that the sums collide, it shouldn't be too difficult to ensure that they also end at a block boundary.

--
"Go to CNN [for a] spell-checked, fact-checked summary" -- CmdrTaco

Re:First post by xintegerx · 2005-06-17 00:41 · Score: 1

If I was a troll I wouldn't have 5 mod points every week. You have to use your brain when reading what I say. I know the kinds of stuff I could post that would give me +5 insightful, or +5 funny. However, I choose to post my true feelings. When some douche mods it offtopic, it doesn't make me a troll.

--

Cover your eyes and click this link!

Re:First post by rbarreira · 2005-06-17 02:57 · Score: 1

Maybe you're not a troll, but your post was a troll. Did you even read the article before posting it? If you had, I don't think you would say what you said...

--

The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F

312 comments