SHA-1 Collisions for Meaningful Messages
mrogers writes "Following on the heels of last year's collision search attack against SHA-1, researchers at the Crypto 2006 conference have announced a new attack that allows the attacker to choose part of the colliding messages. "Using the new method, it is possible, for example, to produce two HTML documents with a long nonsense part after the closing </html> tag, which, despite slight differences in the HTML part, thanks to the adapted appendage have the same hash value." A similar attack against MD5 was announced last year."
Or at least for the next 3 years.
Using the new method, it is possible, for example, to produce two HTML documents with a long nonsense part
To achieve this, the method uses material pulled from myspace.com.
Push Button, Receive Bacon
I would have thought this not such a big issue for software developers who aren't incompetent.
This is bad news for the git.
a4055725b7794277bd9da3a7904c011cb24edbec
It's a hash algorithm... no big deal ? Just like it was proven possible to break MD5 for binary executables and others them (both) won't be dropped for, say, storing passwords on a database.
One thing is that cryptographic hash functions should be easier to make secure than ciphers. At leaste that is what many cryptogtaphers thought. The other is that up to now you could rely on SHA-1 to be collision resistant, no matter what. The argument that you have a large part of the message being "garbage" does not give any real security. Many, many applications can still be attacked, and they need not even be broken for that.
While expected since last year, selecting and using crypto-hashes just got a lot more difficult and error prone.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
So the original document that is trying to be forged/modified has to convienently have a "long nonsense part" after the tag. How real-world relevant is that?
The second reason to keep cool is just as important, if not even more so: hackers will have to execute a pre-image attack to manipulate, for instance, a contract that has been digitally signed. In other words, hackers will have to find a second, manipulated contract with the same hash value as the real contract. In principle, the number of operations needed is thus far greater (2160). Indeed, as far as we know all attacks to date have only concerned collisions, and Wang et al. does not change that. There are no known methods to reduce significantly the number of operations needed for pre-image attacks.
Don't you think you're flying off the meter here a bit... Just because a collision was found means truly little. So a garbage laced HTML page was created after the actual HTML closing tag... 1) No one will see what comes after that unless you like viewing the source of a webpage as opposed to an actual page. 2) You should read up on birthday paradoxes. If someone created two similar messages, it would take years for them to figure out how to compute a hash to match. Now in the field of sending out something so so so secure, what makes you think that even if a someone did re-computate a hash to match, that message would be worth anything years down the line. Someone would have to be able to accomplish a collision, re-computate the hash in their new message and send it all within minutes for it to truly be a threat.
Let's look at this scenario... A massive kernel update is made to say Linux... The information is hashed, posted, and everyone is now going to update their Linux boxes... Unless someone is so quick fast to intercept along this path, most are safe unless they choose to verify the hash years down the line (which by then would be worthless). So unless someone can exploit this within minutes (no more than I would guesstimate 36 hours), I see little reason to get all bent out of shape over this...
Infiltrated dot Net
For all those failing "to see a problem" or guessing about the implications: this actually is quite severe, and it doesn't just concern "broken" applications.
Heise has some fine backround in their Know-How article about cracked hashes.
Common web browsers (I just tried Opera, FF, and Lynx) will happily display everything after the closing tag. You would have to put it inside <!-- --> comment delimiters, but then it doesn't matter whether it is before or after the closing tag. Unless the attack doesn't work if the --> has to come at the end, but then you can just omit the closing tag. Only an XHTML-compliant browser would complain. From cursory scanning TFA it is not clear to me what the reason is for mentioning the closing </html.
Avantslash: low-bandwidth mobile slashdot.
Provide the following 3 pieces of data:
1) Message/file length
2) SHA1 hash
3) MD5 checksum or some other hash/checksum that's calculated way differently from SHA1.
Providing the length means that the person trying to change the data needs to keep it the same length which makes it more difficult.
Using 2 different hashing/checksumming methods means they have to be able to match both of them in order to be able to switch the data.
The more restrictrictsion we toss on the data, the harder it is to manipulate. I do think that using more than 2 or 3 hashing/checksumming methods would be overkill however.
It seems to me that all the attacks on both of these hash systems (SHA-1 and MD5) involve different message sizes which is easily fixed by using both hash and message size as a verification. I honestly don't come close to understanding the math involved in these hashes, but I get the idea that the complexity involved in creating a hash using the exact same amount of data is far higher.
Maybe factoring in message size as part of the hash is the solution? I don't know. Again, the math is way above me. But you could certainly do it by adding on some extra bits to the hash itself. A bit costly in size, but seems like maybe it might be worthwhile.
I mean this find is awesome... but to get a matching SHA-1, how big does the html get with the jibberish. I mean it just has to create the same hash, but i dont think it would be a couple of lines.... then again the first post about myspace content is prolly on to something, i guess there is a little truth in all jokes.
Well, a variation of SHA-1, now called SHA-0 was broken similarily more that a year ago (before MD5 was broken). It was just a matter of time they figured out how to break full SHA-1. In the mean time, people who knew about this switched to SHA-256 which is still known to be good.
If it's no on fire, it's a hardware problem.
NO SHA-1 COLLISIONS HAVE EVER BEEN FOUND!
Ahem.
Sorry, my caps lock key got stuck there.
No SHA-1 collisions have ever been found. The first person or group to find one will achieve considerable fame. I say this as an attendee of both last week's Crypto conference and the immediately following hash function workshop.
The work factor estimated for a SHA-1 collision is something over 2^60 cycles. That would put it on par with the biggest calculations that have ever been done (publicly anyway). So far nobody has put together a sufficient effort to achieve a collision.
At the hash function workshop, cryptographer Antoine Joux published a set of recommendations for how such a hash collision effort should be mounted, in order to minimize the damaged from a published collision. The main goal is to make it difficult to take a published collision and use it to create harmful effects in various ways. Hopefully Joux's guidelines will be followed if and when a SHA-1 collision finding project gets started.
If the MD5 of the two different strings that had the same SHA-1 value are different then there's no real reason to panic. For an added level of security you could also calculate the byte length of the data.
Software will just need to be updated to calculate two hashes. Good luck finding two sets of data that are different yet have the same length, the same SHA-1 hash and the same MD5 hash.
Work Safe Porn
I can never quite figure out why the MD5 and SHA and all these other algorithms don't include the original message size as part of their hash. This would eliminate all attack vectors that stem from adding or deleting information from a file. But I guess that's too simple a solution for these mathematicians.
SHA-1 and MD5 are becomming non-safe to use. So, we go to SHA-256 or a completely different algorithm?
SHA-1 was proved to have insecurities years ago. Because of that SHA-2 ("SHA-256", "SHA-384", and "SHA-512") was released back in 2001 as a better version of SHA-1. SHA-2 was tested and no insecurities were found (yet). What's more, SHA-2 is now the official US standard.
Complaining that SHA-1 is insecure is like complaining that Windows 98 is insecure.
Oblig Wikipedia link: http://en.wikipedia.org/wiki/SHA_hash_functions
Why not just create an algorithm with more rounds? I'm sure it would have more benefits than just slapping two algorithms together. In fact MD5 and SHA came from the same algorithm anyway (MD4), so I don't see much of a benefit... You may even need less processing time to calculate the hash than just doing both.
Thi smight be good is the number that was output was two seperate numbers concatenated together. Otherwise it's still just one number they have to match when they are modifying the hash protected data.
It sounds like this scheme would alter the size of the original text. Don't most schemes compare the hash and the size of the text?
This is broken in O(K 2^(n/2)) time, see Joux's paper "Multicollisions in iterated hash functions".
It looks like validation of document structure significantly complicates the problem of creating a hash collision (the technique mentioned relied on padding the document with junk at the end, which, IIRC, is not in line with modern XHTML specs). Of course it's best to use a secure hash to begin with, but this does add yet another layer of resistence.
Not true -- XHTML-compliant browsers will only treat it as XHTML if it's sent with a "Content-Type: application/xhtml+xml" header. This is very nice for keeping your page clean -- view it in Firefox with that header, and it'll parse it as XML, and complain whenever you have a problem with your XML, saving you a trip to the even more pedantic W3C validator. Unfortunately, sending that header will have browsers that don't understand XHTML attempt to download the page, rather than displaying it as HTML. Even worse, browsers aren't consistent in the "Accept:" header they send, which is supposed to help with this -- Firefox prefers XHTML and says so, for instance, but while Safari is perfectly capable of receiving XHTML, it just says "Accept: */*", which says it doesn't prefer any type of document -- I may as well send it a PDF (which it would try to download). Even browsers which indicate a preference have to have "*/*" somewhere in there if they want to be able to download files.
So, there's really no good, standard way of detecting whether to send XHTML as XHTML or HTML. I've tried implementing something on my website, but I doubt that it will really work properly until I break down and attempt to detect specific versions of browsers.
Don't thank God, thank a doctor!
Actually, hashes are difficult to secure for general communications purposes without putting a cap on the size of the transmission. In information-content terms, a collision proof hash is equivalent to a lossless compression algorithm.
A hash will either contain all of the non-redundant information in the original content, or some of the information gets lost during the hash. Non-redundant information being defined in an information-theory sense that a given bit is completely random/unpredictable based on the content of preceding bits.
In order for a hash to be completely collision proof, it has to contain all of the non-redundant information contained in the original file. Otherwise information in the orignal message is lost in the hash. And if information is lost from the original message, that creates a possibility of constructing a message that differs only in the information that is removed by the hash. Only if the original message is reconstructible from the hash (plus possible information contained in the hash algorithm itself) will it be collision-proof. You've either got the information-content, or you don't. And if you don't have the content, you can't validate it.
We are the 198 proof..
When messages are signed they are already compressed and the
file size is used as part of the final message digest.
If your application doesn't do this then you should use another
applications.
Using the above mentioned processes none of these attacks are
viable, both in the short and long terms. (supposedly)
When receiving the following message:
Jon D0e owe$ Jane Do3 $1000`000 d011ar5.
or
Jon Doe owes Jane Doe $1000000 do11ars.
$4#^$%^&*5#$^()^%ER$%^RT#$3
Would you not ask the sender to resend
their original message?
Arash Partow
Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.
If you can manage to replace a hashed file from a web site with content of your choosing, couldn't you just as well change the hash it's checked against?
Even if the hash and the file came from seperate sources, odds are a user went to 'xyzzysoft.com' from which the link to both the hash and the file are provided; so if xyzzy's page were hacked, they are screwed anyway.
To use a car analogy, to steal a purse in a car, a theif dosn't pick the lock, he smashes the window.
Unless you have a seperate, secure channel for transmitting the hashes, end-to-end, and compare them manually. (such as faxing the hash, then downloading the file) it's silly to use the logic of: File Hash equals Website Hash therefore File is safe, File is the one I wanted, File is really from Xyzzy. Not that hashes are useless, far from it they are excellent to verify against transmission/storage corruption, viruses, and unsophisticated attacks.
There have been hacks such as replacing the system random number generator to cheat in an online game, which would wreak utter havok with any crypto depending on it; a complier virus that adds itself to compliers compiled with an infected compiler, which could silently modify the output of a hash comparing program if a particular sequence was detected. And here's an idea I just thought up, a hacked file decompression library that modifies executables as they are being extracted, after you carefully manually checked the hash of the compressed package you recieved. (if you used a user-priviledge extraction utility for something like an updated kernel mode driver)
I think the key point is this:
No SHA1 collisions have ever been published
whether or not they have been found is a different matter entirely.
Nessie is using encription now? Beware in case you go to Loch Ness....
LedgerSMB: Open source Accounting/ERP
That doesn't always work. For example, two rounds of DES encryption has weaknesses that neither one nor three round suffer from. The only way to count on an encryption or hash algorithm is to have it tested by the general public for many years. Simply increasing a key size or number of rounds is often a crapshoot and you won't know whether you were successful for a long time.
The better alternative is to move to an existing well tested algoritm while the security researchers are testing the future replacements. If there's one place you don't want to be on the bleeding edge, it's with security algorithms.
It's a hash function, not an encryption.
So how can SHA-2 be secure? It has no proven collisions? I can prove it has collisions. It accepts more than 512 bits of input and only procues 512 bits of output, it must collide.
There have not been shown to be collisions for meaningful messages in SHA-1 yet. And the "break" for it requires on the order of 2^63 operations.
SHA-1 is a long way from dangerous to use at this point.
http://lkml.org/lkml/2005/8/20/95
Ehm, this text was written for the predating 2^63 way of finding collissions and should be considered rather out of date, even though the current attack is only for 64 rounds of SHA-1. Oh, and supplying the reference would also be a good idea. Kudo's for reading the document and following the link though.
http://www.heise-security.co.uk/articles/75686
If they really did that as default, you would have a /etc/shadow filled with data about the exact length of each password. Great thing, that ;)
) . By the time everyone has switched, the old algorithms are completely useless, anyway (Hi, 3DES!)
Some Software does include size optionally, other does compute two hashes.
The basic problem though, is that MDn and SHA-n are in the same family and these algorithms have been fundamentally broken for some time now. Instead of slathering on more of old security (Hi, 3DES!), get a new, better and above all different approach like http://en.wikipedia.org/wiki/Whirlpool_(algorithm
..good sites will sign the hash with a private key. In this case, having the ability to change the hash, too does not mean it would be desirable for the attacker.
The attack is not about the garbage characters - it's about what came first. The garbage characters ARE the attack themself, they don't need to carry a virus or something to be malicious... Point being, let's say you sign a document saying you're going to give me 15% of revenue. I change it to 20% and start adding random characters at the end until I get a document that maintains the same hash, meaning I can "prove" that the second document is the one you signed...
I am disrespectful to dirt! Can you see that I am serious?!
"Someone could modify the original tarball, for example, introducing a trojan horse, and append some other not useful data to it so the sum matches."
No, they can't. The attack can not take an existing message and modify it in such a way that the modified version has the same hash. These attacks can only generate a prefix and suffix chunk of seemingly random data that you can sandwich your choice of payload between and get the same hash. In other words, you have to carefuly create BOTH messages to get them to hash the same, and it's fairly obvious if someone has done so.
Your existing signed messages are still perfectly safe.
I think you're confusing the number of rounds (16) in the internal DES Feistal cipher algorithm with the number of times DES may be applied as a whole with (usually different) keys to assure a longer effective key length. Triple-DES (DES3), for instance, is often applied in encrypt-decrypt-encrypt (EDE) mode using either two or three different 56-bit keys for an effective key size of 112 or 168 bits, respectively.
I'm not confusing them, I'm just citing an example where "doing it more" ends up making the system less secure. More rounds, more bits, more iterations, it doesn't really matter. What matters is that any lay person's quick fix is likely to cause as much harm as good. It takes a lot of cryptanalysis and a lot of time to test the quality of an encryption algorithm.
My favorite example is when script kiddies pull out some wacky scheme like "Take a message, type it on an image in photoshop, save as JPEG, reverse the bytes in the file, zip it, and xor with 'HELLO WORLD'." They don't realize that a determined skilled attacker can break that much more easily than if it were a proven cipher. If you don't have a lot of experience in cryptography, you shouldn't even try to "improve" an algorithm.
Remember...... ALL encryption algorithms are 100% resistant to every attack the designers thought of, except brute force. If it wasn't resistant to their test attacks, they would have changed it. Therefore, only someone else can verify the strength of an algorithm. It takes many "someone elses" before you can be confident an algorithm is actually good.
I agree, generally. Of course, not all multiple iterations of DES (e.g., DES3-EDE-168) result in a weaker cipher than the original. I was rather seeking to clarify the term 'round' which, especially in relation to Feistal ciphers, has a particular meaning that is generally accepted by cryptographers.
..."
;-)
"ALL [well-designed] encryption algorithms are 100% resistant
That is to say, some badly constructed algorithms may not even be resistant to some attacks anticipated by the designer.
No, that doesn't characterize hash function security at all.
First of all, all hash functions of the same size are pretty much equally vulnerable to a brute force attack - that's practically the defining feature of a brute force attack. It's resistance to shortcut attacks that matters.
Secondly, there are loads of properties people want hash functions to have - not only preimage resistance but second preimage resistance, collision resistance, correlation-freedom, multiplication-freedom and more. In fact no-one has successfully formalized all the properties we want our hash functions to have: the formalism that comes closest to capturing it, the Random Oracle Model, is impossible for any instantiable hash function to achieve.
Xenu loves you!