Apache Subversion Fails SHA-1 Collision Test, Exploit Moves Into The Wild (arstechnica.com)
WebKit's bug-tracker now includes a comment from Friday noting "the bots all are red" on their git-svn mirror site, reporting an error message about a checksum mismatch for shattered-2.pdf. "In some cases, due to the corruption, further commits are blocked," reports the official "Shattered" web site. Slashdot reader Artem Tashkinov explains its significance:
A WebKit developer who tried to upload "bad" PDF files generated from the first successful SHA-1 attack broke WebKit's SVN repository because Subversion uses SHA-1 hash to differentiate commits. The reason to upload the files was to create a test for checking cache poisoning in WebKit.
Another news story is that based on the theoretical incomplete description of the SHA-1 collision attack published by Google just two days ago, people have managed to recreate the attack in practice and now you can download a Python script which can create a new PDF file with the same SHA-1 hashsum using your input PDF. The attack is also implemented as a website which can prepare two PDF files with different JPEG images which will result in the same hash sum.
Another news story is that based on the theoretical incomplete description of the SHA-1 collision attack published by Google just two days ago, people have managed to recreate the attack in practice and now you can download a Python script which can create a new PDF file with the same SHA-1 hashsum using your input PDF. The attack is also implemented as a website which can prepare two PDF files with different JPEG images which will result in the same hash sum.
Linus isn't afraid, why should you be?
why are ppl using this shit any 1 with any intelligence should be fired they should be fired
Webkit is apparently on SVN repository.
It's now time to retire SVN... everywhere... permanently.
Anons need not reply. Questions end with a question mark.
Then either turn in your nerd badge, or get a paracetamol and start educating yourself. Whatever you do: stop whining.
This entire summary makes my heart sing: no Trump, no clickbait, but crypto, a broken algorithm, and funny side effects. Oh, and exploits.
We need to worry about important things... like Judge Wapner is dead.
Here's what it means: One major aspect of modern cryptography are "hash functions"- a hash function is a function which essentially has the property that in general two inputs with very small differences will give radically different outputs. Also, ideally a hash function will also make it hard to detect "collisions" which are two inputs which have the same output. In general, hash schemes are used for a variety of different purposes, including determining if a file is what it claims to be (by checking that the file has the correct hash value).
Every few years, an existing hash system gets broken and needs to be replaced. MD5 is an example of this; it was very popular and then got replaced.
One of the major currently used hash schemes is SHA-1. However, a few days ago, a group from Google described an attack that allowed them easily find collisions in SHA-1 (easy here is comparative- the amount of computational resources needed was still pretty high). The group released evidence that they could do so but didn't describe how they did so in detail. They gave an example of two files with a SHA-1 collisions and they also described some of the theory behind their attack. What TFS is talking about is how based on this, others have since managed to duplicate the attack and some make some even more efficient variants of it; so effectively this attack is now in the wild.
1: Find the hackers.
2: Send in the drones.
Someone checked in PDFs that demonstrate the first engineered SHA-1 collision and this broke SVN. PDFs in question took 6500+ cpu years + 110 GPU years to generate. "In the wild" is a bit panicky & excessive.
What does this actually means in terms of integrity of repos and other things that rely on SHA-1? Does it merely break repos or does it facilitate injection attack vectors - how important is secure hashing in the guts of repos? What precisely is being secured? SHA-1 has been deprecated for SSL certs already so you shouldn't be using certs with SHA1 sigs anymore. Myself, keep an eye on how this develops and start thinking about using SHA-2 but won't be replaing git or existing usage of SHA1 for password hashing anytime soon.
Computing power is plenty, memory even more so. Why not use a very simple hash to detect "might be the same", but then do full comparison, instead of relying on the hash? Cryptographic hash or not - collisions can always happen. Even at low probability, murphy always wins.
Yesterday Linus said we should ignore this. Today, Apache no longer runs and it is one of the foundations of the Internet. Way to go, Linus.
I am trying to read their paper on the sha1 collisions over here: https://shattered.io/static/sh... and there's some unusual equation stuff.
mi = (mi3 mi8 mi14 mi16)1
Can anyone explain that to me in english?
Ah dam. My unicode got munged by the slashdot anti garbage filter. Should have hit preview first!
Anyway the symbol I was referencing is a circular arrow pointing in a clockwise direction that looks like the images on this page: https://en.wikipedia.org/wiki/... . I've never seen that in a paper. What does it mean when it's in an exponent?
It is a bitwise rotation. The direction and number specify if it is a right or left rotation and then how many bits to rotate.
A cryptographic hash function has the properties you mention, plus the fact that it must not be easily reversible and uniformly distribute results over its entire output space.
The later is a property which is not guaranteed by most common checksums.
Thus, when you need a hash function to give a number to use as a handy "nickname" for a collection of data (e.g.: for a hash look-up table. Or for a content-addressable like git to create said addresses for a given content - and thus to give a serial number to a commit. Or apparently also used in SVN to give a simple number to designate commits), it might be a good choice to pick-up a cryptographic hash like SHA-1 because it guarantees you this additional property, which a vanilla checksum could lack.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
The sensible thing for VCS's is to have a list of hashes, in order of preference. (e.g. sha1, sha256, ...). Each time a commit is made where the hash has already been seen, the VCS has a file to compare to. If equal, problem. If not equal, raise alarm to operator, and try the second has on the list against both files (there will be only two, naturally). Each file in the VCS has at least one hash, probably more. In the event of a collision, use additional hashing functions until we can be sure whether files differ. These collisions are rare, so allowing more computation when they turn up is not an issue.
Does the Git usage of SHA-1 *really* cause silent problems? I'm not sure how Git works internally but I was under the impression that it hashes whole objects, like individual source files at least.
The individual objects inside git aren't file.
The individual objects are commits (i.e..: the content of a patchfile, and a few information like pointer to other past commits to which this patch applies).
To make things easier, a handy number designates this commit - this is currently generated by SHA-1.
(Git is a content-addressable platform. You don't access object by name, you access them depending on their content. But instead of using the whole content to access them, you use addresses generated by SHA-1 to access the various blocks.
So to say which are the parent commits to which the patch in a commit applies, you just mention them by using the SHA-1 sum of the content of these commits).
A theoretical attack would be:
- try to generate 2 commits.
one adds a clean piece of code. the other adds a backdoored piece of code.
but both commits hash to the same SHA-1 so they would be considered as "the same content" by git.
Then try to force your target to re-download the whole repo from scratch from your backdoored history (otherwise git will simply ignore the commits with sha-1 sum that it already has - it thinks that it has the same content already).
In practice it's currently not doable.
The only thing that google managed to generate is a pair of block series. Each series contain completely random junk. Both series end-up generating the exact same shasum even if the random junk is different.
- That is exploitable in a PDF (or any other binary format that supports scripting. You could even do it in an EXE) : using the embed scripting present 2 different contents depending on which random junk is present.
- That is not exploitable in a sourcecode commit : you would need a believable explanation for why the random junk is present in the patched source code.
AND you would need a piece of code which reacts differently (normal vs. backdoor) depending on which random junk is present - to be able to pull that unnoticed would require "Underhanded C Contest"-level of ingenuity.
That's it, you only have blocks of random garbage.
Google currently can't produce hashes colliding from arbitrary pieces of data ("Hey google: here's is legit script A, and that's malicious script B. Add a small nonce at the end so they both end-up having the same sha-1sum") ("Actually don't add a nonce, that would be too conspicuous, try to tweak the punctuation in the comments instead")
Also as you mention, further edits will be problematic :
if I edit script A and submit a patch, this patch will be valid, but will completely fail on top of script B.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
A hash function takes an arbitrary string of bits and outputs a string of bits of a fixed length.
A CRC is an example of a hash function and a long CRC would probably be good enough for GIT or most repositories.
First Pre-image resistance - this is a test of the one wayness of the function. Given a hash value it is difficult to find a pre-image that hashes to that value. Given y a string of bits of length hash output length finding X such that h(X) = y is hard.MD-5 and SHA-1 are still resilient against first pre-image attacks
Second Pre-image resistance - given a message X finding a Y such that h(X)=h(Y) is difficult. MD-5 and SHA-1 are still resilient against second pre-image attacks
Collision resistant - It is hard to find two messages X and Y such that h(X) = h(Y). Note the attacker here is free to choose both X and Y. Both MD-5 and SHA-1 are no-longer collision resistant.
So far however the two messages X and Y have to be nearly identical. They have to start and end the same way and the blocks that are changed actually have to be changed and tested together to make sure the hash function internal state changes only in a specific way. I can't create a document that says the rent will be $3000 per month and another that says it will be $30000. (I might create one that says it is $3149.21 and the other $53210.63 per month, like in the PDF example they played with a colour field). Also because of the way the internal state of the hash function changes we now have a way of detecting if someone is feeding a "funny" stream of bits into our hash function and detect this attack with a very low probability of a false positive.
I have no faith in cryptography solutions these, because my impression is that the industry and the government(s) just don't seem to care about providing people with security and privacy.
It is as if the world's governments wants to have a shitty internet to wage war on, and to spy on people.
https://github.com/nneonneo/sha1collider/blob/master/collide.py
In today's world of large botnets and distributed computing 6500+ cpu years + 110 GPU years is not a particularly daunting number.
He speculated as he ashed his blunt and stared bleary-eyed into a gumbo of Wikipedia tabs, a particle collider for only the least plausible threads of conspiratorial thinking, so that he might drop yet another dingleberry-dollop of wordshitting on slashdot DOT org.
It seems obvious to me that a small string sequence could be identical from two differents long original texts. Even it happend, the hash function is NOT the original message, and a collision could happen. It does'nt mean that the two original texts are the same.
Am i right ?
It seems obvious to me that a small string sequence could be identical from two differents long original texts. Even it happend, the hash function is NOT the original message, and a collision could happen. It does'nt mean that the two original texts are the same.
Am i right ?
Yes. A hash is nothing more than a function mapping data of arbitrary size to an output of fixed, smaller size so by definition you can always construct two inputs which yield the same hash. What makes crypto hashes secure is that this is normally very, very hard to do - that is, given a hash generate an input from it.
Aww, that's ugly:
because Subversion uses SHA-1 hash to differentiate commits.
Know why?
Because SVN does not use SHA-1 to "differentiate" commits.