Linus Torvalds On Git's Use Of SHA-1: 'The Sky Isn't Falling' (zdnet.com)
Google's researchers specifically cited Git when they announced a new SHA-1 attack vector, according to ZDNet. "The researchers highlight that Linus Torvald's code version-control system Git 'strongly relies on SHA-1' for checking the integrity of file objects and commits. It is essentially possible to create two Git repositories with the same head commit hash and different contents, say, a benign source code and a backdoored one,' they note." Saturday morning, Linus responded:
First off - the sky isn't falling. There's a big difference between using a cryptographic hash for things like security signing, and using one for generating a "content identifier" for a content-addressable system like git. Secondly, the nature of this particular SHA1 attack means that it's actually pretty easy to mitigate against, and there's already been two sets of patches posted for that mitigation. And finally, there's actually a reasonably straightforward transition to some other hash that won't break the world - or even old git repositories...
The reason for using a cryptographic hash in a project like git is because it pretty much guarantees that there is no accidental clashes, and it's also a really really good error detection thing. Think of it like "parity on steroids": it's not able to correct for errors, but it's really really good at detecting corrupt data... if you use git for source control like in the kernel, the stuff you really care about is source code, which is very much a transparent medium. If somebody inserts random odd generated crud in the middle of your source code, you will absolutely notice... It's not silently switching your data under from you... And finally, the "yes, git will eventually transition away from SHA1". There's a plan, it doesn't look all that nasty, and you don't even have to convert your repository. There's a lot of details to this, and it will take time, but because of the issues above, it's not like this is a critical "it has to happen now thing".
In addition, ZDNet reports, "Torvalds said on a mailing list yesterday that he's not concerned since 'Git doesn't actually just hash the data, it does prepend a type/length field to it', making it harder to attack than a PDF... Do we want to migrate to another hash? Yes. Is it game over for SHA-1 like people want to say? Probably not."
The reason for using a cryptographic hash in a project like git is because it pretty much guarantees that there is no accidental clashes, and it's also a really really good error detection thing. Think of it like "parity on steroids": it's not able to correct for errors, but it's really really good at detecting corrupt data... if you use git for source control like in the kernel, the stuff you really care about is source code, which is very much a transparent medium. If somebody inserts random odd generated crud in the middle of your source code, you will absolutely notice... It's not silently switching your data under from you... And finally, the "yes, git will eventually transition away from SHA1". There's a plan, it doesn't look all that nasty, and you don't even have to convert your repository. There's a lot of details to this, and it will take time, but because of the issues above, it's not like this is a critical "it has to happen now thing".
In addition, ZDNet reports, "Torvalds said on a mailing list yesterday that he's not concerned since 'Git doesn't actually just hash the data, it does prepend a type/length field to it', making it harder to attack than a PDF... Do we want to migrate to another hash? Yes. Is it game over for SHA-1 like people want to say? Probably not."
EOF
Time for Torvalds to drop the attitude and fix this
Hitler killed the Jews. In **mass**.
CAPTCHA: Steve Bannon
LINUS RESIGN NOW
Cue astroturf invective from aging Microsoft lifers, bitter that they backed the wrong horse.
3... 2... 1...
When all you have is a hammer, every problem starts to look like a thumb.
Both happened in 2005. And SHA-2 was published 4 years earlier. So yes, the sky is not falling, and git can be made secure, but it also wasn't really wise to use SHA-1 when git was implemented, first.
BTW: At the company I work for, we already replaced SHA-2 with SHA-3 for security reasons. Better safe than sorry.
We're at SHA-7 so we won't have to change for a while yet.
Was never intended for security. If you want insurance of source code origin you use signed commits as the kernel does.
Your comment might have been funny if at least their was some algorithm called "SHA-7", but there isn't.
So yes, the sky is not falling, and git can be made secure, but it also wasn't really wise to use SHA-1 when git was implemented, first.
Why not? As the summary quotes Torvalds, this is simply used to guard against corruption. Using something stronger would have given people the wrong idea about what protections it offered. Sort of like how people who don't understand how HTTPS really works think "as long as I see the lock icon, I am OK and my transaction is safe". Even if it hadn't given people the wrong idea, it would have added computational overhead for no reason.
If your repository needs to be cryptographically verifiable, then you use gpg to sign your commits with your 8192-bit RSA key, or your EC key (choose the bit length you like), and don't forget to use SHA-256 or SHA-512 for the signature hash. Now you have Git's nicely sophisticated yet still quite performant "parity on steroids" and you also have all the cryptographic goodness that comes with properly signing commits. You don't depend upon on Git's parity implementation for that.
BTW: At the company I work for, we already replaced SHA-2 with SHA-3 for security reasons. Better safe than sorry.
If you replaced SHA-2 with SHA-3 for signature hashes on SSL/TLS certificates and for GPG signing, then that's great. If you think that SHA-3 somehow magically makes everything more secure for verifying data have not been modified in transit (e.g., installer gets corrupted while being downloaded) because you replaced all the SHA-2 hashes with SHA-3 hashes on the installer download page which is served over insecure HTTP, then I suspect you may not fully understand what threats you are trying to protect against.
Linus really has no sense of security. He'll use whatever is expedient over what's wise. It's a shame really.
This SHA goes up to SHA-11.
Ours goes to SHA-11.
I just SHAT so I don't need to change my pants yet
At Paddy's, we use SHA-Dynasty.
Linus really has no sense of security. He'll use whatever is expedient over what's wise. It's a shame really.
How about describing the attack vector?
The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.
By the time you could do something like trash it with crafted content, you could screw things over in less difficult ways...
On the other hand, gpg still uses SHA1 for key fingerprints per the standard, which seems like that would be a much bigger risk. You can use other more secure hashes for digests, but fingerprint ids are SHA1, which was deemed inadequate for key fingerprints in X509...
XML is like violence. If it doesn't solve the problem, use more.
If you read Linus' whole statement, you will also find the part where he writes "yes, in git we also end up using the SHA1 when we use "real" cryptography for signing the resulting trees, so the hash does end up being part of a certain chain of trust. So we do take advantage of some of the actual security features of a good cryptographic hash, and so breaking SHA1 does have real downsides for us."
Regarding our use of SHA-3: We use crypographic hash-sums as keys to cached data items that are not permitted for everyone to request. Thus we need to make sure that the cache keys cannot be "guessed" (like from knowing a valid cache key for a similar data item).
Why git didn't used sha256 from the begining? it has been released 4years after sha256, and sha1 weakness was felt.
Linus Torvalds doesn't seem to be the kind of guy that see far away...
Your comment might have been funny if at least their was some algorithm called "SHA-7", but there isn't.
At least Mozilla isn't in charge of updating this, otherwise we'd get SHA-1.0.1, 1.0.2, 1.0.3, 1.1.0 ... bumped every six weeks.
It must have been something you assimilated. . . .
Care to substantiate that incredible claim?
When all you have is a hammer, every problem starts to look like a thumb.
Linus really has no sense of security. He'll use whatever is expedient over what's wise. It's a shame really.
How about describing the attack vector?
The attack vector is straight out of OP's fuzzy behind.
When all you have is a hammer, every problem starts to look like a thumb.
Linus really has no sense of security. He'll use whatever is expedient over what's wise. It's a shame really.
How about describing the attack vector?
Well, the "practical" attack, described here required:
This attack required over 9,223,372,036,854,775,808 SHA1 computations. This took the equivalent processing power as 6,500 years of single-CPU computations and 110 years of single-GPU computations.
So, Step 1: Get a super-computer ... or rent a fuck-tonne of capacity at Amazon EC2 ...
It must have been something you assimilated. . . .
irc.easynews.com Host what the hOuse diseases. The De3per into the addresses will and promotes our was in the tea I
Even that wouldn't suffice, as your "corrupted copy" probably wouldn't compile.
I think we've pushed this "anyone can grow up to be president" thing too far.
This would be much harder that the PDF example by Google (which is quite impressive though). You'd need to generate a zlib compressed commit with a specific hash which references the correct parent commit's own hash, has consistent naming and dates and yields somehow valid code.
This was said before, but again: the point of having SHA1 hashes in git is not security but to ensure reasonable uniqueness of objects (commit, trees, blobs or tags). SHA1 is (was?) a rather strong crypto hash so you do get some of its benefits in that regard though.
This attack required over 9,223,372,036,854,775,808 SHA1 computations. This took the equivalent processing power as 6,500 years of single-CPU computations and 110 years of single-GPU computations.
So, Step 1: Get a super-computer ... or rent a fuck-tonne of capacity at Amazon EC2 ...
Yeah, and don't forget that the (millions) of different hashed programs with the malevolent code will also need to replace the good copy. And compil And install
Oy, such a lot of effort. And yup, there is a non-zero chance of this happening to me. Don't think I'll worry much though.
The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.
Linus really has no sense of security. He'll use whatever is expedient over what's wise. It's a shame really.
How about describing the attack vector?
The attack vector is straight out of OP's fuzzy behind.
It's like its possible, for the most unlikely possible way you'd ever want to infect a computer. I'm more worried about getting hit by a meteor. Which is to say, not at all.
The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.
Like how Heartbleed was immediately noticed and didn't sit there for two years.
This is one of those things Open Source proponents keep saying that isn't actually true, because people aren't really auditing the code.
There are medications for autism, Rainman, or you could just shoot yourself.
If you think that SHA-3 somehow magically makes everything more secure for verifying data have not been modified in transit (e.g., installer gets corrupted while being downloaded) because you replaced all the SHA-2 hashes with SHA-3 hashes on the installer download page which is served over insecure HTTP, then I suspect you may not fully understand what threats you are trying to protect against.
The point is that if you're trying to use a hash instead of a checksum, it'll actually work as advertised. If you only care about random bit flips CRC32 will work very well and be much faster than MD5 or SHA-1. If you're doing major overkill you might not care that a hash doesn't function as a hash because you don't actually need a hash but that's no reason to use a bad hash. You should either use a good hash or use a lesser solution that doesn't pretend to make promises it can't hold.
Live today, because you never know what tomorrow brings
BTW: At the company I work for, we already replaced SHA-2 with SHA-3 for security reasons. Better safe than sorry.
This is a misunderstanding of the purpose of SHA-3. It was not designed to be a "successor" to SHA-2, but an alternative. There is no evidence that SHA-2 is insecure, or even that SHA-3 is more secure than SHA-2. It was simply picked because, at its core, the design is fundamentally different from SHA-2 so it is unlikely that both will be broken at the same time. There is no reason to move from SHA-2 to SHA-3 at this point.
Linus Torvalds = Fucktard.
Such troll.
Very flamebait.
Wow.
It seems every other day another large website gets completely owned by hackers. When the philosophy is "To hell with security!", people shouldn't adopt that OS for their critical use-cases...
There is no evidence that SHA-2 is insecure, or even that SHA-3 is more secure than SHA-2.
I'm not entirely sure, but I think you just said that ffkom ( 3519199 )'s company is practicing security-through-obscurity and/or cargo-cult security?
p.s. Why wasn't SHA-3 called SHA-2b?
Probably because it was the result of a 6 year long NIST competition to design a hash function that was different from SHA-2. Would have been a bit glib to just name it SHA-2b.
Or SHA-not-2b
Why is Linus a bitch?
There's a big difference between using a cryptographic hash for things like security signing, and using one for generating a "content identifier"
is really a non sequitur. It's also a truism. Of course, there is a difference. If all you cared about was a "content identifier", you'd use CRC. But the reality is that you really want a secure content identifier (the one which does not provide a vector of attack on your system through spoofing of identifier through a simple calculation). Without it, you have a system in which it is trivial to create a haystack in which any one particular piece of content becomes a need to hide. All you need is to modify as many pieces of content as possible to collide with the one you want to be difficult to find.
The real answer he should have given is that any content which incorporates its md5 becomes unassailable because there is no known vector of attack to produce simultaneous md5 and sha1 collisions.
Any guest worker system is indistinguishable from indentured servitude.
Linus has a smelly vagina
and yields somehow valid code.
Comments. You have infinite tries to get it right.
CLI paste? paste.pr0.tips!
BTW: At the company I work for, we already replaced SHA-2 with SHA-3 for security reasons. Better safe than sorry.
In my country, we kicked out the Shas and migrated to Ayatollahs, who have a unique uniqueness guarantee!
Not really. You can't just put random binary content on a comment.
That mailing list is full of people trying to 'fix' their use of sha-1 in a way that allows them to *continue* using it.
That's handwavy bullshit. Sha1 is broken. Regardless of if you can fix it or not, you just move off of it and avoid the problem altogether, and you look better for it.
ALL new repos should initialize under sha-3 by default.
The 'git' tool itself should remain backwards compatible with the old sha-1 hash.
No one really cares what you do with sha-1 to 'fix' it or blend it with sha-3 or whatever the hell else with it. That's your call.
But you must initialize all new repos with sha-3.
With that, the migration you want will happen on it's own.
and yields somehow valid code.
Comments. You have infinite tries to get it right.
But not infinite time, unless you're immortal - in which case you should have better things to do.
It must have been something you assimilated. . . .
hahaha nice joke. Thanks for the laugh, made my day.
Our codebase is written in hand-optimized SHA-1 assembly language so we don't want to go to the effort of porting it to the SHA-3. There's some guy down the hall who says we could write the kernel in C instead, but nobody listens to him.
-- Ed Avis ed@membled.com
As a hash function, SHA-1 was perfectly adequate for how Git works.
All Git uses SHA-1 for internally is to hash the contents of a file to turn it into a unique number. SHA-1 is a nice fast algorithm to do that, and 160 bits offers plenty of space to uniquely identify stuff. It's so good that all the other things are hashed like commits and such and then a Git repository is merely a collection of hashes. A hash at the top we call "head" which contains the SHA-1 hash representing a commit object (it's the SHA-1 hash of said object, actually). That commit object points to a few other objects, the commit before it (the old head) and the SHA1 hash of the tree object. The tree object contains a list of SHA1 hashes that represent files in the source tree, specifically the list of changed files.
What happens when there's a collision? Interestingly enough, not much. If you're trying to check in a file that collides, chances are git won't let you because a file already in the repo has the same hash. If you force the matter (you can chop your history down so a conflict isn't immeidately apparent), then remote repos that pull from you or you push to will simply ignore the conflicting file as they will just assume it references the file already in the repo (you can check out an old version and check it back in - guess what? The hashes will be identical!. You often do this if you revert).
Now, perhaps Git could be made to handle the issue a bit more gracefully if you do happen to check in a file that differs but hashes the same, but in reality it's a rare occurance. Even Linux itself which has a huge history hasn't experienced the issue.
If you want fun, see WebKit, because SVN uses SHA-1 internally and someone corrupted the master repo checking in a test case consisting of two files with the same hash (the test case was to test for SHA-1 collisions in WebKit caching code). Ironically, that repo is offline at the moment.
SHADY-Nasty? I'm not sure that sounds very trustworthy.
Ezekiel 23:20
That really annoys me no end. There is some gradual improvement in a specific attack, expected by everybody that has a clue and not seen as anything dramatic by the same people. And immediately a horde of people with no understanding of crypto swoop in an declare the sky to be falling and all uses of this thing are now invalid. This is really just utterly pathetic.
Example: I have to constantly defend the use of SHA1 for password hashing. (Sure, something like pbkdf2 or Argon2 should come later if the password may be low-entropy and gets stored. That is not always the case.) The thing is that password hashing has the purpose of preventing the hash being turned into a password again. Collision attacks have no impact on that at all. For a collision attack you would need to know the password and then you could find a second one with the same hash (or rather with the two-sided, much easier, variant you can find two passwords that map to the same hash). Now, these nil-whits completely overlook that the situation when using hashes in signatures always is that you already have what gets signed, which is completely different to the password situation. Still they claim "SHA1 is broken!". No it is not. It is broken for some specific _different_ application.
Why so many non-experts think they can voice a qualified opinion about a very hard mathematical topic is beyond me.
What Linus says here is exactly right and it is a statement by an expert. All those criticizing him are basically people that can put on a band-aid telling a brain-surgeon how to do his work. They just do not get it at all.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Indeed. But seeing that would require some actual understand of the issue at hand, instead of a simple-minded "newer must be better".
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
There might also be implementation specific attack vectors.
For example the zlib part might silently ignore arbitrary binary data padded to the end of it.
Torvalds said on a mailing list yesterday that he's not concerned since 'Git doesn't actually just hash the data, it does prepend a type/length field to it', making it harder to attack than a PDF
I didn't get that one. PDF has a preamble too, right? Which the researchers were able to reproduce just fine in both "shattered" PDFs.
Yes you can. At least the compilers i know let you get away with it (you obviously have to strip possible comment terminators from the data but that goes without saying). As an example, I've just appended 1M worth of data from /dev/urandom (after sed(1)ing "*/" away) at the end of a hello world program. Compiled fine.
But the "random binary data" is a straw man anyway, because why would you even have to use random binary data? It's not like you don't have infinite tries with random printable ASCII.
CLI paste? paste.pr0.tips!
But as of SHA1 being "broken", this is now considered possible in reasonable time. Currently it requires substantial computing power. Soon, it won't. Or might not anyway.
CLI paste? paste.pr0.tips!
I still get a bunch of back talk by people that insist we still use MD5. Something that has been broken for almost a decade. Sometimes we have to, like when dealing with Microsoft servers. They still have to use MD5 because Microsoft doesn't care about security. They even recommend turning off FIPS because they say there are better algorithms and such... that they DON'T EVEN OFFER! So you're using stuff that isn't even FIPS. Often that's an export grade or you can break it in real time encryption.
Don't understand, why not just say -OK and fix it right now?
Ok, so now you've modified a comment.
What's the actual payoff for your effort?
If you only care about random bit flips CRC32 will work very well and be much faster than MD5 or SHA-1.
Well, not exactly.
- MD5 and SHA-1 have fast hardware implementation on some CPUs. CRC32 won't necessarily be a huge performance gain.
SHA-1 is used a bit more than a simple glorified checksum in GIT.
It is also used to give a handy number by which you designates commits, etc.
(i.e.: to compute a hash - e.g.: as would also be used in a hash look-up table).
That requires good output uniformity.
In other words you'd need a hashing function that "spreads" its output accross the whole output domain.
(to give an over-simplified examples: if due to a poor design, all patches ended-up having hashes that begins with the hex number "9", that would be a poor hashing function for these needs. If you used it in a hash lookup table, one part of the table would be over filled, while other would be still empty)
Cryptographic hashing functions offer these guarantees among lots of others. CRC32 doesn't, and several of the other checksums that were quickly designed for speed have also been detected not to offer these.
At this situation, a programmer can choose two paths :
- Some coder would try design their own new hashing which offers both good speed and the important properties (e.g.: That's exactly what LZ4's Yann Collet did, and created xxHash64. It's not a cryptographic hash, but at least offers all the properties that cyann needed)
- Other would instead jump to a quick'n'dirty solution, and go for the major overkill: take a cryptographic hash (e.g.: And that's what Linus Torvalds did. He's a lazy git. He knows that a cryptographic hash would provide all the properties he needs. SHA-1 is one that was popular back then, had even some hardware implementation. So he picked it and didn't think much about it. It offers all the properties Linus needed for git. It also offered much more but Linus didn't give a fuck about that. Though it doesn't offer security (anymore. specially since the google proof of concept) but that's something that Linus doesn't care and didn't even bother to check (as mentioned SHA-1 was already suspected backthen, and serious cryptographic usage relied on SHA-2 instead), relying instead on signed repositories if security is needed).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
The compressed data isn't used to generated the blob hash, the SHA1 hash is generated by adding blob + + filedata, such as like this:
"blob" + + "\0" +
It appears the amount of computation is the same as the attack, it just needs to account for the extra data at the front of the data.
FYI: Bellow are the hashes for the offending files:
ba9aaa145ccd24ef760cf31c74d8f7ca1a2e47b0 shattered-1.pdf
b621eeccd5c7edac9b7dcba35a8d5afd075e24f2 shattered-2.pdf
"Modified" a comment? I think you're not following.
CLI paste? paste.pr0.tips!
Personally I use SHA-5, cause, I mean why use 3 when you could use 5. Only idiots use 3.
We use ROT-13 so our data is 10 order of magnitude safer than yours!
We even use it twice to double the safety!
It looks like the first NIST-validated SHA-256 open source implementations didn't start appearing until 2005, so using SHA-1 makes some sense.
http://csrc.nist.gov/groups/ST...