Linus Torvalds On Git's Use Of SHA-1: 'The Sky Isn't Falling' (zdnet.com)
Google's researchers specifically cited Git when they announced a new SHA-1 attack vector, according to ZDNet. "The researchers highlight that Linus Torvald's code version-control system Git 'strongly relies on SHA-1' for checking the integrity of file objects and commits. It is essentially possible to create two Git repositories with the same head commit hash and different contents, say, a benign source code and a backdoored one,' they note." Saturday morning, Linus responded:
First off - the sky isn't falling. There's a big difference between using a cryptographic hash for things like security signing, and using one for generating a "content identifier" for a content-addressable system like git. Secondly, the nature of this particular SHA1 attack means that it's actually pretty easy to mitigate against, and there's already been two sets of patches posted for that mitigation. And finally, there's actually a reasonably straightforward transition to some other hash that won't break the world - or even old git repositories...
The reason for using a cryptographic hash in a project like git is because it pretty much guarantees that there is no accidental clashes, and it's also a really really good error detection thing. Think of it like "parity on steroids": it's not able to correct for errors, but it's really really good at detecting corrupt data... if you use git for source control like in the kernel, the stuff you really care about is source code, which is very much a transparent medium. If somebody inserts random odd generated crud in the middle of your source code, you will absolutely notice... It's not silently switching your data under from you... And finally, the "yes, git will eventually transition away from SHA1". There's a plan, it doesn't look all that nasty, and you don't even have to convert your repository. There's a lot of details to this, and it will take time, but because of the issues above, it's not like this is a critical "it has to happen now thing".
In addition, ZDNet reports, "Torvalds said on a mailing list yesterday that he's not concerned since 'Git doesn't actually just hash the data, it does prepend a type/length field to it', making it harder to attack than a PDF... Do we want to migrate to another hash? Yes. Is it game over for SHA-1 like people want to say? Probably not."
The reason for using a cryptographic hash in a project like git is because it pretty much guarantees that there is no accidental clashes, and it's also a really really good error detection thing. Think of it like "parity on steroids": it's not able to correct for errors, but it's really really good at detecting corrupt data... if you use git for source control like in the kernel, the stuff you really care about is source code, which is very much a transparent medium. If somebody inserts random odd generated crud in the middle of your source code, you will absolutely notice... It's not silently switching your data under from you... And finally, the "yes, git will eventually transition away from SHA1". There's a plan, it doesn't look all that nasty, and you don't even have to convert your repository. There's a lot of details to this, and it will take time, but because of the issues above, it's not like this is a critical "it has to happen now thing".
In addition, ZDNet reports, "Torvalds said on a mailing list yesterday that he's not concerned since 'Git doesn't actually just hash the data, it does prepend a type/length field to it', making it harder to attack than a PDF... Do we want to migrate to another hash? Yes. Is it game over for SHA-1 like people want to say? Probably not."
Pretty sure I take his opinion over yours. All he did was help create the software that the majority of the entire internet runs on.
It's spelt "en masse".
It's spelled "in synagogue"
Time for Torvalds to drop the attitude and fix this
As far as I can tell, this is a non-cryptographic use of hashing. I've used MD5 in plenty of places just to get a fast (hardware-accelerated) unique ID for a chunk of data, or as a checksum. No security purpose at all.
Socialism: a lie told by totalitarians and believed by fools.
Both happened in 2005. And SHA-2 was published 4 years earlier. So yes, the sky is not falling, and git can be made secure, but it also wasn't really wise to use SHA-1 when git was implemented, first.
BTW: At the company I work for, we already replaced SHA-2 with SHA-3 for security reasons. Better safe than sorry.
We're at SHA-7 so we won't have to change for a while yet.
As far as I can tell, this is a non-cryptographic use of hashing.
It's Non-Cryptographic until you start using GIT for alternative use cases which it was not designed for.
Code your developers write and store in Git should be trusted data, in the sense you are not trying to attack the system.
And code you accept from third parties should be reviewed by humans first to check that it is non-malicious.
Since the SHA-1 collision attack can be detected; It seems like it would be also a simple patch to Git to check before Adding or Updating a file in its repository if the file contains a SHA1 attack, And if so, spit out an error instead of saving the commit.
I don't think it's fair to call this an appeal to authority. If the poster had stated some reason why Torvald's is wrong, and the poster had just replied "He's Linus, so he's right" that would qualify. But he didn't, dismissing some lame, anonymous opinion on the grounds that it's an authority worth trusting is something else, and perfectly appropriate here.
No. He's absolutely right - there's no reason to panic and rush, specially when they've already been working on switching to a new hash.
I never knew he created FreeBSD. ( the real workhorse of the internet )
You think hashes are unique. You're insane.
Was never intended for security. If you want insurance of source code origin you use signed commits as the kernel does.
Your comment might have been funny if at least their was some algorithm called "SHA-7", but there isn't.
So yes, the sky is not falling, and git can be made secure, but it also wasn't really wise to use SHA-1 when git was implemented, first.
Why not? As the summary quotes Torvalds, this is simply used to guard against corruption. Using something stronger would have given people the wrong idea about what protections it offered. Sort of like how people who don't understand how HTTPS really works think "as long as I see the lock icon, I am OK and my transaction is safe". Even if it hadn't given people the wrong idea, it would have added computational overhead for no reason.
If your repository needs to be cryptographically verifiable, then you use gpg to sign your commits with your 8192-bit RSA key, or your EC key (choose the bit length you like), and don't forget to use SHA-256 or SHA-512 for the signature hash. Now you have Git's nicely sophisticated yet still quite performant "parity on steroids" and you also have all the cryptographic goodness that comes with properly signing commits. You don't depend upon on Git's parity implementation for that.
BTW: At the company I work for, we already replaced SHA-2 with SHA-3 for security reasons. Better safe than sorry.
If you replaced SHA-2 with SHA-3 for signature hashes on SSL/TLS certificates and for GPG signing, then that's great. If you think that SHA-3 somehow magically makes everything more secure for verifying data have not been modified in transit (e.g., installer gets corrupted while being downloaded) because you replaced all the SHA-2 hashes with SHA-3 hashes on the installer download page which is served over insecure HTTP, then I suspect you may not fully understand what threats you are trying to protect against.
Linus really has no sense of security. He'll use whatever is expedient over what's wise. It's a shame really.
This SHA goes up to SHA-11.
Ours goes to SHA-11.
I just SHAT so I don't need to change my pants yet
At Paddy's, we use SHA-Dynasty.
Linus really has no sense of security. He'll use whatever is expedient over what's wise. It's a shame really.
How about describing the attack vector?
The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.
By the time you could do something like trash it with crafted content, you could screw things over in less difficult ways...
On the other hand, gpg still uses SHA1 for key fingerprints per the standard, which seems like that would be a much bigger risk. You can use other more secure hashes for digests, but fingerprint ids are SHA1, which was deemed inadequate for key fingerprints in X509...
XML is like violence. If it doesn't solve the problem, use more.
You don't understand how hashing works then. MD5 as a checksum is completely reasonable - and fast.
Well, it is not the software's use non-cryptographic code for cryptographic applications.
Anyway, git's been planning on a new commit hash for a while.
Much more unique than checksums.
Why would Astrosmurf care about SHA-1?
If you read Linus' whole statement, you will also find the part where he writes "yes, in git we also end up using the SHA1 when we use "real" cryptography for signing the resulting trees, so the hash does end up being part of a certain chain of trust. So we do take advantage of some of the actual security features of a good cryptographic hash, and so breaking SHA1 does have real downsides for us."
Regarding our use of SHA-3: We use crypographic hash-sums as keys to cached data items that are not permitted for everyone to request. Thus we need to make sure that the cache keys cannot be "guessed" (like from knowing a valid cache key for a similar data item).
It's spelled "spelt".
That's what the Pope told Galileo, in reference to Aristotle.
"...the software's fault if you use..."
Anyway, want strong security? Use signed commits.
Why git didn't used sha256 from the begining? it has been released 4years after sha256, and sha1 weakness was felt.
Linus Torvalds doesn't seem to be the kind of guy that see far away...
Oy vey, it's spilled schmaltz!
Your comment might have been funny if at least their was some algorithm called "SHA-7", but there isn't.
At least Mozilla isn't in charge of updating this, otherwise we'd get SHA-1.0.1, 1.0.2, 1.0.3, 1.1.0 ... bumped every six weeks.
It must have been something you assimilated. . . .
Care to substantiate that incredible claim?
When all you have is a hammer, every problem starts to look like a thumb.
Linus really has no sense of security. He'll use whatever is expedient over what's wise. It's a shame really.
How about describing the attack vector?
The attack vector is straight out of OP's fuzzy behind.
When all you have is a hammer, every problem starts to look like a thumb.
Linus really has no sense of security. He'll use whatever is expedient over what's wise. It's a shame really.
How about describing the attack vector?
Well, the "practical" attack, described here required:
This attack required over 9,223,372,036,854,775,808 SHA1 computations. This took the equivalent processing power as 6,500 years of single-CPU computations and 110 years of single-GPU computations.
So, Step 1: Get a super-computer ... or rent a fuck-tonne of capacity at Amazon EC2 ...
It must have been something you assimilated. . . .
You think hashes are unique. You're insane.
A 128-bit hash (from a common library or hardware implementation) is more likely to be unique than any code you write in the attempt to create a unique ID. Why? Because the risk of an accidental 128-bit hash collision within a pool of, say 1 billion items, is lower than the risk of a bug in your code.
If you doubt it, ask yourself this: what's the occurrence of bugs, per million lines of code, in high quality software? I bet there's a better than 1:1 million chance of a bug in any code you write to generate a unique ID. But even if you're the greatest coder ever, in all of time and space, I bet the risk of a bug is higher than 1:1 trillion.
Socialism: a lie told by totalitarians and believed by fools.
Why would Astrosmurf care about SHA-1?
Astrosmurf cares about any post about Linus and otherwise probably has trouble tieing his (her/its) shoes, let alone Knowing anything about higher math or security.
When all you have is a hammer, every problem starts to look like a thumb.
His arguments as to why it isn't serious are reasonable. If you want your arguments to be considered, you need to make some.
I think we've pushed this "anyone can grow up to be president" thing too far.
/.ers must fight back against the "pernicious anti-intellectualism" and distrust of experts that is demonstrated by the parent poster.
Even that wouldn't suffice, as your "corrupted copy" probably wouldn't compile.
I think we've pushed this "anyone can grow up to be president" thing too far.
"Appeal to authority" is not a logical fallacy when the authority is actually an expert in the field. You'd have to be a moron not to value the opinion of a highly accomplished programmer over an AC troll on slashdot.
Citation?
/.ers must fight back against the "pernicious anti-intellectualism" and distrust of experts that is demonstrated by the parent poster.
Don't trust, verify. You're a pseudo intellectual fool who believes expert opinions don't require proof because you're lazy and stupid and proof is hard.
This would be much harder that the PDF example by Google (which is quite impressive though). You'd need to generate a zlib compressed commit with a specific hash which references the correct parent commit's own hash, has consistent naming and dates and yields somehow valid code.
This was said before, but again: the point of having SHA1 hashes in git is not security but to ensure reasonable uniqueness of objects (commit, trees, blobs or tags). SHA1 is (was?) a rather strong crypto hash so you do get some of its benefits in that regard though.
This attack required over 9,223,372,036,854,775,808 SHA1 computations. This took the equivalent processing power as 6,500 years of single-CPU computations and 110 years of single-GPU computations.
So, Step 1: Get a super-computer ... or rent a fuck-tonne of capacity at Amazon EC2 ...
Yeah, and don't forget that the (millions) of different hashed programs with the malevolent code will also need to replace the good copy. And compil And install
Oy, such a lot of effort. And yup, there is a non-zero chance of this happening to me. Don't think I'll worry much though.
The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.
160 bits numbnutz! This is why anyone here that doesn't already know sha1 is no better than a trump whitehouse is literally insane!
https://regmedia.co.uk/2015/07...
Linus really has no sense of security. He'll use whatever is expedient over what's wise. It's a shame really.
How about describing the attack vector?
The attack vector is straight out of OP's fuzzy behind.
It's like its possible, for the most unlikely possible way you'd ever want to infect a computer. I'm more worried about getting hit by a meteor. Which is to say, not at all.
The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.
As Torvalds explained there is a difference in using a hash as a checksum (like Git) and using a hash for security signing. Security signing requires that the hash be collision free but collisions could occur in checksums as the collisions are rare especially since they are ways to mitigate the risk.
Well, there's spam egg sausage and spam, that's not got much spam in it.
Appeal to authority makes a weak argument, but does not invalidate it. You have yet to provide even a weak argument to the contrary.
All my liberal friends think I'm a conservative, all my conservative friends think I'm a liberal.
Nothing about your statement makes *any* sense. "Appeal to authority" in itself is not an argument, weak or strong.
Like how Heartbleed was immediately noticed and didn't sit there for two years.
This is one of those things Open Source proponents keep saying that isn't actually true, because people aren't really auditing the code.
There are medications for autism, Rainman, or you could just shoot yourself.
If you think that SHA-3 somehow magically makes everything more secure for verifying data have not been modified in transit (e.g., installer gets corrupted while being downloaded) because you replaced all the SHA-2 hashes with SHA-3 hashes on the installer download page which is served over insecure HTTP, then I suspect you may not fully understand what threats you are trying to protect against.
The point is that if you're trying to use a hash instead of a checksum, it'll actually work as advertised. If you only care about random bit flips CRC32 will work very well and be much faster than MD5 or SHA-1. If you're doing major overkill you might not care that a hash doesn't function as a hash because you don't actually need a hash but that's no reason to use a bad hash. You should either use a good hash or use a lesser solution that doesn't pretend to make promises it can't hold.
Live today, because you never know what tomorrow brings
BTW: At the company I work for, we already replaced SHA-2 with SHA-3 for security reasons. Better safe than sorry.
This is a misunderstanding of the purpose of SHA-3. It was not designed to be a "successor" to SHA-2, but an alternative. There is no evidence that SHA-2 is insecure, or even that SHA-3 is more secure than SHA-2. It was simply picked because, at its core, the design is fundamentally different from SHA-2 so it is unlikely that both will be broken at the same time. There is no reason to move from SHA-2 to SHA-3 at this point.
Yeah but Aristotle was a fucking idiot.
We stand on the shoulders of giants. Those giants are standing on Aristotle's foot.
Only crack the nuts that crack. You don't put the ones that don't crack in the sack.
Such troll.
Very flamebait.
Wow.
It seems every other day another large website gets completely owned by hackers. When the philosophy is "To hell with security!", people shouldn't adopt that OS for their critical use-cases...
There is no evidence that SHA-2 is insecure, or even that SHA-3 is more secure than SHA-2.
I'm not entirely sure, but I think you just said that ffkom ( 3519199 )'s company is practicing security-through-obscurity and/or cargo-cult security?
p.s. Why wasn't SHA-3 called SHA-2b?
If you bothered to read TFS this is not about cryptography at all.
Probably because it was the result of a 6 year long NIST competition to design a hash function that was different from SHA-2. Would have been a bit glib to just name it SHA-2b.
Pretty sure I take his opinion over yours. All he did was help create the software that the majority of the entire internet runs on.
Creating an OS doesn't make one a cryptography expert.
You ignorant little shitbag, you still don't have a fucking clue; this is not a cryptography issue, that's why Git doesn't need to fix it.
Once again we have another ignorant little piss-ant who thinks their ignorance is just as good as education.
IIUC, in this particular context, it's not really a security issue.
Il n'y a pas de Planet B.
Linus Torvalds = Fucktard.
Aww, did Linus tell you do go fuck yourself because you crap code you tried to get into the kernel?
You stated above that appeal to authority is a logical fallacy. A logical fallacy is a classification of reasoning typically structured as an argument.
I don't see what is so hard to understand.
All my liberal friends think I'm a conservative, all my conservative friends think I'm a liberal.
He wrote a a small portion of a now popular kernel
He also wrote Git, which is topic being discussed.
You are a fool to take his word on little more than faith on anything resembling a security issue.
1. This is not a security issue.
2. Linus explained his reasoning very clearly, so nothing is being taken on "faith".
I'm pretty sure that Astrosmurf is just an APK sockpuppet.
See: Internet, all of it.
Or SHA-not-2b
FUCK LlNUS; FUCK LlNUX; FUCK GlT; AND FUCK YOU!
Your cheque from Microsoft is in the mail but it will not be the full amount that was agreed on since you have taken the subtlety out of your quote by shouting.
There's a big difference between using a cryptographic hash for things like security signing, and using one for generating a "content identifier"
is really a non sequitur. It's also a truism. Of course, there is a difference. If all you cared about was a "content identifier", you'd use CRC. But the reality is that you really want a secure content identifier (the one which does not provide a vector of attack on your system through spoofing of identifier through a simple calculation). Without it, you have a system in which it is trivial to create a haystack in which any one particular piece of content becomes a need to hide. All you need is to modify as many pieces of content as possible to collide with the one you want to be difficult to find.
The real answer he should have given is that any content which incorporates its md5 becomes unassailable because there is no known vector of attack to produce simultaneous md5 and sha1 collisions.
Any guest worker system is indistinguishable from indentured servitude.
Linus has a smelly vagina
How do you know it wasn't Linus posting as AC?
Any guest worker system is indistinguishable from indentured servitude.
You're not helping your point any. Hashes (like MD5, which is 128 bits) are plenty unique.
Socialism: a lie told by totalitarians and believed by fools.
I never knew he created FreeBSD. ( the real workhorse of the internet )
You mean the dark horse, don't you? I heard there are a few FreeBSD web servers still running. Nothing against FreeBSD, mind you, some of Linux's core devs were raised on daemon milk.
When all you have is a hammer, every problem starts to look like a thumb.
and yields somehow valid code.
Comments. You have infinite tries to get it right.
CLI paste? paste.pr0.tips!
Torvalds is not a cryptographer.
BTW: At the company I work for, we already replaced SHA-2 with SHA-3 for security reasons. Better safe than sorry.
In my country, we kicked out the Shas and migrated to Ayatollahs, who have a unique uniqueness guarantee!
Not really. You can't just put random binary content on a comment.
Very true. But he's our fucktard.
As far as I can tell, this is a non-cryptographic use of hashing.
Git uses sha1 hashes to identify everything.
A (possiblly signed) tag references a commit by hash
A commit references a tree by hash
A tree references a list of files and subtrees by hash
If a commit you fetch references hashes you already have the files for in your local git tree they will not be re-fetched, the existing ones will simply be used.
The whole point of git is to be distributed, so it should be safe to fetch commits from untrusted sources, inspect them and throw them away without worrying that they will change the meaning of commits you later fetch from trusted sources. It should be safe to download commits over an insecure connection and then verify the commit hash (either by a signed tag or by checking out of band) to ensure that the commit hasn't been tampered with.
The latter part of linus's mail is quite a well-reasoned argument as to why the current attack on SHA1 isn't too big a deal for source code repositories.
If a "distinct chosen prefix" collision attack shows up then the risk gets much higher. For MD5 it took about 2 years to go from a basic collision attack to a distinct chosen prefix one.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Cue astroturf invective from aging Microsoft lifers, bitter that they backed the wrong horse.
3... 2... 1...
Cue astroturf invective from aging Microsoft lifers, bitter that they're stuck at the back of the horse.
There, FTFY.
Shut the fuck up. You're one of the worst Linux zealots on Slashdot. People don't like zealots, no matter where they come from. That's why people give you shit. You've earned it!
Having a super fun time with your Steve Ballmer blow-up doll?
"Appeal to authority" is a logical fallacy. But pointing to a logical fallacy and using that as the sole reason to say that an argument is wrong is itself a fallacy: the "fallacy fallacy".
My own opinion here is that Linus is right. It's close to midnight, so I don't feel like backing that up with anything
All my liberal friends think I'm a conservative, all my conservative friends think I'm a liberal.
We've got dissociative identity disorder (multiple personality disorder) and we all hate your politics AND guts. So there!
(Well, one of us thinks you're kinda handsome, but they're just crazy.)
Well, it is not the software's use non-cryptographic code for cryptographic applications.
I'm referring to use cases specifically like SparkleShare, By the way. Git-based Dropbox-like file synchronization tool.
It's arguably a major Bug in Git if the Git software keeps track of an object Solely by Hash, and lazily assumes that the Hash
uniquely identifies a specific version of the file, And that assumption turns out to be false, and data corruption or tampering can be caused as a result.
HOWEVER....... Arguably Git does not carry a promise that the software is Durable and Safe outside of its intended use cases. Sometimes it is valid for software to carry disclaimers, such as: Please check that your object is not malicious before checking it in.
That actually begins to become unreasonable, when Git is being used by a larger and larger development teams, however..... the chance that someone wants to try something funny greatly increases at some point, and the chance of a mistake occurring increases with larger volumes of source code being managed and updated and committed in on a daily basis.
Linus is NOT an authority in cryptography or cryptanalysys, he IS an authority in the coding of a Unix kernel in C, and C in general, and thereby programming in general.
And as with all such authorities, he does things, say architechts, some things in ways that others highly disagree with.
That is normal.
Yet he is still no crypto authority.
And no one should be using SHA-1, whether or not you're 'vulnerable' to it,
it's broken, move the fuck on from it even just for the sake of being broken
and putting the fucker in its grave where it belongs.
u dont know what youre talking about.
cryptographic means 'with cryptographic strength' which has a certain meaning in the field.
it does NOT mean you are doing anything 'security' with it.
your use of md5 is *broken* because it has been proven to be trivially vulnerable to collisions.
therefore you have NO guarantee that what you're "ID'ing of 'checksum'ming will be unique.
even theoretically you know any hash function will collide for any input exceeding its width.
however md5 is a complete joke.
That mailing list is full of people trying to 'fix' their use of sha-1 in a way that allows them to *continue* using it.
That's handwavy bullshit. Sha1 is broken. Regardless of if you can fix it or not, you just move off of it and avoid the problem altogether, and you look better for it.
ALL new repos should initialize under sha-3 by default.
The 'git' tool itself should remain backwards compatible with the old sha-1 hash.
No one really cares what you do with sha-1 to 'fix' it or blend it with sha-3 or whatever the hell else with it. That's your call.
But you must initialize all new repos with sha-3.
With that, the migration you want will happen on it's own.
and yields somehow valid code.
Comments. You have infinite tries to get it right.
But not infinite time, unless you're immortal - in which case you should have better things to do.
It must have been something you assimilated. . . .
hahaha nice joke. Thanks for the laugh, made my day.
Well, at least the president backs him on this!
Sent from my ASR33 using ASCII
If you had read the background to this, you would know that:
a) The SHA-1 crack took the equivalent of 33,000 years of computing.
b) This is about Git not Linux
c) Git does not use SHA1 for cryptographic, but as a more pwoerful "checksum", which it is.
d) There is a plan to migrate which will take less than 33,000 years.
So, in short, your comments are not worth a steaming pile of dog poo, and you should crawl back into your hole and stay there.
--
You have the right to remain stupid.
git does not use SHA-1 for cryptography and even if it did Torvalds is more than qualified enough to speak intelligently about it. Lots of people are, including myself, and what he said is perfectly valid.
git has been around a long time; it's not a bug to have used a hash that was widely regarded as reasonable. Anyway what's worse, doing that or using it anyway despite believing it's buggy ;)
1. This is not a security issue.
Says Linus. As the saying goes, anyone can design a security system(*) that they themselves cannot break, but that doesn't mean others cannot. Linus is no exception to that rule.
All the mitigating factors he mentions are good points, and he's right that the sky isn't falling. But still, using SHA-1 in 2005 was a lazy choice, and it would have been nice to have seen a move to SHA-2 sometime in the following decade.
(*) I know people are saying that the SHA-1 checksum is not a security device, but that's a bit naive: It's a central part of a protocol for exchanging data over the internet. Whatever it was intended to be, it is in reality a central part of git security.
Our codebase is written in hand-optimized SHA-1 assembly language so we don't want to go to the effort of porting it to the SHA-3. There's some guy down the hall who says we could write the kernel in C instead, but nobody listens to him.
-- Ed Avis ed@membled.com
As a hash function, SHA-1 was perfectly adequate for how Git works.
All Git uses SHA-1 for internally is to hash the contents of a file to turn it into a unique number. SHA-1 is a nice fast algorithm to do that, and 160 bits offers plenty of space to uniquely identify stuff. It's so good that all the other things are hashed like commits and such and then a Git repository is merely a collection of hashes. A hash at the top we call "head" which contains the SHA-1 hash representing a commit object (it's the SHA-1 hash of said object, actually). That commit object points to a few other objects, the commit before it (the old head) and the SHA1 hash of the tree object. The tree object contains a list of SHA1 hashes that represent files in the source tree, specifically the list of changed files.
What happens when there's a collision? Interestingly enough, not much. If you're trying to check in a file that collides, chances are git won't let you because a file already in the repo has the same hash. If you force the matter (you can chop your history down so a conflict isn't immeidately apparent), then remote repos that pull from you or you push to will simply ignore the conflicting file as they will just assume it references the file already in the repo (you can check out an old version and check it back in - guess what? The hashes will be identical!. You often do this if you revert).
Now, perhaps Git could be made to handle the issue a bit more gracefully if you do happen to check in a file that differs but hashes the same, but in reality it's a rare occurance. Even Linux itself which has a huge history hasn't experienced the issue.
If you want fun, see WebKit, because SVN uses SHA-1 internally and someone corrupted the master repo checking in a test case consisting of two files with the same hash (the test case was to test for SHA-1 collisions in WebKit caching code). Ironically, that repo is offline at the moment.
Wow. I would appeal to authority to correct you but I will instead simply provide you with a link where you can read for yourself why you have this COMPLETELY BACKWARDS.
And in case you don't feel the need to click the link:
Appeal to authority is a logical fallacy when the person is NOT an expert in the field.
Appeal to authority may be a logical fallacy but it's not applicable in this case since the logical fallacy relies on the authority not actually being an expert in the field under discussion.
Trust me I'm a English major*
*An appeal to authority because:
a) I'm not an English major
b) If I was right now I'm just thegarbz and you have no ability to check my credentials on being an expert on the topic.
I know you can retrieve a file from git by a salted hash.
What do you think will happen if I add a new file with the same salted hash?
What happened to file 1, the new file and what if the collision is for a revision
SHADY-Nasty? I'm not sure that sounds very trustworthy.
Ezekiel 23:20
That really annoys me no end. There is some gradual improvement in a specific attack, expected by everybody that has a clue and not seen as anything dramatic by the same people. And immediately a horde of people with no understanding of crypto swoop in an declare the sky to be falling and all uses of this thing are now invalid. This is really just utterly pathetic.
Example: I have to constantly defend the use of SHA1 for password hashing. (Sure, something like pbkdf2 or Argon2 should come later if the password may be low-entropy and gets stored. That is not always the case.) The thing is that password hashing has the purpose of preventing the hash being turned into a password again. Collision attacks have no impact on that at all. For a collision attack you would need to know the password and then you could find a second one with the same hash (or rather with the two-sided, much easier, variant you can find two passwords that map to the same hash). Now, these nil-whits completely overlook that the situation when using hashes in signatures always is that you already have what gets signed, which is completely different to the password situation. Still they claim "SHA1 is broken!". No it is not. It is broken for some specific _different_ application.
Why so many non-experts think they can voice a qualified opinion about a very hard mathematical topic is beyond me.
What Linus says here is exactly right and it is a statement by an expert. All those criticizing him are basically people that can put on a band-aid telling a brain-surgeon how to do his work. They just do not get it at all.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Indeed. But seeing that would require some actual understand of the issue at hand, instead of a simple-minded "newer must be better".
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
It's arguably a major Bug in Git if the Git software keeps track of an object Solely by Hash, and lazily assumes that the Hash
uniquely identifies a specific version of the file, And that assumption turns out to be false, and data corruption or tampering can be caused as a result.
I disagree, this is not a bug. It is perfectly reasonable to use a crypto hash to uniquely identify objects within a SCMs, given that one of their properties is that they provide uniformly distributed IDs over a very large space. Statistically the chance of running into a SHA1 collision under normal git usage is so low as to be practically zero - you have a (much, much) better chance of experiencing repo corruption due to cosmic rays hitting your HDD or memory.
Anyway, git's failure mode is not horrible either in that scenario: http://marc.info/?l=git&m=1156...
There might also be implementation specific attack vectors.
For example the zlib part might silently ignore arbitrary binary data padded to the end of it.
Torvalds said on a mailing list yesterday that he's not concerned since 'Git doesn't actually just hash the data, it does prepend a type/length field to it', making it harder to attack than a PDF
I didn't get that one. PDF has a preamble too, right? Which the researchers were able to reproduce just fine in both "shattered" PDFs.
Yep. More to the point is that people tend to fall for the fallacy fallacy
Just because an argument contains a fallacy doesn't mean that the conclusion is wrong.
Ideally an argument should be able to stand on its own, but for anything remotely complex that argument would probably require hours to make. (Or years to people with no prior knowledge in the field.)
Ain't nobody got time for that.
Yes you can. At least the compilers i know let you get away with it (you obviously have to strip possible comment terminators from the data but that goes without saying). As an example, I've just appended 1M worth of data from /dev/urandom (after sed(1)ing "*/" away) at the end of a hello world program. Compiled fine.
But the "random binary data" is a straw man anyway, because why would you even have to use random binary data? It's not like you don't have infinite tries with random printable ASCII.
CLI paste? paste.pr0.tips!
But as of SHA1 being "broken", this is now considered possible in reasonable time. Currently it requires substantial computing power. Soon, it won't. Or might not anyway.
CLI paste? paste.pr0.tips!
Signed using SHA1?
CLI paste? paste.pr0.tips!
Signed with whichever hash GnuPG supports. That includes the entire SHA family up to SHA-3.
Also, when a commit is signed it becomes bundled with it, which means its associated SHA-1 hash will change.
It's arguably a major Bug in Git if the Git software keeps track of an object Solely by Hash, and lazily assumes that the Hash uniquely identifies a specific version of the file,
A hash of 128 bits or more is a more reliable unique ID than anything custom you could code up. Safe vs malicious attackers is different, and as others have pointed out, sign your commits. But as just a way to get a reliably unique ID for a document (or set thereof)? It's a very solid approach.
Socialism: a lie told by totalitarians and believed by fools.
u dont
I stopped reading there, sorry.
Socialism: a lie told by totalitarians and believed by fools.
My own opinion here is that Linus is right. It's close to midnight, so I don't feel like backing that up with anything
What LT says about adding type and length is correct. The collision attack on SHA-1 won't work as a result. Transition to a better hash is appropriate.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
I still get a bunch of back talk by people that insist we still use MD5. Something that has been broken for almost a decade. Sometimes we have to, like when dealing with Microsoft servers. They still have to use MD5 because Microsoft doesn't care about security. They even recommend turning off FIPS because they say there are better algorithms and such... that they DON'T EVEN OFFER! So you're using stuff that isn't even FIPS. Often that's an export grade or you can break it in real time encryption.
Don't understand, why not just say -OK and fix it right now?
Statistically the chance of running into a SHA1 collision under ....
Just because a choice of technology seemed a "reasonable" choice does not mean exposure of a failure case is not a bug.
The "HASHing" algorithms and Database formats used inside the program are internal technical design choices of the software and does not Affect what Correct behavior of the software is.
If a program has only a statistical chance of working correctly when you expect it to [less than 100%] in the worst-case real-world scenarios (Intentional attacks), then by definition the software is buggy.
For example, if a Caching HTTP proxy server used hashes internally to identify documents, then Serving the colliding document with the same hash when requested a URL pointing to a different document is Still definitely a bug.....
Similarly, using a Database format which cannot correctly and uniquely identify certain things, And then serving the wrong data when later queried is a Bug, and just the same sort of bug.
Check the link on my parent post again.
Ok, so now you've modified a comment.
What's the actual payoff for your effort?
A hash of 128 bits or more is a more reliable unique ID than anything custom you could code up.
No.... A hash of 128 bits can be either Reliable or Unreliable, depending on the hashing algorithm.
In the case of SHA1; It is now known to be Unreliable.
and as others have pointed out, sign your commits.
Signing your commits does not actually solve the problem. The colliding commit will just not appear to have a signature on it.... the SHA hash will still be the same, and at best you may notice later if you review your history tree to find an unsigned commit, long after the damage has been done.
If you only care about random bit flips CRC32 will work very well and be much faster than MD5 or SHA-1.
Well, not exactly.
- MD5 and SHA-1 have fast hardware implementation on some CPUs. CRC32 won't necessarily be a huge performance gain.
SHA-1 is used a bit more than a simple glorified checksum in GIT.
It is also used to give a handy number by which you designates commits, etc.
(i.e.: to compute a hash - e.g.: as would also be used in a hash look-up table).
That requires good output uniformity.
In other words you'd need a hashing function that "spreads" its output accross the whole output domain.
(to give an over-simplified examples: if due to a poor design, all patches ended-up having hashes that begins with the hex number "9", that would be a poor hashing function for these needs. If you used it in a hash lookup table, one part of the table would be over filled, while other would be still empty)
Cryptographic hashing functions offer these guarantees among lots of others. CRC32 doesn't, and several of the other checksums that were quickly designed for speed have also been detected not to offer these.
At this situation, a programmer can choose two paths :
- Some coder would try design their own new hashing which offers both good speed and the important properties (e.g.: That's exactly what LZ4's Yann Collet did, and created xxHash64. It's not a cryptographic hash, but at least offers all the properties that cyann needed)
- Other would instead jump to a quick'n'dirty solution, and go for the major overkill: take a cryptographic hash (e.g.: And that's what Linus Torvalds did. He's a lazy git. He knows that a cryptographic hash would provide all the properties he needs. SHA-1 is one that was popular back then, had even some hardware implementation. So he picked it and didn't think much about it. It offers all the properties Linus needed for git. It also offered much more but Linus didn't give a fuck about that. Though it doesn't offer security (anymore. specially since the google proof of concept) but that's something that Linus doesn't care and didn't even bother to check (as mentioned SHA-1 was already suspected backthen, and serious cryptographic usage relied on SHA-2 instead), relying instead on signed repositories if security is needed).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Says Linus. As the saying goes, anyone can design a security system(*) that they themselves cannot break, but that doesn't mean others cannot. Linus is no exception to that rule.
You're missing the point: Linux said the SHA-1 hash was never intended for security, but simply to detect unintentional data corruption. If you want cryptographic hashes, GIT supports those too, but the default is not one of those.
Better the Tuxedo than the Cheeto.
The compressed data isn't used to generated the blob hash, the SHA1 hash is generated by adding blob + + filedata, such as like this:
"blob" + + "\0" +
It appears the amount of computation is the same as the attack, it just needs to account for the extra data at the front of the data.
FYI: Bellow are the hashes for the offending files:
ba9aaa145ccd24ef760cf31c74d8f7ca1a2e47b0 shattered-1.pdf
b621eeccd5c7edac9b7dcba35a8d5afd075e24f2 shattered-2.pdf
"Modified" a comment? I think you're not following.
CLI paste? paste.pr0.tips!
You live in a scary world of imaginary threats, my friend.
Socialism: a lie told by totalitarians and believed by fools.
SHA-1... Git real.
That's 33,000 CPU years.
Didn't miss it, discussed in the last paragraph.
It is when said authority is wrong on the issue at hand.
Personally I use SHA-5, cause, I mean why use 3 when you could use 5. Only idiots use 3.
If there is one thing the Wikipedia article makes perfectly clear, it is that there is considerable disagreement over the nature and limits of the "appeal to authority" logical fallacy. However, in the first and only "notable example" given in the article to illustrate "appeal to authority", the authority figure in question was an expert in the field, which leads me to seriously doubt your assertion that "[a]ppeal to authority is a logical fallacy when the person is NOT an expert in the field."
Quoting from the article, with emphasis added:
In the Western rationalistic tradition and in early modern philosophy, appealing to authority was generally considered a logical fallacy.
More recently, logic textbooks have shifted to a less blanket approach to these arguments, now often referring to the fallacy as the "Argument from Unqualified Authority" or the "Argument from Unreliable Authority".
However, these are still not the only recognized forms of appeal to authority. For example, a 2012 guidebook on philosophical logic describes appeals to authority not merely as arguments from unqualified or unreliable authority, but as arguments from authority in general. In addition to appeals lacking evidence of the authority's reliability, the book states that arguments from authority are fallacious if there is a lack of "good evidence" that the authorities appealed to possess "adequate justification for their views."
So on the whole, it would not be unreasonable to consider an argument of the form "X is true because Y said so, and Y is a recognized authority in the field of X" an example of the "appeal to authority" logical fallacy. That is not to say that X is thus untrue, or that Y's opinion should be disregarded—but there is a vast difference between the valued opinion of a qualified authority figure and a sound logical argument.
You would be perfectly correct to say that this is not an example of the "Argument from Unqualified Authority" fallacy described in the newer textbooks, but that fallacy is much more limited in scope than "appeal to authority".
"The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
Also, when a commit is signed it becomes bundled with it, which means its associated SHA-1 hash will change.
Yes, but the commit only includes the SHA-1 hash of the tree object, which in turn refers to other trees and files by their SHA-1 hashes. Given the possibility of SHA-1 collisions, the commit signature guarantees that you get the right commit, but not necessarily the right file contents. Of course, for this attack to work someone would have to get their obviously artificial collision-prone file included in the signed commit in the first place, so that they could later substitute the malicious version. This is not a practical means of attack for source code repositories where commits are subject to even cursory peer review. There might be some justification for extra precautions when it comes to opaque binary files, such as firmware, which could be as simple as including the SHA-2 of the binary file as part of the commit and verifying it during the build.
"The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
We use ROT-13 so our data is 10 order of magnitude safer than yours!
We even use it twice to double the safety!
It looks like the first NIST-validated SHA-256 open source implementations didn't start appearing until 2005, so using SHA-1 makes some sense.
http://csrc.nist.gov/groups/ST...
Claiming an argument is a logical fallacy by appeal to authority does not *make* an argument, it (possibly) refutes one. You can not *make* an argument with it any more than an eraser can make a sentence.
On the other hand, invalidating a poor attempt to refute a real argument is in itself supporting the original argument. Thus I was making the argument. In the end the result of the last 3 posts is clearly that Linus has a point that should be considered.
One thing everyone can agree, on at least, is that you have contributed absolutely ZERO to the argument either way.
I don't see what is so hard to understand...
Ok, good idea, I did. Commonly accepted stats are approx 65-80% of servers now run Linux, 20-35% run Windows, and FreeBSD comes in at a magical rounding error of 1-2%.
Workhorse, indeed.
You know what, I think we are basically in agreement.
Stupid /. filter showed your response as being to mine when it was in fact a reply to the AC after mine. I am assuming your comment "you have yet to provide even a weak argument to the contrary." was not actually directed at my post, and if not, I apologize.
No worries. I'm still not impressed with /. since the days of beta.
All my liberal friends think I'm a conservative, all my conservative friends think I'm a liberal.