Slashdot Mirror


Apache Subversion Fails SHA-1 Collision Test, Exploit Moves Into The Wild (arstechnica.com)

WebKit's bug-tracker now includes a comment from Friday noting "the bots all are red" on their git-svn mirror site, reporting an error message about a checksum mismatch for shattered-2.pdf. "In some cases, due to the corruption, further commits are blocked," reports the official "Shattered" web site. Slashdot reader Artem Tashkinov explains its significance: A WebKit developer who tried to upload "bad" PDF files generated from the first successful SHA-1 attack broke WebKit's SVN repository because Subversion uses SHA-1 hash to differentiate commits. The reason to upload the files was to create a test for checking cache poisoning in WebKit.

Another news story is that based on the theoretical incomplete description of the SHA-1 collision attack published by Google just two days ago, people have managed to recreate the attack in practice and now you can download a Python script which can create a new PDF file with the same SHA-1 hashsum using your input PDF. The attack is also implemented as a website which can prepare two PDF files with different JPEG images which will result in the same hash sum.

167 comments

  1. It's fine, Linus said so by Anonymous Coward · · Score: 0

    Linus isn't afraid, why should you be?

    1. Re:It's fine, Linus said so by K.+S.+Kyosuke · · Score: 0

      Does the Git usage of SHA-1 *really* cause silent problems? I'm not sure how Git works internally but I was under the impression that it hashes whole objects, like individual source files at least. I imagine you'd have to come up not just with an edit that generates a collision for a file, but with an edit that generates a collision for a file *and all its possible further edits*, otherwise something visibly breaks. The latter sounds much more problematic.

      --
      Ezekiel 23:20
    2. Re:It's fine, Linus said so by Lisandro · · Score: 1

      Not really: http://marc.info/?l=git&m=1156... .

      Git hashes objects (commit, trees, blobs, tags) instead of individual tags. If you managed to somehow create, say, a commit with the same SHA1 as another existing in a repository pushes to it would be simply ignored.

    3. Re:It's fine, Linus said so by Lisandro · · Score: 1

      ...instead of individual files...

    4. Re:It's fine, Linus said so by Anonymous Coward · · Score: 1

      Git and SVN work very, very differently under the hood. The fact that they rely on the same hash algorithm is irrelevant as they use it in very different ways.

  2. sha1 by Anonymous Coward · · Score: 0

    why are ppl using this shit any 1 with any intelligence should be fired they should be fired

    1. Re:sha1 by jellomizer · · Score: 1

      Because it was good once. Better than MD5. Changing it can break a lot of compatibility. So they don't change it.
      If you are keeping software running over a long time. You need to balance compatibility, Security and maintainable design. Otherwise such projects will take decades to develop and be out of date on release.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
  3. In other news by Anonymous Coward · · Score: 1

    Webkit is apparently on SVN repository.

    1. Re:In other news by Anonymous Coward · · Score: 0

      Count the WTFs:

      SVN crashes badly with files that have the same sha1 hash

      The WebKit team are still using SVN

      Their solution was to zip the offending files so that they can continue using SVN

    2. Re: In other news by Cesare+Ferrari · · Score: 3, Insightful

      Actually, svn is just about the perfect source control system if you want something quick and dirty that you can understand. git (I presume that is what you would propose as an alternative) adds no features to many small software development teams. Fortunately the svn->git migration path is well trodden.

      If you had mentioned cvs, or rcs, then i'd agree ;-)

    3. Re: In other news by Anonymous Coward · · Score: 0

      If you had mentioned cvs, or rcs, then i'd agree ;-)

      OpenBSD, arguably the most secure operating system in existence, uses CVS.

      http://cvsweb.openbsd.org

    4. Re: In other news by Anonymous Coward · · Score: 0

      That's "sanitation engineer"

      Please remember to be PC so no one gets offended.

    5. Re: In other news by Anonymous Coward · · Score: 0

      Get off your high horse...

    6. Re: In other news by Anonymous Coward · · Score: 0

      No, incompetence is believing that new shiny is the only solution. There is nothing wrong with SVN unless you use a highly distributed development environment.

    7. Re: In other news by Anonymous Coward · · Score: 0

      If we're comparing the best work done by developers who don't use git to whatever you've accomplished, I know which one I'd bet on.

    8. Re: In other news by Puff_Of_Hot_Air · · Score: 4, Informative

      The problem with git, and I don't see it as a major problem, is not that it's hard to get up and running, but rather that you can quite easily get into the kind of trouble that needs expert knowledge to get out of. If you don't happen to have a git expert handy; well, you are going to have a very bad day. In this git shares the same problem as most very powerful tools, for example C. So I agree with the original poster, if you need the facilities of git, then you'd be an idiot to not use it, but if you don't, then other tools are better for simpler uses cases. Much like using something like python makes a huge amount of sense for certain applications, and zero sense for writing kernels.

    9. Re: In other news by Zero__Kelvin · · Score: 0

      There is nothing wrong with a tricycle for commuting to work either, but given that they are both free I'll take the Benz instead. Thanks!

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    10. Re: In other news by Zero__Kelvin · · Score: 0

      I have been using git for almost ten years. I keep hearing about all these possible "hairy edge cases" but I have seen one yet. How fucking hard is it? Clone a repo, change a file, stage it for commit, and then commit it. How fucking stupid are these morons that they can't do that?

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    11. Re: In other news by Anonymous Coward · · Score: 1

      Completely agree - if you are just starting to use git, and you don't have a git rock star handy (guy next cubicle over that noisily snorts Cheetos all morning), and you don't have access to Stack Overflow, just quit now.

      There's a reason this SA answer has been upvoted 13,182 times.

    12. Re: In other news by Anonymous Coward · · Score: 0

      Actually, svn is just about the perfect source control system if you want something quick and dirty that you can understand. git (I presume that is what you would propose as an alternative) adds no features to many small software development teams. Fortunately the svn->git migration path is well trodden.

      If you had mentioned cvs, or rcs, then i'd agree ;-)

      Seriously? How dumb do you have to be to not get:
      git init
      git add
      git commit
      git push

      It's nearly identical to svn. Once you get into a more complex project with multiple branches, how do you not understand the difference between an svn branch and a git branch? Or merging?

    13. Re: In other news by Lisandro · · Score: 1

      I've been using git on and off for a while and, honestly, it is the most developer-friendly SCM out there. Which kind of problems do you refer to?

      My main pet peeve with git is that it really doesn't work well with big repositories, large number of users, or binary files. Other than that it is a joy to work with.

    14. Re: In other news by Lisandro · · Score: 1

      Seriously? How hard it is to type git revert?

    15. Re:In other news by arglebargle_xiv · · Score: 1

      And that's the problem, that SVN has crap handling of colliding values. They use their own homebrew NoSQL store, FSFS, which doesn't handle things like duplicates in any way because it's NoSQL and web scale and stuff, like MongoDB. So the message here is "don't build your app around a crap NoSQL database store", not "SHA-1 will kill you".

      Anyway, I've gotta get back to my job shovelling pig shit, and administering anal suppositories to sick horses.

    16. Re: In other news by Puff_Of_Hot_Air · · Score: 1

      A quick google of "problems with Git" will quickly reveal the various challenges that git brings to the table, for example git push --force. More generally, any team using git needs to decide on a workflow and carefully adhere to it. How do we manage merge workflows? To rebase or not to rebase? etc. With traditional source control, this is significantly easier.

      I'm not anti-git, far from it. I introduced Git into the company I work for and love it; and it is absolutely the best source control system for distributed teams that exists. If you have distributed teams, it's an absolute must. But if you can't see that it's more complex, then you obviously haven't had the wonderful experience of having to field complaining from 30+ developers and having to fix the amazingly inventive ways in which they have managed to screw things up.

    17. Re: In other news by brantondaveperson · · Score: 4, Informative

      I love arguing about git.

      SVN has several huge advantages over git. It's far simpler. It doesn't have a thing called 'rebase', which rewrites your commits and occasionally messes them up. Its revision numbers are actually in order, which means you always know which revision came first, given two of them, something that's impossible with git's hashes (YES - I know why the hashes are used... but the reality of 99% of software development is that the repository is centralised, so git's solving an almost non-existent problem here). SVN supports real cherry-picking, and actually records in the repo that you took code from somewhere, as opposed to git's cut-n-paste approach.

      SVN has branches, git has pointers into a tree. Thus in git, it is impossible after the fact to determine to which branch a change was committed, just in which branches it now currently resides. Branches don't really exist in git at all, they aren't versioned (who created a branch, and when?), and if you accidentally delete them you tend to lose the commits against them. Tags in git are even worse. Added to which is the fact that both are implemented in the filesystem as regular files, which means you're at the mercy of your filesystem's ideas regarding case and permitted characters, and good luck if someone tries to check it out into a filesystem with different ideas. Nice design decision there Linus, guess you were having an off day? And the line-endings stuff?... Oh. My. God.

    18. Re: In other news by brantondaveperson · · Score: 1

      Git breaks. And when it does, good luck. Sometimes it breaks in the middle of a pull, and leaves the opposite of the repo's changes staged for you to accidentally commit. It can crap out when you're stashing, with the same effect. The bloody thing is a usability nightmare. And those hashes? Seriously... why couldn't we have just had a revision number? We're all committing to the same server.

    19. Re: In other news by Lisandro · · Score: 1

      But i don't quite get it yet. git push --force is not supposed to be a straightforward, or even common operation, as it can destroy history. And selecting/enforcing a ranching schemes is a problem you'll run into with every other SCM in existence.

      I'll admit that git gives you enough tools to shoot yourself in both foots if you're willing to, but it also provides very straightforward, easy to use commands for everyday operations. Anyone proficient in SVN can pick up git in 20'.

    20. Re: In other news by Anonymous Coward · · Score: 1

      Tags in git are even worse. Added to which is the fact that both are implemented in the filesystem as regular files, which means you're at the mercy of your filesystem's ideas regarding case and permitted characters, and good luck if someone tries to check it out into a filesystem with different ideas. Nice design decision there Linus, guess you were having an off day?

      You don't get that complaint.

      It is well known that Linus created git to manage the linux kernel - a work which is done entirely on linux filesystems where the rules always are the same: case sensitive, and any character except '/' and '\0' works in file names.

      If you use git for something else - well the fact that you are even able to is just a side effect. It is a unix scm; if you use git in another environment then surely everybody uses names that fit the filesystem used there? If you use it in a mixed environment, you get to set the rules for names. If your workers break the rules, they break the repository, their problem. If you don't like that, create a wrapper that do whatever syntax checking you find useful. It is open source, you can fix stuff yourself, (or complain and get your "money back".) Or hire someone to fix it for you - it'll still cost less than any commercial alternative.

    21. Re: In other news by Anonymous Coward · · Score: 0

      Tags in git are even worse. Added to which is the fact that both are implemented in the filesystem as regular files, which means you're at the mercy of your filesystem's ideas regarding case and permitted characters, and good luck if someone tries to check it out into a filesystem with different ideas. Nice design decision there Linus, guess you were having an off day? And the line-endings stuff?... Oh. My. God.

      Linux Torvalds designed both Linux and Git according to what HE thinks makes sense and how HE thinks things should work, with his basic philosophy of "this is how I think it should be done and if you disagree, fuck off and use something else".

      That's OK for something that is only used by yourself or a few people, but once it starts getting widespread usage, and the person in charge is an arrogant asshat with an ego the size of Jupiter, you start to run into problems.

    22. Re: In other news by Anonymous Coward · · Score: 0

      Except that Linus is no longer in charge of git and hasn't been for more than a decade. The person maintaining git last I heard was Junio Hamano.

    23. Re: In other news by Anonymous Coward · · Score: 0

      Because you are the only one uses in your pet project without remote repo.

    24. Re: In other news by multi+io · · Score: 2

      SVN has several huge advantages over git. It's far simpler.

      Explain in one or two sentences what a "tree conflict" is and how to resolve it.

    25. Re: In other news by Anonymous Coward · · Score: 2, Interesting

      I take it you've never seen a team that for some reason merged current files on a pull to effectively revert commits pushed to the remote with their few intended changes, over and over and over, across several people, for days, before noticing that there was a problem. You will bang your head on a desk shouting "why didn't they notice there was a problem..."

      I can't believe though that question was upvoted 13,182 times. According to this SA "most upvoted" query, it's the currently the second most upvoted question they have. There are a *lot* of questions on SA. Sorry, but there is something wrong when something that should be so obvious is a problem that is so commonly faced by so many people when trying to use a "common" and "most fit for the purpose" tool.

      Unbelievably, the third most upvoted question is also related to git. Five of the results in that query are related to git. That's 25% of the top 20 upvoted questions on SA. No questions in the top 20 are related to subversion. I believe that is valid circumstantial evidence that git is difficult to use and understand, especially the concepts related to it. That doesn't mean it's bad, but if you're working with a team that doesn't have much SCM experience (thankfully that's becoming less of an issue quickly), and you don't have time to commit to everyone on the team being able to pick up the nuances of git for a project, SVN is a great way to go as it's extremely simple, mostly due to significantly less feature coverage. Though, $deity help you if you aren't sitting next door to the server.

    26. Re: In other news by Anonymous Coward · · Score: 0

      Yes. Everyone who uses git is absolutely small time. For really big projects it could never work! ( And yes, you are a complete fucking moron )

    27. Re: In other news by Anonymous Coward · · Score: 1

      Clone a repo, change a file, stage it for commit, and then commit it. How fucking stupid are these morons that they can't do that?

      If that is all you are doing with git, then you might as well stick with svn.

    28. Re: In other news by Anonymous Coward · · Score: 1

      Seriously... why couldn't we have just had a revision number? We're all committing to the same server.

      No, we're not. That is kind of the point of git.

    29. Re: In other news by Anonymous Coward · · Score: 0

      What idiot modded this down. A call to arms ... Fight the moron mods ... Upvote the parent!

    30. Re: In other news by Anonymous Coward · · Score: 0

      That is a fucking ridiculously absurd claim. It all I'm doing with git is using it as a SCM, albeit one that works, unlike SVN.

    31. Re: In other news by Anonymous Coward · · Score: 1

      git init
      git add
      git commit
      git push

      More like:

      $ git init
      $ git add
      $ git commit
      $ git push
      error: no remote configured
      $ git remote add origin ...
      $ git push
      error: failed to push some refs to...
      To prevent you from losing history, non-fast-forward updates were rejected
      $ git pull
      You asked me to pull without telling me which branch you
      want to merge with, and 'branch.master.merge' in
      your configuration file does not tell me, either.
      $ git pull master
      already up to date
      $ git push master
      error: failed to push some refs to...
      To prevent you from losing history, non-fast-forward updates were rejected
      $ git pull master
      already up to date
      $ git push
      error: failed to push some refs to...
      To prevent you from losing history, non-fast-forward updates were rejected
      $ #GOD DAMNIT, WHAT THE FUCK
      $ git pull origin master
      already up to date
      $ git push
      error: failed to push some refs to...
      To prevent you from losing history, non-fast-forward updates were rejected
      $ git pull --rebase
      You asked me to pull without telling me which branch you
      want to merge with, and 'branch.master.merge' in
      your configuration file does not tell me, either.
      $ #YOU PIECE OF SHIT
      $ git pull --rebase origin master
      $ git push origin master
      3a8ce4...a91f4e master -> master

      git is a god damn mess, the svn workflow has fewer steps and is more logical. The only benefit git offers over svn is meaningful diffing of binary files, and I'm going to hazard a guess that most developers will never use that capability.

    32. Re: In other news by Anonymous Coward · · Score: 0

      Maybe if you actually discuss those reasons instead of making the same empty posts a dozen times over, you won't look like some clueless fanboy. One or two quality posts goes a lot further, unless you just miss the days of contentless arguing about VI vs. emacs.

    33. Re: In other news by Anonymous Coward · · Score: 0
    34. Re: In other news by jopsen · · Score: 1

      SVN has several huge advantages over git. It's far simpler.

      Explain in one or two sentences what a "tree conflict" is and how to resolve it.

      Create new svn checkout and copy/paste over your changes.

      That was "easy", hehe, just kidding I would use git over svn any day :)

    35. Re:In other news by Anonymous Coward · · Score: 0

      Webkit doesn't use the piece of shit git repository.

      Fixed for you.

    36. Re: In other news by Anonymous Coward · · Score: 0

      Or in can ignore the rambling of a well known moron who designed the shittiest os and shittiest source control system.

    37. Re: In other news by Anonymous Coward · · Score: 0

      That is a fucking ridiculously absurd claim. It all I'm doing with git is using it as a SCM, albeit one that that's a complete piece of shit, unlike SVN.

      Fixed that for you.

    38. Re: In other news by brantondaveperson · · Score: 1

      As I'm sure you must know perfectly well, it's a conflict caused by a change to a file conflicting at the tree level, so that one person modified the file, and another either deleted or moved it. Git gives a conflict in the first case - naturally - and may give a conflict in the second if its magic content-tracking algorithm fails (which it does, especially in non-trivial cases). You resolve it in the usual way, you inspect both sides, and figure out what to do.

      I know I'm on a losing wicket hating git like I do, but it's just so much less usable. I'd give alot to go back to revision numbers, and really knowing what goes into branches. Git does checkout (sorry... clone) alot faster though, so there's that. Can't help but think that could be fixed.

    39. Re: In other news by brantondaveperson · · Score: 1

      as it can destroy history

      Which should, really, be impossible for a client to a source control server.

    40. Re: In other news by complete+loony · · Score: 3, Funny

      You want a revision number? Simple;
      $ git rev-list HEAD | wc -l

      Assuming everyone is on the same branch of course....

      (Obligatory XKCD)

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    41. Re: In other news by Lisandro · · Score: 1

      It doesn't - unless you allow it, of course. You can destroy history on you local repository all you want but the upstream one will reject it unless specifically permitted.

    42. Re: In other news by Antique+Geekmeister · · Score: 1

      I've worked professionally with most source control systems for decades. I'm afraid to say that the only remaining features which Subversion does better than git are the ability to check out only one directory of an upstream repository, rather than needing to check out the entire repository, and the inability to delete content from the upstream repository.

      The ability to delete content is from experience a vital component, because developers can and will accidentally pollute the central repository with undesired content. This content ranges from bulky binary files, core dumps, and security sensitive content which they should not have submitted.

      And Subversion's centralized control comes at a real price. It makes forking and doing an independent set of work, with local commits, effectively impossible.

    43. Re: In other news by Xylantiel · · Score: 1

      My opinion is that git and svn have largely different purposes. The centralized/distributed one is the most obvious. But also git is a revision history manager, whereas svn is mostly a revision history. And I am convinced that simplicity is in the eye of the beholder.

    44. Re: In other news by nasch · · Score: 1

      Mercurial seems so much nicer than what I read about git. I'm glad my company decided to go that way.

    45. Re: In other news by brantondaveperson · · Score: 1

      81747

      Thanks complete_loony, I'll get that integrated into our workflow.

    46. Re: In other news by Anonymous Coward · · Score: 0

      Can also skip the pipe and have git do the counting:

      $ git rev-list --count HEAD

    47. Re: In other news by phantomfive · · Score: 1

      Another way of looking at it: if you have a team with a single, centralized repository, why waste everyone's disk space by distributing to everyone?

      I like Git, and prefer its pleasant UI, but I can see there are definitely reasons people would use SVN. (I can even think of reasons a team would use Visual Source Safe, although that's more of a stretch).

      --
      "First they came for the slanderers and i said nothing."
    48. Re: In other news by Anonymous Coward · · Score: 0

      Lol. Thanks for this.

    49. Re:In other news by Anonymous Coward · · Score: 0

      You know, they do have the old backend of Berkeley DB as an alternative for the new FSFS.

      It's mostly unmaintained though.

    50. Re: In other news by Anonymous Coward · · Score: 0

      Its revision numbers are actually in order, which means you always know which revision came first, given two of them, something that's impossible with git's hashes

      And this is one of the reasons why I chose to install/learn Bazaar over Git.

    51. Re: In other news by Anonymous Coward · · Score: 2, Informative

      > I'd give alot to go back to revision numbers...

      Why? All that matters is the logical ordering of commits, not the chronological ordering. I DGAF about when someone wrote a bit of code. All that I want is for each commit to more or less work, and for the history to be easily bisectable to aid in bug hunting.

      > Git gives a conflict in the first case - naturally - and may give a conflict in the second if its magic content-tracking algorithm fails...

      As an SVN -> Git convert, I've had git just Do The Right Thing(TM) when committing changes to a moved file more times than I can count. I lost _weeks_ of my life to doing that shit manually with SVN. I get that content tracking doesn't work 100% of the time, but it works infinity% of the time more often than it does in SVN.

      > Branches don't really exist in git at all...

      what?

      > they aren't versioned (who created a branch, and when?),

      wot?!

      If you look at this https://github.com/inaka/shotgun/compare/dave.151.update.deps.to.2.0.0.pre It's obvious that euenlopez@gmail.com created that branch on June 02, 2016. This one-liner makes it doubly clear that this isn't github trickery:

      git log `git log master..dave.151.update.deps.to.2.0.0.pre --oneline | tail -n 1 | awk '{ print $1 }'` | head -n 3

      (The git log invocation inside the backticks gives you the commits that dave.151... has that master doesn't, putting the "oldest" at the bottom. The tail/awk pipe scrapes out the commit ID of the "oldest' commit. That commit ID is fed to the outer git log, and the first three lines of that log are scraped off by head.)

      > ...and if you accidentally delete them you tend to lose the commits against them.

      a) If you delete a branch, you don't want the commits in it.

      b) But if you _did_ want those commits, git reflog saves you from your mistakes

      c) man git-reflog , for $DEITY's sake

      Don't get lost in the terminology of the man page. Just make a git repo, perform some changes to it (some small, some large) and play with git reflog.

      Every change you make to a local repo is stored locally, for 90 days -by default-. git reflog lets you sort through that and restore the pointers that you carelessly blew away.

    52. Re: In other news by Anonymous Coward · · Score: 0

      And don't forget about the great PTC Source...

    53. Re: In other news by Anonymous Coward · · Score: 0

      Holy shit ... You really are one dumb motherfucker.

    54. Re: In other news by Anonymous Coward · · Score: 0

      I chose GIT for my personal projects, because SVN (or CVS) would require hiring someone to set up a SVN (or CVS) server, or spending a lot of time learning to become that someone myself.

      GIT needs no server, and creating a repository on a system that has never used GIT requires installing GIT and then typing:

      git init

      That's all the knowledge required. After that you can start committing, which of course does take a bit of knowledge, but no more than any other version control system.

    55. Re: In other news by Anonymous Coward · · Score: 0

      It doesn't have a thing called 'rebase', which rewrites your commits and occasionally messes them up.

      If you need "a thing called rebase", git wins that one.
      If you don't need rebase, it doesn't matter. Git has about a million features I never use, and couldn't care less about.

      if you accidentally delete them you tend to lose the commits against them.

      No. If you accidentally delete them and then run "git gc", you lose those commits. I actually have a git repository with zero branches (it doesn't even have a "master" branch). Of course I can't commit to it (that's what branches are for), but I can checkout anything that was committed upstream.

      As long as you haven't run git gc, you should be able to recreate a deleted branch without data loss. As long as you can figure out the id of the last commit.

    56. Re: In other news by Anonymous Coward · · Score: 0

      Likewise. And I say this as someone who used to use git.

    57. Re: In other news by Anonymous Coward · · Score: 0

      If you look at this https://github.com/inaka/shotgun/compare/dave.151.update.deps.to.2.0.0.pre It's obvious that euenlopez@gmail.com created that branch on June 02, 2016.

      Actually, it's not. That's the commit. A branch can be created on any commit, and no new commits are needed to create a branch. I can create a branch from Linux-2.4.30 today, and not commit anything on it until December.

      The only record of me doing so will be the owner and modification date of the file. The modification date changes when I commit something on that branch, and the owner is local (as is a branch).

      Now, I don't see the problem, unless he is trying to use git as a centralized scm (it's not), a branch on my repository has been created by - me - and when doesn't really matter, what was committed matters.

    58. Re: In other news by Anonymous Coward · · Score: 0

      Oh, so typing "git reset" is what you need and expert for.

      Seriously... why couldn't we have just had a revision number? We're all committing to the same server.

      So, you are not using the tool for its intended purpose (in git, all commits are to the local repository), and you blame the tool for not being made for your specific (mis-) use.

    59. Re: In other news by Varcain · · Score: 1

      SVN has several huge advantages over git.

      Ok, let's see these advantages.

      It's far simpler.

      I get it, you don't like all this changing history, rebasing, amending, reflog stuff. But for most basic git operation you need to know as many commands as for SVN. And with git you get the benefit of all the magic you can do on top of that. It's almost like saying that you prefer DOS to anything else because it's simpler - less commands, no pesky multitasking and you can do with it everything YOU need to.

      It doesn't have a thing called 'rebase', which rewrites your commits and occasionally messes them up.

      You conveniently ignore the fact that rebase is used almost exclusively in local branches and never on git upstream/production mirrors. The main assumption made by git is that your local history doesn't matter (why should it?). The only thing that matters is the history everyone else is using/depending on. To keep this history clean you use this evil rebase thing to apply your patches to production/upstream branch without polluting the git log with unnecessary noise (I will mention later why this matters for upstream repository even though rebase is not used there). This way git user checks that patches are applied cleanly and if not then fix conflicts. Basically rewriting commits is very useful for your local work and when git "messes them up" it's actually you messing them up because you made a mistake during rebase. I use rebase a lot to squash commits from my local work, to edit commit messages or make any changes to the patch set if something was found by testers. With SVN you don't have this luxury at all. I don't see how this is any advantage for SVN.

      Its revision numbers are actually in order, which means you always know which revision came first, given two of them, something that's impossible with git's hashes

      I use git in my work, I also had to use SVN in few projects. I *never* encountered any actual situation where git hashes instead of revision numbers were a problem. Any git user knows how to use the git hash properly (i.e. extract meaningful data from it). There is also command in one of replies to you here which can convert git hashes to revision numbers, but I never encountered situation where this was actually needed by anyone, anywhere. I think the sheer inconvenience and all the drawbacks of SVN are not worth the pretty little nice looking revision numbers in the revision log.

      but the reality of 99% of software development is that the repository is centralised, so git's solving an almost non-existent problem here)

      It's actually solving (by accident) a very existing problem where you have to contact SVN server every time you want to see the commit log, create patch from commit or see what the commit actually did to files. Ever worked in a project where SVN was on some slow-ass customer server? I did, and it would be so much less painful with git.

      SVN supports real cherry-picking, and actually records in the repo that you took code from somewhere, as opposed to git's cut-n-paste approach.

      And how is that a problem? Why is relying on some magic revision control tool metadata to store such information any better? If you use -x flag for your cherry-pick you basically have all the information you need to find the original commit. Git commit --amend and you can add any additional information you like so it's even more clear. Hell, you can do that later to any commit in your huge patch set using git rebase -i.

      SVN has branches, git has pointers into a tree. Thus in git, it is impossible after the fact to determine to which branch a change was committed, just in which branches it now currently resides.

      You silently ignore the fact that with SVN you have to do this whole tree copy on the svn server to create a branch, which again is a huge PITA if

    60. Re: In other news by brantondaveperson · · Score: 1

      See? I told you arguing about git was fun.

      Fact: SVN stores more information about what's going on with your source code than git, and it never loses anything, even if you ask it really nicely. And it never magically changes files just because it decided that your line endings need to be just-so.

      You silently ignore the fact that with SVN you have to do this whole tree copy on the svn server to create a branch,

      I silently ignore that, because it's not true. SVN marks the point at which the copy was made, who made it, and when. It doesn't actually copy anything, because that would be silly. You can do the copy on the server, and switch your checkout to your new branch, just like you can with git. Except that it's all centralised (like 99% of real software development - the Linux kernel is actually an edge case), so your branch is safely on the server right away.

      Anyway, I don't expect to change anyone's mind - I just find it a bit of a shame that everyone has jumped on this tool, despite it's extreme shortcomings and rampant complexity. I mean, preferring git's SHA-1 hashes to revision numbers is just kinda bonkers. They're not ordered. You need access to the repo to know if a particular commit is in a particular build, because the hashes mean nothing by themselves. Aargh. Etc.

    61. Re: In other news by brantondaveperson · · Score: 1

      when git "messes them up" it's actually you messing them up because you made a mistake during rebase

      No it's not. It' git crashing, and borking my local copy, thanks very much. And yes I have the latest version.

    62. Re:In other news by Anonymous Coward · · Score: 0

      I count: 3

    63. Re: In other news by x_t0ken_407 · · Score: 1

      Apprently (I haven't read the source code), the authors agree with you:

      The name "git" was given by Linus Torvalds when he wrote the very
      first version. He described the tool as "the stupid content tracker"
      and the name as (depending on your way):

        - random three-letter combination that is pronounceable, and not
            actually used by any common UNIX command. The fact that it is a
            mispronunciation of "get" may or may not be relevant.
        - stupid. contemptible and despicable. simple. Take your pick from the
            dictionary of slang.
        - "global information tracker": you're in a good mood, and it actually
            works for you. Angels sing, and a light suddenly fills the room.
      - "g*dd*mn idiotic truckload of sh*t": when it breaks

    64. Re: In other news by 31eq · · Score: 1

      If you need rebase in Subversion, there is a Python script that can do it https://bitbucket.org/x31eq/om...

    65. Re: In other news by Anonymous Coward · · Score: 1

      I've been using Git for over 5 years on a dozen different platforms and never once had Git crash. Buy a new computer.

    66. Re: In other news by Anonymous Coward · · Score: 0

      Literally billions of people agree that you are a moron.

    67. Re: In other news by david_thornley · · Score: 1

      Linux Torvalds designed both Linux and Git according to what HE thinks makes sense and how HE thinks things should work, with his basic philosophy of "this is how I think it should be done and if you disagree, fuck off and use something else".

      And this would differ from any other open source/proprietary/free/closed source software in what way?

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    68. Re: In other news by complete+loony · · Score: 1

      In SVN a commit is final. This encourages developers to leave unfinished work in their work folder without creating a commit until they are "done". So you need a separate backup process for your work folder for any changes that take time to complete. Plus you often end up with a monolithic commit with a bunch of changes. Then how do you review those changes before pushing upstream?

      git rebase gives you a solution to this problem. Whenever I think I've made progress towards solving a problem I can create a commit. If I discover that one of those changes isn't right, I create a new commit with the fixup. Then when I'm "done" with the change, I can rebase in order to produce a series of patches that someone else can more easily review. At any time, if I encounter a bug that I want to push upstream. I can rebase my entire branch first to push the bug fix to the bottom, then push that commit without needing to create a new local branch.

      At any time I can use git to push my incomplete work to a private server or my own work branch on a team server. Both for backup purposes and for collaboration.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    69. Re: In other news by Anonymous Coward · · Score: 0

      > Actually, it's not. That's the commit. A branch can be created on any commit, and no new commits are needed to create a branch. I can create a branch from Linux-2.4.30 today, and not commit anything on it until December.

      A branch with no commits is indistinguishable from a tag. So such a "branch" is a tag.

      For branches that have at least one commit, if you go back one more commit, you can see the commit from which the branch forked from:

      Change

      git log `git log master..dave.151.update.deps.to.2.0.0.pre --oneline | tail -n 1 | awk '{ print $1 }'` | head -n 3
      to
      git log `git log master..dave.151.update.deps.to.2.0.0.pre --oneline | tail -n 1 | awk '{ print $1 }'`~1 | head -n 3

      and you get the commit from which a branch was forked. git show-branch is also informative.

      Like you, I'm having trouble figuring out why anyone would _actually_ care about _who_ created a branch pointer. Like you, AFAICT, the only thing that one cares about is commit authorship and commit contents. If one is concerned about repo access control, one can use things like gitolite or gerrit.

    70. Re: In other news by Anonymous Coward · · Score: 0

      Oh, here is a brainwashed git fanboy who represents the whole community of git users. Don't stop, please. Let normies see how typical git users participate in discussions.

    71. Re: In other news by Anonymous Coward · · Score: 0

      > And it never magically changes files just because it decided that your line endings need to be just-so.

      A) git doesn't do this unless you insert a commit hook to do it.

      B) http://stackoverflow.com/questions/11587806/is-there-anyway-to-get-tortoisesvn-to-leave-eol-line-endings-as-is (tl;dr: SVN does _exactly_ what you claim it doesn't.)

      Item B in that list was the bane of my goddamn existence and the source of a _really_ hard to track down bug when my division switched from VSS to SVN. (SVN "helpfully" converted the newlines on a file it "intelligently" determined was plaintext. This broke our software in subtle and _very_ hard to pin down ways.)

      > You need access to the repo to know if a particular commit is in a particular build, because the hashes mean nothing by themselves.

      The same's true with SVN. Thing is, with git, the repo is a drive seek away, rather than several network round-trips distant.

      > I mean, preferring git's SHA-1 hashes to revision numbers is just kinda bonkers. They're not ordered.

      SVN version numbers are repo-unique, not branch-unique. Activity elsewhere in a repo will cause discontinuities in the revision numbers in your current branch. You've always gotta check the history to know what commits are on a branch (even master). It's just a fact of life, regardless of whether you use SVN or git.

      And if we're talking quirks, don't get me started about how SVN lets one treat SVN's tags just like regular branches, because "tags" are just folders that have no special meaning to SVN. _That_ one was a pain to train some of the folks in our division on.

      And I'm with the AC that chalked up your repo corruption report to user error (and ignorance about things such as git rebase). I've been using git for six years now. I was using SVN for about just as long. I've _never_ had git break a repo. In contrast, I could grab a couple of friends and still not have enough appendages to count the number of times I've had SVN break either a WC or (a few times(!)) the server-side repo data.

      Busted-ass WCs (and the very occasional server-side repo corruption) were _really_ tough to explain to the division. Aint nobody what likes being told "You gotta sit for ten/fifteen minutes and redownload $SEVERAL GB, oh and by the way your last couple hours of work is lost, too.".

    72. Re: In other news by Anonymous Coward · · Score: 0

      So fun to argue about.

      You have at least one other person that agrees with you. SVN is simpler. Mercurial is simpler. git is powerful and hard to use. I use git and I don't think I'd switch back to SVN because I'm used to git. I paid the price to understand the implementation details. I never had to do that to be successful using SVN. There are things I can do with git that were hard or awkward with SVN. Mostly around collaboration and multi-tasking. I find it hilarious that most people use git in a way that exactly resembles the way you'd use SVN and they don't even know it. They get no benefit from the distributed nature of git.

    73. Re: In other news by Anonymous Coward · · Score: 0

      git revert is the wrong answer. git revert makes ANOTHER commit to undo the changes of the first commit. Why the hell do you want to preserve mistakes in the history? The accepted answer is more correct than git revert. This is git.

    74. Re: In other news by Lisandro · · Score: 1

      git revert is the wrong answer. git revert makes ANOTHER commit to undo the changes of the first commit. Why the hell do you want to preserve mistakes in the history? The accepted answer is more correct than git revert. This is git.

      The reversal SHOULD be a new commit. Undoing commits on a distributed SCMs means you're effectively deleting history.

    75. Re: In other news by Anonymous Coward · · Score: 0

      Took me about 5 minutes to understand git, 3 days to learn most of the commands that I need, and another week to get used to diffs and conflicts.git is just a merkle tree. Don't think about git as a bunch of commands that do magic, think about what the tree looks like, what transformations and alterations you want to do it, then figure out what commands it takes to make your desired changes.

      But really, 5 minutes. By the second day I was rebasing like a pro. The only thing that is an ongoing goal of improvement is clean commits and deciding which branching strategy to use.

    76. Re: In other news by Anonymous Coward · · Score: 0

      Disable "force push" on master and tagged branches. You'll sleep much better. Force push will also break signed commits/tags. I make it a rule to only force push personal branches. Of course rules can be broken in extreme or very very low risk situations.

  4. FINALLY! by Gravis+Zero · · Score: 0

    It's now time to retire SVN... everywhere... permanently.

    --
    Anons need not reply. Questions end with a question mark.
    1. Re:FINALLY! by Anonymous Coward · · Score: 1

      If you don't like it, don't use it. Personally, I love it.

    2. Re:FINALLY! by espenskaufel · · Score: 3, Funny

      I do not understand why many developers feel so strongly about versions control systems. I wonder if carpenters feel the same way about hammers or if developers are just way to opinionated...

    3. Re:FINALLY! by Anonymous Coward · · Score: 0

      Doing exactly what you want in an intuitive way is a basic function of any software. Alternatively if hammers and nails came in all kinds of wacky configurations, I'm sure people would have very strong favorites also.

    4. Re:FINALLY! by Lisandro · · Score: 2

      Pretty sure they do.

    5. Re:FINALLY! by espenskaufel · · Score: 1

      When did "Doing exactly what you want in an intuitive way is a basic function of any software.”? I thought that was the holy grail of software. I have still not used one source control system that I found hard to use and in my experience git-repos get messed-up more often than others (might be because they are the most common). Some devs seems to have problem understanding remotes and rebase.

    6. Re:FINALLY! by espenskaufel · · Score: 1

      You are probably right :)

    7. Re:FINALLY! by thegarbz · · Score: 1

      I wonder if carpenters feel the same way about hammers

      Hahahah tip of the iceberg. I saw two carpenters on my house arguing about who had better screwdrivers. Yes people most definitely do.

    8. Re:FINALLY! by Anonymous Coward · · Score: 0

      Don't know about carpenters - but people feel strong about so many things. Which brand is the "best car"? The best motorcycle? The best TV? The best smartphone? The best text editor? The best OS? The only true programming language? The best football team? Best scripting language?

      Surely, people who use version control systems a lot will have a one true system that is so much better that any alternative is just sad. All systems have the basics like 'commit', when you're a more advanced user jumping through hoops to use a different system, you run into dreadful cases where you just can't do what you want. And then 20+ hours to do what a couple of commands achieve in the other system. This sort of thing is usually avoidable by planning ahead - but then you're forced to work in an inefficient way all the time.

    9. Re:FINALLY! by Anonymous Coward · · Score: 0

      Apparently you haven't done much construction.

    10. Re:FINALLY! by phantomfive · · Score: 1

      I wonder if carpenters feel the same way about hammers or if developers are just way to opinionated...

      Yeah, typical carpenter hammer arguments:

      *) Hammer weight (usually 16-24oz for house framing)
      *) Handle type (wood? Fiberglass? (fiberglass hammers suck tbh))
      *) Is the face of the hammer smooth or textured?

      --
      "First they came for the slanderers and i said nothing."
  5. Re:Ug by Anonymous Coward · · Score: 0, Troll

    Then either turn in your nerd badge, or get a paracetamol and start educating yourself. Whatever you do: stop whining.

    This entire summary makes my heart sing: no Trump, no clickbait, but crypto, a broken algorithm, and funny side effects. Oh, and exploits.

  6. Who fucking cares by Anonymous Coward · · Score: 0

    We need to worry about important things... like Judge Wapner is dead.

  7. Here's what it means by JoshuaZ · · Score: 4, Informative

    Here's what it means: One major aspect of modern cryptography are "hash functions"- a hash function is a function which essentially has the property that in general two inputs with very small differences will give radically different outputs. Also, ideally a hash function will also make it hard to detect "collisions" which are two inputs which have the same output. In general, hash schemes are used for a variety of different purposes, including determining if a file is what it claims to be (by checking that the file has the correct hash value).

    Every few years, an existing hash system gets broken and needs to be replaced. MD5 is an example of this; it was very popular and then got replaced.

    One of the major currently used hash schemes is SHA-1. However, a few days ago, a group from Google described an attack that allowed them easily find collisions in SHA-1 (easy here is comparative- the amount of computational resources needed was still pretty high). The group released evidence that they could do so but didn't describe how they did so in detail. They gave an example of two files with a SHA-1 collisions and they also described some of the theory behind their attack. What TFS is talking about is how based on this, others have since managed to duplicate the attack and some make some even more efficient variants of it; so effectively this attack is now in the wild.

    1. Re:Here's what it means by Lisandro · · Score: 5, Informative

      FWIW, you're correct, but "hash function" englobes much more than that. Technically, a CRC is, by definition, a hash function. So is bit parity.

      A cryptographic hash function has the properties you mention, plus the fact that it must not be easily reversible and uniformly distribute results over its entire output space.

    2. Re:Here's what it means by Anne+Thwacks · · Score: 0
      OTOH a craptographic hash function is visibly lacking those features.

      This would appear to be the issue at stake (or maybe steak YMMV).

      --
      Sent from my ASR33 using ASCII
    3. Re:Here's what it means by Aighearach · · Score: 1

      Or, it means more generally that updates are bad, and true security will only come from removal of code thrash. We have to figure out what features we actually want, and implement them, and then stop changing those features.

      As long as everything is thrashing, everything is vulnerable. Protections will be temporary and new bugs will be introduced even into the protections because those too are always experiencing code thrash.

    4. Re:Here's what it means by complete+loony · · Score: 5, Informative

      Google produced two pdf's that differ in some binary data near the beginning of the file. The SHA-1 hash routine processes data one block at a time, updating its internal state. There are two consecutive blocks that differ between the pdf's. The first pair of blocks produce an internal state where half of the bytes are the same. The second pair of blocks then produce an identical state. The remainder of the pdf files is the same.

      So you can use these two pdf prefixes and append whatever data you want to them to produce your own pair of files. Pdf includes a programming language for rendering content. Within this language you can inspect the earlier bytes of the file to detect which version of the file you are rendering, and make some visual changes. So while there are only a few bytes that are different, you can make two pdfs that display different content.

      Nobody has invested the time to produce a new hash collision, but someone has already automated the production of duplicate pdf's based on this work.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    5. Re:Here's what it means by 140Mandak262Jamuna · · Score: 1

      So as a first measure, if source control software add a "salt" at the top of pdf files being checked in, and strip it out when being checked out, this attack would not work. In fact a simple countermeasure could be to salt all files with a prefix block and a suffix block for the purpose of calculating SHA-1.

      --
      sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
    6. Re:Here's what it means by Anonymous Coward · · Score: 0

      Thanks for the proper summary. I was wondering who in their right mind would put up a webpage to allow users to use multiple CPU-years of processing time just for the gimmick of having two different PDFs with the same SHA-1 hash.

    7. Re:Here's what it means by brantondaveperson · · Score: 1

      There was a hash-collision proof-of-concept thing some years back that purported to have generated two HTML files that hashed to the same value, but displayed different content. In reality, the files were identical, they just contained javascript that displayed different things depending on the URL that the file was served from. A simple trick, but one that could easily defeat hashing strategies applied to things that contain actual executable code, rather than just - say - an image.

    8. Re:Here's what it means by Anonymous Coward · · Score: 1

      It means the exact opposite of this.

      An old(er) cryptographic hash is now unsafe to use as assumptions that the developers of subversion (and other software) made about it are now invalid, so you MUST update to newer version to avoid unforseen issues due to the ability to generate multiple inputs that hash to the same value.

    9. Re:Here's what it means by complete+loony · · Score: 5, Interesting

      This is why git is not vulnerable in this specific instance. In git all objects are prepended with their type, in this case "blob". Of course if you had $100k (-ish) to burn, you could repeat this attack on a file that does start with "blob" to break git.

      However you don't need to do this. This attack depends on reaching an intermediate state with specific properties in order to massively reduce the search space. Any attempt to hash a file that reaches one of these states can be detected and rejected. If you swap to using https://github.com/cr-marcstevens/sha1collisiondetection for all SHA-1 calculations, every instance of this attack can be detected and rejected.

      Also I mis-spoke slightly and spotted my error after checking the paper again. The first pair of blocks have half of the same bytes, but produce an internal state with only 6 bytes of differences. The second pair of blocks, again only differ in half of their bytes, and exactly cancel out those 6 bytes of differences. See Table One on page 3 for the actual byte values.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    10. Re:Here's what it means by nasch · · Score: 1

      Wouldn't it be better to switch to SHA-512 or something?

    11. Re:Here's what it means by JoshuaZ · · Score: 1

      Yeah, that's a valid point. I also was a bit sloppy at other points; I wrote about detecting collisions when I should have said generating collisions. Thanks for expanding in a helpful fashion.

    12. Re:Here's what it means by Lisandro · · Score: 1

      Np. Sorry if i came across as pedantic - i though the distinction was important because from reading other threads people don't really seem to understand what SHA1 is supposed to and not to do.

    13. Re: Here's what it means by Anonymous Coward · · Score: 0

      Switch to SHA256, as I understand it, not allot of support for the higher variations yet.

    14. Re: Here's what it means by Anonymous Coward · · Score: 0

      SHA-2

      Problem solved.

  8. problem solved CAPTCHA: override by Anonymous Coward · · Score: 0

    1: Find the hackers.
    2: Send in the drones.

  9. "In the wild" - slight exaggeration by geekpowa · · Score: 2

    Someone checked in PDFs that demonstrate the first engineered SHA-1 collision and this broke SVN. PDFs in question took 6500+ cpu years + 110 GPU years to generate. "In the wild" is a bit panicky & excessive.

    What does this actually means in terms of integrity of repos and other things that rely on SHA-1? Does it merely break repos or does it facilitate injection attack vectors - how important is secure hashing in the guts of repos? What precisely is being secured? SHA-1 has been deprecated for SSL certs already so you shouldn't be using certs with SHA1 sigs anymore. Myself, keep an eye on how this develops and start thinking about using SHA-2 but won't be replaing git or existing usage of SHA1 for password hashing anytime soon.

    1. Re:"In the wild" - slight exaggeration by gravewax · · Score: 1

      In today's world of large botnets and distributed computing 6500+ cpu years + 110 GPU years is not a particularly daunting number.

    2. Re:"In the wild" - slight exaggeration by Anonymous Coward · · Score: 0

      RTFA:

      "...now you can download a Python script which can create a new PDF file with the same SHA-1 hashsum using your input PDF"

      Soon there will be bunches of different files with duplicate SHA-1 hashes flying around the 'net.

      That could get ugly.

      AC

    3. Re:"In the wild" - slight exaggeration by geekpowa · · Score: 1

      Umm, that is an uncited claim in the summary. Nothing of the sort is stated in any of the links. The summary links to a paper that provides more details of the attack. Very heavy and technical though a few inital takeaways from it is that implementations only take a few days to run on gear they have so does seem safe to assume that SHA-1 collisions are pretty much pwned.

    4. Re:"In the wild" - slight exaggeration by Anonymous Coward · · Score: 0

      It's *trivial* to generate similar pairs of PDF files by just appending identical data to original pair of PDF released by Google.
      No CPU-years are required for that - but such an attack is not very useful in practice (since you can only append *identical* data).

      But it is sufficient to exploit Denial-of-Service vulnerabilities such as this one.

    5. Re:"In the wild" - slight exaggeration by serviscope_minor · · Score: 2

      "In the wild" is a bit panicky & excessive.

      No, it's really not. This demonstrates that SHA-1 is not only weak, but broken. One golden rule about security is that it never improves over time. It means that collisions are now possible, and are within reach of moderate sized organisations. Google can clearly manage, governments certainly can and any criminal organisation with a large enough botnet can manage too. This isn't just finding random data either: it's a practical attack whereby two valid PDFs both hash to the same value.

      The security will get worse over time, just like it did for MD-5. With MD-5 it took less than 3 years for someone to go from creating two valid documents with the same hash (poth PDF and PS support arbitrary data embedded for various purposes which makes them relatively easy targets) to a completely broken cryptographic certificate which broke the chain of trust entirely. Not only did it happen, but it took a scant 11 hours on a 30 node cluster, meaning practical, attacks were in range of a single, not well funded individual, only 3 years after the first collision was found. With SHA-0, it took about a year and a half to go from the first collision to fast collisions.

      It's hard enough migrating things and old systems tend to hang around for years or even decades, so you should be planning your migration right now.

      That is not to say that SHA-1 is unsuitable for content identification with non malicious inputs, it's fine for that, but so is MD-5.

      --
      SJW n. One who posts facts.
    6. Re:"In the wild" - slight exaggeration by Lisandro · · Score: 1

      This. It is safe to say SHA-1 is effectively broken at this point and existing users should start migrating to better alternatives.

      But let's not panic either. The world is not crumbling down to pieces anytime soon.

    7. Re:"In the wild" - slight exaggeration by geekpowa · · Score: 1

      "not only weak, but broken" seems premature. The attack here involves manipulating two obtuse file formats to yield altered files with a shared hash, different to original unaltered hashes. Definitely weakened and yeah you are probably right this is the final toll for SHA-1 and from here things are likely to get worse quickly. I'll be mindful of this when I think about the various places where I use SHA-1 and start thinking about switching in other things. But I am failing to see how this right now translates into a practical vector for the various places where I encounter SHA-1. A more serious vector would be the capacity to create any desired hash with something significantly more efficient than a brute force compute. i.e. can anyone easily yield output the same as this without knowing the input?"

      echo -n 'mysecretpw+somesalt'|sha1sum
      3cbb35f831b4e9241dd986f66c16e465e2db2a3a -

    8. Re:"In the wild" - slight exaggeration by Aighearach · · Score: 1

      Right, it is still just like Linus said about the git sha-1, not really a big deal because it isn't even the security layer.

      If developers with write access to your repo are malicious, you have much worse problems. This is not a serious threat, it is just an edge case that the future will prevent.

      The real lesson IMO is, if you do roll your own security, use a library for the password hashing. And if the algorithm ends up having been the wrong one, you'll just update the library. If it is on the network, use ssh or similar. Trust is bad, but that doesn't mean trusting yourself. It means to minimize the need for trust whenever possible. If you absolutely have to trust something, trust the normal generic Best Practice. Being able to look that up in the manual with all the noisy info glut might be non-trivial, though.

    9. Re:"In the wild" - slight exaggeration by swillden · · Score: 2

      Umm, that is an uncited claim in the summary. Nothing of the sort is stated in any of the links. The summary links to a paper that provides more details of the attack. Very heavy and technical though a few inital takeaways from it is that implementations only take a few days to run on gear they have so does seem safe to assume that SHA-1 collisions are pretty much pwned.

      The Python script in question doesn't find new SHA-1 collisions. It takes two input PDFs and produces two output PDFs that hash to the same value. It uses some quirks of how PDFs work, plus that original SHAttered collision generated by the Google researchers. Finding another collision is a lot of work. Using a known collision to generate PDFs with the same hash value is not.

      https://github.com/nneonneo/sha1collider

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    10. Re:"In the wild" - slight exaggeration by guruevi · · Score: 2

      If someone checked in, that means they have permissions to do so. It's not like Git just blindly accepts commits with the same hash but different contents. We know it's possible, it's even possible with SHA256 to create a collision, as long as you're making a hash, you can create a collision as you're mapping an infinite set of bits onto a finite set of bits, there will always be a second set of bits that creates a collision as the number of sets approaches infinity regardless of the hash function you use.

      The fact that it's "easier" for a certain definition of "easy" doesn't mean the thing is broken, it just means people should be more careful when accepting particular hashes (eg. if you're using a cloned repo of whatever software you want to use) but even then, a bit-by-bit comparison can easily weed them out.

      As far as mainstream repo's a) you would notice someone suddenly inserting a very oddly shaped document into your repo's b) that person would require permission to do so and c) you should never automate a repo to pull in and compile something into production. Not sure if that's what happened here, the summary is very unclear as to what actually happened besides someone intentionally pushing a broken thing and it broke other things.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    11. Re:"In the wild" - slight exaggeration by geekpowa · · Score: 1

      Thanks for the link!

    12. Re:"In the wild" - slight exaggeration by guruevi · · Score: 2

      Say it with me: Hashing is not Encryption. Hashing is not Encryption. Hashing is not Encryption.

      Very high level:
      Hashing is the irreversible mapping of a set of bits onto a (usually smaller) set of bits in order to obfuscate the original set of bits (one-way)
      Encryption is the mapping of a set of bits onto another equally sized set of bits where the mapping is reversible through some process (two-way)

      Hashing can be done with salts so that using rainbow tables is harder or impossible, but there will always be another set of bits that maps into the same set of bits. It's good enough for hiding a password or for reducing the complexity of finding matches. If you were writing a file system you could use it to do things like de-duplication but when you have a collision, you should ideally still do a bit-by-bit check when a collision occurs.

      Calculating a collision with actual useful content - if I want to insert a "return 0" on a particular line somewhere in the Linux kernel) is still as hard if not impossible to do as before without also inserting a load of weird, binary comments. We just know that these collisions can now be calculated faster, but it's not like adding an arbitrary string will break the calculation and produce a predefined hash.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    13. Re:"In the wild" - slight exaggeration by Anonymous Coward · · Score: 0

      And we don't even need to do daunting calculations. A particular hash collision has been demonstrated. If you want to break somebody's svn, you just check in the published file and some time later its counterpart.

    14. Re:"In the wild" - slight exaggeration by Anonymous Coward · · Score: 0

      In today's world of large botnets and distributed computing 6500+ cpu years + 110 GPU years is not a particularly daunting number.

      Well, that's still 160 days of Folding@home (100 petaFLOPS, with 80k CPUs, and 27k GPUs), or 2.6 years of the entire BOINC project (17 petaFLOPS, with 670k computers).

      Taking into account these are probably mostly higher-end computers, owned by more advanced users, instead of entry-level supermarket computers owned by beginners, which constitute the largest part of botnets. Higher-end computers will overall be harder to penetrate, and their owners will be more likely to notice if it is working at full power, when they are not using it... (well, even beginners will notice if their low-end computer starts working at full-power, even just through fan noise... and while they are less likely to clean the infection, they definitely will switch-off their computer when not in use, if it makes too much noise... it's very different from just sending spam and participating in DDoSes...).

      Still, though, unless they've been very lucky to find a collision so soon, it can be assumed multiple governments (and thus their 'friends') can find collisions pretty easily if they wanted to, in a matter of a few weeks, or even less, if it's very important to them... (that's if they didn't break it more before, if not from the start...).

      High-value targets requiring a single collision to attack will also soon be at risk from some non-government criminals...

    15. Re:"In the wild" - slight exaggeration by 0ptix · · Score: 2

      To be fair, any pair of distinct inputs to SHA1 that hash to the same value are a new collision. In general, being given one collision for a hash function doesnt make it automatically easy to find another. Its only because SHA1 is an iterated hash function (merkle-damgard) that this becomes true. (admittedly, almost all practical cryptographic hash functions are iterated constructions.)

      If SHA1(x0) = SHA1(x1) then for any z SHA1(x0¦¦z) = SHA1(x1¦¦z). I'm guessing the collision generated by the Google-CWI team is on a pair x0 and x1 where xb is the beginning of a pdf document that basically encodes "of the next two sections in this pdf file display section b". Given that its easy to extend them to any colliding pdf documents one wants.

    16. Re:"In the wild" - slight exaggeration by serviscope_minor · · Score: 1

      "not only weak, but broken" seems premature. The attack here involves manipulating two obtuse file formats to yield altered files with a shared hash, different to original unaltered hashes.

      It took less than 3 years for MD5 to go from "first collisison" to "can fake certificate trust chains".

      . But I am failing to see how this right now translates into a practical vector for the various places where I encounter SHA-1.

      But don't forget that the open literature discovered an as-yet-unknown attack against MD5 in an internet worm, one almost certainly written by a government organisation. In other words, the state of the art may well be a couple of years ahead of what's public.

      --
      SJW n. One who posts facts.
    17. Re:"In the wild" - slight exaggeration by nasch · · Score: 1

      Was somebody confusing hashing with encryption?

    18. Re:"In the wild" - slight exaggeration by tlhIngan · · Score: 1

      If developers with write access to your repo are malicious, you have much worse problems. This is not a serious threat, it is just an edge case that the future will prevent.

      What if they aren't malicious? I mean, WebKit SVN is down not because a developer wanted to try it, but because they were submitting a test case. A test case meant to verify that WebKit's caching algorithms aren't vulnerable to a SHA-1 collision.

      And in checking in this test case, he inadvertently broke the entire repository. It's completely possible he wasn't aware how SVN works internally that such a test case could break the repo as well.

      There's right now a worry that the master repository is irrepairable - that because of this checkin, you cannot repair it - the only way to recover is to restore it from a backup.

    19. Re:"In the wild" - slight exaggeration by Anonymous Coward · · Score: 0

      Even simpler. The collision is two "random" blocks of data, and the stuff that comes AFTER encodes "if this is block A, display this, but if it's block B, display that".

      As long as the PDF header before the blocks is the same, the rest of the PDF (the actual content) can be anything you like.

    20. Re:"In the wild" - slight exaggeration by Bob+the+Super+Hamste · · Score: 1

      It means that collisions are now possible, and are within reach of moderate sized organizations.

      This is the key. 6500 CPU years or 110 GPU years of computational power is not that difficult to achieve. When this news broke last week a few of us at my work had a discussion about it and while those number sound impressive we then realized that at work we have access to probably 2x that processing power in our building.

      --
      Time to offend someone
    21. Re:"In the wild" - slight exaggeration by Aighearach · · Score: 1

      None of that has meaning or value.

      This doesn't crash anything, and a test case meant to do some shit that it doesn't do well doesn't cause a problem other than for that test case. There is no bad thing happening in your story, just somebody has some shitty code.

      Then you wave your hands and say, "he inadvertently broke the entire repository."

      There is no worry that repositories would, or even might, or even could, because irreparable. That's just making shit up wildly. The speculation in the stories were going into a much more detailed scenario that does involve a malicious actor. Misunderstanding the danger doesn't cause it to change.

  10. Is it sane to rely on hashes alone? by Anonymous Coward · · Score: 0

    Computing power is plenty, memory even more so. Why not use a very simple hash to detect "might be the same", but then do full comparison, instead of relying on the hash? Cryptographic hash or not - collisions can always happen. Even at low probability, murphy always wins.

  11. Real world consequences by Anonymous Coward · · Score: 0

    Yesterday Linus said we should ignore this. Today, Apache no longer runs and it is one of the foundations of the Internet. Way to go, Linus.

    1. Re:Real world consequences by Anonymous Coward · · Score: 0

      Linus was talking specifically about git, you moron. Of course there are consequences when a SHA-family hash gets broken - but you knew that already.

    2. Re: Real world consequences by Anonymous Coward · · Score: 0

      Would have been funnier if you were British and said "Linus was talking about git, you git."

    3. Re:Real world consequences by DonaId+Trump · · Score: 0

      Today, Apache no longer runs

      You must get your news from Breitbart. Excellent choice, believe me!

    4. Re: Real world consequences by Anonymous Coward · · Score: 0

      Its OK, Mr. President.... Apache is running better than ever now that Barron fixed the SHA-1.

  12. Reading the paper. What is in an exponent?? by TheNarrator · · Score: 1

    I am trying to read their paper on the sha1 collisions over here: https://shattered.io/static/sh... and there's some unusual equation stuff.

    mi = (mi3 mi8 mi14 mi16)1

    Can anyone explain that to me in english?

  13. Re:Reading the paper. What is in an exponent?? by TheNarrator · · Score: 1

    Ah dam. My unicode got munged by the slashdot anti garbage filter. Should have hit preview first!

    Anyway the symbol I was referencing is a circular arrow pointing in a clockwise direction that looks like the images on this page: https://en.wikipedia.org/wiki/... . I've never seen that in a paper. What does it mean when it's in an exponent?

  14. Re:Reading the paper. What is in an exponent?? by cryptizard · · Score: 3, Informative

    It is a bitwise rotation. The direction and number specify if it is a right or left rotation and then how many bits to rotate.

  15. SHA-1 in git and co by DrYak · · Score: 1

    A cryptographic hash function has the properties you mention, plus the fact that it must not be easily reversible and uniformly distribute results over its entire output space.

    The later is a property which is not guaranteed by most common checksums.
    Thus, when you need a hash function to give a number to use as a handy "nickname" for a collection of data (e.g.: for a hash look-up table. Or for a content-addressable like git to create said addresses for a given content - and thus to give a serial number to a commit. Or apparently also used in SVN to give a simple number to designate commits), it might be a good choice to pick-up a cryptographic hash like SHA-1 because it guarantees you this additional property, which a vanilla checksum could lack.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  16. Check for identity given equal hash by Anonymous Coward · · Score: 0

    The sensible thing for VCS's is to have a list of hashes, in order of preference. (e.g. sha1, sha256, ...). Each time a commit is made where the hash has already been seen, the VCS has a file to compare to. If equal, problem. If not equal, raise alarm to operator, and try the second has on the list against both files (there will be only two, naturally). Each file in the VCS has at least one hash, probably more. In the event of a collision, use additional hashing functions until we can be sure whether files differ. These collisions are rare, so allowing more computation when they turn up is not an issue.

  17. GIT by DrYak · · Score: 3, Interesting

    Does the Git usage of SHA-1 *really* cause silent problems? I'm not sure how Git works internally but I was under the impression that it hashes whole objects, like individual source files at least.

    The individual objects inside git aren't file.
    The individual objects are commits (i.e..: the content of a patchfile, and a few information like pointer to other past commits to which this patch applies).
    To make things easier, a handy number designates this commit - this is currently generated by SHA-1.

    (Git is a content-addressable platform. You don't access object by name, you access them depending on their content. But instead of using the whole content to access them, you use addresses generated by SHA-1 to access the various blocks.
    So to say which are the parent commits to which the patch in a commit applies, you just mention them by using the SHA-1 sum of the content of these commits).

    A theoretical attack would be:
    - try to generate 2 commits.
    one adds a clean piece of code. the other adds a backdoored piece of code.
    but both commits hash to the same SHA-1 so they would be considered as "the same content" by git.
    Then try to force your target to re-download the whole repo from scratch from your backdoored history (otherwise git will simply ignore the commits with sha-1 sum that it already has - it thinks that it has the same content already).

    In practice it's currently not doable.
    The only thing that google managed to generate is a pair of block series. Each series contain completely random junk. Both series end-up generating the exact same shasum even if the random junk is different.
    - That is exploitable in a PDF (or any other binary format that supports scripting. You could even do it in an EXE) : using the embed scripting present 2 different contents depending on which random junk is present.
    - That is not exploitable in a sourcecode commit : you would need a believable explanation for why the random junk is present in the patched source code.
    AND you would need a piece of code which reacts differently (normal vs. backdoor) depending on which random junk is present - to be able to pull that unnoticed would require "Underhanded C Contest"-level of ingenuity.

    That's it, you only have blocks of random garbage.
    Google currently can't produce hashes colliding from arbitrary pieces of data ("Hey google: here's is legit script A, and that's malicious script B. Add a small nonce at the end so they both end-up having the same sha-1sum") ("Actually don't add a nonce, that would be too conspicuous, try to tweak the punctuation in the comments instead")

    Also as you mention, further edits will be problematic :
    if I edit script A and submit a patch, this patch will be valid, but will completely fail on top of script B.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  18. Hash Functions 101 by FeelGood314 · · Score: 4, Informative

    A hash function takes an arbitrary string of bits and outputs a string of bits of a fixed length.
    A CRC is an example of a hash function and a long CRC would probably be good enough for GIT or most repositories.
    First Pre-image resistance - this is a test of the one wayness of the function. Given a hash value it is difficult to find a pre-image that hashes to that value. Given y a string of bits of length hash output length finding X such that h(X) = y is hard.MD-5 and SHA-1 are still resilient against first pre-image attacks
    Second Pre-image resistance - given a message X finding a Y such that h(X)=h(Y) is difficult. MD-5 and SHA-1 are still resilient against second pre-image attacks
    Collision resistant - It is hard to find two messages X and Y such that h(X) = h(Y). Note the attacker here is free to choose both X and Y. Both MD-5 and SHA-1 are no-longer collision resistant.

    So far however the two messages X and Y have to be nearly identical. They have to start and end the same way and the blocks that are changed actually have to be changed and tested together to make sure the hash function internal state changes only in a specific way. I can't create a document that says the rent will be $3000 per month and another that says it will be $30000. (I might create one that says it is $3149.21 and the other $53210.63 per month, like in the PDF example they played with a colour field). Also because of the way the internal state of the hash function changes we now have a way of detecting if someone is feeding a "funny" stream of bits into our hash function and detect this attack with a very low probability of a false positive.

    1. Re:Hash Functions 101 by FeelGood314 · · Score: 2

      This still can be weaponized. Even if I only have two bit streams that start the same and then only differ in a block that I couldn't control I can still create malicious executables. Once I have the two streams that collide as long as the bits I add to both streams are identical the hashes will remain identical. I then have code after the differing block(s) that checks a value of a field in the differing blocks and behaves differently based on this value. I now have a good executable that is well behaved that I can submit to be signed by Microsoft or some other trusted company and a bad piece of software that has the same hash value. I take the valid signature from the good software and append it to the bad software and the signature remains valid.

    2. Re:Hash Functions 101 by Anonymous Coward · · Score: 0

      You might be able to do that with the rent - see https://alf.nu/SHA1

    3. Re:Hash Functions 101 by Anonymous Coward · · Score: 0

      Or you could use this for arbitrarily sized PDF files with multiple pages:
      https://github.com/nneonneo/sha1collider

    4. Re:Hash Functions 101 by Anonymous Coward · · Score: 0

      > I can't create a document that says the rent will be $3000 per month and another that says it will be $30000

      PDF allows you to do the equivalent of

      "The rent is $" + (if $flag then 3000 else 30000) + " per month".

      So you can effectively create two entirely different documents which differ by a single bit.

      Analysing the document structure would reveal the trick, but if you were trying to use it "for real", there are ways to make it far less obvious, e.g. a "compression" or "encryption" scheme where the documents' compressed/encrypted representations differ by a single bit.

      That would take less effort than finding your own collision (the researchers behind this one have already created a tool to detect files which rely upon this specific collision).

      If you were on the receiving end of such a bait-and-switch fraud, I suspect that you'd have trouble resolving it without spending upwards of $100k on lawyers and digital forensics.

  19. I have no faith in cryptography, because.. by Anonymous Coward · · Score: 0

    I have no faith in cryptography solutions these, because my impression is that the industry and the government(s) just don't seem to care about providing people with security and privacy.

    It is as if the world's governments wants to have a shitty internet to wage war on, and to spy on people.

    1. Re:I have no faith in cryptography, because.. by Bob+the+Super+Hamste · · Score: 1

      Considering that several years ago everyone was told to move away from SHA1 as it wasn't considered secure given the at the time theoretical attacks this shouldn't come as a surprise. NIST has been very open about the process as of late with the AES process and more recently the SHA3 process. Even though no known issues exist with the SHA2 suite of hashes they were proactive in going forward with the SHA3 process because SHA2 is mathematically similar to SHA1 so it may be possible to have related attacks against the various SHA2 hashes. I would question anything that is just dropped wholesale from the government like the whole botched EC crypto, and then there was thae long standing questions about the DES S-Boxes that while it turned out were strengthened against differential attacks but no explanation was given as to why at the time. Even now the full set of parameters used for them haven't been provided.

      --
      Time to offend someone
  20. the python script mentioned by Anonymous Coward · · Score: 0

    https://github.com/nneonneo/sha1collider/blob/master/collide.py

  21. True Tales of Slashdotters: Gravewax by Anonymous Coward · · Score: 0

    In today's world of large botnets and distributed computing 6500+ cpu years + 110 GPU years is not a particularly daunting number.

    He speculated as he ashed his blunt and stared bleary-eyed into a gumbo of Wikipedia tabs, a particle collider for only the least plausible threads of conspiratorial thinking, so that he might drop yet another dingleberry-dollop of wordshitting on slashdot DOT org.

  22. Re:about hash functions by Anonymous Coward · · Score: 0

    It seems obvious to me that a small string sequence could be identical from two differents long original texts. Even it happend, the hash function is NOT the original message, and a collision could happen. It does'nt mean that the two original texts are the same.
    Am i right ?

  23. Re:about hash functions by Lisandro · · Score: 1

    It seems obvious to me that a small string sequence could be identical from two differents long original texts. Even it happend, the hash function is NOT the original message, and a collision could happen. It does'nt mean that the two original texts are the same.
    Am i right ?

    Yes. A hash is nothing more than a function mapping data of arbitrary size to an output of fixed, smaller size so by definition you can always construct two inputs which yield the same hash. What makes crypto hashes secure is that this is normally very, very hard to do - that is, given a hash generate an input from it.

  24. Stop posting bullshit by Anonymous Coward · · Score: 0

    Aww, that's ugly:
    because Subversion uses SHA-1 hash to differentiate commits.

    Know why?

    Because SVN does not use SHA-1 to "differentiate" commits.