Slashdot Mirror


Making Sense of Revision-Control Systems

ChelleChelle writes "During the past half-decade there has been an explosion of creativity in revision-control software, complicating the task of determining which tool to use to track and manage the complexity of a project as it evolves. Today, leaders of teams are faced with a bewildering array of choices ranging from Subversion to the more popular Git and Mercurial. It is important to keep in mind that whether distributed or centralized, all revision-control systems come with a complicated set of trade-offs. Each tool emphasizes a distinct approach to working and collaboration, which in turn influences how the team works. This article outlines how to go about finding the best match between tool and team."

13 of 268 comments (clear)

  1. Errata by kabloom · · Score: 5, Informative

    Because Subversion offers working out of a shared branch as the path of least resistance, developers tend to do so blindly without understanding the risk they face. In fact, the risks are even subtler: suppose that Alice's changes do not textually conflict with Bob's; she will not be forced to check out Bob's changes before she commits, so she can commit her changes to the server unimpeded, resulting in a new tree state that no human has ever seen or tested.

    This statement is incorrect. Subversion requres you to update your working copy before committing whenever you have modified a file that has changed in the repository.

  2. No they don't. by SanityInAnarchy · · Score: 4, Informative

    Each tool emphasizes a distinct approach to working and collaboration, which in turn influences how the team works.

    Ok, yes, some tools do. For example, subversion supports trivial branching, but sucks at merging, so it encourages people to work on a common "trunk" branch. It also only supports a central server, so it "encourages" developing with a central server.

    Git, on the other hand, "encourages" people to not put multi-gigabyte files in version control.

    However, Git can be used to talk to an SVN repository. It can also talk to a central repository, or work purely via ssh between workstations, or with something like Gitjour, in a truly distributed fashion. Github is a strange and wonderful mutation of the two.

    Perhaps, by making branches and merges so awesomely fast, Git "encourages" lots of little local branches, and keeping a neat patch history. But to sum it up:

    SVN can handle large binary files and Windows better than Git, and is better integrated into IDEs.

    Git is better at everything else, ever. Seriously -- 99% of projects that are hosted on SVN would make more sense on Git.

    --
    Don't thank God, thank a doctor!
    1. Re:No they don't. by SanityInAnarchy · · Score: 4, Informative

      It has changed, somewhat -- but mostly, I think there's just better documentation.

      But, for example...

      Looked like you had to deal with bizarre syntax and long hex numbers for the simplest things

      That is pretty fundamental to the design -- it's a SHA1 hash. It's also not incredibly difficult -- cut and paste. When your SVN revisions hit four and five digits, they don't really have much more meaning than that hash, do they?

      Generally, you learn to use relative terms, instead -- for example, HEAD^ to refer to the revision just behind HEAD.

      mercurial was much more straightforward

      I thought so, too...

      I think I tried mercurial, and then bzr, and eventually settled on Git for three reasons:

      1. It's obscenely fast
      2. Everyone's doing it, which has a network effect (github)
      3. I can hold its data model comfortably in my head.

      I should clarify that last part... Maybe some things are cryptic, and I'm sure I don't know all of the possible commands I could run -- but at a very basic level, I know exactly what's going on, just like I did in SVN.

      Just for fun, here's the data model in a paragraph: There are commits. Each commit has a parent commit that it includes, except for merges, which have two parents. A branch is just a pointer to a commit.

      That's it.

      And knowing that, everything else starts to make sense... but it's more than I want to get into in a Slashdot post.

      --
      Don't thank God, thank a doctor!
    2. Re:No they don't. by PeterBrett · · Score: 2, Informative

      Well, except that SVN revision numbers are in order. Could you tell at a glance which of two binaries with the git SHA1 hash in the filename was newer? What about with an svn revision number?

      You may wish to investigate the git describe command. For example:

      [peter@harrington git (master)]$ git describe
      v1.6.3.2-225-gb836490

      The output contains the latest annotated tag, the number of commits since that tag, and the first 7 hex digits of the current commit hash prefixed by a "g". All the information you need to quickly or precisely identify a revision.

      Documentation is here.

  3. Re:Git and Mercurial? by Vanders · · Score: 3, Informative

    All you have to do is set up an extra server and say "Hey, this is the central server now".

    Yeah. I know. In fact I did just that at my last job when we implemented Mercurial. The problem is training developers to push their local changeset to the central repository and from stopping developers pulling from someone else and not the central repository. There was a least one incident a week where a conflict arose due to developers doing things like that which led to divergent codebases which required significant effort on behalf of one of the developers to merge and fix conflicts. I have no doubt these problems could have been fixed given time, but it was an uphill battle.

  4. Re:Git and Mercurial? by Wonko+the+Sane · · Score: 2, Informative

    Even if you are the only coder, a distributed system is still better since you're going to have your version, and the version on the server, and you want to be able to play about with your local version before pushing to the server. That's the sort of thing that git/mercurial are excellent at.

    I'm not even a coder but I already love git for getting the kernel sources. For years I have been downloading incremental patches from kernel.org but now I can update my tree with a single command. Just this weekend I installed git and cloned the kernel tree. Then I added the nouveau tree as a remote so I can try out KMS for my nvidia video cards. When I want to update my tree I can use just three commands:

    git remote update
    git merge origin/master
    git merge nouveau/master

    What's there not to like?

  5. Re:Git and Mercurial? by Antique+Geekmeister · · Score: 1, Informative

    Please do. For many corporate purposes Subversion is opular, but its truly awful security models (storing passwords silently in your local $HOME/.subversion/auth direcotory by default, unencrypted, and refusal to publish workable configuraitons for purely anonymous access), coupled with its designers absolute refusal to support deleting contents from the repository (even if they're accidentally stored DVD images or copyrighted code) leads to a very harsh conflict between the idea of "source control deletes nothing, ever" and the idea of "throwing useless things away makes cleaner code".

    I've come to profoundly hate Subversion for just these reasons, although I do administer it locally for certain projects.

  6. No mention of ClearCase? by gillbates · · Score: 3, Informative

    What I find interesting is there's no mention of ClearCase. Maybe the author is unaware of it, or considers it obsolete? Then again, the author didn't seem that experienced with the debacles into which one can get with revision control SW. The example he posits is the least of the problems which can crop up.

    I've used both ClearCase and CVS. First, CVS:

    1. I instinctively save files. And this is a bad thing to do with CVS; when I do a commit, my otherwise unchanged file can overwrite another engineer's more recent changes because I happened to save the file at a later date than him. The interesting thing is that this is not immediately apparent to either of us until we check out a fresh copy of the repository and he notices his changes are gone. And then I'm listed as the last modifier, and he comes to me...
    2. You can't (or shouldn't) copy one directory to another within a source tree. Nor should you do it between repositories. CVS will commit your changes to the copied directory back to the original repository, unless you delete all of the CVS folders. This little quirk cost a few of my colleagues a few hours of debugging to figure out why their changes kept disappearing...
    3. CVS does not (or did not when I used it) enforce strict version control protocol. I can commit an entire repository back to mainline even if I have outdated files. Even if others have made more recent updates. I didn't know this was happening for a good few months of use...

    Now for ClearCase

    1. ClearCase can manage extraordinarily large codebases spread across several geographical locations.
    2. It can be integrated with version control and bug tracking databases.
    3. It allows two or more developers to work on the same file at the same time, with the last one to commit having to perform a manual merge *only when there are conflicts*. Most of the time, it gets the merges right.
    4. With proper tagging procedures, I can always reproduce the last build bit-exact. No matter how badly an engineer subsequently mangles the codebase, I can always build from the last tag. My impending release can't be sabotaged by another developer committing code-breaking-but-it-compiles-on-my-machine-oh-silly-me-I-forgot-the-headers kind of changes.
    5. It does have problems with cache-coherency. Modifying files on machines other than the build machine may end up with stale files being linked...
    6. It has dynamic views, which don't require a full copy of the source tree on the local machine. There are some big advantages to this, among them being not having to worry so much about the theft of a developer's laptop, and using the server's storage pool for building, rather than the local hard disk. From a developer perspective, it is nice not to have to wait an hour or so for the repository download should I need to make a change to an older codebase. I can work on multiple versions of the same code base at the same time, without having to maintain a separate local copy of the entire tree for each of them.
    7. Managing ClearCase is an administrative position. Yes, it is exceedingly complex.
    8. Suppose I merge several bug fixes for a build. And later, one of those fixes needs to be backed out (didn't fix the problem, conflicts with other SW, etc...). I can do that with ClearCase rather easily, without having to reconstruct all of the interim versions between the two.
    9. I can apply the same bugfix to two different branches of a source tree without checking out and modifying both branches. That is, I can check the changes into one branch, and merge them into another branch (or just pick them up) without having to checkout the repository from the other branch.

    Now, granted, a lot of FOSS products are not trying to be SEI level 5*. They don't have to demonstrate a repeatable process. The often don't incorporate bug fixes into older releases, or maintain several concurrent branches of the same codebase. It is also important to show which

    --
    The society for a thought-free internet welcomes you.
    1. Re:No mention of ClearCase? by Anonymous Coward · · Score: 1, Informative

      I've used Clearcase, SVN, Git, and some CVS.

      Note: I only have experience with raw Clearcase and not the UCS workflow.

      Git is hands down the best. Clearcase has given me the most headaches out of any RCS I've had to touch.

      "ClearCase can manage extraordinarily large codebases spread across several geographical locations."
      Yes, it can manage huge code bases and allows checkouts of subtrees which Git doesn't allow (you can kind of hack around it with submodules but it's not the same), however, it is ridiculously slow. Making large code bases pretty much impractical to have. Almost every operation *requires* some network operation, the worst offense is that it needs to check with the licensing server. If you ever have the licensing server go down (impossible, I know) you pretty much shot off the balls of all your engineers until you get it back up.
      Running clearcase update on any large project takes an inordinate amount of time as almost every single file has to be looked over (as opposed to git, where it maintains near instantaneous speeds on even large code bases).

      "With proper tagging procedures, I can always reproduce the last build bit-exact. No matter how badly an engineer subsequently mangles the codebase, I can always build from the last tag. My impending release can't be sabotaged by another developer committing code-breaking-but-it-compiles-on-my-machine-oh-silly-me-I-forgot-the-headers kind of changes."
      You don't tag every commit though do you? Because what I have discovered is that some serious mangling can occur if you rename a file and a new file is created with the old file's name. I've found that this happens even if I am in a "private" branch.

      Raw clearcaseadvocates the idea of checking out files (locking them from other devs) before working on them. This introduces a new level of RCS hassle as you end up bringing in some admin to help unlock files that some guy in the next cubicle forgot to uncheckout before taking the week off.
      Oh yes, if you checkout files and then your computer dies..... better have your admin on speed dial.

      "It has dynamic views, which don't require a full copy of the source tree on the local machine. There are some big advantages to this, among them being not having to worry so much about the theft of a developer's laptop, and using the server's storage pool for building, rather than the local hard disk. From a developer perspective, it is nice not to have to wait an hour or so for the repository download should I need to make a change to an older codebase. I can work on multiple versions of the same code base at the same time, without having to maintain a separate local copy of the entire tree for each of them."
      I would like to mention that dynamic views also have a pretty bad disadvantage IMO. Since they are autoupdating, you can end up with some pretty nasty thrashing as your end up never having a specific version to develop against.

      Ever try using the clearcase from the CLI? The standard tool you use to interface with clearcase is `cleartool` most people alias this to `ct` because no one wants to type it all out. However, nothing is going to be able to save you from 'ct ci -c "comment" `ct lsco -r -me -s`' (that is the short version). Yes, this does the equivalent of 'git ci -a -m "comment"'. Or not quit, because you may have "hijacked" files (edited files without checking them out first). In that case, you have to find all your hijacked files through some other obscure `ct find` command. Of course you could look at the documentation, the wonderful world of `ct man` where they reimplement `man` in tried and true IBM (ugly) fashion.

      configspecs (how you specify which branch you are, what parts of codebase you have checked out) are also highly obtuse, creating your branches is a non-trivial operation, and raw clearcase versions by file, not changeset (bad).

      I could go on.

    2. Re:No mention of ClearCase? by 7+digits · · Score: 2, Informative

      Nope. Moving directories within a checked out repository and committing their content will commit them back from where they were checked out.

      His "save overwrite stuff" issue is probably due to him loading a file in his editor, updating the underlying version, and saving the file. If he uses a shitty editor, he may overwrite the changes. If he blindly commit his changes, he may have manually reverted the file. I've seen this happen with careless developers. I don't consider this a deficiency of CVS, as he actively overwrote the file. I could do that with any version control system.

      His last point is wrong, though. Maybe he had some script that did some forced commit.

  7. Re:Git and Mercurial? by orzetto · · Score: 4, Informative

    [Subversion's] designers absolute refusal to support deleting contents from the repository [is bad]

    That is one great feature of Subversion: absolutely no way to screw up stuff that was committed. Revision control is about keeping track of stuff, any model that allows a user to remove information from a repository is a disaster quietly waiting to happen; sorry you did not understand that.

    If you absolutely need to remove something from a SVN repository, you can do that with svndumpfilter, meaning you have to ask the repository's administrator. That's a good safeguard against accidental deletions.

    "throwing useless things away makes cleaner code"

    For "cleaner code" you just need svn delete.

    --
    Victims of 9/11: <3000. Traffic in the US: >30,000/y
  8. Re:Git and Mercurial? by locofungus · · Score: 2, Informative

    There are only a few use cases where single user distributed and centralized revision control systems differ.

    1. You can carry a local repository on your laptop and commit. You do not then need to sync to the master repository before continuing work there. (Typically in a distributed system there is one repository that is given the status of master - this avoids issues where two teams might be syncing amongst themselves but both are blissfully unaware that there is any other work happening in the same area of code.)

    2. You can work simultaneously on two separate checkouts and commit them without having to "promote" one of them to a branch.

    IMO any RCS that doesn't allow you to commit your tested and working snapshot whenever you want is fundamentally broken. Distributed systems must support this by definition[1] and any non distributed system that supports this can trivially be made distributed.

    [1] Some distributed systems require you to merge when you synchronize changes. IMO merging should be separate from synchronizing

    Tim.

    --
    God said, "div D = rho, div B = 0, curl E = -@B/@t, curl H = J + @D/@t," and there was light.
  9. Re:TortoiseSVN by Lally+Singh · · Score: 2, Informative

    Ugh, I can't stand Tortoise. It just *kills* the speed of my file-open/save dialogs. In exchange for a few labels (and not in visual studio! just the explorer) and right-click commands (hint: a menu and some dialog boxes do not constitute a GUI) I literally go get coffee when the dialog box is loading my checked-out repo.

    psvn.el for Emacs, however, is an absolute dream. I see my repo (or subfolder thereof) as one dired-like list. diff, checkin/update, etc. are live and just update my buffer.

    --
    Care about electronic freedom? Consider donating to the EFF!