Linus on GIT and SCM
An anonymous reader sends us to a blog posting (with the YouTube video embedded) about Linus Torvalds' talk at Google a few weeks back. Linus talked about developing GIT, the source control system used by the Linux kernel developers, and exhibited his characteristic strong opinions on subjects around SCM, by which he means "Source Code Management." SCM is a subject that coders are either passionate about or bored by. Linus appears to be in the former camp. Here is his take on Subversion: "Subversion has been the most pointless project ever started... Subversion used to say, 'CVS done right.' With that slogan there is nowhere you can go. There is no way to do CVS right."
The ultimate reason why Linus dislikes SVN, CVS, etc. is that it is centralized. Everyone checks out source from a central server and commits their changes to the same centralized area. This has problems: your workspace is not versioned. By this I mean, you cannot track local changes to your workspace without committing them to the central server.
A common pattern in development is to try one approach, test it, tweak it, and possibly try another approach if the first did not work out, perhaps reverting to a prior approach. With decentralized version control, you can commit your changes to a local repository and work from there. All the locally changes you make are versioned, and be committed, checked out, examined all without contacting a central repository. This is ideal, because you often want to try various options to find the one that works best, before pushing your changes to the rest of the world. In centralized version control, you can use a branch for this purpose, but often branches in these systems are difficult to either create, merge, or maintain, so they are rarely used. The end result is that with centralized version control, developers version their workspace in their head. DVCS systems remove the mental burden.
Fortunately, FOSS developers are realizing the usefulness of DVCS and major projects are converting to some form of DVCS. Mozilla is switching to Mercurial. The Pidgin project, which just released 2.0.1, is using Monotone. (Linus favorably mentioned both of these distributed version control systems in his Git talk, as they are both are distributed).
Once you accept that DVCS is better than the centralized model (which may not be true for some situations), only a few (but growing number of) version control systems are viable. This is currently a hot area in open source development, with software such as GNU Arch, Monotone, Mercurial, Git, Darcs, Bazaar, and more paving the way. Many open source DVCS's are still in development and not ready for general usage. I can't speak for Mercurial, but Monotone doesn't have the greatest performance, instead preferring integrity over speed. This led Linus to write git, since speed is very crucial for a large project like the Linux kernel.
Whatever the actual program (git, Mercuial, or Monotone), more and more open source developers are realizing the advantages that distributed version control can offer. I encourage all developers that haven't used any DVCS to try it -- once you do, you won't go back.
Tired of free ipod spam sigs? Opt ou
When you're starting out, just remember "git commit -a" and you'll be fine. Also check out "git reflog" to see the linear history of your repo. The pulling/pushing stuff can get a lot more complex but it's damn powerful. If you can figure out Arch (yeesh) you can figure out git!
SLASHDOT SEZ: you have too few characters per line. Okay, slashdot, here's part of the man page for git-rebase:
If is specified, git-rebase will perform an automatic git checkout before doing anything else. Otherwise it remains on the current branch. All changes made by commits in the current branch but that are not in are saved to a temporary area. This is the same set of commits that would be shown by git log
Most distributed version control systems exhibit this phenomena, because by "checking out" you are actually doing two operations: pulling the latest changes from someone else, and updating your workspace. For example, in Monotone you would type (I imagine git operates similarly):
The first command retrieves revisions from the server, and the second updates your workspace with those new changes. To "commit" a change, in a distributed version control system you first 1) commit the change to your local repository and then 2) push it to someone else:
It is often useful to keep these operations separate. For example, you can commit without pushing. Make a bunch of changes, commit each one separately, and only push once you're satisfied with the result. Other developers can still see each change you made individually, but only after you've pushed, so they won't be stuck with an incomplete in-progress version of the tree.
Similarly, by being able to update without pulling, you can revert to any revision you would like without contacting the network. Likewise, since commit does not require network access, it is no extra effort to work offline. Once an Internet connection is available, you can synchronize your repositories, but in the meantime you can make any change you want - even with no network connection.
The main disadvantage of a decentralized version control system is that it requires workflow changes to get the most out of it. If you are only familiar with centralized version control systems, it will take some time getting used to. But I'm glad to say, an increasing number of projects are making the change to distributed version control, among them, Mozilla and Pidgin. They are not using Git (but Mercurial and Monotone, respectively) but they're all distributed. Git is being used by the Beryl project, among others. Subversion has momentum in FOSS because it is familiar for those used to centralized version control (everyone knows CVS), and SourceForge provides free SVN hosting. Once a free open source hosting site provides hosting for a distributed version control system, I expect more low-resource open source projects to use it.
Tired of free ipod spam sigs? Opt ou
The advantage is that MergePrivileges can be fine-grained: there can be many answers for "merge into what?" There's a -mm tree, a -stable tree, a -linus tree, a -rt tree, and a lot of vendor and distro trees. Each of these has a different maintainer, and can have a different idea of what is acceptable. And only the maintainer can merge things into their tree, and they can decide based on a variety of features of the things they're considering. For example, Linus only merges from a few people directly: maintainers of various subsystems. And he doesn't even trust them completely; if the SD/MMC maintainer has a change which changes x86 architecture code in the tree Linus is asked to merge, he'll notice and ask what's up with that. And if there are changes that look too intrusive for the current point in the development cycle, he'll put it off until the next cycle, and ask for a tree with just fixes. And -linus isn't special, except that almost everybody trusts him implicitly and merges his stuff into their trees (the main exception being -stable, which is why a new 2.6.20.x kernel isn't derived from 2.6.21; and vendor and distro kernels are generally based on -stable of some sort, and only get new stuff from Linus when they go to a new series). Also, maintainers of subsystems know the people who work in their areas, and can apply the same sorts of rules: the guy from Intel who works on their network drivers can get e100 changes into the the -netdev tree, because the maintainer knows they know what they're doing for e100 changes. And Linus sees that the e100 changes are coming in through -netdev, and the network maintainer knows what policy to apply to the drivers around there, so they're fine, even if Linus has no clue who should be allowed to do what in e100.
It's not that the politics go away. It's that the policy is no longer a binary "yes or no" decision, so the technical arrangement mirrors the social arrangement. This doesn't work with CommitAccess because people wouldn't commit the same change everywhere they should, and they couldn't be restricted to only making changes they're trusted to make (there are people who are trusted to correct spelling in comments in any file in the tree, and Linus can look through the total changes they send and verify that they only change spelling in comments).