Linus on GIT and SCM
An anonymous reader sends us to a blog posting (with the YouTube video embedded) about Linus Torvalds' talk at Google a few weeks back. Linus talked about developing GIT, the source control system used by the Linux kernel developers, and exhibited his characteristic strong opinions on subjects around SCM, by which he means "Source Code Management." SCM is a subject that coders are either passionate about or bored by. Linus appears to be in the former camp. Here is his take on Subversion: "Subversion has been the most pointless project ever started... Subversion used to say, 'CVS done right.' With that slogan there is nowhere you can go. There is no way to do CVS right."
anybody have a good tutorial? (not the crappy one which comes with it)
I'm not an SCM rube either. I've competently used tla (arch), darcs, and of course CVS. but git just seems too hard to use. damn fast though.
... And that is that CVS/SVN are centralized, while GIT is distributed, like GNU Arch.
There are appropriate uses to both of these, and in kernel development I think it makes sense to have distributed development. However, in smaller projects, which really *need* a very specific direction (example, Wesnoth, I would think would not have gotten where it is today if there were so many branches where people were all making their own art).
Linus is enough of a famed leader that he's going to be listened to, and thus kind of pulls the community around him as a central source of development. That's not necessarily going to happen everywhere.
http://mediagoblin.org/
You might want to check out TortoiseSVN if you're using svn on windows. It makes version control really easy, and you don't even have to touch the command line.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
The ultimate reason why Linus dislikes SVN, CVS, etc. is that it is centralized. Everyone checks out source from a central server and commits their changes to the same centralized area. This has problems: your workspace is not versioned. By this I mean, you cannot track local changes to your workspace without committing them to the central server.
A common pattern in development is to try one approach, test it, tweak it, and possibly try another approach if the first did not work out, perhaps reverting to a prior approach. With decentralized version control, you can commit your changes to a local repository and work from there. All the locally changes you make are versioned, and be committed, checked out, examined all without contacting a central repository. This is ideal, because you often want to try various options to find the one that works best, before pushing your changes to the rest of the world. In centralized version control, you can use a branch for this purpose, but often branches in these systems are difficult to either create, merge, or maintain, so they are rarely used. The end result is that with centralized version control, developers version their workspace in their head. DVCS systems remove the mental burden.
Fortunately, FOSS developers are realizing the usefulness of DVCS and major projects are converting to some form of DVCS. Mozilla is switching to Mercurial. The Pidgin project, which just released 2.0.1, is using Monotone. (Linus favorably mentioned both of these distributed version control systems in his Git talk, as they are both are distributed).
Once you accept that DVCS is better than the centralized model (which may not be true for some situations), only a few (but growing number of) version control systems are viable. This is currently a hot area in open source development, with software such as GNU Arch, Monotone, Mercurial, Git, Darcs, Bazaar, and more paving the way. Many open source DVCS's are still in development and not ready for general usage. I can't speak for Mercurial, but Monotone doesn't have the greatest performance, instead preferring integrity over speed. This led Linus to write git, since speed is very crucial for a large project like the Linux kernel.
Whatever the actual program (git, Mercuial, or Monotone), more and more open source developers are realizing the advantages that distributed version control can offer. I encourage all developers that haven't used any DVCS to try it -- once you do, you won't go back.
Tired of free ipod spam sigs? Opt ou
I use SVN on windows, mac os x, linux (ubuntu, debian, fedora) as well as netbsd. TortoiseSVN works great on windows especially for the point and click style users who need to use SCM. SvnX works great on Mac OS X. Altium PCB designer works great with the svn command line tools and shows graphical diffs of our circuit boards. But for some reason, Tortoise SVN and svn.exe are unable to access a GIT repositiory.
In addition, git works well for simple projects but not so well for projects that have many different related subprojects which share code.
For instance, our SVN repository holds everything needed for an entire product, including embedded linux with busybox, initrd and custom software and libraries - as well as DSP source code for two different add on cards, the GUI for mac, windows, and linux, the docutils xml file for the various manuals, and manufacturing and test code.
I'd love to use git once it attains the required maturity level so that I can do what I need with it.
--jeffk++
ipv6 is my vpn
You hit the nail on the head. Distributed version control often comes with superior merging, making the process less painful and encouraging it to occur frequently. Monotone employs a 3-way merge, Codeville has an innovative merging algorithm, and some may even support 5-way merging ("left's immediate ancestor, left, merged, right, right's immediate ancestor") in the future.
In my experience, nearly all merges occur automatically and cleanly. Only if two developers modified code in conflicting areas of the source code do you have to merge manually--and even then, only one person has to do it. It is much better to have merging operate automatically and transparently when possible, than to have to have two people manually coordinate each and every one of their changes beforehand.
Tired of free ipod spam sigs? Opt ou
Most distributed version control systems exhibit this phenomena, because by "checking out" you are actually doing two operations: pulling the latest changes from someone else, and updating your workspace. For example, in Monotone you would type (I imagine git operates similarly):
The first command retrieves revisions from the server, and the second updates your workspace with those new changes. To "commit" a change, in a distributed version control system you first 1) commit the change to your local repository and then 2) push it to someone else:
It is often useful to keep these operations separate. For example, you can commit without pushing. Make a bunch of changes, commit each one separately, and only push once you're satisfied with the result. Other developers can still see each change you made individually, but only after you've pushed, so they won't be stuck with an incomplete in-progress version of the tree.
Similarly, by being able to update without pulling, you can revert to any revision you would like without contacting the network. Likewise, since commit does not require network access, it is no extra effort to work offline. Once an Internet connection is available, you can synchronize your repositories, but in the meantime you can make any change you want - even with no network connection.
The main disadvantage of a decentralized version control system is that it requires workflow changes to get the most out of it. If you are only familiar with centralized version control systems, it will take some time getting used to. But I'm glad to say, an increasing number of projects are making the change to distributed version control, among them, Mozilla and Pidgin. They are not using Git (but Mercurial and Monotone, respectively) but they're all distributed. Git is being used by the Beryl project, among others. Subversion has momentum in FOSS because it is familiar for those used to centralized version control (everyone knows CVS), and SourceForge provides free SVN hosting. Once a free open source hosting site provides hosting for a distributed version control system, I expect more low-resource open source projects to use it.
Tired of free ipod spam sigs? Opt ou
So don't do it
Wow! I bet you have never worked on anything other than hobby
projects.
Most projects I have worked on cannot do without branching &
branching big & I am not talking about branches created for
individual devs.
What do you do if you have make patches on an earlier release(s)?
What do you do if your project team has 50 devs working on
5 different modules inside? If one guy makes a buggy submit
it will break every one else? Typically each team does weekly
sanity tests & then propagates the changes to the main.
Yeah - and I agree with Linus - CVS is rubbish.
Have used CVS, Clearcase & Source Depot. Source Depot
is a Microsoft internal Source Control system. Microsoft
licensed Perforce & developed on it. I used to work with
MS long back & Source Depot was the best Source Control
System I have ever used.
CVS lacks too many features.
1) Atomic checkins/submits
I am trying to submit changes in 5 files as a single bugfix.
A submit/checkin should either succeed for all 5 or fail for all 5.
CVS doesn't do this. The end result is that I may end up submitting
a change in the header without submitting a correspond change in the
implementation file.
2) Changelists
After checking in multiples files together, at any point in time, I should
be able to find out all the changes that were checked in at the same time.
CVS has no way of doing this - Submitting 5 files together is the same as
submitting 5 files separately as far as CVS is concerned.
3) More Changelist features for non-submitted changes
Let us say I am working on 3 different bugfixes. Source Depot allows me
group together my changes in different changelists even before I
submit the changes. That is I can create changelist A B & C.
In changelist A - I have files a.c & a1.c changed, in changelist
B, I have b.c & b1.c changed & so on. So I decide I am done with
all the changes required in the subset A, I can submit it very easily
or undo all changes in changelist B.
4) Merges
Merges between branches are a breeze with Source Depot. With CVS it's
a pain. Source Depot stores a lot of information about merges which have
already happened which in invaluable. In CVS, merges between branches
are very little more than changes manually copied from one branch to
another.
I can do a lot of stuff which I can't do with CVS
- I can very trivally merge Bugfix 1111 (comprising of 5 files
checked into changelist XXXX) from a branch to another branch or
the main trunk.
- Because Source Depot stores information about merges, I can do periodic
single command merges very easily between a branch & the trunk - Source Depot
will not try to merge in changes which have already been merged the last
time I did a merge.
I could go on & on, but the point is that something Source Depot makes
a developers life so much more easier. I could work around all these
things in CVS (i.e. do it in multiple steps) but the ease is something
worth paying for I think. If Microsoft ever released Source Depot
as a commercial product, it would be great, but I don't suppose their
license with Perforce would allow it.
The advantage is that MergePrivileges can be fine-grained: there can be many answers for "merge into what?" There's a -mm tree, a -stable tree, a -linus tree, a -rt tree, and a lot of vendor and distro trees. Each of these has a different maintainer, and can have a different idea of what is acceptable. And only the maintainer can merge things into their tree, and they can decide based on a variety of features of the things they're considering. For example, Linus only merges from a few people directly: maintainers of various subsystems. And he doesn't even trust them completely; if the SD/MMC maintainer has a change which changes x86 architecture code in the tree Linus is asked to merge, he'll notice and ask what's up with that. And if there are changes that look too intrusive for the current point in the development cycle, he'll put it off until the next cycle, and ask for a tree with just fixes. And -linus isn't special, except that almost everybody trusts him implicitly and merges his stuff into their trees (the main exception being -stable, which is why a new 2.6.20.x kernel isn't derived from 2.6.21; and vendor and distro kernels are generally based on -stable of some sort, and only get new stuff from Linus when they go to a new series). Also, maintainers of subsystems know the people who work in their areas, and can apply the same sorts of rules: the guy from Intel who works on their network drivers can get e100 changes into the the -netdev tree, because the maintainer knows they know what they're doing for e100 changes. And Linus sees that the e100 changes are coming in through -netdev, and the network maintainer knows what policy to apply to the drivers around there, so they're fine, even if Linus has no clue who should be allowed to do what in e100.
It's not that the politics go away. It's that the policy is no longer a binary "yes or no" decision, so the technical arrangement mirrors the social arrangement. This doesn't work with CommitAccess because people wouldn't commit the same change everywhere they should, and they couldn't be restricted to only making changes they're trusted to make (there are people who are trusted to correct spelling in comments in any file in the tree, and Linus can look through the total changes they send and verify that they only change spelling in comments).
http://www.youtube.com/watch?v=4XpnKHJAok8
This is the video from the article. You can either watch it in the tiny embedded window, or you can go to youtube and click the button to watch it full-screen.
Look, posters: if you're going to point to a video that's hosted on YouTube (or another video hosting site), just link to that site. Don't link to some random web page that has the video embedded in it.
--