Linus on GIT and SCM
An anonymous reader sends us to a blog posting (with the YouTube video embedded) about Linus Torvalds' talk at Google a few weeks back. Linus talked about developing GIT, the source control system used by the Linux kernel developers, and exhibited his characteristic strong opinions on subjects around SCM, by which he means "Source Code Management." SCM is a subject that coders are either passionate about or bored by. Linus appears to be in the former camp. Here is his take on Subversion: "Subversion has been the most pointless project ever started... Subversion used to say, 'CVS done right.' With that slogan there is nowhere you can go. There is no way to do CVS right."
Well Linus didn't have anything bad to say about MS Source Safe. . .
;-)
[ducking] Sorry, I couldn't resist the urge.
I hope you're working for one of my company's competitors, if you are so eager to hamstring your developers and limit their productivity! Having to wait for someone else to finish a major piece of development before I can fix a bug in an unrelated section of a file they happen to be modifying... yeah, that's the way to turbocharge your development process.
CVS and Subversion are open source projects, Linus should fix them.
anybody have a good tutorial? (not the crappy one which comes with it)
I'm not an SCM rube either. I've competently used tla (arch), darcs, and of course CVS. but git just seems too hard to use. damn fast though.
He is only human. Just because he is the head of a huge software project doesn't make him infallible.
Just look at the whole 'RMS vs Linus' thing.
His opinions should carry some weight, especially since he should know more than anyone what the limitations of SCM software is when it comes to larger projects like the linux kernel. But a lot of SCM comes down to the way a project is managed, the preferences of the people involved, and how they deal with their project. I doubt there is a blanket solution... a 'one SCM package to rule them all' so to speak.
Especially in the software industry you can always find someone just as good as yourself that strongly holds opinions that are the polar opposite of yours.
We ALL know that the people who use CVS and SVN are version control Nazis!
I've used CVS, SVN, and GIT in serious projects and I can say I far prefer SVN to GIT, and GIT to CVS. GIT was incredibly confusing to use, and it may just have been the way the repository was administered was poor, but I never knew if I was synched with everyone else's checkouts and the command names made no sense. Its been over a year so I don't remember the details of GIT, but I remember having to do a lot of things "twice". Need to do a checkout? Two commands. Need to commit? Two commands. It was a bitch to use and I am glad I'm done with it. SVN, on the other hand, I felt very comfortable with from the start and most important of all, I trusted SVN to do what I wanted it to and to keep me from screwing up. In a year of using it, it has failed to lose my trust.
I'm not trying to say SVN is better than GIT. The best repository depends on the type of project and type of development. But defaming SVN in favor of GIT is not, I believe, a valid statement. Especially when (I'm pretty certain) many, many more projects use SVN rather than choosing to use GIT.
Hero of Allacrost, a FOSS RPG for *NIX/*BSD/OS X/Win
No one said that if you're famous and contributed something incredible to the world (such as Linux) you can't speak out of your ass most of the time, just because you enjoy how everybody listen and try to decipher if they should care about it, or just laugh and pass by.
I use SVN if a medium sized team and see SVN used extensively in all kinds of projects around the globe with great success. I personally love the workflow of SVN.
The only thing that they need to work is merging of branches, and incidentally I've talked to the developers, they're quite aware of this flaw of SVN and working on it. We'll see new versions that can track changes in each branch and even attempt automated merges with good success.
I know a guy who has the same personality like Linus. The guy is very smart, he single-handedly is coding an application which is very popular in its area (won't mention it since that's internal stuff). He keeps bitching all the time: about customer feature request, about random products and how sucky they are, how people can't see that. And he could also change his opinion overnight for no apparent reason and go in the other extreme. But he's a friggin' programming genius and what he does is great, despite is takes a lot of effort to deal with him.
Well, probably those two go together: being an amazing creator, and being an amazing ass with huge ego. Who knows.
... And that is that CVS/SVN are centralized, while GIT is distributed, like GNU Arch.
There are appropriate uses to both of these, and in kernel development I think it makes sense to have distributed development. However, in smaller projects, which really *need* a very specific direction (example, Wesnoth, I would think would not have gotten where it is today if there were so many branches where people were all making their own art).
Linus is enough of a famed leader that he's going to be listened to, and thus kind of pulls the community around him as a central source of development. That's not necessarily going to happen everywhere.
http://mediagoblin.org/
You missed the point of the thread; to discuss git, not to be one.
My favorite, of course, is Mercurial. My main draw is that I had been interested in distributed SCMs for years, but had never found one that made any sense to me whatsoever. I was on the hunt again and stumbled on Mercurial, and I've been hooked ever since.
Of the various distributed SCMs, Mercurial is the easiest to use one I've found. And it's pretty fast, though not quite as fast as git (though I have some ideas on how to fix that). And since it's written in Python with only a very small C component it runs on many platforms.
Need a Python, C++, Unix, Linux develop
I took a look at git a while ago and was completely underwhelmed. The UI was so bad it was useless, and it didn't "seem" to do anything that Darcs didn't do. (I used to love Darcs because of the automatic patch dependency computations).
.git dir and shell scripts that combine very simple low-level functions. For instance, you can create a branch just by saving the SHA1 ID of the tip into a file in .git. You can branch off any point in the history this way, including branches you've deleted in the past (git keeps all the old commit objects by default, even ones that aren't pointed to by any branch or tag.. this is very simple and understandable model, like reference-counting in a way).
Now that all the "next generation" SCM tools have matured somewhat, I took a look at all of them again. I had to stop using Darcs because of the "patch of death" problem, which basically is this: after using Darcs on a project with long-lived parallel branches, the repository may eventually enter a wedged state you can't get out of, due to exponentially complex patch dependencies. Oops.
At this point I had an idea of what an SCM should do, how it should work, what the "mental model" should be. I want to create changesets, add them to branches, combine multiple branches (and keep track of renames and so forth between branches), re-order changesets, collapse multiple changesets into one, discard old branches, etc.
Of course, CVS and close cousin Subversion are SO UTTERLY USELESS I didn't even consider them. Seriously, Subversion is like gold-plated shit. Looks nice but it's still shit. Reading people say stuff like "Subversion is awesome" makes me wince. How can something that doesn't have "real" branches, and doesn't have tags OF ANY KIND, be useful for anything? How do you keep track of multiple merges between branches? Answer: you don't. Or you keep track of revision numbers using svnmerge and pray it all works. Even the Subversion docs sortof hand-wave this away. I.e., they hand-wave away one of the FUNDAMENTAL ASPECTS of source code management: branching and merging. It's like hearing people talk about OO databases. They mean well but they just don't comprehend the generality of the underlying problem.
That's why I was so excited about Darcs: the author "gets it". Unfortunately the implementation is flawed.
I checked out a few more (Mercurial, bzr) but finally settled on git because it let me do all the things I needed to do, and it did them FAST. Once I figured out the underlying model I was pretty impressed. Git can be viewed at many levels: very low-level plumbing, or UI-level, or in between. The UI and documentation is still pretty shitty, but thankfully they are working on improving it and are moving away from the idea of having interchangeable UIs. Just focus on improving "core git".
One great thing about git is that so much of it is just files in the
The other great thing about git is how easy it is to sling changes around and reorder them and combine them. For instance let's say you add a file to your project as commit "A". Then you add some code that uses this file as commit "B". Then you fix a bug in the file as commit "C". So you have A-B-C. Now you'd like to combine A and C into a single patch A', and put B on top of it, like this: A'-B. In git, this is super-easy. I can think of two ways to do it off the top of my head.
I was checking into a CVS project the other day (for a client) and wanted to do this. Then I realized, you can't move things around in CVS like this *twitch*. So nowdays I do everything in git and only after the changes are beautiful and self-contained and well-commented do I check them into CVS one at a time.
Okay so they point is, check out git (or honestly? Checkout out ANYTHING that isn't CVS or svn). Even if you think Linus is an asshole (which he is) or you don't like the git UI (it's not that bad now), check it out anyway.
And if you don't use SCM at all? You suck. Start learning. It's a best practice that you can't live without, once you start.
The thing is, you've got the wrong solution to the problem. Rather than not allowing branches, you need to control when and how often they're made, and how long they're allowed to survive. Your fixing a policy problem with technology, which never works well. If the branches are kept under control, you don't have the last-second merge problem. Merges should be happening constantly throughout the process so everyone stays in sync. If someone isn't committing their work at least once a day, that's when they get a stern talking to from the lead developer. Because if a developer needs to coordinate with another developer to change one line of code, then you've wasted two people's time instead of one.
You might want to check out TortoiseSVN if you're using svn on windows. It makes version control really easy, and you don't even have to touch the command line.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
The ultimate reason why Linus dislikes SVN, CVS, etc. is that it is centralized. Everyone checks out source from a central server and commits their changes to the same centralized area. This has problems: your workspace is not versioned. By this I mean, you cannot track local changes to your workspace without committing them to the central server.
A common pattern in development is to try one approach, test it, tweak it, and possibly try another approach if the first did not work out, perhaps reverting to a prior approach. With decentralized version control, you can commit your changes to a local repository and work from there. All the locally changes you make are versioned, and be committed, checked out, examined all without contacting a central repository. This is ideal, because you often want to try various options to find the one that works best, before pushing your changes to the rest of the world. In centralized version control, you can use a branch for this purpose, but often branches in these systems are difficult to either create, merge, or maintain, so they are rarely used. The end result is that with centralized version control, developers version their workspace in their head. DVCS systems remove the mental burden.
Fortunately, FOSS developers are realizing the usefulness of DVCS and major projects are converting to some form of DVCS. Mozilla is switching to Mercurial. The Pidgin project, which just released 2.0.1, is using Monotone. (Linus favorably mentioned both of these distributed version control systems in his Git talk, as they are both are distributed).
Once you accept that DVCS is better than the centralized model (which may not be true for some situations), only a few (but growing number of) version control systems are viable. This is currently a hot area in open source development, with software such as GNU Arch, Monotone, Mercurial, Git, Darcs, Bazaar, and more paving the way. Many open source DVCS's are still in development and not ready for general usage. I can't speak for Mercurial, but Monotone doesn't have the greatest performance, instead preferring integrity over speed. This led Linus to write git, since speed is very crucial for a large project like the Linux kernel.
Whatever the actual program (git, Mercuial, or Monotone), more and more open source developers are realizing the advantages that distributed version control can offer. I encourage all developers that haven't used any DVCS to try it -- once you do, you won't go back.
Tired of free ipod spam sigs? Opt ou
Linus talks about his distributed model, how everyone has a branch, and how this avoids politics associated with who gets commit access. He claims (and I admit I've seen this happen in some) that many projects have quite the internal politicking on who has CVS commit access. But then he claims that Git's special sauce eliminates these internal politics. Ok, I was intrigued, so I listened on.
Essentially, he explains, the secret with Git is that everyone has commit access on their own branch - they do whatever they want. He says that the way it works is that someone does something cool with their own branch, then they start hollering to say "Hey, I have a good branch, merge mine" and it will get merged. Politics over.
Ok, so now I'm scratching my head. How is this a fundamentally different paradigm? In CVS, basically anyone can check out the whole tree and make any changes the like. They can then say, see, my changes are good and ask for them to get committed or ask for commit access themself. In Git, this commit access bottleneck is just moved from the commit stage to the merge stage. You make your changes, commit them to your separate and unique branch, and then ask someone with to merge it, or give you the ability to merge it in to mainstream. How exactly does this eliminate the politics? You are still going to have some people with "the power" and some people without. In any project where you have people who are going to fight about who gets commit access, you'll just have a fight about who has the ability to merge into mainstream.
So, ok, distributed is nice (though for some projects central may be preferred) but I don't see how this magic system bypasses politics. In fact, I can potentially see more internal politics over this method. I can see factions gathering to support this or that branch, arguing about which is better, fighting about which one gets merged in. I can see the potential for branches going longer between merges, and more changes happening at once, making it harder to track problems. I don't claim these scenarios are more likely, but I do claim that this changing from a commit access to a merge access paradigm is just renaming the problem.
If you have a project that has thousands of developers all of the world like Linux does, a SCM system that is focused on merging makes a lot of sense. Unfortunately, there is a tendency for some people to overdo merging on small projects when they don't really need to. If the application is designed in a modular fashion and developers are assigned specific modules, than merging is rarely needed. Of course, many control freaks don't like this approach because it makes it harder for them to "correct" other developer's code.
I use SVN on windows, mac os x, linux (ubuntu, debian, fedora) as well as netbsd. TortoiseSVN works great on windows especially for the point and click style users who need to use SCM. SvnX works great on Mac OS X. Altium PCB designer works great with the svn command line tools and shows graphical diffs of our circuit boards. But for some reason, Tortoise SVN and svn.exe are unable to access a GIT repositiory.
In addition, git works well for simple projects but not so well for projects that have many different related subprojects which share code.
For instance, our SVN repository holds everything needed for an entire product, including embedded linux with busybox, initrd and custom software and libraries - as well as DSP source code for two different add on cards, the GUI for mac, windows, and linux, the docutils xml file for the various manuals, and manufacturing and test code.
I'd love to use git once it attains the required maturity level so that I can do what I need with it.
--jeffk++
ipv6 is my vpn
You hit the nail on the head. Distributed version control often comes with superior merging, making the process less painful and encouraging it to occur frequently. Monotone employs a 3-way merge, Codeville has an innovative merging algorithm, and some may even support 5-way merging ("left's immediate ancestor, left, merged, right, right's immediate ancestor") in the future.
In my experience, nearly all merges occur automatically and cleanly. Only if two developers modified code in conflicting areas of the source code do you have to merge manually--and even then, only one person has to do it. It is much better to have merging operate automatically and transparently when possible, than to have to have two people manually coordinate each and every one of their changes beforehand.
Tired of free ipod spam sigs? Opt ou
I wrote about Linus's talk a few weeks ago:
b uted/
http://kylecordes.com/2007/05/17/linux-git-distri
Looking back at that, and at your comment, some things come to mind:
* the tool Linus is pushing, greatly facilitates the idea of frequent, easy merges, and Linus mentions that a tool with great, fast merges, helps you merge early and often.
* on the other hand, your comment is about "you need to control when and how often [branches] are made...", while a big point of distributed SC tools is the opposite of that control: these tools make the power of the tool fully available to all users. A "main" repository may (and probably should) have permissions/hooks set to enforce some policy about what happens to what branches. Individual users can always create local quasi-branches by simply not checking things in; with a tool like they can can create real (local) branches too, which can then be promoted to official status (i.e. on a blessed central repository) if needed.
So don't do it
Wow! I bet you have never worked on anything other than hobby
projects.
Most projects I have worked on cannot do without branching &
branching big & I am not talking about branches created for
individual devs.
What do you do if you have make patches on an earlier release(s)?
What do you do if your project team has 50 devs working on
5 different modules inside? If one guy makes a buggy submit
it will break every one else? Typically each team does weekly
sanity tests & then propagates the changes to the main.
Yeah - and I agree with Linus - CVS is rubbish.
Have used CVS, Clearcase & Source Depot. Source Depot
is a Microsoft internal Source Control system. Microsoft
licensed Perforce & developed on it. I used to work with
MS long back & Source Depot was the best Source Control
System I have ever used.
CVS lacks too many features.
1) Atomic checkins/submits
I am trying to submit changes in 5 files as a single bugfix.
A submit/checkin should either succeed for all 5 or fail for all 5.
CVS doesn't do this. The end result is that I may end up submitting
a change in the header without submitting a correspond change in the
implementation file.
2) Changelists
After checking in multiples files together, at any point in time, I should
be able to find out all the changes that were checked in at the same time.
CVS has no way of doing this - Submitting 5 files together is the same as
submitting 5 files separately as far as CVS is concerned.
3) More Changelist features for non-submitted changes
Let us say I am working on 3 different bugfixes. Source Depot allows me
group together my changes in different changelists even before I
submit the changes. That is I can create changelist A B & C.
In changelist A - I have files a.c & a1.c changed, in changelist
B, I have b.c & b1.c changed & so on. So I decide I am done with
all the changes required in the subset A, I can submit it very easily
or undo all changes in changelist B.
4) Merges
Merges between branches are a breeze with Source Depot. With CVS it's
a pain. Source Depot stores a lot of information about merges which have
already happened which in invaluable. In CVS, merges between branches
are very little more than changes manually copied from one branch to
another.
I can do a lot of stuff which I can't do with CVS
- I can very trivally merge Bugfix 1111 (comprising of 5 files
checked into changelist XXXX) from a branch to another branch or
the main trunk.
- Because Source Depot stores information about merges, I can do periodic
single command merges very easily between a branch & the trunk - Source Depot
will not try to merge in changes which have already been merged the last
time I did a merge.
I could go on & on, but the point is that something Source Depot makes
a developers life so much more easier. I could work around all these
things in CVS (i.e. do it in multiple steps) but the ease is something
worth paying for I think. If Microsoft ever released Source Depot
as a commercial product, it would be great, but I don't suppose their
license with Perforce would allow it.
The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
Why? It doesn't have to be. At least if you use something that isn't horribly broken.
Yes, they will. Because this is a monumentally stupid idea. Because the entire *purpose* of revision control systems (note: "CVS" stands for "Concurrent Versioning System") is to make it possible for developers to work on things at the same time. The idea is that you can get more benefit from the concurreny than you get difficulties from merging.
Rules like "merge early, merge often", perhaps? Fixes the problem, and *doesn't* cripple development horribly like your idea would.
Distributed version control the way git does it (conceptually, not necessarily the implementation) is the best idea in SCM since concurrent development and optimistic merge conflict resolution on check-in.
Notice how, even years after better ideas superceded the lock-modify-unlock paradigm, many tools and shops still use exclusive-lock SCM.
It could be quite a while before you see anything like the way git does SCM in use in the majority of programming shops.
Monotone's inode prints (which, incidentially, Linus was a major contributor of) can speed up some things, but the initial pull of a large repository is still unacceptably slow. The Pidgin developers have worked around this performance bottleneck by supplying bzip2'd Monotone databases via http, which the developer then can sync with the latest repository on pidgin.im to obtain an up-to-date database with the latest changes. Partial pulls should partially fix this problem in a future release of Monotone, or so I hear.
For what it's worth, I use Monotone daily and find the performance acceptable. For the record, Linus used Monotone at a particularly bad time it its development cycle, when it was very slow and the main designer was on vacation. Nonetheless, the Monotone developers emphasize correctness and integrity over speed, and Mercurial and Git were direct responses to the performance of Monotone. Still, the performance of Monotone is always improving.
Tired of free ipod spam sigs? Opt ou
Richard Dawkins spent a good deal of time in his book, "The Blind Watchmaker" talking about what the gradualist and the punctuationist view of Darwinism is. His gripe was that the latter was sold as a whole new theory, opposing the old gradualist view. Dawkins was rightly pissed about this, because the latter is merely an improved version of the former. I feel the same about the Centralized vs. Distributed topic. The distributed system is basically a centralized system where EVERY COPY HAS FULL REVISION HISTORY.
There is still a central or main copy, otherwise you'd be herding a lot of slowly diverging forks! Most projects want to produce a release eventually and there is a main copy of sourcecode which the release is produced from.
Imo, the reason Linus dislikes SVN and CVS and pretty much everything else is because of speed, because most SCMs lack the ability to work with merging different copies of repositories and work on a commit level instead, and do not allow for easy development routing around the central copy.
It takes a man to suffer ignorance and smile
Be yourself no matter what they say
The thing is, Linux is actually a pretty small project. Much larger projects would include FreeBSD, which uses CVS not only for the kernel but for every line of source of the entire OS. Now, Linus is a smart guy, but I don't know why he thinks CVS (and SVN by extension) won't work for large projects. It clearly can. It may not be suitable for the way he wants to run his project, but that's a different issue.
Dewey, what part of this looks like authorities should be involved?
Yeah and luckily the whole "haves versus have nots" on who gets CVS commit access rights has never, ever, been a problem in *BSD or XFree86. Right?
Seriously, centralized version control fails for large open source projects for political reasons, not technical ones. That's really Linus' main point, although his lack of tact in presentation is going to cause many people to miss that insight. With a changeset-based distributed version control system, you only have to trust patches and code, not people. The whole concept of "the chosen few who get commit access" goes away, and problems like the XFree86/X.org fork or the EGCS/GCC semi-fork disappear.
I was at the talk and I have to say he lost a HUGE amount of respect from me (and other people in the room whose job has to do with source control).
The way git works as a decentralized solution with a chain of trust is simply not useable for really large, multiple projects with interdependencies. And it's even worse when you need to control access to certain portions of the code.
I see Git as a pyramid scheme with Linus sitting on top. I can't start imagining the job of the poor release engineer in a big corp who would need to merge the changes of sub-engineers and the chain of trust involved to reach the top ! What I see is that everyone would code and test on out of sync code, a bit like Vista's development was.
Git is a solution that is fine tuned to Linus specific needs, but it's ages away from a solution that's flexible for most of the industry's needs.
I'm a big fan of subversion, and while I'll admit it's far from perfect it's way better than cvs could ever be. It does the job well most of the time, and SVK is filling some of the holes.
http://www.youtube.com/watch?v=4XpnKHJAok8
This is the video from the article. You can either watch it in the tiny embedded window, or you can go to youtube and click the button to watch it full-screen.
Look, posters: if you're going to point to a video that's hosted on YouTube (or another video hosting site), just link to that site. Don't link to some random web page that has the video embedded in it.
--