Linus on GIT and SCM
An anonymous reader sends us to a blog posting (with the YouTube video embedded) about Linus Torvalds' talk at Google a few weeks back. Linus talked about developing GIT, the source control system used by the Linux kernel developers, and exhibited his characteristic strong opinions on subjects around SCM, by which he means "Source Code Management." SCM is a subject that coders are either passionate about or bored by. Linus appears to be in the former camp. Here is his take on Subversion: "Subversion has been the most pointless project ever started... Subversion used to say, 'CVS done right.' With that slogan there is nowhere you can go. There is no way to do CVS right."
git is a piece of shit, pure and simple.
Of course CVS sucks. And yes, Subversion does suck. GIT beats them hands down.
Linus has it right (as usual). No surprise here.
So don't do it.
It's always done late in a development cycle, in the rush to get the project out the door.
So don't branch, and DON'T allow concurrent checkout of any code - FORCE the DEVELOPERS who need to work on the same code to COORDINATE their work EARLY in the development cycle. Of course they'll bitch.
But so what. Developers always bitch when you make them do things right.
And you make the programmers coordinate their work properly and early by not branching, and not allowing concurrent checkout of the same file. In Subversion, you lock the file, and if anyone breaks the lock they get a very public visit from the lead developer asking why.
It works, too. You won't have any merge surprises two weeks from ship date.
If your technical leadership has the spine to show prima donna twits who won't follow development rules the door. Of the entire company.
There have got to better acronyms out there to choose from. Whoever came up with this was a worthless git.
Well Linus didn't have anything bad to say about MS Source Safe. . .
;-)
[ducking] Sorry, I couldn't resist the urge.
CVS and Subversion are open source projects, Linus should fix them.
anybody have a good tutorial? (not the crappy one which comes with it)
I'm not an SCM rube either. I've competently used tla (arch), darcs, and of course CVS. but git just seems too hard to use. damn fast though.
We ALL know that the people who use CVS and SVN are version control Nazis!
I've used CVS, SVN, and GIT in serious projects and I can say I far prefer SVN to GIT, and GIT to CVS. GIT was incredibly confusing to use, and it may just have been the way the repository was administered was poor, but I never knew if I was synched with everyone else's checkouts and the command names made no sense. Its been over a year so I don't remember the details of GIT, but I remember having to do a lot of things "twice". Need to do a checkout? Two commands. Need to commit? Two commands. It was a bitch to use and I am glad I'm done with it. SVN, on the other hand, I felt very comfortable with from the start and most important of all, I trusted SVN to do what I wanted it to and to keep me from screwing up. In a year of using it, it has failed to lose my trust.
I'm not trying to say SVN is better than GIT. The best repository depends on the type of project and type of development. But defaming SVN in favor of GIT is not, I believe, a valid statement. Especially when (I'm pretty certain) many, many more projects use SVN rather than choosing to use GIT.
Hero of Allacrost, a FOSS RPG for *NIX/*BSD/OS X/Win
No one said that if you're famous and contributed something incredible to the world (such as Linux) you can't speak out of your ass most of the time, just because you enjoy how everybody listen and try to decipher if they should care about it, or just laugh and pass by.
I use SVN if a medium sized team and see SVN used extensively in all kinds of projects around the globe with great success. I personally love the workflow of SVN.
The only thing that they need to work is merging of branches, and incidentally I've talked to the developers, they're quite aware of this flaw of SVN and working on it. We'll see new versions that can track changes in each branch and even attempt automated merges with good success.
I know a guy who has the same personality like Linus. The guy is very smart, he single-handedly is coding an application which is very popular in its area (won't mention it since that's internal stuff). He keeps bitching all the time: about customer feature request, about random products and how sucky they are, how people can't see that. And he could also change his opinion overnight for no apparent reason and go in the other extreme. But he's a friggin' programming genius and what he does is great, despite is takes a lot of effort to deal with him.
Well, probably those two go together: being an amazing creator, and being an amazing ass with huge ego. Who knows.
Yep. $1.2 million. Paid for by getting billed out at $400 an hour. I could probably afford more house, but I like my vacations - a train ride with my family in first class sleepers to Glacier Park, maybe take the kids to Discovery Cove for a week or so. The one I'm planning now is going to be about 3-4 weeks bouncing around the South Pacific, NZ, and Oz.
Yeah, I don't know what I'm talking about.
I'm the guy that gets hired to fix your being nine months late and millions over budget.
Because if two developers or more developers who need to work on the same code aren't smart enough to figure out how to do it without stomping on each others work, they're overpaid.
You are smart enough to know you need to fix that bug, and you do have the initiative to work with the developer who already has the code checked out, right? Or are you too fucking lazy and stupid to do that? Because if you work for me, that will be one of the things I expect from you.
So by not allowing a last-second merge, you FORCE you to solve your concurrency issues yourself, EARLY in the development cycle, instead of having you drop in a merge at the last second and say "It worked fine in my branch, not my problem!"
You sound like one of those who needs to be shown the door.
... And that is that CVS/SVN are centralized, while GIT is distributed, like GNU Arch.
There are appropriate uses to both of these, and in kernel development I think it makes sense to have distributed development. However, in smaller projects, which really *need* a very specific direction (example, Wesnoth, I would think would not have gotten where it is today if there were so many branches where people were all making their own art).
Linus is enough of a famed leader that he's going to be listened to, and thus kind of pulls the community around him as a central source of development. That's not necessarily going to happen everywhere.
http://mediagoblin.org/
Cvs is already done right. These would-be improvements are pointless.
GOOG stock split next week - go, go, google
My favorite, of course, is Mercurial. My main draw is that I had been interested in distributed SCMs for years, but had never found one that made any sense to me whatsoever. I was on the hunt again and stumbled on Mercurial, and I've been hooked ever since.
Of the various distributed SCMs, Mercurial is the easiest to use one I've found. And it's pretty fast, though not quite as fast as git (though I have some ideas on how to fix that). And since it's written in Python with only a very small C component it runs on many platforms.
Need a Python, C++, Unix, Linux develop
I took a look at git a while ago and was completely underwhelmed. The UI was so bad it was useless, and it didn't "seem" to do anything that Darcs didn't do. (I used to love Darcs because of the automatic patch dependency computations).
.git dir and shell scripts that combine very simple low-level functions. For instance, you can create a branch just by saving the SHA1 ID of the tip into a file in .git. You can branch off any point in the history this way, including branches you've deleted in the past (git keeps all the old commit objects by default, even ones that aren't pointed to by any branch or tag.. this is very simple and understandable model, like reference-counting in a way).
Now that all the "next generation" SCM tools have matured somewhat, I took a look at all of them again. I had to stop using Darcs because of the "patch of death" problem, which basically is this: after using Darcs on a project with long-lived parallel branches, the repository may eventually enter a wedged state you can't get out of, due to exponentially complex patch dependencies. Oops.
At this point I had an idea of what an SCM should do, how it should work, what the "mental model" should be. I want to create changesets, add them to branches, combine multiple branches (and keep track of renames and so forth between branches), re-order changesets, collapse multiple changesets into one, discard old branches, etc.
Of course, CVS and close cousin Subversion are SO UTTERLY USELESS I didn't even consider them. Seriously, Subversion is like gold-plated shit. Looks nice but it's still shit. Reading people say stuff like "Subversion is awesome" makes me wince. How can something that doesn't have "real" branches, and doesn't have tags OF ANY KIND, be useful for anything? How do you keep track of multiple merges between branches? Answer: you don't. Or you keep track of revision numbers using svnmerge and pray it all works. Even the Subversion docs sortof hand-wave this away. I.e., they hand-wave away one of the FUNDAMENTAL ASPECTS of source code management: branching and merging. It's like hearing people talk about OO databases. They mean well but they just don't comprehend the generality of the underlying problem.
That's why I was so excited about Darcs: the author "gets it". Unfortunately the implementation is flawed.
I checked out a few more (Mercurial, bzr) but finally settled on git because it let me do all the things I needed to do, and it did them FAST. Once I figured out the underlying model I was pretty impressed. Git can be viewed at many levels: very low-level plumbing, or UI-level, or in between. The UI and documentation is still pretty shitty, but thankfully they are working on improving it and are moving away from the idea of having interchangeable UIs. Just focus on improving "core git".
One great thing about git is that so much of it is just files in the
The other great thing about git is how easy it is to sling changes around and reorder them and combine them. For instance let's say you add a file to your project as commit "A". Then you add some code that uses this file as commit "B". Then you fix a bug in the file as commit "C". So you have A-B-C. Now you'd like to combine A and C into a single patch A', and put B on top of it, like this: A'-B. In git, this is super-easy. I can think of two ways to do it off the top of my head.
I was checking into a CVS project the other day (for a client) and wanted to do this. Then I realized, you can't move things around in CVS like this *twitch*. So nowdays I do everything in git and only after the changes are beautiful and self-contained and well-commented do I check them into CVS one at a time.
Okay so they point is, check out git (or honestly? Checkout out ANYTHING that isn't CVS or svn). Even if you think Linus is an asshole (which he is) or you don't like the git UI (it's not that bad now), check it out anyway.
And if you don't use SCM at all? You suck. Start learning. It's a best practice that you can't live without, once you start.
The ultimate reason why Linus dislikes SVN, CVS, etc. is that it is centralized. Everyone checks out source from a central server and commits their changes to the same centralized area. This has problems: your workspace is not versioned. By this I mean, you cannot track local changes to your workspace without committing them to the central server.
A common pattern in development is to try one approach, test it, tweak it, and possibly try another approach if the first did not work out, perhaps reverting to a prior approach. With decentralized version control, you can commit your changes to a local repository and work from there. All the locally changes you make are versioned, and be committed, checked out, examined all without contacting a central repository. This is ideal, because you often want to try various options to find the one that works best, before pushing your changes to the rest of the world. In centralized version control, you can use a branch for this purpose, but often branches in these systems are difficult to either create, merge, or maintain, so they are rarely used. The end result is that with centralized version control, developers version their workspace in their head. DVCS systems remove the mental burden.
Fortunately, FOSS developers are realizing the usefulness of DVCS and major projects are converting to some form of DVCS. Mozilla is switching to Mercurial. The Pidgin project, which just released 2.0.1, is using Monotone. (Linus favorably mentioned both of these distributed version control systems in his Git talk, as they are both are distributed).
Once you accept that DVCS is better than the centralized model (which may not be true for some situations), only a few (but growing number of) version control systems are viable. This is currently a hot area in open source development, with software such as GNU Arch, Monotone, Mercurial, Git, Darcs, Bazaar, and more paving the way. Many open source DVCS's are still in development and not ready for general usage. I can't speak for Mercurial, but Monotone doesn't have the greatest performance, instead preferring integrity over speed. This led Linus to write git, since speed is very crucial for a large project like the Linux kernel.
Whatever the actual program (git, Mercuial, or Monotone), more and more open source developers are realizing the advantages that distributed version control can offer. I encourage all developers that haven't used any DVCS to try it -- once you do, you won't go back.
Tired of free ipod spam sigs? Opt ou
Linus talks about his distributed model, how everyone has a branch, and how this avoids politics associated with who gets commit access. He claims (and I admit I've seen this happen in some) that many projects have quite the internal politicking on who has CVS commit access. But then he claims that Git's special sauce eliminates these internal politics. Ok, I was intrigued, so I listened on.
Essentially, he explains, the secret with Git is that everyone has commit access on their own branch - they do whatever they want. He says that the way it works is that someone does something cool with their own branch, then they start hollering to say "Hey, I have a good branch, merge mine" and it will get merged. Politics over.
Ok, so now I'm scratching my head. How is this a fundamentally different paradigm? In CVS, basically anyone can check out the whole tree and make any changes the like. They can then say, see, my changes are good and ask for them to get committed or ask for commit access themself. In Git, this commit access bottleneck is just moved from the commit stage to the merge stage. You make your changes, commit them to your separate and unique branch, and then ask someone with to merge it, or give you the ability to merge it in to mainstream. How exactly does this eliminate the politics? You are still going to have some people with "the power" and some people without. In any project where you have people who are going to fight about who gets commit access, you'll just have a fight about who has the ability to merge into mainstream.
So, ok, distributed is nice (though for some projects central may be preferred) but I don't see how this magic system bypasses politics. In fact, I can potentially see more internal politics over this method. I can see factions gathering to support this or that branch, arguing about which is better, fighting about which one gets merged in. I can see the potential for branches going longer between merges, and more changes happening at once, making it harder to track problems. I don't claim these scenarios are more likely, but I do claim that this changing from a commit access to a merge access paradigm is just renaming the problem.
If you have a project that has thousands of developers all of the world like Linux does, a SCM system that is focused on merging makes a lot of sense. Unfortunately, there is a tendency for some people to overdo merging on small projects when they don't really need to. If the application is designed in a modular fashion and developers are assigned specific modules, than merging is rarely needed. Of course, many control freaks don't like this approach because it makes it harder for them to "correct" other developer's code.
You are the biggest ass I've seen on slashdot. And I've seen a lot of asses on slashdot. But bragging about how much money you make and the vacations you take, as if somehow that means your opinion is correct....wow.
Smart to post as a coward tho, gotta give you that.
So what you are saying is that RCS was done right and everything done since is wrong...
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Mercurial seems to have quite a bit of traction these days, more than any of the other d-sccm tools out there from what i can tell (inside sources tell me sun has mandated mercurial, so there should be quite good tool integration soon). Mercurial is also very fast, and has a good ui, and a really nice http interface too (rss no less).
svn is indeed gold plated shit
cvs is indeed past it
"Cvs is already done right. These would-be improvements are pointless."
Isn't it about time to have another Vim vs Emacs flame war?
You may be saying this in the context of large FOSS projects, but for most projects, not allowing all the team members to commit changes seems like a really bad idea. If you don't trust them, why are they on your team?
Complaining about the occasional inefficiencies of file locking while forcing some developers to waste time waiting for permission to commit, seems really ironic to me.
Or is Linus turning into a replica of RMS, only without the manners?
Support SETI@home
There's one trick to getting performance from monotone, which is to flip a switch on your workspace to make it use timestamps (like SVN does) instead of always re-hashing every file to see if it's different. For small projects, the rehash is best since it is certain. With timestamps on unix if you make changes in 1 sec, for example copying a different version right after a update (which can happen btw) then version control will not check in your changes and they can be lost.
Once you enable timestamps with monotone pretty much all operations are faster than subversion. Even reverting can be faster in practice because the server typically has the files in ram vs your workstation which has to seek all over the place to make copies. Depending on your setup of course.
Monotone is not slow anymore, and it keeps a much tidier and smaller repository. So small that in just a little more space than SVN's spare copies of all HEAD files for the past revision you can have all revision on your workstation. Why anybody would use subversion is beyond me... Linus is right on this one.
Every developer has their own repository, which they can commit as branches into a repository of repositories :P
The perfect sig is a lot like silence, only louder
You're kidding, right?
$ svn diff A.txt
will compare your local version of A.txt to the last committed version of A.txt in the
depository.
Surely you don't have to do commit to do the diff.
Distributed version control the way git does it (conceptually, not necessarily the implementation) is the best idea in SCM since concurrent development and optimistic merge conflict resolution on check-in.
Notice how, even years after better ideas superceded the lock-modify-unlock paradigm, many tools and shops still use exclusive-lock SCM.
It could be quite a while before you see anything like the way git does SCM in use in the majority of programming shops.
We used VSS for a long time but switched to SVN after reading numerous accounts that that VSS would eventually croak, and because we needed multiple developers working on the same source. After all the developers (~16 very active users, 20+ projects) were over the learning curve of SVN, there's not one that would go back to VSS. I can honestly say there is nothing I know of that we need that SVN doesn't do for us. We use TortioseSVN for the Windows guys, and the server-side guys (Linux/Unix) use command-line SVN. We have no need to branch local copies. We very rarely have to manually resolve conflicts. I fail to see what GIT would do for us.
Perhaps SVN sucks for kernel-guys. But for what we do, SVN fits the bill perfectly... Central repository, easy to get up-to-date, easy to commit, easy to update, easy to review changes, easy to review history....
SVN for us is the right tool for the job.
.
Bill watered down Steve's vodka so much that Steve won't drink it. Sure he's considered an ass, but he got free booze.
Steve then ordered an Apple-tini, knowing Bill is allergic. Sure its a "gay drink", but its classy and metro.
Linus brought in his own home-made booze. Its true he can't go to every bar out there but hey, he got free and good booze without being an ass.
Honestly I don't know what this has to do with ANYTHING posted here, but it sounded really funny in my head and I didn't want to forget it (plus I gots lots of extra karma to go around).
Ginga no Rekshiya Mata Each page.
So, how do you take that diff, revert half of it back to server's version, begin coding a completely new direction, realize you were right the first time, go back to the original dif you took, then pull in half the stuff you did while doing the wrong thing, finish coding and push the commit back to the server?
You can't because subversion has no client side version control.
Exactly. With a centralized version control system (PVCS, which is not coincidentially listed as the riskiest bet on the Forrester Source Code Management comparison) I've used in the past at a large company, everyone ended up making several different local copies of the code with various changes, in order to revert if necessary. I was dumbfounded - isn't that what version control is for, to keep track of changes?
Tired of free ipod spam sigs? Opt ou
There seems to be one thing which SVN does that Mercurial does not do, which is checkout of partial repositories. Correct me if I'm wrong, but with Mercurial (and maybe GIT too) it's all or nothing.
On the other hand with SVN you can checkout just a directory (and everything underneath it) and work with that directory, update, commit and so on, without any consideration of the rest of the repository. That makes it a better tool if you want to have just one repository with many mostly independent directories, rather than many repositories.
One thing I do not like about SVN is all the metadata it keeps under the .svn
subdirectory. It increases the total size of the checkout by about 3 times.
SVN unlike HG has a .svn directory in every checked-out directory. HG has one
only at the top level.
That doesn't have to do with centralized/distributed. ClearCase is super-centralized, but you do your experiments in branches, as many as you like.
Monotone's inode prints (which, incidentially, Linus was a major contributor of) can speed up some things, but the initial pull of a large repository is still unacceptably slow. The Pidgin developers have worked around this performance bottleneck by supplying bzip2'd Monotone databases via http, which the developer then can sync with the latest repository on pidgin.im to obtain an up-to-date database with the latest changes. Partial pulls should partially fix this problem in a future release of Monotone, or so I hear.
For what it's worth, I use Monotone daily and find the performance acceptable. For the record, Linus used Monotone at a particularly bad time it its development cycle, when it was very slow and the main designer was on vacation. Nonetheless, the Monotone developers emphasize correctness and integrity over speed, and Mercurial and Git were direct responses to the performance of Monotone. Still, the performance of Monotone is always improving.
Tired of free ipod spam sigs? Opt ou
I use Perforce and people have their own branches off the development branch. It is their sandbox and they can share code with other developers if needed.
One thing that always scares me about distributed SCMs are their lack of ACLs. I have respositories that have strict permissions because they store things besides development related files. I even have directories within branches that are not available to developers, only to release engineers. ACLs always seems to be an aftertought to these free systems; they are not ready for commercial environments.
>By this I mean, you cannot track local changes to your workspace without committing them to the central server.
This is actually a good thing really, be more open about your changes.
You should look at the work I have been doing for GCC: http://gcc.gnu.org/wiki/PointerPlus .
If I did this work in private, it will most likely not be accepted as it is a big change to the compiler. Plus this allows for and kinda forces collaboration on projects that are not part of the main trunk. Both are good things. If people do work in private in GCC, the development comunity looks down on that development as that means they will dump and run. If the work is in public, then we know people are working on that project and developer resources are not lost.
Thanks,
Andrew Pinski
For a large projects with lots of developers who work via the Internet your suggestions just don't make good sense.
If I start a project and twenty people eventually join in and they check out various parts of the project but don't check them back in for long periods of time, following your suggestions would be horrid.
Linus is right. You want distributed repositories not a centralized one.
The race isn't always to the swift... but that's the way to bet!
atomic checkins?
'cvs mv'?
'cvs cp'?
And that's without even exerting 3 brain cells.
"Avoid employing unlucky people - throw half of the pile of CVs in the bin without reading them." -- David Brent
Git has some great features. Speed, that the whole repository with revision history is mirrored, that it's consistent cryptographically, etc.
There is one part that I don't get and it's the decentralized part. Yeah, it is a big bonus that potentially any copy can take over would something happen to the main one or that developers can create branches and share code with each other without relying on a central server, but the part that bugs me is that according to Linus the right model is when there is a maintainer like him that avaits emails either sending him patches or giving him git repository addresses and telling him to pull. For most projects this is simply an unbelievably stupid idea, waiting for a person to judge your patches one by one. Most open source software on a small to medium level don't work this way.
Also, there is the fundamental misunderstanding that decentralized means that there is no central server/primary copy. This is patently false even in the case of Linux. Linus' tree is the central server. For 95% of the people THAT tree is the linux kernel. For 4.9999% it is the 'real' linux kernel. For the remaining 0.00001% or less, well those are the forks.
It's quite simple. There is a decentralized environment, but there exists a main/most influential copy. If you diverge too much from that main copy, that's a fork.
So I was saying that for small to medium scale projects pretending that there is no centralized server, just people's repository is stupid. For large scale projects it can work, like it does for Linux, but then you have a dedicated core team that is necessary in judging what goes in and what stays out. It doesn't matter if you call it people x's tree or commit right to the central repository. That is the same thing. The terminology Linus uses is annoying because it lies. Not Linus, but the terminology.
It takes a man to suffer ignorance and smile
Be yourself no matter what they say
GIT and SCM? You mean Geita, Tanzania, and Scammon Bay, Alaska, USA?
What sound do people on rollercoasters make? Hint: it's not Xbox 360.
If people spent as much time coding as they did arguing over pointless little details about source control systems (guess what kids, THEY'RE ALL FINE, SO LONG AS YOU UNDERSTAND THE TOOL) we'd have perfect software by now.
Richard Dawkins spent a good deal of time in his book, "The Blind Watchmaker" talking about what the gradualist and the punctuationist view of Darwinism is. His gripe was that the latter was sold as a whole new theory, opposing the old gradualist view. Dawkins was rightly pissed about this, because the latter is merely an improved version of the former. I feel the same about the Centralized vs. Distributed topic. The distributed system is basically a centralized system where EVERY COPY HAS FULL REVISION HISTORY.
There is still a central or main copy, otherwise you'd be herding a lot of slowly diverging forks! Most projects want to produce a release eventually and there is a main copy of sourcecode which the release is produced from.
Imo, the reason Linus dislikes SVN and CVS and pretty much everything else is because of speed, because most SCMs lack the ability to work with merging different copies of repositories and work on a commit level instead, and do not allow for easy development routing around the central copy.
It takes a man to suffer ignorance and smile
Be yourself no matter what they say
You cannot write anything in C using just its about 5 commands/statements.
You need to use some C system (or other) libraries and these are the compatibility problem of C.
It is about those POSIX, SystemV, ANSI, BSD, GNU etc. etc. standards conflicting each other, with incomplete specifications etc. Also you face architectural problems of different word size, endianity etc. (this remains true even for high level languages regarding binary network protocols).
Depends what you're doing in C.
Threads? Doing anything with the OS? Uh oh.
Hello World in C runs on more platforms than Hello World in Python, but Python abstracts a lot of less trivial stuff so it works cross platform without rewriting.
I haven't worked with GIT so far, but i did watch the talk from Linus and some stuff he mentioned sounded like very familiar problems. Mainly when he said that even patch is better than SVN. And that's where i got the feeling that those tools solve two different problems:
SVN is better than doing normal backups for sourcecodes.
GIT seems better than working with patch.
Amusingly though, both Git and Mercurial were "inspired" by Monotone, but were created as separate projects because the developers wanted to go in different directions
Yes, yes, yes, yes... all this lies again about how good SVN or CVS are...
Yes, they are pretty good... only when you can't afford something better!!
Clearcase is much, much better, but it is not for programmers wanabees...
you *MUST* understand what you're doing... yes, it is like saying Visual
Basic is better than C/C++ or Java just because it is simple... Come on!! It
is crap, just like CVS and SVN are crap.
Clearcase is much, much better than cheap-free-version-control-software.
SurroundSCM is better, and systems like PlasticSCM or Accurev even better...
The downside? You HAVE TO pay...
The same old story... free is good, when you pay you get better.
Personally, I just let the development tools manage my local workspace. They generally do a better job anyway, since they know what you are doing to it. Eclipse is my favorite tool because of this - it has local file history (with versions you can inspect, compare, and revert to) as well as undo history for all of the refactorings you have been doing (Java only, so far ...).
This paired with SVN or CVS solves all my problems with local workspace revisioning.
Darcs is arguably easier to use, although it doesn't scale well to large projects like Mercurial can. In particular, Mercurial requires commits at odd times (pull + merge) and doesn't support the same level of cherry-picking that darcs can support. Anything you could imagine wanting to do is likely possible in darcs; The main problem this raises is that supporting that rich model makes efficient implementation difficult or potentially impossible. However, for small projects <100k LOC, this is not really an issue, so darcs is hard to beat.
So Use SVK, which uses the base libraries of Subversion (the atomic, versioning filesystem ones which are heavily tested and work very well) and uses them to build a distributed SCM.
http://en.wikipedia.org/wiki/SVK
I am NaN
Subversion can be quite useful for a project's "authorative" repository. Especially if that project used to successfully use CVS, as a great many small projects do/did - and some larger ones, like GNOME and KDE, too. Subversion is also quite convenient for publishing sources, though it's less than ideal for any contributors without commit trying to work from anonsvn.
svn is supported by a number of IDE plugins & GUIs, which a surprisingly large number of people use and come to rely on. I'm not one of them, but many of the folks I work with use various svn guis.
git-svn looks very interesting, as it should provide a way to add distributed scm capabilities on top of svn, where you're working with projects that use svn. It'd be useful even just for the ability to take partial local history and keep local modifications under revision control. I wonder if there's anything similar for Mercurial...
What bothers me most about svn is the insufficient integrity guarantee on the repository. That, however, can be fixed, and I hope it's going to be addressed with an `fsfs2' format. Frankly, not everyone *needs* distributed SCM, and many are quite fine with a good centralized system.
It would be interesting to know more details of what you were trying to do, to see if there is some non-ACL way of mapping it to distributed VCS functionality. From a distributed VCS background, I would probably do something like the following:
(1) Split off non-development files into a separate repository, with different permissions to that tree
(2) Give release engineers their own tree which developers cannot push to; If a release engineer needs a fix from a developer, he can pull it.
In fact, a whole lot of nice things fall in to place when you make pull the fundamental operation rather than push. The general workflow is for a developer to finish an implementation, checking in as necessary, and then notify an upstream or more central developer that the patch is complete. The upstream developer reviews the patch and pulls it if is correct (works, doesn't violate policy, etc). In this flow, code review is emphasized, and at no point is any developer trusted with "push" rights until you get to the final central integrator (usually a release/QC person).
I would like to watch the video, but it seems impossible without flash. Anybody got a link to download the video file?
Wouldn't it be safe to say that more platforms support C than Python?
You have no idea what you are talking about, do you? Any platform that supports C also supports Python, unless you count really tiny ones that do not have enough memory.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Amen to that; saying that an SCM is a "software configuration manager", when you are using it to manage source code (and not software configuration), has always struck me as incredibly silly~
I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment
http://www.youtube.com/watch?v=4XpnKHJAok8
This is the video from the article. You can either watch it in the tiny embedded window, or you can go to youtube and click the button to watch it full-screen.
Look, posters: if you're going to point to a video that's hosted on YouTube (or another video hosting site), just link to that site. Don't link to some random web page that has the video embedded in it.
--
On medium to large projects -- 10 to 100 developers maybe, on 1 to 5 sites, with a significant amount of metaconfiguration which is itself versioned -- it is simply impossible to forbid concurrent editing of code.
It's also extremely hard to avoid branching. To say 'there is no branching' is to say 'nobody has any changes that need to be in version control, and yet should not be forced on all developers / all releases'. It's only possible to say this about extremely small projects -- small enough to have only one stream of development going at once.
I'm stating the obvious here but it's worth repeating because some people do have a lot of trouble understanding that proper SCM (as opposed to the parent post's conception of SCM) is necessary.
Whence? Hence. Whither? Thither.
Linus has enough credentials and competence to give his opinion some serious weight. And even if he does run roughshod on SVN and CVS quite a bit, he is right in a lot of points.
I've started using SCM on a regulary basis with Subversion for about half a year now and more often than once have I thought: 'This can't be the way SCM is done right'. Only one thing I know of that SVN does better than CVS is pure entire-project version numbers. CVS seemed like a kiddiebike with training weels that wanted to be as complicated as Kornshell. SVN did away with that but still has downsides that we needn't put up with for any reason. SCM metadata stored in each project directory is one of those little things that bug the hell out of me for instance. And when Linus says that his brain isn't damaged into thinking CVS is an OK way of doing things because he only used it when forced to - I believe him. I wasn't aware that his project Git had come so far so fast until two weeks ago, but when he says that any SCM that he can do better in two weeks of coding on his own isn't worthwhile - and SVN is one of those - that really has me thinking. I'm not so much into software developement with large groups that I could exactly tell how bad SVN is, but I'm sure Linus can.
To use an analogy: CVS was a Ford Model T, SVN is an improved Ford Model T, but I think we should start looking for a current BMW or something. It might be a good idea to do that *before* we move a larger audience to using SCM.
We suffer more in our imagination than in reality. - Seneca
Ignoring Linus' heinous unprofessional attitude, massive ego, and completely insulting comments, there's a lesson to be learned here: you and your team need to decide whether you want centralized or decentralized version control. There are advantages and disadvantages to both methodologies. Anybody who gets up on a stage and tells you that "all centralized systems are garbage, decentralized is the one true way" isn't giving you the full picture. (And likewise, anyone who says the opposite is equally off their rocker!) 80% of software development takes place within corporations, and there's a reason centralized SCM has worked so well in that environment. Decentralized systems might be great for certain open source communities, but it's not what most organizations want or need. If you'd like another viewpoint on why centralized might sometimes be better than decentralized (even in open source projects), take a look at this essay I wrote a while back.
I'm one of the original designers/developers of Subversion, and even we (in the svn developer community) are well aware of both sides of the coin. We're seriously considering adding decentralized features to svn 2.0. We've also added true merge-tracking magic to the imminent svn 1.5 release (so svn is no longer "hand waving" merges, they'll be just as simple as in decentralized systems.)
If you truly believe that distributed SCM is the the Only Way of working in all situations, then I suggest you try to push these systems on corporate teams, and see how they fare. Distributed systems have a model that's much more complex for the average joe-user to understand, and as a result most existing distributed systems have extremely complicated UI's. If they're complex enough to confuse open source nerds, think about the rest of the world's programmers...
Keep an open mind about this stuff. No matter what Linus says, there's no magic SCM bullet.
What you're citing as advantages for decentralized version control are not the result of decentralization.
Cheap branches? Subversion has cheap branches.
Better merging? This is a result of algorithms has nothing to do with whether the system is centralized or not.
If you're on a fast net with the server you can commit as often as you like. If you can branch/merge easily it's no problem.
If you want to cite advantages for decentralized version control it might be more like:
If you have to talk to a server over slow links, decentralized is much better
Personally I made 3 false-moves to svn from cvs before managing to actually move to svn for real.
Some months ago my svn repository died and thankfully i have backups.
Having said all that, i always felt SVN was a move in the wrong direction for the right reasons. Considering though that I use svn for only my own work (with only me coding in it), git kinda seems pointless for me (though in development with more than 2 people involved i cant certainly see where it would kick butt over svn). For me, i was thinking of getting rid of svn and putting my repo's onto an ext3cow fs!.
I find it a little odd actually that people cant see why per-coder branching with a merge model is a big win-win. To me this is like a big "WOW, we can do QA is a logical manner finally" (keep in mind, a merge can be another branch if i understand it correctly). Of course, I haven't really used git myself other than a quick play to see what its like.
On a side note, 2 companies I've worked with have black-listed SVN cause of its ability to be configured in an authentication mode involving plain-text. It didn't matter that no one there had planned on using that particular functionality, just that it even existed.
I think you meant whine.
Since you seem to be an informed proponent of git, maybe you can answer the one question I wish someone would have asked Linus.
Say you have a team of, as few as six, programmers coming in for work on Monday. They've all made changes to their codebase (over the weekend, they're dedicated). How do they all manage to get each others changes, and begin working with a completely up to date version? Do they pick someone to act as the centralized repository for that day?
The SVN devs have known from the beginning that the lack of merge tracking constitutes, to use their verbiage, a "headache". For 5 long years, SVN's "Best Practices" solution was to track merges manually in the commit log messages. This "Best Practice" could be best described as, to use the technical terminology, "really fucking error-prone."
Look, I like SVN, I use SVN, I hope they get merge tracking (and 'svn obliterate', as long as I'm creating my Christmas List) ASAP. My only point here is that the great-grandparent's claim, that "It's trivial to branch and merge in SVN", is a heaping, stinking load of crap, to use the technical term. You know it, I know it, CollabNet knows it, everybody knows it.
I'm guessing that even sqlrob (173498) knows it.
"Avoid employing unlucky people - throw half of the pile of CVs in the bin without reading them." -- David Brent
Linus suffers from a common misconception: if something doesn't work the way he wants it to, he assumes that it's no good. And if he adds a feature that he finds useful, he can't understand why other people might object.
Fortunately, Linus's opinions on version control systems don't matter: there are lots of version control systems to choose from, and users just choose what works for them. I bet that's a lot more Subversion than git.
It is not as easy as what you recommended. We have 21 active codelines. Major interdependencies. I have been using Perforce for almost 8 years (ClearCase before that) and constantly get bombarded by developers wanting to use the latest cool SCM. A migration to another SCM would be a major undertaking. ACLs are a showstopper. A hook via an external program does not cut it.
The idea of distributed (and out-of-band) development intrigues me. Git may indeed be a great idea for Linux development. For any significant commerical environment, I do not see how it can work. I have looked at others (darcs, merc bitkeeper). I am waiting for is a hybrid solution, that offers central repositories with distributed sandboxes that can be synced back to the server.
CVS popularized concurrent versioning and many other ideas that we are taking for granted, and there have been distributed versions of it, too. Linus may think that git does everything different from CVS, but git owes a lot of its functionality to CVS.
As for Subversion, it does support distributed development via svk, and I suspect that's going to get integrated into Subversion.
You have no idea what you are talking about, do you? Any platform that supports C also supports Python, unless you count really tiny ones that do not have enough memory. So therefore, what you're saying is that more platforms support C than Python.
This solution seems chaotic to me. Now, instead of needing to pull all the changes from one central repository, I need to pull changes from the machines of all my co-workers individually? Wouldn't this system make it difficult to guarantee that each developer was integrating the work of the others? Also, it doesn't seem very scalable. What if I have 20 co-workers?
He's right about CVS, and more or less about SVN. Except for one thing: Subversion works. Not only in the technical sense, but in the sense that you can work with it, you can easily explain it to new developers, there is integration into lots of IDEs, code editors and other tools and the list goes on and on. (last, but not least: Trac!).
I used to be passionate about arch, for example. I'm fairly sure I would've been about GIT had it existed back then. But then I learned that to get real work done in the real world, the theoretical basis of your version control system matters little. If the system doesn't work for my developers - who like many projects are doing this for their fun and in their spare time - then it doesn't work, period. If I can't explain it to the boss at work, it won't get installed.
And that's why Subversion is everywhere and arch is, where exactly?
Now Linus is a man with his feet on the earth, so GIT may have a different fate. Wake me when Eclipse and Textmate have built-in GIT support and at least half of my potential developers know it.
Assorted stuff I do sometimes: Lemuria.org
Subversion 1.5 will have automated merge tracking. The merge tracking feature is already available on the trunk, and will be released in (probably) a couple of months. This link has more information:
. html
http://subversion.tigris.org/merge-tracking/index
With the addition of the merge tracking feature Subversion will be almost at parity (feature wise) with commercial products like P4.
Git may be a great SCM tool for some situations, but for commercial development which doesn't require a distributed architecture, IMO, SVN is preferable to Git.
Git needs to be supported on more platforms and have a better user interface to be accepted as widely as Subversion.
I must mention TortoiseSVN at this point, because until Git has a user interface like TortoiseSVN, it'll never be accepted by Windows devs. I'm not asking for trouble here (in other words, I'm not trolling Linux users), it's just that I looked at lots of alternatives for our (mainly Windows and UNIX-based) shop and nothing else came close. All of our devs use Windows, even if just to host a terminal program, and the ease of training folks on the use of TortoiseSVN was a big reason for our switch to Subversion.
http://tortoisesvn.tigris.org/
FWIW, I'm a UNIX/OS X/Linux guy myself.
This sig kills fascists.
The distributed system is basically a centralized system where EVERY COPY HAS FULL REVISION HISTORY.
No. The fundamental feature of a distributed VCS is that you can ALWAYS commit your current state and get back to it.
Once you have this feature, any centralized VCS can trivially be converted into a DVCS because on every commit you _DO_NOT_CARE_ what anybody else might have done to the repository.
Infact, the problem that DVCS have to solve is that they do NOT have a full revision history because if you commit to the trunk in your repo and I commit to the trunk in my repo we are both completely oblivious to the other commit until we merge (my statements above immediately follow from this)
Tim.
God said, "div D = rho, div B = 0, curl E = -@B/@t, curl H = J + @D/@t," and there was light.
This isn't always possible. The company I worked for produced software that ran on medical instruments, unable to access the network for security reasons. You couldn't use centralized version control because you couldn't access a central server.
Tired of free ipod spam sigs? Opt ou
That's a valid concern, but the flipside is that DVCS allows you to commit early, and commit often. I often make small changes in my code, trying out different things, adding a function here, and it is not crucial for other developers to see these small changes immediately. However, they do see the changes -- every one of them, committed individually as I made them -- when I push to the server once I'm done working on a certain feature for the time being (at least once a day).
Tired of free ipod spam sigs? Opt ou
He went into this at the talk (and for a bit afterwards, when the cameras were off): To Linus, it's not that CVS and SVN don't "work the way he wants them to", they're fundamentally flaw in their designs. He's all about the distributed model vs. the centralized repository. In fact, his tech talk was more about the design rationale behind git as it was git itself. He simply thinks that the repository model is the absolutely wrong way to go about SCM.
He liked bitkeeper, and that whole fiasco caused him to look for options. He found none, and decided to implement his own. While he was doing that, he thought he'd throw in a few new ideas that he liked.
So it's not just about them not working right, or nobody liking a certain feature. To him, there simply wasn't anything out there that met his needs, so he wrote something himself. Kinda like, well, Linux.
Anyway, even though I'll not likely ever use git, it was cool to see him. He's certainly got some opinions...
-B
Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.
Yeah, what git calls subprojects is a very recent feature, and I haven't played with it yet. But unless you set it up by hand, it's all or nothing. But git's packs are very compact. Two years' worth of linux kernel history is 189448 K. A single checked-out version, after compiling to object files, is 357492 K.
And given that all-or-nothing, you can do everything locally without needing to access a central server. And incremental updates are fast.
(It turns out to be non-trivial to do incremental updates of incomplete downloads. Being able to say "I already have this version, and everything before it" is very concise and powerful.)
that is Linus' dick?
I guess my question was more along the lines of "what sort of policies do you need to enforce with ACLs". Such as "a developer can only check-in under a certain subdirectory", or a certain file, or whether you need to be able to prevent developers from even reading certain subdirectories... I ask because I participate in the mailing list for an actively developed DSCM; If there's a single feature that is keeping it from being accepted in the corporate environment, the developers would like to know. From an implementation standpoint, write restrictions are relatively easy for a DSCM, while read restrictions would require more work. Several DSCMs are working on the concept of nested repositories / super-repositories as well, with some of the same usage cases as goals that ACLs might otherwise be used for. The more feedback on whether that could be useful, the better.
Oh well.
I watched the video and was baffled by Linus' attitude.
The guy is bright on technology but what he calls strong opinions I -and almost any business person- should call them shortsighted opinions that do not appreciate other people's way of doing things.
Saying stuff like "Those of you that like SVN would probably want to leave the room. Because it's crap too." is very stupid. I for one like SVN. It's good enough for my purposes and it sure beats quite a few commercial tools I've seen. But I sure want to know hear Linus' thoughts on the topic. Mainly because different ideas keep my mind fresh.
I hadn't the slightest objection to his spending his time planning massacres for the bourgeoisie... (P.G. Wodehouse)
He said: C supports more platforms than python. You said: You have no idea what you are talking about; C supports more platforms than python!
--
WHO ATE MY BREAKFAST PANTS?
Stylish sheet to fix many problems in Slashdot's D3: https://gist.github.com/801524
.. and now that I've watched the entire thing, start to finish, and wrote a near-complete transcript of it, Linus gets tonnes of things wrong. Oh Linus.. why did you have to begin spewing such diatribes about stuff you visibly know nothing about? Talking about Git is one thing, but then to claim that other products don't do the stuff git does? Sigh. I mean really.. narrowing down changesets to a single directory? Pulling a report of what's changed between two dates or two revisions, narrowed to a single directory or set of directories? Everyone can do that! And you went on for like.. two minutes about how tremendously powerful that ability was, claiming that you "guarantee" that no other system can do that.
Sooo disappointed at how little research you've done.
I used to think that he was always joking around when he called people idiots whenever they disagreed with him... I used to think it was all a bunch of tongue in cheek bravado (partially because that's the sort of sense of humor I have). However, I was talking to one guy who got chewed out by him before, and it turns out he's usually pretty serious.
I suspect that this is the sort of thing you see mostly in open source projects, where some people, like Linus, are their own bosses and egos get big. I haven't been in the industry that long, but most of the developers I've met are pretty polite and measured.
That said, Linus is pretty bright, and when he's bitching someone out he usually manages to put together a pretty compelling argument of why they're wrong. I usually agree with him more often than not, even if I would explain it without calling anyone an idiot.
According to Nathaniel Smith Linus only provided the test cases to help. org/msg08143.html
them find a speed bug:
http://www.mail-archive.com/monotone-devel@nongnu
- Peder
As Linus said in the talk, when people say they want "cheap branching" what they really want is also "cheap merging" ... SVN has the former, which is worthless on it's own. If "fixing merging" in SVN was "just a matter of fixing the algorithms" why have the SVN/CVS developers failed to do it in the last 10/5 years? The reason is that how difficult the algorithms are depend on your storage model ... and the CVS/SVN storage models are broken. Also, even at GigE speeds, talking over the network is significantly slower than talking to the HDD.
Also I'd like to live in your world where you always have fiber type speeds to your central repo. ... and neither the network or the machine itself ever goes down. Don't forget about when I've just taken a plane flight to X, which is 1,000s of miles from where I normal am. But however much I'd like to, I don't live in that world ... and I find it hard to believe that I'll live in that world in my lifetime.
I've also had to "contribute" to more than one "project"[1] using a CVS/SVN repo. where I didn't have commit privs. ... I find it hard to believe anyone who has lived through this pain could argue that it's a good idea, or helps anyone. So you must also be extremely lucky in that regard. You apparently live a blessed life.
[1] Project here isn't codeword for OSS, this is exactly as painful in CVS/SVN/clearcase/perforce/etc. inside a company.
ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
>> I'm one of the original designers/developers of Subversion, and even we (in the svn developer community) are well aware of both sides of the coin. We're seriously considering adding decentralized features to svn 2.0.
Sounds to me like what this community is proposing is evolutionary changes in version tracking and SCM, away from the prima donna model of one guy at his terminal behind the wall.
This work won't be accomplished in a new version of git, but over several lifecycles and updates to the core routines that define what version update routines are and how they operate in a distributed environment.
>> We've also added true merge-tracking magic to the imminent svn 1.5 release (so svn is no longer "hand waving" merges, they'll be just as simple as in decentralized systems.)
"Magic"? You must understand sussman, there is nothing simple about merges within and without the use of decentralized systems. What I just said is the reason they're decentral and are operationally redundant in manner of time and place. The reason distributed connections should be able to handle co-dependent nodes in decentralized ways is to beat the central office model. The reason we want to beat the central office model is whenever someone takes repository A offline or makes line edits under the old framework of subversion and CVS, everyone else loses their work.
My day job is a clearcase administrator, so I have seen more than my fair share of merges. I'd love to get rid of them, but this isn't going to do it. Not even close.
- doug
I found the information I needed.
Nothing that Linus did in the development of linux was new or groundbreaking. Who had the first ever public, anonymous CVS server? Oh yeah, that would be openbsd. Linux may not be as closed as some projects, but it certainly didn't lead the way in open development models.
We just moved to perforce. From alien brain.
.. tons of modelers, animators, texture artists, designers, illustrators.)
.. compiled for multiple hardware (ps2/gc/wii/xbox .. any combination, depending on the game) and to me, its a crazy test on code management software.
You want to test a code management system? Try working at a games company. We're not just talking programmers (or code) here. We're talking versioning artistic assets, the works. Many people who use the system are not programmers (programmers are a small part of any video game
Combine that with the fact that you have to produce stable builds 60 days after you start the project, one a month, for a few years
Alien brain was madding as hell when it came to anything more than checkouts, checkins. Perforce has a really nice command line interface, and a decent (but not as good as alien brain) gui.
The thing I will miss from Alien brain is that you could create a custom layout locally of the gui, much like Visual Studio (although clunkier.) The thing I'm looking forward to in perforce is atomic change sets and from what I understand so far from my limited use of it, decidedly better branching. I bet dollars to donuts that our artists like Alien Brain more, but p4 seems far more capable of the tools you need to keep the build stable.
"Old man yells at systemd"
... and the non-networked medical instruments had a functioning development suite on them? Bzzzt. You develop on a networked PC and load the software/firmware to the instruments via whatever means. If the medical instruments have a PC controlling them, you throw a goddamned nic card in your development box.
Or are technicians supposed to say "whoops, I can fix that glitch!" as they're examining someone, drop out of the exam, fire up the IDE, fix the bug, re-start the program and complete examining the patient?
Yeah, didn't think so.
Yes, in some cases we (not me, I was just a contractor) loaded development tools, including an full IDE, onto the medical devices for development. The devices that go out to the customer do not have these tools loaded, of course.
Tired of free ipod spam sigs? Opt ou
If you have to talk to a server over slow links, decentralized is much better
I have recently used SVN and Microsoft TFS, and in this respect SVN is a clear winner since it keeps version history locally, and you only need to connect when getting or committing updates. I've done work on a laptop on a train with no network at all. SVN didn't bat an eyelid. TFS, by contrast, throws a hissy fit if it can't get to the version server.
My Karma: ran over your Dogma
StrawberryFrog
Of course, in theory, there's no meaningful difference between theory and practice, but in practice, there is.
Well sure it seems chaotic and different because you're not used to the idea. Assuming you're working with just 1-5 other people, it's a fairly simple cognative load. Heck, you'd probably even script it, so it'd just happen.
I submit to you, the reader, that the subversion method is pretty chaotic too. Because of the "Thunderdome" style of launching all patches into trunk without regard for if the build works, it can be really unclear if your checkout works, has passing tests, or any other thing. All you can do is hope the logs are accurate. And this assumes you have the patience to wait for SVN to tell you these things, given how slow it can be.
To me, that's one of the worst case scenarios. Because responsibilities are often delegated in this kind of situation, you seldom have any idea about the code that's being worked on "over there." So if it breaks in a reasonably-sized project, you're somewhat screwed.
There is almost exactly the same amount of integration work. The difference is that you can defer it, or foist it off on other people. These other people may find a merge that baffles you to be utterly trivial.
Actually, it works fine. What you do naturally with such a large group is that you begin to delegate. You say, "It's Alice's job to get everyone's patches for this component, and she, Bob and Carlyle with be working on that part." Their patches feed up through her, and then you pull from Alice, confident that she's dealing with that part of the software.
Obviously git scales to the large-delegated-group solution, that's where it's being used to greatest effect (i.e., the linux kernel).
Slashdot. It's Not For Common Sense
- distributed (hundreds or even thousands developers working same time, multiple dev teams working on different modules etc)
- reliable and secure (I'm releasing versions quite often and those releases have to work)
- fast (I have better things to do than wait)
what alternatives do I have?