Designing a New Version Control System?
tekvov asks: "When Linus Torvalds decided to use BitKeeper as the version control system for Linux there seemed to be a lot of controversy and many challenges to create a better system than CVS. My question is exactly what would this 'better system' look like? How is the subversion project, Tigris, doing at creating a new version control system? Basically, does the Open Source Community need new tools in this aspect of development? And if so, how should these new tools look?"
"My question is exactly what would this 'better system' look like?"
You said it, BitKeeper. It's there, it's very good, don't people have anything better to do than nagging about other people just charging for their own work?
If you want to give away your work, please do (I'm happy to use it) but you are not BitMovers (the company) mom and have no business telling them what to do.
CVS may be the best open source version control tool right now, it still suffers from a lot of shortcomings.
.. doesn't handle big binary files in a satisfactory way
.. but I'm really waiting for subversion to get mature and usable for production..
Generally speaking, stuff like commit emails need the addition of specific wrappers (see http://cvsreport.sourceforge.net for instance), and CVS doesn't scale well to big projects
It's quite usable
In fact, you exhibit a common misconception. If you want version control, CVS do the work. But what you seem looking for, and what do many of the alternative proposed in the replies is configuration management.
Now, what an ideal system would be? I don't think one size fit all. You need very quick local net access (bye bye CC), and you need infrequently, losely connected internet developpers. But not at the same time. So I don't think tere is one unique response to your question.
My largest problem with most of these revision control systems for Open source is the Lack of the Windows based Servers...I Know I know...but unfortunately most of the development I do is for the company I work for, and I just don't have a choice in these things. I have to develop for windows here, and I have to use windows systems, NO Linux, BSD, etc allowed. However I can't stand most of the Closed Source systems, I would love to be able to use one of the open source systems at work. Before you get tofar down that road of thought, Cygnus(or VMware, etc) is not the right fix here either, the server team does not allow that sort of software on the servers.
Power Corrupts,Absolute Power Corrupts Absolutely, leaving one person(group)in charge is absolutely corrupt.
Trying to maintain a huge source database with hundreds of developers is basically impossible if there isn't a well established team structure with source managers. When there are fewer than 6 developers working on the same project, it is fairly smoothe. With 12 or more developers, it gets exponentially harder to figure what's going on. Unless there are very few check-ins or changes and the source is in maintenance mode.
In active development phase, there may be dozens to hundreds of checkin's and changes per day, which may cause an unknown number of effects. It doesn't matter which development style you use, because in the end it comes down to whether or not the product is divided into small manageable chunks. Distributed development is a management artform and very few managers know how to do it effectively. I would put forth the idea that the tool for source management is really only 20% of the equation of distributed development/source management. Trying to address the problem by focusing on the symptons isn't a solution to the real cause of the illness.
and suprisingly difficult to use to do simple little things (deleting files, etc.) while making it nearly impossible for a normal person (read non-rational expert) to recover a file that had been deleted, but is needed once again. the problem with clearcase is that it's not only really expensive, but it almost requires an admin devoted full-time to clearcase if even a single project is using it. and it's gobs of functionality are great until you realize that 90% of projects don't need some of that stuff (especially integration with all other Rational tools which doesn't seem to work nearly as well as anyone would like) and it ends up making what they do need more complicated than it should be.
Creating a branch is very much like copying all the source to another directory (e.g. you had all your source in mysoft directory, which is your trunk.. when you branch, you copy that to mysoft2 directory and now you have a branch.. best of all, every new branch takes up a miniscule amount of disk space, storing only the files you actually change). And then Perforce supplies you with powerful integration tools to let you synch changes across branches.
It has some flaws, like no version control on client, branch and label specs, so if somebody messes up the definition of a branch, you can't step back to the last version, but otherwise it's an excellent source code management (or whatever the right term is) system.
If anyone's curious about P4, they can read the manual.
I'll add a couple things:
Here's a good question that you raised - is a three-way graphical merge really the best way to do a complex merge? I've done a lot of them and it mostly works, but at times it still seems like a sub-optimal solution. Does anyone else have a better system for complex merges that they do?
Your right to not believe: Americans United for Separation of Church and
CVS does handle locks. It is based off of RCS, which uses locks. The option wasn't removed, just made hard to find. Look at the admin command. Locks are requrired when you have concerent development on binary files, which can't be merged.
Useing locks on text files is normally counter-productive.
C 113
...
http://www.cvshome.org/docs/manual/cvs_16.html#SE
-l[rev]
Lock the revision with number rev. If a branch is given, lock the latest revision on that branch. If rev is omitted, lock the latest revision on the default branch. There can be no space between `-l' and its argument.
-L
Set locking to strict. Strict locking means that the owner of an RCS file is not exempt from locking for checkin. For use with CVS, strict locking must be set; see the discussion under the `-l' option above.
(appended to the end of comments you post, 120 chars)
The first two are of particular importance.
By processing the 'commits' as a transaction it guarantees that only one person is committing at a time.
With both 'Transactions' and 'change list management' commits can be rolled back in reverse order to revert the system to any previous state should a major commit (with many files) go wrong.
I think good basic operations would be 'Catch up' (merge changes into local workspace) then you can run tests and check everything is ok. Then the atomic 'Commit'. The system would also needs to check that no other transactions has been processed between the 'Catch Up' and 'Commit', or the developer should be forced to catch up again.
Why do most version control clients have 6 million options but don't contain just these simple 2 operations you want? Probably mainly for historic reasons I guess.
I think the version control system on VisualAge for Java MicroEdition had a system like this called 'Team Streams'.
8. Needs smarter add functionality. I don't like writing stuff like 'find ./src/ -name "*.java" | xargs -n 100 | cvs add' just to hunt bring in my new source code.
Thats really the same thing as
2. Updates don't always work as expected. They won't grab new directories and a few other quirky things.
If CVS understood the simple concept of a sub-directory (I don't even want versioning!), then most of these problems would go away.
10. I can't think of a tenth thing.
How about the ability to rename a file or directory without having to piss about with cvs remove, then a cvs add, which not only kills your version history, but is also a royal pain in the ass?
As a Java developer there is one big thing missing CVS. Code refactoring support.
For example:
Lets say you are working on a large project 20 or so developers. And you create a little utility class for the area you are working in. You check in the code to your module (or package ) and use it. A few of you buddies are running into some problems that your utility can solve so the end up using the class. Now a few months later a large amount of code now uses your little utility, and the leads want to move your class to the global utility package. Tools exist that can quickly move the class and change all of the references that use that class. But to check in this change is a nightmare.
The thing is in Java this type of operation is common, and is good for the project (keeps the code clean). But until a version control system has proper code refactoring support it will always be hard to do.
Honestly this sounds a lot like many of the features that perforce has (which I use at work too).
Atomic commits -- if perforce can't process all your files in your changelist, it won't submit them. this means if one of hte files in your list is out-of-date with the server version (your revision number is lower than the one on the server, which means you have to resolve the merge) or if you've done something that perforce doesn't like with a file. you can't force it either.
changelist and access control - perforce sets up "clients" which map it's depot on your local computer. you can create as many changelists as you need and as you check out files add them to various change lists, submit one changelist while you have others open, submit some files from a directory and keep others checked out
web viewer/graphical diff - there's a web viewer and the windows version has a diff program.
it does labelling, it supported on EVERYTHING thing (including Mac os pre-X via the Macintosh Programmers Workshop via the Command Line, and OS X via command line)
You shouldn't have to be a specialist to use a version control system.
My IDE can
/usr/bin.
-- rename a variable or class, and have the changes propagate through every file in the project
find + sed
-- Flag most syntax errors or mismatched parameters I produce while I write them
I saw Visual C++ do this. I could type faster than the machine would let me. Even worse, it prompted me for mouse clicks on the fly. Basically, auto-checking is a kludge and gets in the way. Syntax checking is what good eyes and good compilers are for.
-- press a key and have every use of the variable the cursor is on highlighted in purple.
find + egrep
And dozens of other things.
And dozens of other general tools in
Sure, you can do them all by hand, but much slower and more error prone.
Not necessarily slower nor more error prone, and general tools, such as grep, sed, and awk, can be used in generating reports about source code that are extremely useful in gaining understanding about where to go next. For example, in a matter of minutes, I was able to pick out every function call that relied on a certain 3rd party API and send that list to the vendor for a support request.
In short, hard-wired GUIs inhibit the system rather than help, and the extra bugs introduced by the raw complexity of a GUI-based system can be haunting. Also, text-based tools are programming-language independent and provide seamless reuse across projects.
Healthcare article at Kuro5hin
I'll help you burn some karma =)
VSS is the best SCM tool I've used as well (and I've tried them all), at least from a feature POV. The problem I've always had with it is the fact that it's not a client/server type application. The engine actually sits on your desktop and everything is done over the network directly to the file system on the repository server. To be fair, given this type of design it's actually amazing that it doesn't have more problems, but it just doesn't work well, especially for larger teams. Security is laughable and the automation services suck rocks.
The thing is, VSS is a hack on top of a port of a very old tool. The inside party line at MS is that they need a good C/S source control tool for .Net, but they don't know where to start - at one point they were talking about hacking VSS yet again. It think they need to use the basic VSS algorithms (the merge/diff are great) but completely rewrite the engine from the ground up.
I sure hope they come up with something soon.
However, the last time I looked at subversion, it had some security shortcomings too. One which looked simple to overcome was that, since it used ftp, it sent passwords in the clear over the Internet to change data. This is a crazy thing to do, but is easily fixed (e.g., by using sftp instead of ftp).
More serious is the notion in subversion that all developers are totally trustworthy. It appears that any developer could modify the files on the code server and make it appear that someone ELSE made a given change. Now clearly developers with passwords have to be trusted to some extent, and certainly SOMEBODY has to be trustworthy (e.g., the server administrator or the person who validates keys), but this kind of total trust of ALL developers isn't warranted in many cases. Even if you expect others to find a security flaw, you'd like some mechanism to backtrack to who made the changes. I didn't do a serious security analysis to see if subdomain countered this, though, and I haven't followed it since. I'd be curious if others have examined the issue more closely.
- David A. Wheeler (see my Secure Programming HOWTO)