Tom Lord's Decentralized Revision Control System

← Back to Stories (view on slashdot.org)

Tom Lord's Decentralized Revision Control System

Posted by timothy on Tuesday February 5, 2002 @10:14AM from the interesting-approach dept.

Bruce Perens writes: "He'll have to change its name, but Tom Lord's arch revision control system is revolutionary. Where CVS is a cathedral, 'arch' is a bazaar, with the ability for branches to live on separate servers from the main trunk of the project's development. Thus, you can create a branch without the authority, or even the cooperation, of the managers of the main tree. A global name-space makes all revision archives worldwide appear as if they are the same repository. Using this system, most of what we do using 'patch' today would go away -- we'd just choose, or merge, branches. Much of the synchronization problem we have with patches is handled by tools that eliminate and/or manage conflicts -- they solve some of the thorny graph topology issues around patch management. Arch also poses its own answer to the 'Linus Doesn't Scale' problem. This is well worth checking out." If you're asking "What about subversion?", well, so is Tom.

20 of 291 comments (clear)

Min score:

Reason:

Sort:

Re:Question by Anonymous Coward · 2002-02-05 10:39 · Score: 0, Informative

Mr. Cowboy,
The only other system in "widespread" use is GNU RCS. GNU RCS is different in that it's actually used by respectable organizations, not just amateurish Cheap Software projects. This is because it is similar to older, commercial systems which aren't available on modern systems. Unfortunately, development of RCS has fallen by the wayside too, and CVS is almost guaranteed to be the only Cheap Software system available in five years.
But, honestly, if you're have a Real Job at a Real Company, the cost of RCS is usually insigificant. And for personal projects and Cheap Software shit, CVS is more than adequate.
If you're like most Slashdot readers -- i.e. living in their parents basements and "hacking" Perl while living on food stamps -- and thus forced to use CVS, I recommend Fogel's book. It's actually available online, but I recommend buying it, because it is unique in two aspects: first, it's the only Coriolis book I've seen that isn't total BULLSHIT; second, it's some of the best-written Cheap Software documentation you'll ever read. If you're poor or stupid enough to use Cheap Software RCS, this book is essential. Fogel is really too good for Coriolis... I'd like to see this book reprinted by New Riders.
HTH, you delicious newbie fag!
-- The_Messenger
Re:From his faq by The+Man · 2002-02-05 10:55 · Score: 2, Informative

Anyone know a good system of incoroprating source control with a databases? Oracle and Postgres would do.
Well, it's certainly not a GOOD source control system, but I know for a fact that starteam uses a database backend. I'm pretty sure Rational ClearCase does also, and I'm told it sucks a good deal less. Anyway, there are a lot of problems with starteam; one of them being its strong preference for running on microshaft platforms, another its lack of database support (access, sql server, and oracle only - gimme a break!) and its outrageous cost (10s of $k for a small team plus massive server hardware). So, yeah, it's been done, but I'd much rather use even CVS than starteam. ClearCase, well, I'd love the chance to see it, but I never will at this cheapass company.
Re:sounds like ClearCASE by Anonymous Coward · 2002-02-05 10:55 · Score: 2, Informative

Except of course that ClearCASE costs money
Actually, I wrote an open-source implementation here (with a few additions: mounting the repository as a filesystem, and a couple of other things as I note them.). Actually, I didn't really "write" it, just cleaned it up a little (besides these additions).. The original "implementation" in open source is just the output of program to turn my reverse engineered bytecode into pretty object code. Then I gave it names and stuff.
NOTE: You can only do this with COPYRIGHTED but UNPATENTED software. You can't circumvent a patent by reimplementing it with different control structures and variable names. You CAN do so with a copyright. If the binary is totally different (based on objectification), then so is the content. (This is the "clean room" reimplementation you sometimes hear about.)
Re:And others by Polo · 2002-02-05 11:14 · Score: 3, Informative

Here is a comparison to cvs
Re:Question by T-Punkt · 2002-02-05 11:25 · Score: 2, Informative

Are you a troll or just uninformed?

CVS is built upon RCS, they use the same fileformat to store revisions. Actually you can see CVS as "RCS + network support". Using RCS instead of CVS doesn't buy you anything. Since you mentioned the GNU RCS homepage, this is from the GNU CVS homepage:

"While CVS stores individual file history in the same format as RCS, it offers the following significant advantages over RCS:
[...]"
(Read the rest on http://www.gnu.org/software/cvs/ )

So saying RCS is for "Real Jobs" at "Real Companies" and CVS is just for "amateurisch Cheap Software projects" just makes you look pretty dumb IMHO.
Subversion or Arch or both? by kfogel · 2002-02-05 11:31 · Score: 5, Informative

I hope both systems (Arch and Subversion) get some widespread use. Like a lot of Subversion developers, I'm genuinely curious to see a) how well the Arch model works in practice, and b) how well Arch's implementation of that model works out. If it turns out to be winning, then that'll be a big step forward for collaborative projects & free software. Arch sounds a lot like Bitkeeper only without the license problems, and I've talked to some happy Bitkeeper users before (a small sample, so it's hard to know whether we're dealing with a Shift To Better Paradigm or just good software).
Subversion was deliberately designed to address CVS's shortcomings, not to break new ground. Our philosophy was essentially conservative: CVS basically works, but has some bugs and maintainability problems. Let's keep the model and fix the problems. Result: Subversion.
The ideal situation is a world where both models have good, free implementations. Then we'll all very quickly find out which model works better. :-)
-Karl

--
http://www.red-bean.com/kfogel
1. Re:Subversion or Arch or both? by qbalus · 2002-02-05 11:52 · Score: 3, Informative
  
  I've been keeping an eye on subversion, as the goals are noteworthy. Fundamentally Bitkeeper and now 'arch' model is very powerful. I used Sun's Teamware (Bitkeeper is an enhance Teamware) in organizations with over 100 developers, and remote development and it required almost zero administrative overhead. The core of Sun's Teamware, Bitkeeper, HP's old KCS, Sun's Smerge/Smoosh, and 'arch' is simply the branch/merge capabilities. Once this problem is solved, then the rest of the services can be built around it. This is where most SCM systems fall flat on their face... They lock you into a centralized server model, user interface that is clumsy, terminology that is cumbersome, policies that don't meet the consumers needs, etc...
  
  I view 'arch' as having a great model with a very simple implementation. Because of the simplicty, 'arch' developers will be able to respond very quickly with bug fixes and new functionality, and others can build around 'arch' to support their own policies, and process flows
  
  Regards,
  Kramer
Check out Meta-CVS. by Kaz+Kylheku · 2002-02-05 11:33 · Score: 4, Informative

Adds renaming over top of CVS and some other niceties. Can be used to create patches that contain versioning changes. With Meta-CVS, people can restructure directories in conflicting ways, and then resolve conflicts when they merge the structure.

http://users.footprints.net/~kaz/mcvs.html

This doesn't add anything else; no atomic commits or distributed operation over multiple repositories, etc.

Of course, you can use branches to track foreign code streams, as you can with CVS. The nice thing is that you can rename things on your own branch and keep up with an unrenamed source of patches. Or if the other people are using Meta-CVS, they can give you patches that include restructuring.

Meta-CVS is currently about 1600 physical lines of Common Lisp (with some CLISP extensions and bindings to glibc2) scattered in twenty or so files. A lot is done with little!
Re:is arch CVS compatible (like subversion)? by e40 · 2002-02-05 11:50 · Score: 2, Informative

According to this, there is a plan for CVS repository conversion.
Cool.
CVS is self contained by A+nonymous+Coward · 2002-02-05 11:54 · Score: 3, Informative

CVS hasn't invoked rcs or diff or anything for ages.

--
Infuriate left and right
Neither funny nor accurate by William+Tanksley · 2002-02-05 12:08 · Score: 5, Informative

I'm surprised this one got modded up. The poster clearly knows nothing about the topic; it's just an ignorant flame.

In case anyone's wondering, arch supports and uses write permissions; however, it also allows you to start your OWN server, and people can hook up to it in parallel with the main server, and get all the branches which appear on either.

You can commit all the crashy code you want on your own server, but it won't affect anyone who isn't using your server.

The genius is that your server is hooked up to the original server, live, and you can track the changes they make, merging when and where you like. If the project manager for the original server feels like it (and if you let him), he can track the changes on your server as well. If someone else has started their own branch server, you can merge directly with them as well.

VERY clever.

Although I don't dig the Subversion trashing; Subversion is also very cool for its own purposes. I'm glad Tom took the time to underline the differences, but I'm unhappy that the result is so slanted. It didn't need to be: both arch and Subversion stand on their own as superb projects, and there's even another one coming out of IBM "sometime" which has its own merits.

-Billy
Re:Confusion about version numbers. by wls · 2002-02-05 12:22 · Score: 3, Informative

Excellent point; poorly worded on my part. In general, your statement ought to be true about all version number schemes inside a repository. (Cederqvist, section 4.3)

Labels are our friends. Though, I've actually heard people using phrases like "We're modifying the 1.19.3.4.7.x branch today." That doesn't convey a lot of meaning.

However, based on real world practices, people tend to use revision numbers as version numbers. They shouldn't. And there is a difference between the two. Your point illustrates that well; thank you for raising it. I'd like to think Sun didn't tweak their internal revision numbers to mirror product version numbers.

Where I was going was, if numbers are going to be used to convey repository structure, it should be hack free. If revision numbers are going to be used to convey information, the user should have control over what gets used. The reserved use (which personally I like), in CVS's case came from [Cederqvist section 13]. PVCS is pretty darn good about giving the right level of control for those who want to twiddle numbers directly.

I'm simply saying it's up to the person running the repository to decide -- ideally they should have a clue of what works well.
Re:gasp--a mess of shell scripts by btonkes · 2002-02-05 12:25 · Score: 2, Informative

From the cvs info pages:
CVS started out as a bunch of shell scripts written by Dick Grune, posted to the newsgroup `comp.sources.unix' in the volume 6 release of December, 1986. While no actual code from these shell scripts is present in the current version of CVS much of the CVS conflict resolution algorithms come from them.
A "mess of shell scripts" can be very useful for a proof-of-concept.
Re:From his faq by Graspee_Leemoor · 2002-02-05 12:41 · Score: 3, Informative

You'd think that using an rdbms would give you lots of control over your source tree, but think again. Any decent rcs works incrementally- i.e. you are storing deltas, not always whole lines of code.

The indices (stuff this "indexes" crap) would be really bad and slow on all your tables.

Also RDBMSs suck at representing hierachies, which source trees naturally are. In fact, I dare say the only reason that RDBMSs are so widespread and accepted today is that originally it was much faster to do this rather than use an OO, hierachical way of doing things.

The way you store things has to be written specifically so that it fits in with the way projects work and evolve.

Forgive the lameness. Haven't had my 2nd coffee of the day yet...

graspee
Re:Some SCM Observations by Anonymous Coward · 2002-02-05 12:54 · Score: 1, Informative

I've yet to see someone produce a readable guide about version control abstracted at a high level bringing all the terminology together. (Incidentally, I'm about to release one; email me for a draft.)

Some good work in clearing up the CM-terminology has been made in:"Streamed Lines: Branching Patterns for Parallel Software Development"
Re:Seems like a big step backwards... by Fweeky · 2002-02-05 13:04 · Score: 3, Informative

Well, it's not *entirely* in sh:

Totals grouped by language (dominant language first):
ansic: 61064 (66.48%)
sh: 27853 (30.32%)
lisp: 1868 (2.03%)
awk: 1044 (1.14%)
sed: 24 (0.03%)

(If you want more detail, run sloccount over it yourself)

Anyay, it could be worse; it could be written in Perl ;)
Java package name prefacing by smcv · 2002-02-05 13:40 · Score: 2, Informative

The canonical package name for your widgets would be nz.net.neural.(anything you like here)

If you own multiple domains (subdomains, or not), you pick one or more to use. The most sensible strategy would be to pick the one you were most likely to keep. Whether it corresponds to a real web page, or server, or whatever really doesn't matter - all that matters is that you control the neural.net.nz domain, and you don't use the same package name for different things as anyone else at that domain.

You do use directories for package name components - the class file for nz.net.neural.widgets.Widget (the convention is for class names to have initial caps) should go in nz/net/neural/widgets/Widget.class (replace / with your OS's directory separator if you don't use Unix). You often don't see this because classes are in .jar files, which have their own internal directory structure (they're slightly modified zip files).

The domain has to be written backwards to put the most significant part first (otherwise neural.net and neural.net.nz would have overlapping namespaces, even though they might be owned by different people).
Re:Uggghhh.... [OT] by cduffy · 2002-02-05 14:03 · Score: 2, Informative

What it would do is force the downstream forks to stay sync'd with Linus's version, and thus make merging between them easier. Yes, the code still needs to be reviewed -- but that's not the only task involved in maintaining a tree.
Re:Seems like a big step backwards... by gstein · 2002-02-05 14:22 · Score: 2, Informative

It may be interesting to note that you can do an "svn commit" to check in a change to a .html file and have it immediately appear on your web site. In fact, SVN uses a URL to specify the repository to check out. That URL can be your website. For example:
$ svn checkout http://mysite.example.com/ -d site $ jed site/index.html $ svn commit -m "more tweaks" site
Your tweaks are immediately published.

(of course, it sounds like you want a staging server in there, and some kind of workflow, but that can be done and is an exercise for the reader... :-)
Re:From his faq by owenomalley · 2002-02-05 17:43 · Score: 2, Informative

Actually, I'm team lead on a CM system where all of the metadata is in Sybase. We use Sybase replication to keep multiple servers at different sites in sync with each other. (Sybase has a nice replication model that will store changes in a stable queue until the remote server is available again.) Anyways, using a real database means that our tool scales to insane levels (we see peaks on one project of 20,000file versions/day). We also get the ability to do live backups, etc. It is also very nice being able to write adhoc queries against the database in sql. (ie. in the last month, show me how many file versions were generated at each site on each day.)

While we keep all of the metadata in Sybase, we store the actual bits in the filesystem.