Tom Lord's Decentralized Revision Control System
Bruce Perens writes: "He'll have to change its name, but Tom Lord's arch revision control system is revolutionary. Where CVS is a cathedral, 'arch' is a bazaar, with the ability for branches to live on separate servers from the main trunk of the project's development. Thus, you can create a branch without the authority, or even the cooperation, of the managers of the main tree. A global name-space makes all revision archives worldwide appear as if they are the same repository. Using this system, most of what we do using 'patch' today would go away -- we'd just choose, or merge, branches. Much of the synchronization problem we have with patches is handled by tools that eliminate and/or manage conflicts -- they solve some of the thorny graph topology issues around patch management. Arch also poses its own answer to the 'Linus Doesn't Scale' problem. This is well worth checking out." If you're asking "What about subversion?", well, so is Tom.
In his FAQ he states it works on any system that's POSIX compliant.
/me high-fives Tom
If you celebrate Xmas, befriend me (538
I guess I'm wondering why arch uses FTP as its network protocol. The FAQ says that it should be workable behind firewalls since the data is all transferred in passive mode, but this still seems like a huge step backwards.
So, what am I missing? I only got to read a little bit of the site before it got DDOS'd by slashdot.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
Not only 'what about Subversion' but also 'what about CVS, what about Aegis'. If you include non-free systems then what about Perforce or Bitkeeper.
:-(.
This is getting worse than journalling filesystems
-- Ed Avis ed@membled.com
A global name-space makes all revision archives worldwide appear as if they are the same repository
I don't know whether to laugh or cry...
A more distributed source control system could obviously circumvent problems like these, but with this caveat: the code that different groups work on would need to be sufficiently black boxed that most changes wouldn't require changes in other projects. It's just good programming style, but I know that this wasn't the case at ACME, and given my experiences with Corporate America I doubt it's true in most places. Maybe I'm just being pessimistic...
Anyway, it sounds like a good idea if it's used right.
That sounds like hype. In the real world, selecting the aspects of software we want to compile from on remote sites would have serious implications. The first being security. The second being quality. Linus may not scale, but he has good judgement. That's the fundamental problem.
The ability to do distributed development, manage multiple (possibly hostile or private) branches at once, good merge and diff tools, etc. sounds sort of like ClearCASE. Except of course that ClearCASE costs money, and doesn't have the global namespace thing going on. Rational had better be careful or their customers are going to move over to arch (especially since their Unix GUIs have sucked more and more with each successive release).
Bravo to the author on this tool - it sounds like a great advance of the state of the art if it works like he says.
Your right to not believe: Americans United for Separation of Church and
Other than CVS and arc, are there any other (GPL)software revision control system available, and how best you rate them ?
Muchas Gracias, Señor Edward Snowden !
Coherent code is for morons. Especially if it works without crashing and corrupting all your data.
Is your company running tools written by ma
Call me a dummy but I assumed he meant the possibility of corrupting a distributed global namespace. I presume this features some form of strong authentication system (couldn't reach the site) but it could be pretty hairy if you were doing a make world out of this using any "unofficial" patch sources, but we all audit all the code we run don't we!
Never underestimate the dark side of the Source
If your post is 1 character long, then yes.
Is your company running tools written by ma
ClearCASE has been doing this for many years now.
Nothing new. not revolutionary...
i know. I'm just an AC, but i am right.
ACID (Automicity, Consitancy, Isolation and Durability) is only something that has been implemented and tested well on high read RDBMS such as Oracle.
When you think about that, why is it that no one is using a DB backend to source control? Wouldn't that just get rid of so many ambguities? For one, we wouldn't have to deal with all the nonsence and create a million wheels, when a nice pair of rolls royces resides with a good RDBMS.
People need to think outside their brains, and in regard to source control, I feel we need to make more packages that interface well with a good RDBMS rather than create our own RD functionality in 40ks. What's the use?
Anyone know a good system of incoroprating source control with a databases? Oracle and Postgres would do.
We tried - briefly - VSS in a project involving approximately 15 developers in the same building. It was slow and awful.
CVS may not integrate so prettily into VC++, but it does work! We found switching over to CVS to be relatively painless: the only problem was that sometimes a file would be edited using Notepad or something, that shouldn't have been, which introduced ^M characters that confused CVS.
Extrapolating from our experiences, the reason why VSS worked so poorly for your company might be more due to the quality of VSS rather than the degree of distribution of your developers.
This looks really cool, if only for the fact that it finally has a sane way to rename files. It's annoying renaming, deleting, removing, and adding with CVS.
A deep unwavering belief is a sure sign you're missing something...
I've been struggling with CVS for a while now, and while it does the job I've always been thinking "There's got to be something out there with recursive add built in."
Now here comes slashdot with an actual useful story about source control and some of the options and development outside of CVS.
The only thing to find out now is if the discussion will be of any use, obviously I'm not helping...
It sounds like it has a lot of nice features, but then you realize the whole thing is written in sh? One of the nice things about CVS is that the client-server nature allows someone to use pretty much any operating system as a client. Subversion takes this to the next step, by making all connections use the client-server model.
Forcing everyone to use sh is a major hassle. I know that it would work with any "reasonably POSIX" OS, but then developers can't get arch accessibility built into their favorite tools, like NetBeans or whatever.
Creating local branches is pretty cool, though.
Mike
How about polyfork? Sounds like a great way to give equal weighting to every trivial disagreement over design.
Proud member of the Weirdo-American community.
From the article, it looks good.
But let me say that I've sometimes been in the position of having to merge branches. In my first hacking job, I had to take code that had been written by 2 crazy Polish programmers, and merge 37 non-working branches into one branch that worked. It was *not* fun, and I enjoyed a well-deserved beer when it was done.
IMO, a distributed system of archive management that doesn't make ongoing reference to a central tree is a sure recipe for chaos, and poses the risk of making software harder to install/use for the non-skilled, and creating a lot of work in merging disparate branches for the skilled.
You want package xxyzz? OK - go to Jim's store in San Diego. It's easy to set up. Oh, I forgot to tell you, you've gotta get some bits from Lucy's store in Manchester, and Frieda's fixed a few bugs too - get her fixes from Bonn. And don't forget Peter's enhancements - his store is at the Adelaide University site. What? it doesn't compile? What kind of idiot are you? Just hack it till it does compile, then put it together in your own tree!
-- In the beginning was the WORD, and the WORD was UNSIGNED, and the main(){} was without form and void...
It is an important feature of subversion that it will be CVS compatible. I manage a 10+ year old/1+GB CVS repository. CVS has a lot of faults, but I can't throw that version history away. It's too valuable. subversion gives me hope that I'll get something more usable than CVS (we'll see, won't we!) without much pain.
I'm really hoping the subversion developers succeed.
Having said that, I'm all for arch succeeding too. Perhaps it will be better for new projects. Who knows.
first Sourceforge kills Sourcecast(sourcexchange)... Now subversion gets beat the the punch.
Bet Karl Fogel's peeing his pants right now.
Poor execution Behlendorf... sell the company... get back to what you do best -> apache.
Will RIAA attack me if I put mp3 files in the source tree?
(with the permission of the author, performer, their music studio, and Aunt Tillie of course)
Coding Blog
This seems like it's worse than CVS. Functionally, I'm quite happy with CVS. The main complaint I have about it is that it isn't self-contained but invokes rcs and other shell commands in mysterious ways. "arch" seems to make things worse, not better in that regard. What I would like to see is something mostly like CVS, but something that is implemented as a clean, self-contained library with a single command line executable (with subcommands) and a built-in HTTP-based server. Until that comes along, I think I'll just stick with CVS.
Well, flowerpot, now I'm wondering whether arch uses the ftp programs, or just the ftp protocol. That is, do you need an ftp client or server installed for arch to work? From what I've seen it wouldn't be too hard to do the protocol yourself.
I still can't get to the site, so oh well.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
Subversion was deliberately designed to address CVS's shortcomings, not to break new ground. Our philosophy was essentially conservative: CVS basically works, but has some bugs and maintainability problems. Let's keep the model and fix the problems. Result: Subversion.
The ideal situation is a world where both models have good, free implementations. Then we'll all very quickly find out which model works better. :-)
-Karl
http://www.red-bean.com/kfogel
Adds renaming over top of CVS and some other niceties. Can be used to create patches that contain versioning changes. With Meta-CVS, people can restructure directories in conflicting ways, and then resolve conflicts when they merge the structure.
http://users.footprints.net/~kaz/mcvs.html
This doesn't add anything else; no atomic commits or distributed operation over multiple repositories, etc.
Of course, you can use branches to track foreign code streams, as you can with CVS. The nice thing is that you can rename things on your own branch and keep up with an unrenamed source of patches. Or if the other people are using Meta-CVS, they can give you patches that include restructuring.
Meta-CVS is currently about 1600 physical lines of Common Lisp (with some CLISP extensions and bindings to glibc2) scattered in twenty or so files. A lot is done with little!
So here it goes...
What is your favorite revision system and why?
What is the URL?
Is it open source or proprietary?
Having just gone thru the regexps arch docs it looks like Tom has done an excellent job.
The concepts Tom discusses are right on target. It appears there is plenty of flexibility to implement policy based on project requirements.
At first glance (have not installed yet) 'arch' has many of the same concepts as does BitKeeper and Sun's Teamware.
I'd like to see software like this dual licensed i.e Qt/BitKeeper style licenses.
Regards,
Kramer
ACID (Automicity, Consitancy, Isolation and Durability) is only something that has been implemented and tested well on high read RDBMS such as Oracle.
Oh, come on. ACID isn't that hard to do. Lots of systems implement ACID. Why do you imagine that only Oracle, etc., can do it?
- jon
Ganymede, a GPL'ed metadirectory for UNIX
Let's say that I don't have write access to the Linux kernel tree. So I go grab a copy and make a branch on my machine and fix it. So then I post to the kernel mailing list saying that I've fixed this bug. Linus gets all excited and want so merge my branch in, but he can't because I am offline. So he forgets, and nothing happens.
Now you could say that I could upload it to the central server, but I don't have write access to that. I wouldn't imagine that they would give me (a non-kernel developer, trust me, I'd break something) access to the tree.
I guess I just don't get how useful this will be.
If you read the paper that compares arch with subversion you'll see near the end that arch is written in about 10K lines of shell, sed and awk!
That must be awful (awkful)! Just think about trying to maintain that beast!
It would seem a lot easier to write something like this in a language like Perl or Ruby (actually prefer Ruby these days, but that's another subject....) and it would be more easiliy maintained.
Conceptually, I kind of like arch - subversion does seem to have a lot of dependencies - a DB, a web server. Maybe it would be a good project to reimplement arch in something like Ruby?
I've done SCM for a number of years, professionally evaluated version control product, and helped edit an Anti-Pattern book on the subject. It seems, at least to me, that the majority of version control systems out there have the basis covered when it comes to check-in, check-out, branching, and labeling. The standard features, if you will.
...this worked at least twice for me.)
However, most of the reasons that I've seen companies change version control systems is because of completely different reasons. Here are a few that come to mind:
- A version control system must be fast. I worked at one company where we tried to use Visual SourceSafe over a WAN; it took HOURS to share code. A good VCS should transmit the minimal amount of data.
- A version control system must provide security. All too often management uses the SCM repository as kind of a shared directory (BAD, BAD, BAD) -- and people who have no need to see or modify the code, do... implicitly.
- A version control system should provide extensive auditing and notification capabilities that can be discretely turned on and off. Allow logging the positive, the negative, and letting people know when particular operations happen to a set of files. In once case we attempted to get PVCS to automate scripts on a change to send mail to the PM. Checking in a directory flooded inboxes, since it could audit collections of code.
- There MUST be a recovery mechanism. Ever try to recover a lost SourceSafe password? Yikes. (Gaining re-entry is possible, back stuff up, change your password, do a diff. Copy pattern into the admin record with hex editor. Login as admin with new password. Change admin password.
- Again, there MUST be a recovery mechanism. I love RCS, SCCS, and PVCS for their file-related mechanisms. Why? I've had SCM systems go down hard when the database got munged. Yes, you can recover from a backup, but a lot of work gets lost. With an open file format, you can at least hand fix localized problems.
- That said, good version control systems should allow you to check in collections of files as atomic units, move files and directories, and operate on projects as a whole. Anytime I have twiddle with a repository, thereby breaking past history, something is seriously wrong with the VCS system model.
- Good systems must have an IMPORT / EXPORT capability that PRESERVES HISTORY. The less I feel locked into a solution, the more likely I'll be to try it out. Porting between system is usually painful.
- SCM systems must conform to how the CM manager wants to run things, not the other way around. Let's face it, users can and will make mistakes, and that's okay. Mistakes should be fixable. I'll never use StarTeam because it was too easy for users to check in accidentally branches that couldn't be removed. Tech support argued that version control should reflect the history of the product, where I maintain (and still do) that it should reflect the intended history. If I want to include user errors, that should be my policy, not the tools. My users should be able to reflect upon the project history and know why things changed. Period. You don't use a hack to undo a mistake.
- Branching notation should be clear and to the point. CVS has it's magic numbers, StarTeam has god awful views. Let me choose the numbering scheme, don't play games with odd/even numbering. Version numbers should not be overloaded to carry additional meta-information by the product.
- A good SCM tool should remember tag history. Suppose I accidently move or delete a tag, now I want to put it back. Suppose I want to see where it's been. This case is rare, but anyone who's had a user twiddle with the wrong tags feels this pain as sharp and deep.
- More ADMINISTRATIVE control. My big beef with CVS is when I have to twiddle with the repository structures and permissions directly to accomplish what I want done. No. No. No. There should be a tool (that audit's change) for standard operations.
- An admin should have the ability to define, enforce, and audit user permissions that should be applied cross dimensionally against repository, commands, and elements within the repository.
- Data should be stored in a manner that can be parsed by custom tools. It allows me to write extensions and automation.
- Nothing should be possible in a GUI that is not possible from the command line. The inverse holds true as well. Everything should be automation friendly. Early versions of PVCS pissed me off for this reason. As a SCM manager, I've used both, and I'll take a command line over a GUI any day. My novice users want a GUI, my advanced ones usually revert back to command lines (and integrate it with their editors).
- There must be readable 2 and 3 way diffs.
- A good SCM tool will be able to produce reports, or at least make it possible to export information that can produce reports.
- A good SCM tool should know how to handle binary files efficiently, rather than just storing the whole copy.
- A good SCM system should not put a limitation on comments.
- A good version control system should not try to "do it all" (CCC/Harvest) and do none of it well. When GUI's pop up off screen, or you have to artificially create packages for simple files, something's wrong. Which leads into...
SCM systems should operate the way the users of that system do.
There is a BIG difference between how commercial houses run things verses OpenSource projects.
Commercial groups usually have a smaller set of developers, they are known in advance, and commonly use the locking model. OpenSource models tend to use concurrency a lot more, and operate on the applying diff's procedure. (Yes, I know, exceptions are out there.)
Thus, some tools that feel more natural in some environments get quickly rejected in others. I've yet to see someone produce a readable guide about version control abstracted at a high level bringing all the terminology together. (Incidentally, I'm about to release one; email me for a draft.)
The overall problem in tends to be that people look on the side of the box for features, rather than asking if the features are even applicable for what they're doing.
Worse yet, proper SCM often gets sidestepped in commercial world. Ask: Do you want branching? You get, is it a feature?...yes! Now ask: Do you know when it's appropriate to branch, how to do the branch efficiently, how to graft branches back to the root, or how to physically do it... and you find out this is where a lot of bad CM happens. It isn't fun to inherit a screwed up repository.
The most common downfall of SCM, as I've seen in the commercial world, is a failure of the those running it (quite often over-tasked infrastructure people) failing to understand the product being built with the tool, failure by team leads to communicate repository structure, failure by management as they use the SCM tool as a substitute for communication, and failure by the developers who don't know how to use the tool and when to use the appropriate features.
CVS hasn't invoked rcs or diff or anything for ages.
Infuriate left and right
This is incorrect. The CVS numbers are internal. If you care about them at all, you are doing something wrong. Your baselines and branches are identified by tags. If you understand how the CVS numbers work, they are actually quite logical; there are reasons why they work they way they do. It's not play ``games''.
Version numbers *are* meta-information, so it's meaningless to talk about them being overloaded with metainformation. They are not intended to correspond to your product release numbers, which are usually the fabrications of a marketing department anyway, like e.g. Solaris 7 being the followup to 2.6. Do you think the Sun guys bumped up their version control system to use the number 7? ;)
Perforce incorporates a very fast database back-end to manage metadata on the server side. It's based loosely (many improvements since) on the Berkeley DB format.
It also includes some very nice database journalling and checkpointing features for robustness.
It's free for normal two-user, two-client use even. Only drawback: no open source.
I'm surprised this one got modded up. The poster clearly knows nothing about the topic; it's just an ignorant flame.
In case anyone's wondering, arch supports and uses write permissions; however, it also allows you to start your OWN server, and people can hook up to it in parallel with the main server, and get all the branches which appear on either.
You can commit all the crashy code you want on your own server, but it won't affect anyone who isn't using your server.
The genius is that your server is hooked up to the original server, live, and you can track the changes they make, merging when and where you like. If the project manager for the original server feels like it (and if you let him), he can track the changes on your server as well. If someone else has started their own branch server, you can merge directly with them as well.
VERY clever.
Although I don't dig the Subversion trashing; Subversion is also very cool for its own purposes. I'm glad Tom took the time to underline the differences, but I'm unhappy that the result is so slanted. It didn't need to be: both arch and Subversion stand on their own as superb projects, and there's even another one coming out of IBM "sometime" which has its own merits.
-Billy
How about polyfork?
... as one can spoon changes back into whichever tree one is following, knife out other changes, and fork the system themself if they wish.
Silverware would be a better name
Seriously, this wouldn't give equal weighting to every trivial disagreement any more than free source code does anyway. Whether the control system is subversion, cvs, arch, or plane ole text files, we as individuals choose which fork we want to follow. Indeed, currently the mechanism in use is ftp (or alternatively http/rsync), ie. do you ftp linux-2.4.17.tar.gz, linux-2.4.17-ac3.tar.gz, or linux-2.4.17-myfork.tar.gz. Your decision is based on your trust of Linus, Alan Cox, or myself (probably nil). Using arch wouldn't change this, it would merely give you more flexibility in choosing bits of the Linus kernel, bits of the AC kernel, etc. in creating your own, personal fork that reflects your values and interests, and if others like your choices, they can benefit as well. If they ignore your choices, then who cares? You still benefit in having been able to make and prosper from your choices yourself.
How on earth could that be a bad thing?
That having been said, my wishlist would be support of gnupg signatures and authentication and scp instead of ftp. As to it being written in a shell scripting language, so what. If you really want to run a client or (god forbit) a server under Windows, there is nothing preventing you from writing a compatible client or server in the programming language of your choice (although the mockery one would receive for having used Visual Basic would probably detract some from the feeling of accomplishment, but I digress).
The Future of Human Evolution: Autonomy
Problems with the Arch vs Subversion comparison faq listed in the article.
:)
:) No worries. :)
1. Allowing the "smarts" to reside within the clients means you are stuck with however a client is configured--how do you handle modification times if the client's clock is skewed, or messed up hard-drives? Administration of such a distributed system would be a heck of a job and not one I'd enjoy doing when you get up into the hundreds of clients.
2. SCM based on a server-client system can be unerringly fast--blindingly so. Even if communication is entirely over TCP channels, an SCM system can be built to be a speed demon.
Some interesting ideas: I like the idea of being able to mirror the files to a backup server--but there are still some pretty annoying replication problems that would need to be worked out. Does development stop while the backup is brought as up to date as possible? Do developers still have to check in their files once more if those were ruined since the last backup or mirroring cycle?
I really think these projects need to stop playing catch-up to the larger SCM systems and start leading the field with advanced and stable functionality.
Too bad I shouldn't build one.
Most people who release cool free software these days have their own domain name. This will be simply another reason to get a good one. Even if you haven't got a domain name you cold use sourceforge, so for example, my toy programming language would be net.sourceforge.stalk. I can definately live with that.
graspee
RCS to CVS to arch, same story, a decade later. However, arch is far more competively priced. ;-)
"To those who are overly cautious, everything is impossible. "
I know you probably know, but that was a different Death Star- v2.0 if you like. The reason it looked the same is because the (evil) Ewpire said they were not going to release any new battlestations, just concentrate on the security and stability of the old design.
graspee
I'm sure that down the road it would be a very slick thing to the rsync protocols for data transfer between sites, as implemented in rsync and Unison. That would provide all sorts of ooey-gooey- encrypted, compressed goodness to help network connections be used more efficiently.
The file transfer protocol isn't nearly as important as how it deals with versioning, logging, and thie likes, to be sure...
If you're not part of the solution, you're part of the precipitate.
I am getting soooo tired of this notion:
Arch also poses its own answer to the 'Linus Doesn't Scale' problem.
Look people, the "Linus doesn't scale" issue is NOT something that can be solved by replacing the use of 'patch'. Putting the Linux kernel on CVS (or Arch or whatever) would just allow people to commit stupid changes.
The reason Linus doesn't scale is not because he doesn't have enough time to run 'patch'. It's because changes to the kernel MUST be approved.
The standard that SCO, Solaris, and *BSD can probably all conform to.
With its POSIX subsystem.
With its POSIX subsystem
With BSD underneath.
I suppose Tandem may not be emulating a flavor of Unix, but who's got one of those at home? PalmOS isn't a Unix-like system, but it's getting pretty long in the tooth, and isn't a tremendously viable platform for arch anyways.
It's not outrageous to suggest that Unix has effectively "won" the mind-share war.
If you're not part of the solution, you're part of the precipitate.
Until those scripts are rewritten in C, I think I'd rather use Subversion.
So if I develop Java widgets, I should preface them net.neural.widget?
Or do I give them nz.net.neural.widget?
And what about when I get projects from subdomains going? games.net.neural.widget? or net.neural.games.widget?
Wouldn't it be easier to just use the domain name in the order it works online?
And is it better to use directories than subdomains ?
- Kaos games and encryption systems developer
Uh, no, decentralization is a capitalist idea. Centralization ("planned economy") is a communist idea. Loser.
What if I had released my projects a few years ago when I was at CIT?
I have nz.ac.cit.ee.amcp.lpg.widget then when I drop out I have to transfer to nz.net.paradise.amcp.widget
then my corporate version becomes nz.co.netlogic.esee.widget which is stolen and I lose control of that.
Then you get 2 competing projects called com.e-see.widget and nz.ac.vuw.sci.math.amcp.lpg.widget for a few months then I drop out of uni when I'm sick and go back to paradise then, now I run it from my domain name nz.net.neural.lpg.widget
Wouldn't it be easier to use a URI for the widget name?
- Kaos games and encryption systems developer
You have a good point. I also cringe upon hearing these overused terms 'cathedral' and 'bazaar'.
I wonder.
The canonical package name for your widgets would be nz.net.neural.(anything you like here)
.jar files, which have their own internal directory structure (they're slightly modified zip files).
If you own multiple domains (subdomains, or not), you pick one or more to use. The most sensible strategy would be to pick the one you were most likely to keep. Whether it corresponds to a real web page, or server, or whatever really doesn't matter - all that matters is that you control the neural.net.nz domain, and you don't use the same package name for different things as anyone else at that domain.
You do use directories for package name components - the class file for nz.net.neural.widgets.Widget (the convention is for class names to have initial caps) should go in nz/net/neural/widgets/Widget.class (replace / with your OS's directory separator if you don't use Unix). You often don't see this because classes are in
The domain has to be written backwards to put the most significant part first (otherwise neural.net and neural.net.nz would have overlapping namespaces, even though they might be owned by different people).
If so, you've noticed that when you choose to merge data from branch (A) into branch (B) [no, it *doesn't* happen automatically unless you want it to!], then you have *control* over what parts of A go into B. You may have noticed that you can ask for the differences between A and B, and go through them by hand, and accept only specific parts -- just as someone doing patching does.
No revision control system tries to replace good maintainership -- rather, their job is to make it easier.
The data manager implements optimized database services based on the Berkeley DB database package, customized for multi-user support. It maintains a meta-database describing the status and history of versioned files in the depot and transactions against the depot. The librarian is a highly efficient file archiver that stores repository files on disk local to the server. It writes text file versions in an RCS-compatible, reverse-delta format; binary file versions are stored in a standard compressed format.
Sourceforge has a shortened domain name too. So that would make it just net.sf.stalk.
Which (arch or subversion) manages conflicts best? And how do they differ from cvs? Looks like subversion uses a different version control of per commit not per file. Anyone could shed some light on this?
i thought you needed some sort of atomic test/exchange method to ensure consistency in such situations?
Everyone,
I'm about to set up a source repository at my place of employment for a new project that we are working on. I was set to use CVS, as we have in the past, until I read this article. Arch seems pretty spiffy, and would be fun to try out. My only concern is that some of our developers use non-posix (ie windowsXP) systems for development. CVS is great because there are clients available for all oses, and integrated into many IDEs. Are there any cross-platform Arch clients?
If not, one must think that perhaps this design be better implemented in Python || Perl || Ruby || Java instead of awk/sed/sh.
Thanks,
Andrew Murray
I'd also like to say, up front, to the Anonymous poster who asked:
Anyone know a good system of incoroprating source control with a databases? Oracle and Postgres would do.
Subversion does. The backend it currently uses is Berkeley DB, but the backend is pluggable. After version 1.0 comes out, expect to see a backend for one of the SQL databases pop up.
Now, on to Tom's comparison to Subversion. Caveat: I am not a Subversion guru. I lurk in the developer mailing list, and I use Subversion myself. Therefore, I may make mistakes about details, but I'm fairly certain I won't provide completely bogus information. I got some reviews on this post from the Subversion dev list, including some comments from Tom, but any mistakes in here are my own, and they're copyrighted mistakes, dammit.
I'm not going to quote whole sections; just enough for context.- Smart Servers vs. Smart Clients. Subversion
clients are also smart, although perhaps not as smart
as Arch. Diffs travel in both directions,
so a minimum of network traffic is used. Many Subversion
operations (status, diffs against the last revision, etc)
are purely client-side opereations.
- Trees in a Database vs. Trees in a File Systems
This is misleading. You *can* get stuff out of the Subversion
database with the standard BDB tools, so Subversion
isn't required. Also, because Subversion is based
on WebDAV, access to the database through a web
server is a freebee; also, Subversion is very Windows
friendly, from many points of view, which should help its
adoption in a corporate setting.
Subversion only stores the differences between two versions
of a file or directory, which is space efficient. The advantage
to being able to access a filesystem-based repository of diffs
is arguable.
- Centralized Control vs. Open Source Best Practices
In practical application, there is no advantage to the ARCH system
over Subversion. Subversion allows per-file/directory sourcing,
so you could create a project that includes sources from any number
of different repositories. (This code is not currently working
in Subversion.)
These are simple mistakes. There is also one statement that is wrong: arch is better able to recover from server disasters The argument was that, because arch is a dumb FS, it is easily mirrored. The implication is that databases aren't easily mirrored. BDB is just as easily mirrored, and most other databases are easily replicated.Other comments pointed out were:
- Subversion does not require Apache. It works over a local
filesystem just fine. If you want network access, you need
Apache.
- Subversion has all of the strengths of Apache. You therefore
get Apache access control (well defined and understood), SSL,
client and server certificates, and interoperability with other
WebDAV clients, among other things.
- With Subversion, you have both client side and server side hooks,
as well as smart diffs.
- Arch has both revision libraries and repositories. The comparison
document doesn't differentiate between them. In some cases, the
comparisons made aren't meaningful. Revision libraries, for example
"... also have to be created and maintained by the user.
So comparing them to accessing past revisions through normal means in
subversion is not a fair, or even really meaningful, comparison." (Daniel Berlin).
- When comparing Arch's repositories to Subversion's there is no
speed advantage. Arch's storage is either diffy (storing only differences),
in which case it is not easily browsed and is no faster (at best) than
Subversion; or the storage isn't diffy, in which case it isn't efficiently
stored (imagine multiple copies of each file for each revision).
- Subversion's choice of BDB as a backend was not accidental. Some of
the tools Subversion got from using BDB are: Hot
backup and replication, all kinds of existing tools that know
about BDB databases (e.g. Python or Perl bindings). A body of -
"community" knowledge. etc (Greg Stein).
I've left out vaporware features, such as the future SQL backend of Subversion 2.0.Short answer: no.
.c files (dirent incompatability, added my own isspace() macro), I finally got the thing to build successfully. But the real problem is that arch (unlinke Cygwin) treats the line endings /n and /n/r differently - as such nothing works at all. This is a fundamental assumption in arch. I don't see a simple workaround.
Long answer: no.
After I hacked the Makefiles and corrected a couple of
Please correct me if I am wrong.
And without Larry. If you have never had personal dealings with Larry, consider yourself lucky. Instant slashdot poll: how many people here has he not threatened with a lawsuit? He'll probably bring patent action against Tom Lord.
AC
A nice feature I would like to have in revision
control systems is that it can (optionally)
compress the directory recursively for you and
send it compressed, this
will minimize download time as well as relieve
developers from the daily compression of source
directories.
as to ftp, isn't ftp an acknowledged, rfc'd
protocol? while cvs style of retrival isn't?
Someone provides a machine on the Internet that contains an up-to-date branch, you can get access fairly easily because it is not official and is far from critical. You merge your changes with the branch and get some peer review by the other guys using the machine. If everyone's happy, you set a tag and notify Linus & co. They review your changes and merge them with the main repository if they like them.
This could lead to a new phenomenom: Linux-clans. Groups of programmers that share a branch and review & test each others work. This could make life a lot easier for the maintainers of various pieces of Linux. Programmers could even mature to a server with a higher status. A very good programmer gets access to an 'elite'-branch, which is heavily monitored by the maintainers. Those on an 'apprentice'-branch would have to get their baseline approved by an elite-programmer before a maintainer is notified.
This sounds extremely useful to me, far better than mailing your rough changes. In my scenario, Linus & co do scale. They don't have to scour the mailing list for patches, but can depend on requests from the elite's. Every request has been reviewed by a good programmer and is thus far more likely to be useful and integrated easily.
The Drowned and the Saved - Primo Levi
side note: you can have the CVS server convert such things automatically
Katie. It's great.
Yikes! I can't win! Okay, what I meant to say was, "let's call them Generic Company That Doesn't Really Exist Because The Real Company is Very Aggresive About Filing Lawsuits Against People Who Post Things About Them On the Internet, Inc."
People need to think outside their brains, and in regard to source control, I feel we need to make more packages that interface well with a good RDBMS rather than create our own RD functionality in 40ks. What's the use?
Anyone know a good system of incoroprating source control with a databases? Oracle and Postgres would do.
Katie (available here) uses postgresql to store all its metadata. Using a real database has certainly helped a lot in terms of ease of implementation (as you said, not reinventing the wheel).
I'm not saying it's a "good" (as in "usable") system yet, but it is definately getting there.
There are several problems that stem from the behavior you're asking about. Let's explore them.
The primary purpose of a version control system is to... store revision history! (surprise.) The secondary purpose is to provide chronological associations between files. (for instance, this documentation goes with that version of the software) What happens is the non-initiated miss the purpose of the tool and view it as a central dumping grounds for unrelated "stuff."
If you're just trying to share temporary artifacts, stick them in a public folder on your network, publish them on the intranet, don't pollute a project's repository with unrelated and transient materials just because you've got write access.
Basically, if you can't tell what you have in a repository (and know why it's there), you've got a problem on your hands.
I hesitate to reveal how many times I've had to clean up repositories that had superfluous files that didn't belong to the project or the repository in question, had versions spread across separate files, had no commented revision history or other identifying features, or where it was just being used as a backup.
Let's face it - commercial version control systems are expensive. I once worked for a place where everyone had to get a license for the client just to get non-project related materials. The side effect was that this particular tool didn't have decent security, and it allowed everyone to access the source code by virtue of having rights to the repository. I've yet to get a decent explanation why the HR department needs access source code.
Adding insult to injury, non-programmers usually don't "get it" when it comes to version control. As a result, things would mysteriously disappear. Turns out a non-developer was being 'helpful' by cleaning up stuff that didn't look "important." Thank god for backups.
The version control system is supposed to help with change control, not make you a victim of your officemates' good intentions.
Speaking of polluting repositories, or in this case, excessive growth, a lot of version control systems are not that efficient when it comes to storing binary data... such as a Word document. We'd often have people check in change after change after change to such documents, and the repository would get enormous. Doing standard checkouts on the project would take friggin' forever.
Ironically, developer documentation (which was written in LaTeX, HTML, and plain ASCII text files) worked just fine. Wonder why... perhaps the developers knew how the tool operated and took advantage of that fact? I think so. Sure, they stored graphics in it, either as PostScript or multiple JPEG/GIF/PNGs - but those changed so little, it wasn't noticed.
So, just as we've established a version control system shouldn't be used as file junk yard, we should also take note that there *are* specific tools for managing intra-documentation changes. It takes relatively little effort to use similar tags across two different toolsets, and thus keep everything in synch.
Finally, and perhaps the worse offense, is management thinking that version control is a communication tool. I'm not making this up, but I've been in meetings where management put a memo in version control and lashed at the staff for not reading it. I like to say that version control is like the national archives, you can get anything you want, if you know it's there. It is NOT the Borg Collective; it isn't a broadcast medium, and it isn't a substitution for communication - no matter when you checked it in.
Yes, CVS can be "abused" in the manner you're describing, but only for a short time. Eventually you'll run into technical limitations, strange CM policies that appear to have nothing to do with CM, and you'll be scratching your head asking "what the hell am I looking at?"
Hope this helped!
Indeed, if you do a uname -a on a Solaris 7 box, it'll tell you that it's running version 5.7--since the internal version number scheme remained unchanged even after the big marketing rename of what would have been SunOS 5.x to Solaris 2.x many years ago.
I wonder what's going to happen with it once Solaris 10 hits the streets, though...
From what I have heard, ClearCase is a centralized fileserver demanding some serious hardware. Very nice iff you get the hardware and manpower to run the server, a piece of junk otherwise.
In contrast, arch seem to be very light and decentralized. Probably a bit more demanding on the end developers, but more flexible, and much less depending on (and demanding of) a cetral repository.