Ease Into Subversion From CVS
comforteagle writes "While you have a nice leisurely Sunday afternoon/evening you might want to read this fine article on easing into Subversion from CVS. Written by versioning admin Mike Mason, it talks about the philosophy and design behind Subversion (now 1.0), how it improves upon CVS, and how to get started using it."
We're switching. CVS is crufty, buggy, and slow. That alone is reason enough to switch, but atomic commits and faster and more transparent branching will be, in the long run, a more fundamental win.
'jfb
To spur "enterprise Linux," Big Bang, the distributed two-phase commit.
Many projects follow the "make branch, fix bug in branch, test branch and then merge" cycle, which makes a lot of sense.
"The slave who knows his master's will and does not get ready...will be be beaten with many blows."Luke 12:47-48
Do developers out there voice the need to store binaries? I can imagine this being needed for web developers and such, but I think programmers can just build their binaries from CVS.
Yes, developers definitely need to store binaries. I worked on a project awhile back where the boot block code was a finished binary. Because CVS was used to house the project, a horrible kludge involving UUENCODE had to be used to store the binary commits. Sometimes the binary was created by a totally different tool that the main build machine doesn't have. In the case I speak of, the binary was built with an expensive licensed assembler for an Analog Devices DSP chip, and contained as a body of the 'build' because it was dynamically 'injected' into the dsp processor from the native processor, which happened to be an 80196.
There are always cases where a binary needs to be committed. Think about bitmaps and other resources. It doesn't make sense to 'generate them from source' every time a build is done.
Given all this, it's my understanding that with newer versions of CVS binaries can be committed safely. Is this even an instance where 'Subversion' is needed?
---
Ok, I saw some questions about why people should switch from CVS to Subversion. The article does a nice job of covering what features Subversion adds, but people still seem to wonder why these are important.
Atomic Commits:
As stated in the article, if something goes wrong in the middle of a CVS commit (e.g. network goes down) it can leave the commit only partially complete. This can be a problem if changes in multiple files are dependent upon each other. Say I add a function to an API, then call it in other file. If the call gets committed and the API change doesn't, now the code in CVS won't compile. With atomic commits if the connection was dropped the commit would simply roll back. Then when my network came back up I could try to commit again, but the repository would never be left in a state where it didn't compile.
Constant Time Tagging/Branching:
In Subversion tagging and branching are fundamentally the same, they're both executed as a "copy" command. I'm not sure what the execution time is for these operations in CVS, though I believe it's linear to the size of the repository. In Subversion this is an O(1) operation. While one of the posts commented on tagging being an infrequent operation, this may be true, but why not let it be fast anyways? However, no matter how often you do tags, constant time branching is nice. I can at any time quickly create my own branch of a project to work from. Working in my own branch means that I can keep very granular track of my changes by committing frequently, without worrying about breaking something else. Once I'm satisfied with my changes I can merge my branch with the main code.
Storing Binaries:
"Binaries" does not necessarilly mean compiled code. There are plenty of things that can benefit from this. Anywhere you use graphics: web programming, GUI programming, or say game or other 3D programming andy you want to store your models. Or, you can store documentation in the repository: PDFs, Word docs, spreadsheets, etc.
Finally, the barrier to switching isn't all that high. The command line program has quite similar syntax, so switching is pretty easy, and the other interfaces such as the web viewer, TortoiseCVS, and IDE integrations generally have counterparts for Subversion.
Well, that's all I can think of for now. I'm actually going to try to get my company to switch over to Subversion from a commercial software they were using when we start on our new product. We're using a Java applet to interface with the repository now, and it's not very nice. CVS would work, since the main thing I want is integration with Eclipse and IntelliJ Idea, but there are plugins to support this with Subversion as well. However, Subversion has nice feature CVS doesn't, so I don't see any reason to use CVS over Subversion.
Once a week, a snapshot release is made. That means a tag is added. This operation takes, on average, 40 minutes, because the GCC source tree is large.
Every time someome makes a branch, they create a tag just before branching (for use later on, with diffs and merging). 40 minutes to tag, another 40 minutes to branch.
All because these are, stupidly, O(n) operations instead of O(1). We'd like to move to Subversion, but can't, until they get annotate ('svn blame') fully working, because GCC developers spend a lot of time doing "revision-control archaeology".
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
Someone else already mentioned the ability for live backups with Subversion. Another benefit of the database is built-in journaling support. BerkelyDB logs any changes before making them, so if your system crashes or something, the DB will be restored to a stable point. This is MORE reliable than what CVS offers, even with a journaling filesystem. Also I'm pretty sure that if you REALLY need to hack the DB, there are utilities that will let you do this. However, most of the scenarios that CVS admins needed to hack the ,v files for are no longer a problem in Subversion.
- Finger feel is very similar to CVS
- Flexible directory layout & tagging
- Extremely stable development.
Subversion Bad Points:- Database & log files take up a LOT of space.
- Quite hard to share repositories
- No way to mark your branches (if you accidentally check out the directory containing your branches, you just got 50 gigs of 99.9% identical files...)
- No distributed development
- Pretty weak merging
Arch Good Points:- Extremely good distributed development
- Super easy to share repositories
- Pretty strong merging.
- Very stable development
Arch Bad Points:- Forces you to give your projects weird names ("my-project--branch-1--1.1").
- Forces each branch into a different top-level directory in your archive ("my-project--branch-2--1.1").
- Doesn't feel anything like CVS.
- Pretty slow (but they're working on it).
- Somewhat difficult to resolve merge conflicts
I wish I could love Arch because distributed development absolutely rules. I could tolerate its bizarre command set, but I simply won't accept arbitrary (and ugly) constraints on what I name my projects and branches.Verdict: I'm still using CVS. Subversion is very close to pleasing me enough to switch... I'll probably ditch CVS some time this year.
Do developers out there voice the need to store binaries?
There are definitely reasons for storing binary (non-text) files in a version control system:
WWTTD?
If you want to start svnserve as a windows service, google for srvany.exe, it allows you to run a regular win32 exe as a service.
Yep, Subversion comes with a conversion script, cvs2svn, which is under very active development right now. It's not quite so wonderful at converting CVS repositories with complicated branches, so you'll want to double-check the conversion, but lots of people are reporting success converting huge multi-gig repositories over to Subversion.
Look here...
--
All extremists should be taken out and shot.
...I am also wary of database-based products which are tied to one particular database...
Subversion has a utility that might assuage your fears: The dump command can do a (full or incremental) dump of your repository such that you can completely recreate its history. If you use this command for backup, you will be assured that you don't lose any data.
As a bonus, the dump file is human readable, so there should be no fear of losing data to an inscrutable binary file.