Pragmatic Version Control Using Subversion

← Back to Stories (view on slashdot.org)

Pragmatic Version Control Using Subversion

Posted by timothy on Thursday February 10, 2005 @10:20AM from the subvert-your-own-intentions dept.

Dean Wilson writes "When it comes to software development the Pragmatic Programmers are widely recognised as masters of their trade, but with the release of their award-winning Starter Kit Series they've begun to gain a reputation for writing, editing and finding book authors that are as talented as they are. Pragmatic Version Control Using Subversion by Mike Mason is an excellent example. The book itself is an introduction to using Subversion (focusing on the command-line tools), but while it clearly covers all the essentials: basic commands, tagging, branching, etc. it also delves into some of the related, but often overlooked areas of version control. When it comes to version control systems, CVS has long been the workhorse of the Open Source and Free Software movements -- but with the release of Subversion, it's time to put the old nag to rest; and this book tells you what you need to do it." Read on for the rest of Wilson's review. Pragmatic Version Control Using Subversion author Mike Mason pages 224 publisher The Pragmatic Programmers rating 8 reviewer Dean Wilson ISBN 0974514063 summary An excellent guide to version control with Subversion for developers and sysadmins

Chapters on repository layouts, integrating third party code (into your source tree and products) and conflict resolution all help raise this book from just being a single application tutorial into a best practices guide that you'll come back to long after you've gained confidence with Subversion itself.

Pragmatic Version Control Using Subversion is very similar to Pragmatic Version Control Using CVS, but this is in no way a criticism! The previous book was the best introduction to CVS that I've read, and this related volume manages to retain the winning formula while adding useful sections, such as CVS hints, to help people migrating across.

While the book has a broad appeal, the ideal audience are those developers who know they should be doing version control but have heard it's too complex, have been burnt by previous mistakes, or just don't know where to start. Seasoned developers will also find this book useful, but in different ways. For instance, using it as an easy to scan and follow reference, handing it down to less experienced colleagues, or even just for quickly bringing themselves up to speed when moving from CVS to Subversion.

Considering the book's slim size (or quick download, if you purchase the PDF version) it packs in surprisingly wide coverage of the important topics. The first two chapters provide an overview and sell the benefits of using a version control system. They cover what should and shouldn't be under version control, and clearly explain the terminology required to understand both the technology in general and the book's later chapters.

Chapters 3, 4, and 5 get you working from your own Subversion repository and introduce the essential commands. They show how to create, add and import your projects in a clear, easy-to-understand way. Once you have some files to work with, they take you through a well-paced tour of the simple operations; checking out, committing and accessing the files in different ways.

Following these, Chapter 6, "Common Subversion Commands," shows some of the more complex but essential tasks you'll want to perform in Subversion; setting properties, looking at changes and their associated history and how to handle merge conflicts. These are all presented in short sections that provide enough information to be useful on a day-to- day basis while not leaving beginners bogged down in the minutiae.

Jumping ahead slightly, we leave the part of the book that everybody using Subversion should read and move onto the more powerful, and complex, functionality such as "Using Tags and Branches" (Chapter 8) and the more abstract topics of "Organising Your Repository" (Chapter 7) and dealing with "Third Party Code" (Chapter 10).

Chapter 8 stands alone in the second half of the book due to its coverage of a very technical subject; chapters 7, 9 and 10 are more abstract. Tagging and branching are one of the more notorious areas of version control, but this book -- much like the CVS book before it -- manages to explain not only when and how to use both tags and branches, but also provides enough guidance to allow the reader to 'smell' when something's wrong and adding them would make it worse.

Chapters 7, 9 and 10 logically combine to cover the issues surrounding setting up your own project, including the project's structure, the integration of third party code, external projects, and binary libraries such as Nunit or Java mock libraries. Considering the amount of maintenance coding (as opposed to new projects) that happens in the world, these chapters might not be immediately useful to a fair chunk of the readership. I don't think they should be removed, though -- better to leave them in and show best practices and experience-driven common sense than remove them and let people make the same mistakes over and over again.

It's worth noting that the appendices are a lot more useful than the filler material typically found lurking at the back of a book -- they cover a couple of topics that don't fit elsewhere and help round out both the book's coverage and appeal.

Appendix A is more relevant to system administrators than developers. It shows how to install Subversion on the server. It then gives a brief introduction to configuring, serving (using either the native svnserve, svn over SSH or via Apache) and adding basic security to your repositories. It finishes off with a short, but useful, digression into backing up your hard work.

This appendix provides a valuable, quick guide to getting a Subversion install in place. It's a good starting point for anyone who needs to actually run and maintain a Subversion server.

The remaining appendices vary in usefulness. Appendix B is a concise introduction to migrating a CVS repository to Subversion; this is something you either need desperately or won't care about. Most of Appendix C shows how to perform common tasks using the TortoiseSVN extension for Windows Explorer; this won't appeal to the Unix/Linux crowd but might help sway Windows developers away from the hell that is Visual Source Safe.

In short, whether you're new to version control in general or just Subversion itself, this book is highly recommended. Clear, concise and crammed full of useful, important and dare I say, pragmatic, advice and information. An excellent book in its own right and a worthy addition to the Starter Kit Series.

Dean Wilson is a System Administrator at Outcome Technologies. His personal site is unixdaemon.net. You can purchase Pragmatic Version Control Using Subversion from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page

19 of 235 comments (clear)

Min score:

Reason:

Sort:

sounds interesting... by n4ru70+f4n · 2005-02-10 10:35 · Score: 2, Insightful

The book sounds very interesting, but the way it is described makes it seems like it only goes through the basics. What if you want more in-depth reading on tagging and other simple necessities that you cant go without knowing about well?
Re:Anyone considering switching to SVN... by cduffy · 2005-02-10 10:46 · Score: 2, Insightful

By the way, the GCC team is starting to make experiments with svn, and it looks like they might switch in 2 or 3 months.

That's something of a disappointment -- I'd hoped to see them on Arch. Given the magnitude of their project and the number of 3rd parties interested in maintaining their own branches, I'd think that distributed revision control support would be as valuable a feature for them as it's proved to be for Linus.
Re:Benefits of Subversion's revisioning system? by Alioth · 2005-02-10 11:01 · Score: 2, Insightful

It depends on your goals. I prefer SVN's revisioning system (having used both CVS and CMVC, which use the same revisioning). The nice thing about it is if you extract revision n, you'll know all the files you extract from the repo are as the repo looked when revision was made. This is non-trivial with CVS or CMVC's revisioning system unless you know the version number of all the files in the entire repo and extract those version numbers.

It's still easy to tell when individual files were changed using something like the 'svn log' command.

--
Oolite: Elite-like game. For Mac, Linux and Windows
Other justification... by SuperKendall · 2005-02-10 11:02 · Score: 3, Insightful

The way I like to put it to point out why ClearCase and others of its ilk are such a beast to work with compared to CVS, is that broken merges are fixed BY the people that cause them, BEFORE they screw up the repository.

With the "normal" source control systems that use the reserve/checkin style, a programmer may work on several files - perhaps they even work on them unreserved to be nice to others (as is becoming policy here).

You still have the issue of "The Merge" That is, the programmer doing the development is nt getting changes made to those files while he is working, and others are not seing his work.

So when it comes time to check in all the files, a prigrammer checks most of them in - but then possibly runs into issues with merges in the last few files. Lots of commercial merge tools seem poorly designed to help the average programmer deal with issues, sometimes they simply automatically hose the merge without the programmer even knowing.

Using a CVS system, the programmer is able to keep in sync all through development by constantly updating files. That means that issues caused by him will also be resolved by him on the fly, instead of someone else discovering a merge went wrong later. It makes the day of checkin no longer something to fear, since you are already synchronized and can be reasonably sure the system works BEFORE you do the checkin, instead of checking after and possibly having a broken build.

Sure that checkin might cause someone else issues, but they will tend to be isolated to a developer and not affect the whole team at once.

So basically CVS style operation encourages programmers to keep in sync with what other people are doing - I honestly believe that if it did not work this way and people were forced to deal with reserved files, the whole OpenSource movement would be a fraction of its current size and success.

Yes I know ClearCase can kind of do something like that, but not very well and I have seen clear case totally bungle automatic merges before.

--
"There is more worth loving than we have strength to love." - Brian Jay Stanley
1. Re:Other justification... by jrumney · 2005-02-10 23:22 · Score: 2, Insightful
  
  and I have seen clear case totally bungle automatic merges before.
  I have seen *designers* totally bungle merges before.
  The biggest problem with Clearcase is that developers (or designers if you prefer) do not want to take a 3 day course to learn how to use a tool that is peripheral to their core job. So they end up using a very small part of it (badly), so small (and so badly) that they might as well be using RCS.
Re:Benefits of Subversion's revisioning system? by avalys · 2005-02-10 11:10 · Score: 4, Insightful

It's much better to have repository-wide revision numbers, because it means that a revision number identifies the state of an entire project at a specific point in time, not just the state of specific file.

This doesn't make it any harder to track changes to individual files. When you run "svn log" on a file or directory, you only see the log messages/revisions listed where that file or directory was changed.

It's really quite an elegant system.

--
This space intentionally left blank.
Re:Is it better than Perforce? by mgm · 2005-02-10 11:13 · Score: 3, Insightful

When I was initially getting involved with Subversion I found that most of its features reminded me a lot of Perforce. The repository-wide revision numbers, database backend, and general "feel" of Subversion is very similar to Perforce.

I think Perforce is better than Subversion if you're doing a lot of branching -- the merge point tracking that Perforce performs is really well implemented and saves you from a lot of manual tracking. Overall though if you're looking for a free alternative to Perforce I'd highly recommend Subversion
Re:yawn by LourensV · 2005-02-10 11:28 · Score: 5, Insightful

Well, assuming that you're declaring computer science superior over the pragmatic approach, I've started wondering about that recently.

Let me start off by saying that I'm firmly in the scientific camp, intending to start a PhD in CS this (northern) summer. It seems that an awful lot of popular things in IT are despised by computer scientists. Linux, as a monolithic kernel, is a famous example, as is C++, and I recently saw something about Perl being evil as well.

Now, these scientists have good reasons to call these things ugly, but people still use them. That means that either people are stupid, or the computer scientists are missing something. I think that it's mostly the latter.

In my software engineering course I was taught that the first and foremost thing you do in a project is gather requirements. It seems to me that computer scientists need to get out and ask people who work in IT what they actually expect from their kernels, languages, and development systems. Then they can try and create theories of how it all works or should work to fulfill those wishes, and use those theories to improve those real-world systems.

The alternative, sitting in your ivory tower inventing things that you think are pretty and everyone else thinks are useless, doesn't seem to be working too well.
Re:Benefits of Subversion's revisioning system? by mgm · 2005-02-10 11:33 · Score: 2, Insightful

It sounds like you're assigning meaning to the individual revision numbers on files. So you know that create_db.sql version 1.14 is special somehow and people understand these numbers mean something.

That's fine, and if it's working for you you don't need to change it, but personally I find it hard to remember that in version 1.15 I added a new table and that branched version 1.2.4.15 corresponds to the current production code.

In Subversion you'd use symbolically named tags, which are copies of directory trees, in order to remember that kind of thing. So you might have a bunch of files in your repository corresponding to your released software, in directory
/myproject/tags/release-1.0/sql/...
In your example a single revision number is useful because a change to SQL code usually involves a change to other code. For example, if you rename a column you'll need to change application code that accesses that column. If you commit all these changes in one go, logically grouping them together, it makes things a lot clearer when reviewing changes later on. Once you have changes grouped together as a unit you can move them around, apply them to other branches, or even back them out if they don't work.

Grouping changes like this together probably also implies some stuff like having database people sit with programmers and pair on their changes, but I'll avoid going off the deep-end and evangelising XP too much... ;-)
Re:No it is not by Sax+Maniac · 2005-02-10 12:28 · Score: 2, Insightful

It's been a very long time. I seem to remember writing a short script that invoked findmerge and clearmerge to do it all. Definitely not a simple as CVS.

As for the screwing up on whitespace, no idea. But CVS does sometimes, too. :-)

And no, I don't prefer clearcase... that thing was like a tank without an engine: you'd just have a bunch of people inside turning the wheels.

--
I can explanate how to administrate your network. You must configurate and segmentate it, so it can computate.
Re:Is it better than Perforce? by chiph · 2005-02-10 12:53 · Score: 2, Insightful

If you've got the money and the need for a SCM tool like Perforce, you should also be looking at Borland Star Team (comes with defect tracking utility) and Clearcase.

If costs like $1000 per seat scare you off, and/or you don't have a need for planet-wide team development, then maybe Subversion would be suitable.

Chip H.
Re:Subversion? by Anonymous Coward · 2005-02-10 13:17 · Score: 1, Insightful

Touch my code and I will beat you with silly with a rolled copy of the original spec docs."

You get spec docs!!! You're so lucky! I have to resort to threats like:
"Touch my code, and I will beat you silly with vague, constantly changing notions of what the code may or may not look like in the near to mid term future".

It doesn't work nearly as well.

--
AC
Re:yawn by thpr · 2005-02-10 13:24 · Score: 3, Insightful

Interesting, but I think that's an unfair bashing of the computer scientists. Let me state that my background is both computer architecture (the hardware side) and business.
It's interesting to note that while many things are stated by computer scientsts as evil, they are also practical. There are reasons why these languages are 'efficient' in a business sense: Linux is a widely used and widely known free *nix kernel. C++ leverages the existing C knowledge base; Perl is wonderful when you're trying to analyze ad-hoc log files. Personally, I haven't learned Python or Ruby, because the amount of time I spend hacking perl (maybe 10-15 hours a year) isn't enough to ever justify learning a programming language to replace it. [note I'm not a programmer; thus don't rely on programming knowledge to hold a job; thus my self-education time is better spent elsewhere; otherwise this might be a useful broadening of my horizons]
What you have to realize is that something won't be (rapidly) adopted unless it is significantly better than a previous product (or has a near-zero learning curve). For example, one of the members of the IEEE that sits on some of the Ethernet standards committees mentioned to me that one of the reasons Ethernet jumps by magnitudes of 10X is to ensure the next generation provides significant benefit over the previous one (of course, there are other reasons, too).
The point is, you can find a significant number of languages in use, but few that have been displaced. Those that were widely adopted (Fortran, Cobol, C, Perl, Java) all have specialties (scientific, business, fast/portable, efficient scripting, the web) that significantly differentiate them from previous languages. The dynamics of Python and Ruby is actually an interesting case study of where Perl is being very slowly knocked out of some of its domains by a similar, less "ugly" language (or such is my perception [both Perl being knocked out and the others being less 'ugly']). However, that transition is, I would argue, a bit glacial.
Also, to say that computer science (and studying algorithms) has very little effect is not all that accurate. Developments in rendering 2D and 3D systems, graph theory, and countless other areas are the result of those individuals. Even my understanding of how to organize code to ensure my OO code is a directed acyclic graph is a result of those "CS" folks and having that knowledge filter down. It doesn't necessarily affect the languages, but it changes how we use them. More recently, we have seen discussion of aspects, though (personally) I still haven't figured those out [I actually think it's because I work in problem domains where aspects are not efficient].
In addition, continually pointing out the weaknesses of the existing systems helps those that will design the next set of systems avoid the same mistakes. Even though that advocate may appear to be in an 'ivory tower' and ignoring what is going on in the 'real world', if a few architects listen and learn (and apply to the next generation of systems), then the computer scientist has served a purpose.
One other note. In respect to: "It seems to me that computer scientists need to get out and ask people who work in IT what they actually expect from their kernels, languages, and development systems.", try it sometime. Many times, people have NO IDEA what they are looking for. That's the whole basis of prototyping - to get something in front of the person and get feedback. Many other times, you will get completely contradictory answers. Detailed-oriented people will want a revision control system, code tags, and everything javadoc'ed. Others will be hacking out code that hopefully self-documents, but only has a comment to identify the license and copyright of the code. The challenge when dealing with people is that you have wildly different learning mechanisms, personality types (e.g. Myers-Briggs), brain operation (e.g. Hermann), and Communication Styles (e.g. DiSC). These differences lead to d
Re:Conflicts and Merging vs Locking by Jerf · 2005-02-10 13:26 · Score: 2, Insightful

Every time when I convert a PVCS user to Subversion they are scared because of the edit/conflict/merge idea.... I have a hard time conveying the benefit of the CVS/subversion way.

The fundamental argument is, "If it's so horrifying to be without [X], why doesn't the doom and gloom actually happen to people who try living without it?", followed by pointing out the large number of people living without it.

This same argument can be applied to a lot of dogma that we've accreted over the past few decades... static typing, the "need" for various excessively top-down methodologies, the need to compile all the way down to machine code or suffer Performance Doom. (The last one is one we are quite a ways into shaking ourselves free of; I mention it so more people can see the "other side" of a thing like this.)

Of course, this only applies to doom-and-gloom-based arguments. I bold that because there can be benefits in some situations for some of this class of issue, but the benefit won't be the avoidance of doom and gloom. You can make positive arguments about the benefits that, say, static typing might bring under some situations and this counterargument doesn't apply to that point.

The best thing about this though isn't so much using it to convince others, it's using it to convince yourself and advance your understanding. If there are a lot of people who are doing something you just know is wrong and certain to lead them to doom, it is still worth a try, especially if these people seem otherwise smart. You may still be right (PHP is still a disaster of a language, for instance), but you may prove wrong and learn something. It's happened to me several times. (Yes, the conclusion that I tried PHP is correct, and it's one of the few times I can think of that following this algorithm led me to reinforce my preconceptions on the issue, rather than discard or modify them.)

To get back to the topic at hand, abundant community experience indicates that source control needs to be easy, or nobody will bother with it. The price of rare conflict resolution (which is admittedly a cost, though there is some benefit in the explicit existance of a conflict that wouldn't necessarily be revealed in a lock-based system) is more than repaid by the way people actually use the source control system, instead of ignoring it as much as possible and sometimes actively fighting it. (Every time you fight or work around the system, you're not preventing problems, you're creating them.)
Re:Conflicts and Merging vs Locking by motomike · 2005-02-10 13:29 · Score: 2, Insightful

Here's the killer argument: file locking gives a false sense of security. Imagine: Developer A has foo.h checked out, while developer B has foo.c checked out. Developer A makes changes in foo.h that break foo.c. Notice that developers A and B both had locks on their respective files, and a conflict still arose. So locking is as bad a solution - or worse - than concurrent versioning. At least with concurrent versioning, developer A can change foo.h and foo.c at the same time, then merge his changes with Developer B's foo.c. In fact, subversion is remarkably adept at handling those merges ... but that's only relevant after you've won them over from their file locking tools.
Re:Is it better than Perforce? by yerfatma · 2005-02-10 13:30 · Score: 2, Insightful

Oh happy Christ, Borland must release two tools under the same name, because you and I can't be using the same StarTeam. All you need to know is that StarTeam assigns files to the following status codes: "Current", "Out of Date", "Not in View" and "Unknown". How the hell does it not know? We spent like $50,000 on it (by "we," I thankfully don't mean me) and that's about the only thing making it difficult to convert us over to svn. I'm hoping to use Trac as a Trojan Horse here.
Re:yawn by downbad · 2005-02-10 14:12 · Score: 3, Insightful

Most CS topics are essentially mathematics, which is as hard of a science you can get.
Other CS topics are dependent on physical devices with complex emergent behavior (computer systems and their constituent parts) and thus don't admit axiomatic proof; but are still "hard" in that repeatable controlled experiments can be conducted to generate solid empirical evidence.
However, many parts of "Computer Science" are dependent on economics and human behavior. It is at this junction that emotions are thrown into the mix and religion ensues.
"Hard" sciences do not have dependencies on soft sciences; the results of physics research do not depend on psychology or economics. Given this, I consider CS to be a soft science.
More information on side-streams... by SuperKendall · 2005-02-10 14:33 · Score: 2, Insightful

No, it can do exactly that, and very well. If it is set up correctly, basically designers can create their own side-stream, and do repeated merges into it to keep up to date.

So where can you find more information on ClearCase side streams and bringing in external changes? I have ever been frustrated by a lack of good ClearCase documentation, and whoever set up our ClearCase systems at work sure does not know about this.

However, I still find setting up a sidestream (private branch? Google had nothing on ClearCse sidestreams) a lot of bother when under CVS I can simply edit files and merge in all changes as they come. It literally is the least amount of work possible for the best workflow, and if the sidestream thing works like private branches then I fear the version tree would be a nightmare from daily merges into a private branch over time... I'll reserve judgement though until I get more details and try it out. If all you do is create a private branch once then I guess the overhead is not too bad... as long as the merging works well.

--
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Re:Right, but ... by the+eric+conspiracy · 2005-02-10 15:02 · Score: 3, Insightful

One, a vague sense of security, that if a file gets corrupted, I can at least make an attempt at manual repair.

The problem is that sense of security is very misplaced. CVS doesn't do any integrity checking. So you can easily have corruption problems and not know it until it is way too late. And if you add a binary that you haven't configured CVS for, well, he's dead, Jim. A scrambled text file isn't going to be any more recoverable than a scrambled binary.

You might not like binary formats. But I don't see how you can avoid them if you are really going to handle binary data well. Otherwise you are ducking the issue.

As far as using grep on a repository, yeah, I have done that too. It's ok if you have small projects. But for larger projects that is not a useful benefit. My current employer has an 11GB SourceUnsafe Repo that has to be a disaster in the making. And of course a $0 budget to move it to something else.

Subversion isn't the cureall either. It's got some bad design in it that has got me holding back from recommending it. What I want is stuff like keywork expansion in Unicode. Merge tracking. etc.

But at least it isn't a one man project like darcs. That would never fly for any sane corporation.