Performance Tuning Subversion
BlueVoodoo writes "Subversion is one of the few version control systems that can store binary files using a delta algorithm. In this article, senior developer David Bell explains why Subversion's performance suffers when handling binaries and suggests several ways to work around the problem."
I know it can handle binaries, but I cannot think why I would want to. Can anyone help?
Have a look at soylentnews.org for a different view
Subversion fails to follow symbolic links that point to code that other projects share for the sake of a minority that still develops using Windows (which doesn't have real symbolic links).
S ystem has prooven itself to be superior and far more intuitive.
CVS http://en.wikipedia.org/wiki/Concurrent_Versions_
You have code that many projects share, like multi-platform-compatibility-layers? Just use symbolic links and CVS will follow them.
In SVN you have to create a repository for these shared source files and write config files by hand to make it include these files your repository.
I hardly see SVN reach the point of flexibility CVS has. They support Windows (which doesn't have symbolic links) and give up usability.
Except this difference SVN and CVS are the same. There are marginal differencies in features but these affect no real world use. So if you want a version control system where you don't need to write config files by hand you choose CVS. If you want the latest hype you choose SVN.
There wasn't really a need for SVN.
for me performance is (currently) the least of my problems with subversion.. .. http://subversion.tigris.org/servlets/ReadMsg?list Name=users&msgNo=65992 .. and noone seems to be too bothered..
.. and i use it for my open source projects.. but currently CVS is way better.. just because of the tools and a few unnecessary annoyances less)
more that you lose changes without any warning or whatsoever during merging
(don't get me wrong, i love subversion
Find me at http://herbert.poul.at
In short: Use git-svn
Long version: The fraction of a few speedup described in the article is blown away by the several orders of magnitude you get by using git. Then there are all the other goodies, like real branches and merges, git-bisect, and visualization with gitk. Subversion is just for people who are forced to use it, or those not exploring all their options these days.
why use subversion only as import/export? That's the complaint here right? (the slow import/export speeds?) I thought the point in using revision control is to checkout then do commit/update commands???
It is still the wave of the future. I've worked in it extensively, and it is still the best version control system I've ever used. Because of its other strengths, it is continuing to expand its user base and gain popularity. You can tell this because Microsoft is now actively attempting to copy Subversion's concepts and ways of doing things. Ever used Team Foundation Server? It is just like Subversion, only buggier (and without a good way to roll back a changeset... you have to download and install Team Foundation Power Tools to do it). I'm a new employee at my company (which uses Microsoft technology), and yet I've been explaining how the TFS system works to seasoned .Net architecture veterans. The reason I can do this? I worked extensively with Subversion, read the Subversion book a few times (the O'Reilly book maintained by the Subversion team), and worked on a project for my previous company that basically had the goal of making versions of the TFS wizards for Subversion on the Eclipse platform. It only took me about one day of using TFS to be able to predict how it would respond, what its quirks would be, etc, because it's technical underpinnings are just like Subversion. So even with performance issues, if even Microsoft is abandoning its years of efforts on Source Safe and jumping all over this, you can know that its strengths still make it worth adopting over the other alternatives. After all, if Microsoft was going to dump source safe, it had its pick of other systems to copy, as well as the option of trying to make something new. What did it pick? Subversion.
Beware of bugs in the above code; I have only proved it correct, not tried it.
I've been using Subverison for 2 years on game related projects. Most of our assets are binary (photoshop files, images, 3D models, etc), plus all the text based code. I love subversion. Best thing out there that doesn't cost $800/seat.
What I don't like about this article is that it implies I should have to restructure my development environment to deal with a flaw in my version control. The binary issue is huge with subverison, but most of the people working on subversion don't use binary storage as much as game projects. Subversion should have an option to store the head as a full file, not a delta, and this problem would be solved. True, it would slowdown the commit time, but commits happen a lot less than updates (at least for us). Also the re-delta-ing of the head-1 revision could happen on the server in the background, keeping commits fast.
Okay, I know this is completely off-topic but I'd really like to get some responses or some discussion going on what makes version control suck.
I mean, is it just me or is revision control software incredibly difficult to use? To put this into context, I've developed software that builds websites with integrated shopping cart, dozens of business features, email integration, domain name, integration, over 100,000 sites built with it, (blah blah blah) but I find revision control HARD.
It feels to me like there is a fundamentally easier way to do revision control. But, I haven't found it yet or know if it exists.
I guess for people coming from CVS, Subversion is easier. But with subversion, I just found it disgusting (and hard to manage) how it left all these invisible files all over my system and if I copied a directory, for example, there would be two copies linked to the same place in the repository. Also, some actions that I do directly to the files are very difficult to reconcile with the repository.
Since then, I've switched our development team to Perforce (which I like much better), but we still spend too much time on version control issues. With the number, speed of rollouts and need for easy accessibility to certain types of rollbacks (but not others), we are unusual. In fact, we ended up using a layout that hasn't been documented before but works well for us. That said, I still find version control hard.
Am I alone? Are there better solutions (open source or paid?) that you've found? I'd like to hear.
Sunny
Be my Friend
Comment removed based on user account deletion
If you actually care about your code and making proper releases, use Vesta. Transparent version control that even tracks changes between proper check-ins (real "sub" versions). Built-in build system that beats the pants off of Make. It even has dependency tracking to the point that you not only keep your code under version control, but the entire build system. That's right. You can actually go back and build release 21 with the tools used to build release 21. It's sort of like ClearCase but without all the headache. Did I mention it's open source?
The first time I used Vesta, it was a life-changing experience. It's nice to see something that isn't a rehash of the 1960s
.. that the article is glaringly absent *actual check-in times.* Or, where *actual check-in times* are available, the details of whether it's the same file as in previous tests is glaringly absent. This leaves open the question as to whether the data set they were working on was identical or whether it was different between the various tests.
.ODF typically stored in compressed form? If not, then small changes wouldn't necessarily affect the entirety of the file (as it would in a gzip file if the change were at the beginning) and SVN might be able to store the data very efficiently. Uncompressed PDF would certainly benefit.
Questions that remain:
1. Does the algorithm simply "plainly store" previously-compressed files, and is this the reason why that is the most time-efficient?
2. What exactly was the data for the *actual check-in* times? (What took 28m? What took 13m?)
3. Given that speedier/efficient check-in requires a large tarball format, how are artists supposed to incorporate this into their standard workflow? (Sure, there's a script for check-in, but the article is absent any details about actually using or checking-out the files thus stored except to say it's an unresolved problem regarding browsing files so stored.)
The amount of CPU required for binary diff calculation is pretty significant. For an artistic team that generates large volumes of binary data (much of it in the form of mpeg streams, large lossy-compressed jpeg files, and so forth) it would be interesting to find out what kind of gains a binary diff would provide, if any.
Document storage would also be an interesting and fairer test. Isn't