Slashdot Mirror


Subversion as Automatic Software Upgrade Service?

angel'o'sphere asks: "I'm working on a contract where the customer wants a automated, Internet-based check-for-updates, update and install system. So far we've considered a Subversion based solution. The numbers are: a typical upgrade is about 10MB in size. Usually it's about 30 to 50 new files (which have an average size of about 200kB) and 2 database files (which can be anywhere from 500MB to 2GB) that change regularly. Upgrades are released about every 3 months, and this will probably become more frequent as the system matures. The big files are the problem as we estimate about 100-300 changes in every file. The total user base is currently 2000 users, creeping up to probably 5000 over the next year, and might be finally end up at some 30,000 users. Any suggestions from the crowd about setting up a meaningful test environment? How about calculating the estimated throughput of our server farm? Does anyone know of projects that have tried something similar using an RCS or a configuration management system?" "We want to support as many concurrent users as possible (bandwith is not an issue). We use an Apache front end as a load balancer and as many Subversion servers as necessary on the backend. My largest worry, from my calculations, is disk access on the Subversion server. We could not run meaningful tests, because a typical PC kills itself if you try to run more than 4 or 5 parallel Subversion clients doing an upgrade (due to insanely high disk IO, and high seek times)."

19 of 41 comments (clear)

  1. rsync by ¡ · · Score: 4, Insightful

    Why not use rsync instead of Subversion? Subversion wasn't really designed for this, where as rsync is used for mirroring and syncing large repositories all over the place all the time.

    1. Re:rsync by commanderfoxtrot · · Score: 3, Informative

      Subversion uses binary diffs in a similar way to rsync. The original poster pointed out bandwidth was not an issue- therefore any bandwidth advantages rsync gives (and yes, there are plenty) are meaningless.

      Subversion gives excellent control (tags anyone?) of binary installations. We use it at for things way beyond the usual source code storage.

      I have also found disk IO is the main killer. I would suggest looking in to caching. The subversion client sends straightforward HTTP commands to the server. I have a custom PostgreSQL backend which does some caching- in his place, I would have a Squid set up to cache some basic data fetches- obviously, you need to be careful to not cache old data but that's not hard.

      So yes, Subversion is excellent for this, and with a little thought, the heavy disk IO can be reduced. Cache, cache, cache.

      --
      http://blog.grcm.net/
  2. Transfer file to compare, then change file by Hey,+Retard... · · Score: 2

    Sounds like twice the work for thrice the price.

  3. Rsync? by Karora · · Score: 3, Informative

    Wouldn't Rsync be better for what you want? Why do you need to be able to choose different versions to fetch?

    If the files contains parts that are constant along with parts that vary then rsync will in many cases only transfer the partial file. With Subversion that won't apply for binary files, but rsync will still recognise partial matches even on those.

    --

    ...heellpppp! I've been captured by little green penguins!
  4. times two by Lord+Bitman · · Score: 3, Informative

    remember that svn always uses more than double the actual space required to hold the files for a "working copy". For "one-way" updates, svn is _NOT_ the answer.

    --
    -- 'The' Lord and Master Bitman On High, Master Of All
    1. Re:times two by saurik · · Score: 2, Insightful

      By non-working it should be noted that you also mean non-upgradable. Once you do an export, you dan't do an update, which makes that feature useless for this purpose.

  5. If this was in java... by hexghost · · Score: 3, Insightful

    You would use java web start. Maybe you should consider writing something like it for this project?

  6. Apt? by cortana · · Score: 2, Funny

    Can the clients run dpkg and apt? A daily apt-get update && apt-get upgrade is very convenient. Server-side, you don't need anything more complicated than a web server.

  7. Not Subversion by the+eric+conspiracy · · Score: 2, Insightful

    rsync is excellent at this, and rdist can have benefits too if you are updating a bunch of servers at once.

  8. How about bsdiff/patch and some scripts? by Fweeky · · Score: 5, Interesting

    This is the technique used by portsnap; basically you generate binary diffs from a known starting point, and the client keeps track of what new patches it needs to keep in sync. Since you're just serving static files, scaling it should be as easy and cheap as it gets.

    rsync is highly general purpose; your servers will end up generating hashes for every n-bytes of every file for every client, which is a lot more heavyweight than just serving patches you generate once. SubVersion may be more effecient since it should know something about the files it's checked out previously, but it's still going to end up dynamically generating diffs between whatever versions each client has and the latest; this likely gets worse if your clients aren't tracking HEAD.

    Also note that a custom solution can likely get away with a single tag file detailing the latest patches; rsync and svn are going to be scanning their directory trees religiously. Both you and your users will probably appreciate a single GET to a small file on a webserver than a load of CPU use and disk thrashing.

  9. Agreed, rsync rocks by Anonymous Coward · · Score: 2, Informative

    I have several apps like this. One is deployed to more than a dozen locations around the country, each having roughly 5000 users. It's a mod_perl app on BSD.

    My general routine: I have a "development server", and a staging farm (set up exactly like one of the customer's locations, right down to the network hardware). After changes are made and unit-tested, the changes are pushed to the staging servers using rsync. When all the various remaining tests pass, the software is pushed out to a customer's location (if they need to review the changes), or out to all locations.

    Note that I use rsync to PUSH changes on a regular schedule. The apps do not ever "phone home".

    My rsync script basically copies all the files except for unit tests, photoshop files, data, all that stuff, just the stuff it needs for run-time. It depends on an SSH key (which exists only on two machines and has a passphrase, so a key agent is required). It has a "fan-out" setting which allows up to N machines to be done in parallel.

    Also, my app is completely relocatable and cross-platform. I can check it out in any directory on any Mac, BSD, or Linux box and get to work. I can then push my changes directly from that development area to the staging server if needed. I use CVS and Darcs but that's not important, except to note that the rsync script needs to skip those "CVS" or "_darcs" files.

    Works great, very powerful. Of course I am leaving out details like choosing CVS tags, database schema migration, restarting/upgrading/installing daemons (hint, if you don't use daemontools, your apps will never be reliable), handling 3rd-party open source packages, pulling in changes that were made on the customer's machine (in an emergency for instance) etc., etc. But rsync is the core of it.

  10. CVS by alexpach · · Score: 2, Insightful

    I have been using CVS to manage many different websites and/or projects on various servers. It doesn't store more then it needs (just the CVS folders) and it add, updates, patches and removes the files according to your repository.

    Additionally you can use branches and sticky tags to keep track of files that don't need to be updated, or files that vary from client to client.

    It is also easy to trigger and update over ssh or cron.

    One downside compared to SVN is the lack of a binary diff mechanism, but I have been able to get by fine without it managing projects up to a GB in size.

    Alex

  11. Disk Accesses by Anonymous Coward · · Score: 2, Informative

    My largest worry, from my calculations, is disk access on the Subversion server.

    Put enough ram in your server, and the changed portion will likely fit in cache. If that's not an option, use RAID to speed up disk accesses.

    Others have mentioned rsync. You might also consider xdelta.

  12. Disk I/O by pete-classic · · Score: 2, Insightful

    Let's see. You have a ceiling of 2.01GB worth of updates. You have disk I/O problems.

    Your problem is either that you don't have enough RAM in the system, or you have an OS that doesn't do a rational job of caching disk.

    Or both.

    -Peter

  13. perhaps by /dev/trash · · Score: 2, Informative

    rdiff-backup

  14. Some clarifications, especially about rsync by angel'o'sphere · · Score: 2, Informative

    First of all, thanks for so many replies!

    First I like to clarify a bit, probably my original question was not clear enough!

    The clients of the system are customers. They have Windows PCs as the software runs on windows. On the server side we need to be able to authenticate every client as there are several region and user level restrictions about who may access which file.

    You can assume there are simply 5 to 10 user levels, where a user on level 10 may access everything and a user on level 5 only a subset.

    So far SVN looks good:

    * authentication via the Apache front end, probably via a LDAP server

    * structuring the "download area" into directories with user level appropriated content

    Regarding, rsync:

    * first off all, I did not know about it :D

    * my first investigation indicates several draw backs

    It seems not to run on Windows (without Cygwin), users need to be unix/linux users on the server, building a distribution seems "more complicated" than making a tag/version with SVN.

    Please consider: from the point of view of the service provider the system is just the same like hosting a hugh pile of sourcecode. The starting distribution probably has 3000 files and is about 2.5 GB big.

    The users need to have the ability to fall back on a later revision in case of errors during distribution.

    Users need to be able to upgrade to the latest HEAD (there is only one main thrunk anyway).

    Regarding performance of SVN, yes we are clear we need to put a lot of RAM into the servers. But we cant get rid of the disk IO it seems as SVN does not cash requests (in this case all clients allways want the same release to upgrade to, and most of the time they either have the previous or the second oldest release installed)

    However: alternatives to SVN are very welcome! I only wanted to make clear why we considered DVN in the first place.

    angel'o'sphere

    --
    Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    1. Re:Some clarifications, especially about rsync by NuShrike · · Score: 2, Interesting

      Here's a combination of available strategies:

      o DON'T use SVN (imo)
      o check out your latest rev to a staging 'folder'
      o rename your previous release 'folder' to backup name
      o rsync the data from your staging 'folder' to all your clients one by one.

      If you have issues with the release, just roll back to the previous release 'folder'.

      There other thought is to use rsync a .torrent file and use something like bittornado to distribute from your 'staging' folder.

      All this should let you get by with a 1GB or less ram master file server, and crappy i/o too.

      You figure out a security-scheme to wrap around this.

    2. Re:Some clarifications, especially about rsync by jrockway · · Score: 3, Insightful

      > Regarding performance of SVN, yes we are clear we need to put a lot of RAM into the servers. But we cant get rid of the disk IO it seems as SVN does not cash requests (in this case all clients allways want the same release to upgrade to, and most of the time they either have the previous or the second oldest release installed)

      Subversion doesn't need to cache requests -- the OS* does this itself. With plenty of RAM, whatever isn't being used by processes is used for cache. If you don't trust the disk caching algorithm, just make a 2.5G ramdisk and copy your files over to that when you want to release them. Then the disk won't be a problem.

      * Assuming you're using a Real OS, and not Windows. Don't use Windows for anything that requires speed or reliability.

      --
      My other car is first.
    3. Re:Some clarifications, especially about rsync by eklitzke · · Score: 3, Informative

      You may be interested in the Unison project. More info can be found here: http://www.cis.upenn.edu/~bcpierce/unison/

      --
      #include ".signature"