Slashdot Mirror


Subversion as Automatic Software Upgrade Service?

angel'o'sphere asks: "I'm working on a contract where the customer wants a automated, Internet-based check-for-updates, update and install system. So far we've considered a Subversion based solution. The numbers are: a typical upgrade is about 10MB in size. Usually it's about 30 to 50 new files (which have an average size of about 200kB) and 2 database files (which can be anywhere from 500MB to 2GB) that change regularly. Upgrades are released about every 3 months, and this will probably become more frequent as the system matures. The big files are the problem as we estimate about 100-300 changes in every file. The total user base is currently 2000 users, creeping up to probably 5000 over the next year, and might be finally end up at some 30,000 users. Any suggestions from the crowd about setting up a meaningful test environment? How about calculating the estimated throughput of our server farm? Does anyone know of projects that have tried something similar using an RCS or a configuration management system?" "We want to support as many concurrent users as possible (bandwith is not an issue). We use an Apache front end as a load balancer and as many Subversion servers as necessary on the backend. My largest worry, from my calculations, is disk access on the Subversion server. We could not run meaningful tests, because a typical PC kills itself if you try to run more than 4 or 5 parallel Subversion clients doing an upgrade (due to insanely high disk IO, and high seek times)."

8 of 41 comments (clear)

  1. rsync by ¡ · · Score: 4, Insightful

    Why not use rsync instead of Subversion? Subversion wasn't really designed for this, where as rsync is used for mirroring and syncing large repositories all over the place all the time.

    1. Re:rsync by commanderfoxtrot · · Score: 3, Informative

      Subversion uses binary diffs in a similar way to rsync. The original poster pointed out bandwidth was not an issue- therefore any bandwidth advantages rsync gives (and yes, there are plenty) are meaningless.

      Subversion gives excellent control (tags anyone?) of binary installations. We use it at for things way beyond the usual source code storage.

      I have also found disk IO is the main killer. I would suggest looking in to caching. The subversion client sends straightforward HTTP commands to the server. I have a custom PostgreSQL backend which does some caching- in his place, I would have a Squid set up to cache some basic data fetches- obviously, you need to be careful to not cache old data but that's not hard.

      So yes, Subversion is excellent for this, and with a little thought, the heavy disk IO can be reduced. Cache, cache, cache.

      --
      http://blog.grcm.net/
  2. Rsync? by Karora · · Score: 3, Informative

    Wouldn't Rsync be better for what you want? Why do you need to be able to choose different versions to fetch?

    If the files contains parts that are constant along with parts that vary then rsync will in many cases only transfer the partial file. With Subversion that won't apply for binary files, but rsync will still recognise partial matches even on those.

    --

    ...heellpppp! I've been captured by little green penguins!
  3. times two by Lord+Bitman · · Score: 3, Informative

    remember that svn always uses more than double the actual space required to hold the files for a "working copy". For "one-way" updates, svn is _NOT_ the answer.

    --
    -- 'The' Lord and Master Bitman On High, Master Of All
  4. If this was in java... by hexghost · · Score: 3, Insightful

    You would use java web start. Maybe you should consider writing something like it for this project?

  5. How about bsdiff/patch and some scripts? by Fweeky · · Score: 5, Interesting

    This is the technique used by portsnap; basically you generate binary diffs from a known starting point, and the client keeps track of what new patches it needs to keep in sync. Since you're just serving static files, scaling it should be as easy and cheap as it gets.

    rsync is highly general purpose; your servers will end up generating hashes for every n-bytes of every file for every client, which is a lot more heavyweight than just serving patches you generate once. SubVersion may be more effecient since it should know something about the files it's checked out previously, but it's still going to end up dynamically generating diffs between whatever versions each client has and the latest; this likely gets worse if your clients aren't tracking HEAD.

    Also note that a custom solution can likely get away with a single tag file detailing the latest patches; rsync and svn are going to be scanning their directory trees religiously. Both you and your users will probably appreciate a single GET to a small file on a webserver than a load of CPU use and disk thrashing.

  6. Re:Some clarifications, especially about rsync by jrockway · · Score: 3, Insightful

    > Regarding performance of SVN, yes we are clear we need to put a lot of RAM into the servers. But we cant get rid of the disk IO it seems as SVN does not cash requests (in this case all clients allways want the same release to upgrade to, and most of the time they either have the previous or the second oldest release installed)

    Subversion doesn't need to cache requests -- the OS* does this itself. With plenty of RAM, whatever isn't being used by processes is used for cache. If you don't trust the disk caching algorithm, just make a 2.5G ramdisk and copy your files over to that when you want to release them. Then the disk won't be a problem.

    * Assuming you're using a Real OS, and not Windows. Don't use Windows for anything that requires speed or reliability.

    --
    My other car is first.
  7. Re:Some clarifications, especially about rsync by eklitzke · · Score: 3, Informative

    You may be interested in the Unison project. More info can be found here: http://www.cis.upenn.edu/~bcpierce/unison/

    --
    #include ".signature"