Slashdot Mirror


Delta Compression for Linux Security Patches?

cperciva asks: "For people without fast internet connections, it is often impractical to download large security patches. In order to avoid to reduce patch sizes, some operating systems -- starting with FreeBSD over a year ago, and recently followed by Mac OS X and Windows XP SP2 -- have started to use delta compression (also known as binary diffs, which constitutes a portion of my doctoral thesis), and can often reduce patch sizes by over a factor of 50. In light of the obvious benefits, I have to ask: When will Linux vendors follow suit?"

18 of 289 comments (clear)

  1. How about this... by ufoman · · Score: 1, Interesting

    You go over to a friends house that has broadband and a CD-Writer (both are very popular these days) and download the patches onto a CD-R and take it home?

    --
    The following statement is false.
    The previous statement is true.
    Welcome to my world.
  2. Here the problem: by Sonic+McTails · · Score: 2, Interesting

    Linux makes it very easy to install new packages and upgrade packages from sources father away from the vendor. If a vendor tried to release a patch using delta versioning, it could totally wreck a system. Since neither RPM nor DPKG are designed to handle checking md5sum hashs against each file, and making sure the patch can be installed safely, it will have to wait until this feature is incomporaited into either system.

    --
    This signature was left intentionally blank.
  3. Re:Doesn't make as much sense to use for Linux by bluesguy_1 · · Score: 3, Interesting

    I disagree. I've used smartversion on Windows for a couple years now for making versioned archives of important files, and I wish Linux had something comparable. It's liked having a portable single tar.gz of an entire cvs repository without all the headaches...

  4. SUSE by DreadSpoon · · Score: 3, Interesting

    SUSE already does this.

    RPM in general, however, doesn't nicely support this feature. Either RPM needs to be extended/modified, or a new format needs to be made. While I favor a new format for many reasons other than this, modifying RPM is probably the best solution in order to provide backwards compatibility.

  5. Well... by iamdrscience · · Score: 2, Interesting

    On that topic, why does almost everybody distribute source code as gzipped tars instead of bzip2'ed tars (just about everybody that does use bzip2 also distributed gzips)? Sure, in the beginning gzip made more sense for people on slow machines, but nowadays the difference in the time it takes to decompress is trivial, whereas the compression benefits of bzip2 on text are phenomenal in my experience.

  6. use rsync by stonebeat.org · · Score: 2, Interesting

    delta based patch distribution on linux platform is quite easy. Just use RSYNC to sync application file to the source. I have used this technique of patching (i.e. RSYNC), to provide updates/patches to a in-house built application. Work very nicely.

  7. Gentoo by SuperBanana · · Score: 2, Interesting
    Jokes about gentoo aside, the source tarballs are cached in /var, and only removed when they exceed configured limits for max disk space. Patches are contained in the portage tree, along with the "ebuild" files which are the build instruction files.

    If the update is just a patch to the source, there's sometimes a minor revision made and an updated gentoo ebuild file and source code patch added to the portage tree, which is of course done via rsync. All in all, it's decently efficient. This mostly(I think) happens with unstable package versions, where a security update may make it into portage before the official project bumps their release, but that's not the case with stable stuff.

    I think for basic systems, compile time complaints are slightly exaggerated. My -original- celeron 450 isn't shabby at all at compiling most of the more basic system packages and server apps. Even glibc and gcc build with relative ease, and when I set up distcc amongst my three systems, it became even less of a hassle. Even without distcc, the time to clear out 50 packages of updates on a mail server is surprisingly low on a low-powered system.

  8. Gentoo Portage by WamBamBoozle · · Score: 4, Interesting
    I wonder why Gentoo doesn't do this. Gentoo, as far as I can tell, always distributes a bzip2'ed tar of any particular distribution.

    It works beautifully but I can't help but think it is a waste of bandwidth.

  9. It's already doing it. by Mongo222 · · Score: 3, Interesting



    http://www.daemonology.net/bsdiff/

    bsdiff and bspatch are tools for building and applying patches to binary files. By using suffix sorting (specifically, Larsson and Sadakane's qsufsort) and taking advantage of how executable files change, bsdiff routinely produces binary patches 50-80% smaller than those produced by Xdelta, and 15% smaller than those produced by .RTPatch (a $2750/seat commercial patch tool).

    http://sourceforge.net/projects/diffball

    A general delta compression/differencing suite for any platform that supports autoconf/automake, written in c, w/ builtin support for reading,writing, converting between multiple file formats, and an easy framework to drop in new algorithms.

  10. Re:Doesn't make as much sense to use for Linux by timeOday · · Score: 2, Interesting
    Binary updates are not a good fit for Gentoo! Not only because most people don't use the binary packages, but because in order to generate the diff, the server must know the exact contents of the file on your system, as well as the exact contents of the updated file. The number of different binary patches would be exponential in the number of compile switches, compiler versions, USE flags, and so on - for both "old" and "new" file versions, so square it again!

    I guess if you reall wanted to be clever, you could send the server enough information (your existing package versions, make and USE flags etc), plus the desired flags for your new file. With this the server could compile a binary matching yours, then compile the new binary for you, then make a binary diff. But that, as Kramer would say, is "kooky talk."

  11. License of BSDiff by gnuman99 · · Score: 3, Interesting
    Certainly for your primary commercial auto-updated Linux distributions it does, but for anything else it usually doesn't.

    Especially since the license of bsdiff is not even close to a BSD license (don't let the name of BSD Protection License fool you). Unless the license is changed to something like BSD, BSDiff is not going to be implemented anywhere except in closed source software. Debian cannot even package this software becauses it is non-free.

    I guess the bottom line is if you want to have something accepted in open source *and* in propriatary software, you want to license under BSD. You want to cater to one group (closed source in this case), you will lose the other.

  12. Re:Several reasons, but not all technical by tyrione · · Score: 3, Interesting
    Well congratulations.

    You point out TeTeX at 14+MB which is as bare as it gets for TeTeX, then comes the TeTeX-Doc and the TeTeX-Extra which by now we're up to over 50MBs.

    Oh and here is the real kicker. Debian has updated 2.02 3 if not 4 times this month. Now 150MB+ to over 200MBs of fixes? Nope. SP2 looks a bit smaller now don't it?

    And that doesn't even touch the -1,-2,..-20 Debian patches they keep spewing out for project after project.

    The only plus for a 56k access is they don't cap youru downloads on a monthly basis. The badside obviously is bandwidth, but for me its time down waiting for important packages like TeTeX to update.

    Having a SVN approach to patching systems makes sense. Or CVS if you prefer a different versioning system approach.

    It's already been said but it is worth repeating, especially when one runs KDE or GNOME. Just Build a freakin' base package and update us with Binary Images that are new or replaced, documentation that is new or revision updates and binaries to the executables, libraries, so on that change and not the mountains of innert parts that don't change.

    You can't tell me KDELIBS , KDEBASE needs to be completely rebuilt each .x revions or -x revision by Debian and by completely rebuilt I mean all the inert files that don't actually get touched during the build process other than to make sure some wallpaper image still exists. Hell the Wallpaper backdrops, etc should be add-ons, not part of the distributions. But then again I suppose everyone thinks we all have T1 access.

    K.I.S.S.

  13. Re:Doesn't make as much sense to use for Linux by Anonymous Coward · · Score: 1, Interesting

    Especially with suse it is baaaad!

    Over the last weeks I got a couple kernel patches (each 50 Megs download! They always download the full packet! Dialup users will never be able to do this!) as well as a base kdepackage fix - which also resulted in about 20 megs download. I thin bdiff would _really_ benefit suse.

  14. Gentoo now has "source delta's" reducing traffic by rigolo · · Score: 3, Interesting
    Well, gentoo is known for the fact that you download the source of every program and than start compiling. These sources are distributed in .tar.gz or .tar.bz form and can be very large. A version change (even a change from .0.0.1 to .0.0.2) has it's own tarball and therefor is downloaded again completly. But, the real changes between these 2 can be small.

    Enter "deltup" a tool that looks at to tarrballs and gives you a diff between the 2 that you can use to "transform the old tarball to a exact copy of the new tarball", it even preserves MD5 checksums compatibility. Now some enterprising gentoo user create a "dynamic deltup server" that automates the creation of these delta files, and people can reuse the delta files that other people used.


    Using this technique in combination with gentoo portage people can reduce there traffic with on average 75%.


    Have a look at the following URL's for more information:

    http://forums.gentoo.org/viewtopic.php?t=215262

    http://linux01.gwdg.de/~nlissne/deltup-status.atim e.html


    Rigolo

  15. Re:XDelta3 by Spy+Hunter · · Score: 2, Interesting

    Oh man, I just had a great idea. What if you incorporated XDelta3 into a Reiser4 filesystem plugin? Versioning built into the filesystem would be an *awesome* feature. I'm sure it's been done before on some other OS, but it could really go mainstream on Linux with Reiser4.

    --
    main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
  16. You shouldn't have to get entire packages by Anomalous+Cowturd · · Score: 2, Interesting

    Apparently, most distributions have you download the entire package for each update, although there are efforts underway to break up sections a bit more (if I'm wrong, I apologize - I use BSD).

    It sounds like what's really needed is to build packages of just the updated files. The install manifest would just specify the files in the archive, so there shouldn't be any complaints about missing files. Or does that show my ignorance?

    Actually, if you wanted a more general scheme, the update server would build packages on the fly. The updater would send a list of files in a package to the server, which would return a set of files that need updating. You could use this to upgrade any system, regardless of distribution. You would just have to update the database for whatever to show that it was at the latest version.

    Yes, this would take up a lot more CPU time, and be pretty slow on response time, but the savings in bandwidth should be worth it. All the time the servers were waiting for the network card could be used compressing files.

    --

    Java: the bastard demon spawn of C++ and Ada

  17. is there any real benefit? by borud · · Score: 2, Interesting
    Why does Linux need this? How many people have a connection which is so bad they really benefit from this?

    Sure it is always nice to have faster downloads. But is it worth the extra work involved in setting this up both at the distribution point and on the client side?

    I am not being rethorical. I am just wondering.

  18. Something to try by slashdotjunker · · Score: 2, Interesting
    Something that might be useful to try out is a patch method I developed for a MMORPG. The problem is a challenge since MMOs are constantly updating game files. Also, many of the files are very large art resources which might only change a small percentage from patch to patch. On top of this, players are always mesing with their files so you can't assume that any file is in any known state on the basis of some "patch level".

    The method I came up with was to use essentially the rsync algorithm, but I reversed it so that the computationally expensive parts were performed on the client side. The results of each computation were stored on the server side as a "patch" so that the computation was performed only once. This provided a patch system that was dynamic but without generating large server load.

    The advantages are:
    1. Patches are generated dynamically so the files can be in any state (truncated, too big, filled with garbage, missing enitirely, etc).
    2. The heavy computation is performed on the client side so that patch generation does not drive up server load.
    3. Computed patches are stored and reused, so a database of patches it quickly built up.
    4. Patches are efficient (based on binary diff).

    A detailed example follows; knowledge of rsync is required for understanding.

    For example, let's suppose you are releasing a new content upgrade. A particular file's signature has changed from F4A3 to 26B1. (For brevity I am using a 16-bit signature, in practice it is much much larger.) When the first client connects to the patch server it receives the updated list of file signatures. The client notices that the file is now old so it requests a patch for F4A3->26B1. The patch does not exist yet, so the reverse-rsync algorithm activates and the client calculates the F4A3->26B1 patch. When that patch has been generated it is returned to the patch server and all future clients can just download the patch and skip the reverse-rsync. After applying a patch, the result is checked to make sure that you actually ended up with 26B1. If you did not, extra rounds of patching are performed.

    Some Notes: These extra rounds are consequences of rsync and a security check as well to prevent bad patches from being uploaded to the server. And, normally the release maintainer would "pre-seed" the patch server by patching a clean current version to the new version just before release.