Slashdot Mirror


Delta Compression for Linux Security Patches?

cperciva asks: "For people without fast internet connections, it is often impractical to download large security patches. In order to avoid to reduce patch sizes, some operating systems -- starting with FreeBSD over a year ago, and recently followed by Mac OS X and Windows XP SP2 -- have started to use delta compression (also known as binary diffs, which constitutes a portion of my doctoral thesis), and can often reduce patch sizes by over a factor of 50. In light of the obvious benefits, I have to ask: When will Linux vendors follow suit?"

23 of 289 comments (clear)

  1. Doesn't make as much sense to use for Linux by drinkypoo · · Score: 5, Informative

    Certainly for your primary commercial auto-updated Linux distributions it does, but for anything else it usually doesn't. What makes more sense (because it's easier) is breaking up media and programs, and distributing them separately so you don't have to update one when you update the other. Some projects do this already, and even package their sources this way.

    Personally I'd prefer to see binary distributions move to a model of using something like cvs, so you can just do a cvs up (or equivalent) and update everything. Some files would have to be marked to always be overwritten, while config files would be merged. This solves both your differential update problem (if the right system is used - I'm thinking that's pretty much not CVS but I don't know if there's a way to make it do all of that - CVS doesn't handle binaries amazingly intelligently from what I understand) and your updates in general. Plus, you can use it both for source and binary updates.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    1. Re:Doesn't make as much sense to use for Linux by GweeDo · · Score: 3, Informative

      What you are requesting can already be done basically in Gentoo (emerge -Uupv world), Debian (apt-get something or other) and Redhat/Fedora (up2date something or other). So why do we need something else again :) Oh, with Gentoo...add ccache to that for faster compiles too

    2. Re:Doesn't make as much sense to use for Linux by drinkypoo · · Score: 3, Informative

      I use gentoo. I never have found any ccache settings that make much of a difference. None of these systems do a binary differential update, they download a whole package and install it, or in the case of gentoo, download a source package and compile it. Neither of these approaches are what is being called for in this article, nor what I suggest above.

      Mind you, I'm fine with the way gentoo does things, but I have a fairly powerful system - not incredibly fast by modern standards but faster than anything I've run linux on before, or probably any Unix at all for that matter. For a dialup user on an older computer, atomically differential updates would make a really big difference.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    3. Re:Doesn't make as much sense to use for Linux by morcego · · Score: 5, Informative

      I'm not sure about Gentoo, but I'm positive that is not what happens for Debian, RedHat, Fedora, Mandrake, SuSe, Conectiva etc.

      On those systems, when you do an upgrade (apt-get update), you will get a fresh package, including not only the files that changes, but all the files for that package. And if we have a package with 1 binary and 50 images, and only the binary changed, we get to download all the images again.

      Some distributions have been implementing package fragmentantion for this (package-core and package-images for this example), and that is a good thing for these cases, although it is a nightmare to manage. Not as fine grained as proposed by the grandparent post, but good enough for most cases.

      --
      morcego
    4. Re:Doesn't make as much sense to use for Linux by advocate_one · · Score: 2, Informative
      Binary updates are not a good fit for Gentoo! Not only because most people don't use the binary packages, but because in order to generate the diff, the server must know the exact contents of the file on your system, as well as the exact contents of the updated file.""

      No, those "binary" diffs for Gentoo would be done against the sources used for the previous version of the gentoo "package", which would then be used to download the diff so that the gentoo computer could then construct new sources to build against. It would require gentoo computers to keep the sources rather than discard them to save space.

      --
      Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
  2. Mindvision by shawnce · · Score: 4, Informative

    The folks at mindvision made an installer/installer creation tool that allowed one to scan two different sets of files and directories to find differences between them (binary differences) and it would just package up those differences in the installer archive. In fact you could use it to diff and package delta between several versions at once. When the user ran the installer (really and updater) it would apply the binary patch to the file set as needed.

    I was using this tool over 7 years ago now on Mac OS so I don't see what is so new about this concept... but I am glad is looks like it starting to be used more.

  3. Re:SP2? by cperciva · · Score: 4, Informative

    Sorry, the writeup was a bit unclear. Windows XP SP2 contains a new version of Windows Installer (or whatever they're calling it today). This new version includes support for downloading updates via binary diffs, and most updates to XP after this point should be done that way.

  4. Too complicated and confusing by avida · · Score: 4, Informative

    Delta compression requires the vendor to create a delta for each older version that you can upgrade from. So if a package has had ten updates, the next yupdate will need to have eleven deltas. I don't think so. Unless you want to do something like Windows Update where an agent scans your binaries and compares the difference with the update and then downloads individual files ... but that's a lot more complicated and isn't justified by the bandwidth savings.

  5. Re:SUSE by cperciva · · Score: 3, Informative

    SUSE already does this.

    Nope. SuSE's "patches" are created by packaging all the files which are affected by a security fix; those files are packaged intact, without any delta compression.

    Now, this is certainly a step forward from the common (eg, Debian, RedHat) approach of having people download a complete new package, including copies of files which haven't changed at all, but SuSE's approach is still suboptimal by more than an order of magnitude.

  6. Re:Gentoo by bzBetty · · Score: 2, Informative

    Gentoo users should make sure they know about this its called deltup, basically a script for portage that grabs xdelta patches instead of downloading the entire file again. It seems to save me alot of bw anyway.

  7. What exactly is binary diff/delta compression ? by Anonymous Coward · · Score: 1, Informative


    can someone explain for those people who have no idea what delta compression is and how it differs over something like zip/rar/gz/7z etc

    --ajs

  8. XDelta3 by TheBashar · · Score: 4, Informative

    XDelta3 recently reached its first public release.

    http://xdelta.org/xdelta3.html

    XDelta3 is a library which is designed to foster exactly this kind of functionality. If distrobutions integrate the xdelta functionality into their package management framework we would be well on our way to what the poster is looking for.

  9. Re:Warez by Anonymous Coward · · Score: 1, Informative

    Well, that's a particular kind of binary diff. It gets harder if you want the process automated, you know, like it has to be if you want to do anything productive with it (as opposed to blotting out a contiguous chunk of someone else's code).

  10. Re:Its well known by Mongo222 · · Score: 2, Informative

    What makes you think you need to take a server out of production while code compiles on it? I never have.

  11. Re:as soon as it gets hacked in to RPM by Waffle+Iron · · Score: 2, Informative
    The real issue is that linux doesnt really need binary diffs. Linux distros already have fine grain packages ( lots of little packegs not a few bigs ones). Security updates usally just require a one or very few packegs to be updated.

    I beg to differ. SuSE 9.1 came out only 5 months ago:

    $du -h /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/

    417M /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/i586
    14M /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/noarch
    431M /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/
    That's almost a whole CD worth of patches in half of a year. All of this is to correct for mistakes in probably no more than a few hundred total lines of code.
  12. Re:Gentoo Portage by haruchai · · Score: 3, Informative

    No reason why they couldn't. According to this page:
    http://www.daemonology.net/bsdiff/, this util is already in Gentoo.

    --
    Pain is merely failure leaving the body
  13. Re:Warez by dtfinch · · Score: 2, Informative

    If I understand correctly, a binary diff goes a few steps further than a patch. It stores insertions and deletions while a typical patch (like IPS) only stores replacements, which is optimal for files patched in a hex editor and even most database files but not for recompiled files where everything can change in location.

  14. Re:as soon as it gets hacked in to RPM by sPaKr · · Score: 2, Informative
    OK check out http://www.suse.com/us/private/download/updates/91 _i386.html turns out suse already is patch rpms. As for your 'CDs worth of updates, I say PFFFT' Digging into those directories we can see the updates include the New RPM, the PATCH rpm and instructions in both English and German. Example
    -rw-r--r-- 1 507 1002 62129 Apr 16 09:00 ypserv-2.12.1-44.1.i586.patch.rpm
    -rw-r--r--&nbsp ; 1 507 1002 125191 Apr 16 08:39 ypserv-2.12.1-44.1.i586.rpm
    -rw-r--r-- 1 507 1002 499 Apr 20 07:40 ypserv-2.12.1-44.1.i586_de.info
    -rw-r--r-- 1 507 1002 462 Apr 20 07:40 ypserv-2.12.1-44.1.i586_en.info
    lrwxrwxrwx 1 507 1002 27 May 27 06:02 ypserv.rpm -> ypserv-2.12.1-44.1.i586.rpm
    -rw-r--r-- 1 507 1002 44754 Aug 26 19:30 zlib-1.2.1-70.6.i586.patch.rpm
    -rw-r--r-- 1 507 1002 63682 Aug 26 10:18 zlib-1.2.1-70.6.i586.rpm
    -rw-r--r-- 1 507 1002 593 Sep 02 12:30 zlib-1.2.1-70.6.i586_de.info
    -rw-r--r-- 1 507 1002 553 Sep 02 12:30 zlib-1.2.1-70.6.i586_en.info
    -rw-r--r-- 1 507 1002 44291 Aug 26 19:30 zlib-devel-1.2.1-70.6.i586.patch.rpm
    -rw-r--r--&n bsp; 1 507 1002 66192 Aug 26 10:18 zlib-devel-1.2.1-70.6.i586.rpm
    -rw-r--r-- 1 507 1002 642 Sep 02 12:30 zlib-devel-1.2.1-70.6.i586_de.info
    -rw-r--r--&nbs p; 1 507 1002 599 Sep 02 12:30 zlib-devel-1.2.1-70.6.i586_en.info
    lrwxrwxrwx&nbs p; 1 507 1002 30 Sep 02 13:03 zlib-devel.rpm -> zlib-devel-1.2.1-70.6.i586.rpm
    lrwxrwxrwx 1 507 1002 24 Sep 02 13:03 zlib.rpm -> zlib-1.2.1-70.6.i586.rpm


    It looks like the total updates would be about half of that CD. And that would be full replacement RPMS. The patches seem to be only slighty smaller. And these are replacements RPMS for ALL possible installed packeges. Suse like most distros ships everything in rpms, but none installs all of them. I mean how many people install every Database, Every Development toolchain (that gnu ada compiler rocken your world?), Every desktop gui.. no one. Just as most people you have taken a small data point and whipped it into a full misunderstading. Statistics Lie, and Liers use Statistics. My Stat proff told me that, still true.
  15. Nope, not SuSE by Carl+T · · Score: 2, Informative
    The online update part of SuSE's Yast2 doesn't require you to not have modified any of the files it's upgrading, so it can't very well be a binary diff. And it takes forever for minor patches on big things, such as Mozilla or the kernel sources.

    Here's a snippet from a patch description used by the online update, to give you some idea of what it does:

    Filename: libsmbclient.rpm
    Label: Samba client library
    Series: i586
    Size: 689245 309267
    PatchRpmBasedOn: 2.2.8a-107
    PatchRpmSize: 689245 309244
    Buildtime: 1090501315
    BuiltFrom: samba-2.2.8a-220.src.rpm
    --

    This signature is not in the public domain.
  16. Re:Well... by Anonymous Coward · · Score: 1, Informative

    $ time wget http://www.kernel.org/pub/linux/kernel/v2.6/linux- 2.6.8.1.tar.bz2
    real 6m21.421s
    $ time tar jxf linux-2.6.8.1.tar.bz2
    real 1m10.380s

    See, it all depends.

  17. If I remember correctly, Novell used it ages ago by deniea · · Score: 2, Informative

    I'm quite sure Novell has been doing it in the past. At least with the older versions of Netware (3.x and 4.x versions).

    You had the whole Novell NOS + couple of services in, lets say 100Mb or so, and you needed to update tons of NLM's. Just needed to download a quite small patch file (over a POTS line) that usually could fit on a floppy or so) and then it updated the loose NLM's.

    Nothing new I guess..

  18. Re:Is this an Issue? by teqo · · Score: 2, Informative
    Say I wanted to upgrade from fedora core 2 to fedora core 3, I could just download an upgrade iso at a fraction of the size etc.

    As somebody pointed out before in this article, there is rsync which minimizes transmitted data using some xdelta-like algorithm. This is not really new, and some sites offer anonymous rsync downloads for exact this reason.

    (Rumours were that some people actually use rsync in the following way to get the latest Debian ISOs from a collection of old, already downlod packages: They cat'ed all their packages together to one huge binary file and then ran rsync against the remote ISO image and that local file. Since most data was already in that file, only transmission of a few megabytes of new data and some data arranging had to happen....)

    Here I uttered a few quick&dirty thoughts (which most certainly somebody else has had before, as usual) on how rsync could help in mass patching, don't know if they are worth reading for you... .)

  19. Different forms of binary diffs by davidwr · · Score: 2, Informative

    Not all diffs are alike. The simplest diffs are literal - "find string of bytes to replace and replace them with newbytes."

    Back in 1985, Apple did something a bit more sophisticated for their System Update 2.0 patch.

    Their binary files were structured. The particular structure was called a "resource fork" and had a collection of hundreds or thousands of usually-tiny "resources" which could be individually modified. As a made-up example, replace String ID 50 in the file "System" to "Version 2.0" where it previously was "Version 1.0" or replace one linked-in graphic with another.

    The patch program updated, deleted, and added individual resources.

    This is important for historical reasons:
    If Microsoft or anyone else gets the funny idea to patent "replacing parts of a structured file in an update mechanism" as a broad-scope patent and the USPTO grants the patent [which they probably will, out of ignorance], the patent will need to be challenged and narrowed significantly.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.