Slashdot Mirror


Delta Compression for Linux Security Patches?

cperciva asks: "For people without fast internet connections, it is often impractical to download large security patches. In order to avoid to reduce patch sizes, some operating systems -- starting with FreeBSD over a year ago, and recently followed by Mac OS X and Windows XP SP2 -- have started to use delta compression (also known as binary diffs, which constitutes a portion of my doctoral thesis), and can often reduce patch sizes by over a factor of 50. In light of the obvious benefits, I have to ask: When will Linux vendors follow suit?"

55 of 289 comments (clear)

  1. Doesn't make as much sense to use for Linux by drinkypoo · · Score: 5, Informative

    Certainly for your primary commercial auto-updated Linux distributions it does, but for anything else it usually doesn't. What makes more sense (because it's easier) is breaking up media and programs, and distributing them separately so you don't have to update one when you update the other. Some projects do this already, and even package their sources this way.

    Personally I'd prefer to see binary distributions move to a model of using something like cvs, so you can just do a cvs up (or equivalent) and update everything. Some files would have to be marked to always be overwritten, while config files would be merged. This solves both your differential update problem (if the right system is used - I'm thinking that's pretty much not CVS but I don't know if there's a way to make it do all of that - CVS doesn't handle binaries amazingly intelligently from what I understand) and your updates in general. Plus, you can use it both for source and binary updates.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    1. Re:Doesn't make as much sense to use for Linux by bluesguy_1 · · Score: 3, Interesting

      I disagree. I've used smartversion on Windows for a couple years now for making versioned archives of important files, and I wish Linux had something comparable. It's liked having a portable single tar.gz of an entire cvs repository without all the headaches...

    2. Re:Doesn't make as much sense to use for Linux by GweeDo · · Score: 3, Informative

      What you are requesting can already be done basically in Gentoo (emerge -Uupv world), Debian (apt-get something or other) and Redhat/Fedora (up2date something or other). So why do we need something else again :) Oh, with Gentoo...add ccache to that for faster compiles too

    3. Re:Doesn't make as much sense to use for Linux by drinkypoo · · Score: 3, Informative

      I use gentoo. I never have found any ccache settings that make much of a difference. None of these systems do a binary differential update, they download a whole package and install it, or in the case of gentoo, download a source package and compile it. Neither of these approaches are what is being called for in this article, nor what I suggest above.

      Mind you, I'm fine with the way gentoo does things, but I have a fairly powerful system - not incredibly fast by modern standards but faster than anything I've run linux on before, or probably any Unix at all for that matter. For a dialup user on an older computer, atomically differential updates would make a really big difference.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    4. Re:Doesn't make as much sense to use for Linux by morcego · · Score: 5, Informative

      I'm not sure about Gentoo, but I'm positive that is not what happens for Debian, RedHat, Fedora, Mandrake, SuSe, Conectiva etc.

      On those systems, when you do an upgrade (apt-get update), you will get a fresh package, including not only the files that changes, but all the files for that package. And if we have a package with 1 binary and 50 images, and only the binary changed, we get to download all the images again.

      Some distributions have been implementing package fragmentantion for this (package-core and package-images for this example), and that is a good thing for these cases, although it is a nightmare to manage. Not as fine grained as proposed by the grandparent post, but good enough for most cases.

      --
      morcego
    5. Re:Doesn't make as much sense to use for Linux by timeOday · · Score: 2, Interesting
      Binary updates are not a good fit for Gentoo! Not only because most people don't use the binary packages, but because in order to generate the diff, the server must know the exact contents of the file on your system, as well as the exact contents of the updated file. The number of different binary patches would be exponential in the number of compile switches, compiler versions, USE flags, and so on - for both "old" and "new" file versions, so square it again!

      I guess if you reall wanted to be clever, you could send the server enough information (your existing package versions, make and USE flags etc), plus the desired flags for your new file. With this the server could compile a binary matching yours, then compile the new binary for you, then make a binary diff. But that, as Kramer would say, is "kooky talk."

    6. Re:Doesn't make as much sense to use for Linux by advocate_one · · Score: 2, Informative
      Binary updates are not a good fit for Gentoo! Not only because most people don't use the binary packages, but because in order to generate the diff, the server must know the exact contents of the file on your system, as well as the exact contents of the updated file.""

      No, those "binary" diffs for Gentoo would be done against the sources used for the previous version of the gentoo "package", which would then be used to download the diff so that the gentoo computer could then construct new sources to build against. It would require gentoo computers to keep the sources rather than discard them to save space.

      --
      Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
  2. Right after... by Fermier+de+Pomme+de · · Score: 4, Funny

    ... their biggest customers start using dialup.

  3. SP2? by keiferb · · Score: 5, Funny

    You mean to tell me that beast I downloaded was just a diff? Jesus H. Christ!

    1. Re:SP2? by cperciva · · Score: 4, Informative

      Sorry, the writeup was a bit unclear. Windows XP SP2 contains a new version of Windows Installer (or whatever they're calling it today). This new version includes support for downloading updates via binary diffs, and most updates to XP after this point should be done that way.

    2. Re:SP2? by dracvl · · Score: 4, Funny
      You mean to tell me that beast I downloaded was just a diff? Jesus H. Christ!

      If you look at the URL...

      http://www.microsoft.com/windows2000/techinfo/plan ning/redir-binarydelta.asp

      ...you will clearly see that what you downloaded was Windows 2000, with a binary patch that turned it into Windows XP SP2.

  4. as soon as it gets hacked in to RPM by sPaKr · · Score: 2, Insightful

    As soon as binary diffs get hacked into RPM then it might happen. binary diffs of one rpm to another later version wont really work as binary diffs are only small when they are produced on uncompressed, unecrypted data. The real issue is that linux doesnt really need binary diffs. Linux distros already have fine grain packages ( lots of little packegs not a few bigs ones). Security updates usally just require a one or very few packegs to be updated. Binary diffs only really make sense when you have huge packages that require a whole new package for upgrade. I bet the average RPM is about the same size as the minium binary diff from MS.

    1. Re:as soon as it gets hacked in to RPM by cperciva · · Score: 2, Insightful

      Binary diffs only really make sense when you have huge packages that require a whole new package for upgrade

      Binary diffs make sense any time you've got large files being updated. On my system, libssl (library archive + shared object file + profiled library) is 600kB; that's large enough to justify using a 10kB binary diff instead.

      I bet the average RPM is about the same size as the minium binary diff from MS.

      I can't say anything about Microsoft's patches directly, but the patches used by FreeBSD Update are on average 65 times smaller than the individual files being updated. As little faith as I have in Microsoft, I still doubt that they could produce patches which were sub-optimal by more than a factor of 50.

    2. Re:as soon as it gets hacked in to RPM by Waffle+Iron · · Score: 2, Informative
      The real issue is that linux doesnt really need binary diffs. Linux distros already have fine grain packages ( lots of little packegs not a few bigs ones). Security updates usally just require a one or very few packegs to be updated.

      I beg to differ. SuSE 9.1 came out only 5 months ago:

      $du -h /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/

      417M /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/i586
      14M /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/noarch
      431M /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/
      That's almost a whole CD worth of patches in half of a year. All of this is to correct for mistakes in probably no more than a few hundred total lines of code.
    3. Re:as soon as it gets hacked in to RPM by sPaKr · · Score: 2, Informative
      OK check out http://www.suse.com/us/private/download/updates/91 _i386.html turns out suse already is patch rpms. As for your 'CDs worth of updates, I say PFFFT' Digging into those directories we can see the updates include the New RPM, the PATCH rpm and instructions in both English and German. Example
      -rw-r--r-- 1 507 1002 62129 Apr 16 09:00 ypserv-2.12.1-44.1.i586.patch.rpm
      -rw-r--r--&nbsp ; 1 507 1002 125191 Apr 16 08:39 ypserv-2.12.1-44.1.i586.rpm
      -rw-r--r-- 1 507 1002 499 Apr 20 07:40 ypserv-2.12.1-44.1.i586_de.info
      -rw-r--r-- 1 507 1002 462 Apr 20 07:40 ypserv-2.12.1-44.1.i586_en.info
      lrwxrwxrwx 1 507 1002 27 May 27 06:02 ypserv.rpm -> ypserv-2.12.1-44.1.i586.rpm
      -rw-r--r-- 1 507 1002 44754 Aug 26 19:30 zlib-1.2.1-70.6.i586.patch.rpm
      -rw-r--r-- 1 507 1002 63682 Aug 26 10:18 zlib-1.2.1-70.6.i586.rpm
      -rw-r--r-- 1 507 1002 593 Sep 02 12:30 zlib-1.2.1-70.6.i586_de.info
      -rw-r--r-- 1 507 1002 553 Sep 02 12:30 zlib-1.2.1-70.6.i586_en.info
      -rw-r--r-- 1 507 1002 44291 Aug 26 19:30 zlib-devel-1.2.1-70.6.i586.patch.rpm
      -rw-r--r--&n bsp; 1 507 1002 66192 Aug 26 10:18 zlib-devel-1.2.1-70.6.i586.rpm
      -rw-r--r-- 1 507 1002 642 Sep 02 12:30 zlib-devel-1.2.1-70.6.i586_de.info
      -rw-r--r--&nbs p; 1 507 1002 599 Sep 02 12:30 zlib-devel-1.2.1-70.6.i586_en.info
      lrwxrwxrwx&nbs p; 1 507 1002 30 Sep 02 13:03 zlib-devel.rpm -> zlib-devel-1.2.1-70.6.i586.rpm
      lrwxrwxrwx 1 507 1002 24 Sep 02 13:03 zlib.rpm -> zlib-1.2.1-70.6.i586.rpm


      It looks like the total updates would be about half of that CD. And that would be full replacement RPMS. The patches seem to be only slighty smaller. And these are replacements RPMS for ALL possible installed packeges. Suse like most distros ships everything in rpms, but none installs all of them. I mean how many people install every Database, Every Development toolchain (that gnu ada compiler rocken your world?), Every desktop gui.. no one. Just as most people you have taken a small data point and whipped it into a full misunderstading. Statistics Lie, and Liers use Statistics. My Stat proff told me that, still true.
  5. Mindvision by shawnce · · Score: 4, Informative

    The folks at mindvision made an installer/installer creation tool that allowed one to scan two different sets of files and directories to find differences between them (binary differences) and it would just package up those differences in the installer archive. In fact you could use it to diff and package delta between several versions at once. When the user ran the installer (really and updater) it would apply the binary patch to the file set as needed.

    I was using this tool over 7 years ago now on Mac OS so I don't see what is so new about this concept... but I am glad is looks like it starting to be used more.

  6. Re:Is this an Issue? by AhabTheArab · · Score: 4, Insightful

    Now with broadband being so popular, and still on the rise, is this really an issue?

    Yes, it is. I just switced to broadband less than two months ago. A lot of my friends are still on dialup. Also, do not forget rural areas which do not have access to broadband. You would be surprised how many people still have dialup, I believe the number of broadband users just recently surpassed the number of dialup users. This means, obviously, that nearly half of all internet users are still on dialup.

  7. Here the problem: by Sonic+McTails · · Score: 2, Interesting

    Linux makes it very easy to install new packages and upgrade packages from sources father away from the vendor. If a vendor tried to release a patch using delta versioning, it could totally wreck a system. Since neither RPM nor DPKG are designed to handle checking md5sum hashs against each file, and making sure the patch can be installed safely, it will have to wait until this feature is incomporaited into either system.

    --
    This signature was left intentionally blank.
  8. SUSE by DreadSpoon · · Score: 3, Interesting

    SUSE already does this.

    RPM in general, however, doesn't nicely support this feature. Either RPM needs to be extended/modified, or a new format needs to be made. While I favor a new format for many reasons other than this, modifying RPM is probably the best solution in order to provide backwards compatibility.

  9. Well... by iamdrscience · · Score: 2, Interesting

    On that topic, why does almost everybody distribute source code as gzipped tars instead of bzip2'ed tars (just about everybody that does use bzip2 also distributed gzips)? Sure, in the beginning gzip made more sense for people on slow machines, but nowadays the difference in the time it takes to decompress is trivial, whereas the compression benefits of bzip2 on text are phenomenal in my experience.

    1. Re:Well... by Attitude+Adjuster · · Score: 2
      Well, I don't know for sure, but here my $0.02 (twice).

      Ramble 1:In my admittedly limited experience (since '94), it was a while before traditional old-school Unix (tm) like OSF (now Digital Unix) and Solaris abandoned the encumbed compress/uncompress utilities and started having gzip.

      Even now, the old Solaris Ultra 10 sitting in the corner of my office doing nothing (running 5.7, which has had uptimes of a couple of years solid) doesn't have bzip2 - cant be arsed to ask the sysadmin to update it as I'm only using Linux these days, which is cheaper/faster/nicer/more capable for workstations than Solaris ever was.

      Ramble 2:The science software I use knows all about gzip - to save space I keep all my binary data files gzipped (I have 2 TB of disk space, but a lot of data) and the tools internally gunzip and re-gzip. That functionality hasn't been added for bzip (why? I have no idea) despite bzip2 being around for ages.

      So... maybe its social inertia preventing a complete move to bzip2, or maybe gzip is still more widely available than bzip2.

    2. Re:Well... by p3d0 · · Score: 2, Insightful
      bzip2 is a retarded name, for one thing. It makes it sound like it's in flux (gee, should I wait for bzip3?).

      They definitely should have done whatever was necessary to keep the name as just "bzip".

      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    3. Re:Well... by spitzak · · Score: 2, Insightful

      It would help a lot if tar would do it if you just provided -z instead of having to remember to provide -j. Come to think of it, it would be nice if tar just detected compression and you did not have to give it -z either! Can this be done?

    4. Re:Well... by Sunspire · · Score: 5, Insightful

      I always for example grab the "regular" tar.gz version of the kernel for two reasons,

      1) I always forget the j option to tar, since bz2 packages are not that common. It should autodetect it.
      2) I have the perception that the combined download time and unpacking is longer for bz2

      Point two was subjective up until now, but just for the hell of it I decided to measure it. I used the time command to measure how long it took to download the kernels and how long it took to unpack them:

      time to download linux-2.6.8.tar.bz2 1m4.414s
      time to download linux-2.6.8.tar.gz 1m9.706s

      time to unpack linux-2.6.8.tar.bz2 2m05.457s
      time to unpack linux-2.6.8.tar.gz 0m26.309s

      This is on a P4C 3.2GHz, 1GB RAM, 8Mbit connection. So there you have it, with a fast enough connection the difference is significantly in favor of the old gz format. The size difference between the bz2 and gz kernel, about 8.8 MB, is not nearly good enough to merit the slower unpacking. If you have a slower machine but also a slower connection the result is likely in the same ballpark.

      This goes to show that if you want to provide faster (subjective) update times to users, especially in the future with faster connections, you have to study the problem in detail and not just blindly try to optimize some aspect of the process (size in this case) since the global performance might in fact perform worse. Premature optimization and all that... What's the time for patching using delta compression any way? If a 600KB RPM update can be delta compressed to 10KB, but the patching process takes longer than 15 seconds, I'm likely see a slow down in system update time.

      --
      It's like deja vu all over again.
  10. use rsync by stonebeat.org · · Score: 2, Interesting

    delta based patch distribution on linux platform is quite easy. Just use RSYNC to sync application file to the source. I have used this technique of patching (i.e. RSYNC), to provide updates/patches to a in-house built application. Work very nicely.

  11. Gentoo by SuperBanana · · Score: 2, Interesting
    Jokes about gentoo aside, the source tarballs are cached in /var, and only removed when they exceed configured limits for max disk space. Patches are contained in the portage tree, along with the "ebuild" files which are the build instruction files.

    If the update is just a patch to the source, there's sometimes a minor revision made and an updated gentoo ebuild file and source code patch added to the portage tree, which is of course done via rsync. All in all, it's decently efficient. This mostly(I think) happens with unstable package versions, where a security update may make it into portage before the official project bumps their release, but that's not the case with stable stuff.

    I think for basic systems, compile time complaints are slightly exaggerated. My -original- celeron 450 isn't shabby at all at compiling most of the more basic system packages and server apps. Even glibc and gcc build with relative ease, and when I set up distcc amongst my three systems, it became even less of a hassle. Even without distcc, the time to clear out 50 packages of updates on a mail server is surprisingly low on a low-powered system.

    1. Re:Gentoo by bzBetty · · Score: 2, Informative

      Gentoo users should make sure they know about this its called deltup, basically a script for portage that grabs xdelta patches instead of downloading the entire file again. It seems to save me alot of bw anyway.

  12. Ummm... diffs? Not for Linux? Are you kidding? by !ucif3r · · Score: 5, Insightful

    Ok before I get berated by the karma (whoring) police I do realize these are not binary diffs. But, seriously, linux has been using diff's as a way to save bandwidth before Windows even offered 'updates'. Another example of Windows 'innovation' I guess.

    Yes, I see how it is neat that there is a binary version of this process with Windows but linux is primarily a source based operating system. It is that way becuase the software is designed to be compiled for a variety of systems and setups and work with all of them.

    I do understand the authors question though, but it really should be reworded. Linux is not a OS in the sense that Windows is an OS. He should perhaps be more correctly asking when one of the 'binary' distributions of Linux (or of a Linux 'based' OS to be exact) will plan on offering this. Binary packages are really only offered on a per distribution basis with the binaries not being very compatible between distro's and systems (although some basic compatibility is generally there). As to that question who knows and who cares I use Gentoo, and after trying almost every one of the binary distro's

    --
    "Take that Lisa's beliefs!" - Homer Simpson
  13. Too complicated and confusing by avida · · Score: 4, Informative

    Delta compression requires the vendor to create a delta for each older version that you can upgrade from. So if a package has had ten updates, the next yupdate will need to have eleven deltas. I don't think so. Unless you want to do something like Windows Update where an agent scans your binaries and compares the difference with the update and then downloads individual files ... but that's a lot more complicated and isn't justified by the bandwidth savings.

    1. Re:Too complicated and confusing by Malc · · Score: 2, Insightful

      And how is this different from source code patches? It seems to me that they'll only provide patches from version to version, like they do with GNU Emacs. If you need to update multiple versions then you have to make a decision about going through 10 patches, or doing a full download of the desired version.

  14. Re:SUSE by cperciva · · Score: 3, Informative

    SUSE already does this.

    Nope. SuSE's "patches" are created by packaging all the files which are affected by a security fix; those files are packaged intact, without any delta compression.

    Now, this is certainly a step forward from the common (eg, Debian, RedHat) approach of having people download a complete new package, including copies of files which haven't changed at all, but SuSE's approach is still suboptimal by more than an order of magnitude.

  15. Re:Huh?? by EvanED · · Score: 2, Insightful

    Um, I don't know about you, but I don't want to recompile for a security patch. I didn't even compile my system in the first place, it's just binaries.

  16. Real hackers... by Black+Parrot · · Score: 2, Funny

    ...toggle their diffs in from the front panel.

    --
    Sheesh, evil *and* a jerk. -- Jade
  17. XDelta3 by TheBashar · · Score: 4, Informative

    XDelta3 recently reached its first public release.

    http://xdelta.org/xdelta3.html

    XDelta3 is a library which is designed to foster exactly this kind of functionality. If distrobutions integrate the xdelta functionality into their package management framework we would be well on our way to what the poster is looking for.

    1. Re:XDelta3 by Spy+Hunter · · Score: 2, Interesting

      Oh man, I just had a great idea. What if you incorporated XDelta3 into a Reiser4 filesystem plugin? Versioning built into the filesystem would be an *awesome* feature. I'm sure it's been done before on some other OS, but it could really go mainstream on Linux with Reiser4.

      --
      main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
  18. Shareware by slittle · · Score: 3, Insightful

    Used to do this back in ye olde DOS shareware days. I think RTPatch was the most common of the commercial ones.

    --
    Opportunity knocks. Karma hunts you down.
  19. Re:How about this... by Anonymous Coward · · Score: 4, Funny

    But if you're posting to Slashdot on a Friday night, you probably don't have a friend's house to go over to.

  20. Gentoo Portage by WamBamBoozle · · Score: 4, Interesting
    I wonder why Gentoo doesn't do this. Gentoo, as far as I can tell, always distributes a bzip2'ed tar of any particular distribution.

    It works beautifully but I can't help but think it is a waste of bandwidth.

    1. Re:Gentoo Portage by haruchai · · Score: 3, Informative

      No reason why they couldn't. According to this page:
      http://www.daemonology.net/bsdiff/, this util is already in Gentoo.

      --
      Pain is merely failure leaving the body
  21. Re:Its well known by Mongo222 · · Score: 2, Informative

    What makes you think you need to take a server out of production while code compiles on it? I never have.

  22. Several reasons, but not all technical by lakeland · · Score: 3, Insightful

    Firstly, linux programs tend to be smaller than windows programs (do one thing, and do it well). Even a huge beast like tetex is 'only' 14.4MB -- compare to SP2... This has reduced the demand for delta compression.

    Secondly, in the windows world people release rarely. However, the opposite is true in the linux world -- projects with daily releases are not unheard of, and weekly releases are fairly common. This means enumerating patches (v 3.4 -> v. 3.7) is infeasible in Linux where it is feasible in Windows.

    More sophisticated algorithms than delta checksums do exist (as I guess you know if your thesis is on them) -- rolling checksums have been used in several projects I know of. However, there is a widespread rumour that these techniques are patented. I have never seen any evidence, but it puts a damper on any implementations.

    There is a semi-vapourware project implementing all of this (part of the apache project IIRC). However the project fizzled away several years ago.

    1. Re:Several reasons, but not all technical by tyrione · · Score: 3, Interesting
      Well congratulations.

      You point out TeTeX at 14+MB which is as bare as it gets for TeTeX, then comes the TeTeX-Doc and the TeTeX-Extra which by now we're up to over 50MBs.

      Oh and here is the real kicker. Debian has updated 2.02 3 if not 4 times this month. Now 150MB+ to over 200MBs of fixes? Nope. SP2 looks a bit smaller now don't it?

      And that doesn't even touch the -1,-2,..-20 Debian patches they keep spewing out for project after project.

      The only plus for a 56k access is they don't cap youru downloads on a monthly basis. The badside obviously is bandwidth, but for me its time down waiting for important packages like TeTeX to update.

      Having a SVN approach to patching systems makes sense. Or CVS if you prefer a different versioning system approach.

      It's already been said but it is worth repeating, especially when one runs KDE or GNOME. Just Build a freakin' base package and update us with Binary Images that are new or replaced, documentation that is new or revision updates and binaries to the executables, libraries, so on that change and not the mountains of innert parts that don't change.

      You can't tell me KDELIBS , KDEBASE needs to be completely rebuilt each .x revions or -x revision by Debian and by completely rebuilt I mean all the inert files that don't actually get touched during the build process other than to make sure some wallpaper image still exists. Hell the Wallpaper backdrops, etc should be add-ons, not part of the distributions. But then again I suppose everyone thinks we all have T1 access.

      K.I.S.S.

  23. It's already doing it. by Mongo222 · · Score: 3, Interesting



    http://www.daemonology.net/bsdiff/

    bsdiff and bspatch are tools for building and applying patches to binary files. By using suffix sorting (specifically, Larsson and Sadakane's qsufsort) and taking advantage of how executable files change, bsdiff routinely produces binary patches 50-80% smaller than those produced by Xdelta, and 15% smaller than those produced by .RTPatch (a $2750/seat commercial patch tool).

    http://sourceforge.net/projects/diffball

    A general delta compression/differencing suite for any platform that supports autoconf/automake, written in c, w/ builtin support for reading,writing, converting between multiple file formats, and an easy framework to drop in new algorithms.

  24. Re:Warez by dtfinch · · Score: 2, Informative

    If I understand correctly, a binary diff goes a few steps further than a patch. It stores insertions and deletions while a typical patch (like IPS) only stores replacements, which is optimal for files patched in a hex editor and even most database files but not for recompiled files where everything can change in location.

  25. Re:Its well known by Stevyn · · Score: 2, Insightful

    While it's more difficult to set up a system with Gentoo than Windows 2000, it's easier to maintain.

    This is probably because of portage. Precompiled packages coming from all differnet sources can be a bitch to maintain. Mandrake is my example for this if you ever want to update a package they don't have RPMs for. And as for compile time, I'd rather let the computer sit for a hour or two overnight compiling a huge package than having to deal with the dependencies myself.

  26. License of BSDiff by gnuman99 · · Score: 3, Interesting
    Certainly for your primary commercial auto-updated Linux distributions it does, but for anything else it usually doesn't.

    Especially since the license of bsdiff is not even close to a BSD license (don't let the name of BSD Protection License fool you). Unless the license is changed to something like BSD, BSDiff is not going to be implemented anywhere except in closed source software. Debian cannot even package this software becauses it is non-free.

    I guess the bottom line is if you want to have something accepted in open source *and* in propriatary software, you want to license under BSD. You want to cater to one group (closed source in this case), you will lose the other.

  27. Why do they make it sound new? by mr_zorg · · Score: 2, Insightful
    ...have started to use delta compression (also known as binary diffs...
    Why does the poster make this sound like a new technology? And why does one of the high ranked comments link to a Microsoft tech note from 3/04 talking about this new thing called Binary Diff Compression?

    What's so new about it? I remember working with InstallShield, RTPatch, and others, way back in the Windows 3.11 days... New? <yawn>

  28. Gentoo now has "source delta's" reducing traffic by rigolo · · Score: 3, Interesting
    Well, gentoo is known for the fact that you download the source of every program and than start compiling. These sources are distributed in .tar.gz or .tar.bz form and can be very large. A version change (even a change from .0.0.1 to .0.0.2) has it's own tarball and therefor is downloaded again completly. But, the real changes between these 2 can be small.

    Enter "deltup" a tool that looks at to tarrballs and gives you a diff between the 2 that you can use to "transform the old tarball to a exact copy of the new tarball", it even preserves MD5 checksums compatibility. Now some enterprising gentoo user create a "dynamic deltup server" that automates the creation of these delta files, and people can reuse the delta files that other people used.


    Using this technique in combination with gentoo portage people can reduce there traffic with on average 75%.


    Have a look at the following URL's for more information:

    http://forums.gentoo.org/viewtopic.php?t=215262

    http://linux01.gwdg.de/~nlissne/deltup-status.atim e.html


    Rigolo

  29. Nope, not SuSE by Carl+T · · Score: 2, Informative
    The online update part of SuSE's Yast2 doesn't require you to not have modified any of the files it's upgrading, so it can't very well be a binary diff. And it takes forever for minor patches on big things, such as Mozilla or the kernel sources.

    Here's a snippet from a patch description used by the online update, to give you some idea of what it does:

    Filename: libsmbclient.rpm
    Label: Samba client library
    Series: i586
    Size: 689245 309267
    PatchRpmBasedOn: 2.2.8a-107
    PatchRpmSize: 689245 309244
    Buildtime: 1090501315
    BuiltFrom: samba-2.2.8a-220.src.rpm
    --

    This signature is not in the public domain.
  30. You shouldn't have to get entire packages by Anomalous+Cowturd · · Score: 2, Interesting

    Apparently, most distributions have you download the entire package for each update, although there are efforts underway to break up sections a bit more (if I'm wrong, I apologize - I use BSD).

    It sounds like what's really needed is to build packages of just the updated files. The install manifest would just specify the files in the archive, so there shouldn't be any complaints about missing files. Or does that show my ignorance?

    Actually, if you wanted a more general scheme, the update server would build packages on the fly. The updater would send a list of files in a package to the server, which would return a set of files that need updating. You could use this to upgrade any system, regardless of distribution. You would just have to update the database for whatever to show that it was at the latest version.

    Yes, this would take up a lot more CPU time, and be pretty slow on response time, but the savings in bandwidth should be worth it. All the time the servers were waiting for the network card could be used compressing files.

    --

    Java: the bastard demon spawn of C++ and Ada

  31. is there any real benefit? by borud · · Score: 2, Interesting
    Why does Linux need this? How many people have a connection which is so bad they really benefit from this?

    Sure it is always nice to have faster downloads. But is it worth the extra work involved in setting this up both at the distribution point and on the client side?

    I am not being rethorical. I am just wondering.

  32. If I remember correctly, Novell used it ages ago by deniea · · Score: 2, Informative

    I'm quite sure Novell has been doing it in the past. At least with the older versions of Netware (3.x and 4.x versions).

    You had the whole Novell NOS + couple of services in, lets say 100Mb or so, and you needed to update tons of NLM's. Just needed to download a quite small patch file (over a POTS line) that usually could fit on a floppy or so) and then it updated the loose NLM's.

    Nothing new I guess..

  33. Re:Is this an Issue? by teqo · · Score: 2, Informative
    Say I wanted to upgrade from fedora core 2 to fedora core 3, I could just download an upgrade iso at a fraction of the size etc.

    As somebody pointed out before in this article, there is rsync which minimizes transmitted data using some xdelta-like algorithm. This is not really new, and some sites offer anonymous rsync downloads for exact this reason.

    (Rumours were that some people actually use rsync in the following way to get the latest Debian ISOs from a collection of old, already downlod packages: They cat'ed all their packages together to one huge binary file and then ran rsync against the remote ISO image and that local file. Since most data was already in that file, only transmission of a few megabytes of new data and some data arranging had to happen....)

    Here I uttered a few quick&dirty thoughts (which most certainly somebody else has had before, as usual) on how rsync could help in mass patching, don't know if they are worth reading for you... .)

  34. Different forms of binary diffs by davidwr · · Score: 2, Informative

    Not all diffs are alike. The simplest diffs are literal - "find string of bytes to replace and replace them with newbytes."

    Back in 1985, Apple did something a bit more sophisticated for their System Update 2.0 patch.

    Their binary files were structured. The particular structure was called a "resource fork" and had a collection of hundreds or thousands of usually-tiny "resources" which could be individually modified. As a made-up example, replace String ID 50 in the file "System" to "Version 2.0" where it previously was "Version 1.0" or replace one linked-in graphic with another.

    The patch program updated, deleted, and added individual resources.

    This is important for historical reasons:
    If Microsoft or anyone else gets the funny idea to patent "replacing parts of a structured file in an update mechanism" as a broad-scope patent and the USPTO grants the patent [which they probably will, out of ignorance], the patent will need to be challenged and narrowed significantly.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  35. Something to try by slashdotjunker · · Score: 2, Interesting
    Something that might be useful to try out is a patch method I developed for a MMORPG. The problem is a challenge since MMOs are constantly updating game files. Also, many of the files are very large art resources which might only change a small percentage from patch to patch. On top of this, players are always mesing with their files so you can't assume that any file is in any known state on the basis of some "patch level".

    The method I came up with was to use essentially the rsync algorithm, but I reversed it so that the computationally expensive parts were performed on the client side. The results of each computation were stored on the server side as a "patch" so that the computation was performed only once. This provided a patch system that was dynamic but without generating large server load.

    The advantages are:
    1. Patches are generated dynamically so the files can be in any state (truncated, too big, filled with garbage, missing enitirely, etc).
    2. The heavy computation is performed on the client side so that patch generation does not drive up server load.
    3. Computed patches are stored and reused, so a database of patches it quickly built up.
    4. Patches are efficient (based on binary diff).

    A detailed example follows; knowledge of rsync is required for understanding.

    For example, let's suppose you are releasing a new content upgrade. A particular file's signature has changed from F4A3 to 26B1. (For brevity I am using a 16-bit signature, in practice it is much much larger.) When the first client connects to the patch server it receives the updated list of file signatures. The client notices that the file is now old so it requests a patch for F4A3->26B1. The patch does not exist yet, so the reverse-rsync algorithm activates and the client calculates the F4A3->26B1 patch. When that patch has been generated it is returned to the patch server and all future clients can just download the patch and skip the reverse-rsync. After applying a patch, the result is checked to make sure that you actually ended up with 26B1. If you did not, extra rounds of patching are performed.

    Some Notes: These extra rounds are consequences of rsync and a security check as well to prevent bad patches from being uploaded to the server. And, normally the release maintainer would "pre-seed" the patch server by patching a clean current version to the new version just before release.