Delta Compression for Linux Security Patches?
cperciva asks: "For people without fast internet connections, it is often impractical to download large security patches. In order to avoid to reduce patch sizes, some operating systems -- starting with FreeBSD over a year ago, and recently followed by Mac OS X and Windows XP SP2 -- have started to use delta compression (also known as binary diffs, which constitutes a portion of my doctoral thesis), and can often reduce patch sizes by over a factor of 50. In light of the obvious benefits, I have to ask: When will Linux vendors follow suit?"
Certainly for your primary commercial auto-updated Linux distributions it does, but for anything else it usually doesn't. What makes more sense (because it's easier) is breaking up media and programs, and distributing them separately so you don't have to update one when you update the other. Some projects do this already, and even package their sources this way.
Personally I'd prefer to see binary distributions move to a model of using something like cvs, so you can just do a cvs up (or equivalent) and update everything. Some files would have to be marked to always be overwritten, while config files would be merged. This solves both your differential update problem (if the right system is used - I'm thinking that's pretty much not CVS but I don't know if there's a way to make it do all of that - CVS doesn't handle binaries amazingly intelligently from what I understand) and your updates in general. Plus, you can use it both for source and binary updates.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
The folks at mindvision made an installer/installer creation tool that allowed one to scan two different sets of files and directories to find differences between them (binary differences) and it would just package up those differences in the installer archive. In fact you could use it to diff and package delta between several versions at once. When the user ran the installer (really and updater) it would apply the binary patch to the file set as needed.
I was using this tool over 7 years ago now on Mac OS so I don't see what is so new about this concept... but I am glad is looks like it starting to be used more.
Sorry, the writeup was a bit unclear. Windows XP SP2 contains a new version of Windows Installer (or whatever they're calling it today). This new version includes support for downloading updates via binary diffs, and most updates to XP after this point should be done that way.
Tarsnap: Online backups for the truly paranoid
Delta compression requires the vendor to create a delta for each older version that you can upgrade from. So if a package has had ten updates, the next yupdate will need to have eleven deltas. I don't think so. Unless you want to do something like Windows Update where an agent scans your binaries and compares the difference with the update and then downloads individual files ... but that's a lot more complicated and isn't justified by the bandwidth savings.
SUSE already does this.
Nope. SuSE's "patches" are created by packaging all the files which are affected by a security fix; those files are packaged intact, without any delta compression.
Now, this is certainly a step forward from the common (eg, Debian, RedHat) approach of having people download a complete new package, including copies of files which haven't changed at all, but SuSE's approach is still suboptimal by more than an order of magnitude.
Tarsnap: Online backups for the truly paranoid
Gentoo users should make sure they know about this its called deltup, basically a script for portage that grabs xdelta patches instead of downloading the entire file again. It seems to save me alot of bw anyway.
can someone explain for those people who have no idea what delta compression is and how it differs over something like zip/rar/gz/7z etc
--ajs
XDelta3 recently reached its first public release.
http://xdelta.org/xdelta3.html
XDelta3 is a library which is designed to foster exactly this kind of functionality. If distrobutions integrate the xdelta functionality into their package management framework we would be well on our way to what the poster is looking for.
Well, that's a particular kind of binary diff. It gets harder if you want the process automated, you know, like it has to be if you want to do anything productive with it (as opposed to blotting out a contiguous chunk of someone else's code).
What makes you think you need to take a server out of production while code compiles on it? I never have.
I beg to differ. SuSE 9.1 came out only 5 months ago:
That's almost a whole CD worth of patches in half of a year. All of this is to correct for mistakes in probably no more than a few hundred total lines of code.No reason why they couldn't. According to this page:
http://www.daemonology.net/bsdiff/, this util is already in Gentoo.
Pain is merely failure leaving the body
If I understand correctly, a binary diff goes a few steps further than a patch. It stores insertions and deletions while a typical patch (like IPS) only stores replacements, which is optimal for files patched in a hex editor and even most database files but not for recompiled files where everything can change in location.
It looks like the total updates would be about half of that CD. And that would be full replacement RPMS. The patches seem to be only slighty smaller. And these are replacements RPMS for ALL possible installed packeges. Suse like most distros ships everything in rpms, but none installs all of them. I mean how many people install every Database, Every Development toolchain (that gnu ada compiler rocken your world?), Every desktop gui.. no one. Just as most people you have taken a small data point and whipped it into a full misunderstading. Statistics Lie, and Liers use Statistics. My Stat proff told me that, still true.
Here's a snippet from a patch description used by the online update, to give you some idea of what it does:
This signature is not in the public domain.
$ time wget http://www.kernel.org/pub/linux/kernel/v2.6/linux- 2.6.8.1.tar.bz2
real 6m21.421s
$ time tar jxf linux-2.6.8.1.tar.bz2
real 1m10.380s
See, it all depends.
I'm quite sure Novell has been doing it in the past. At least with the older versions of Netware (3.x and 4.x versions).
You had the whole Novell NOS + couple of services in, lets say 100Mb or so, and you needed to update tons of NLM's. Just needed to download a quite small patch file (over a POTS line) that usually could fit on a floppy or so) and then it updated the loose NLM's.
Nothing new I guess..
As somebody pointed out before in this article, there is rsync which minimizes transmitted data using some xdelta-like algorithm. This is not really new, and some sites offer anonymous rsync downloads for exact this reason.
(Rumours were that some people actually use rsync in the following way to get the latest Debian ISOs from a collection of old, already downlod packages: They cat'ed all their packages together to one huge binary file and then ran rsync against the remote ISO image and that local file. Since most data was already in that file, only transmission of a few megabytes of new data and some data arranging had to happen....)
Here I uttered a few quick&dirty thoughts (which most certainly somebody else has had before, as usual) on how rsync could help in mass patching, don't know if they are worth reading for you... .)
Not all diffs are alike. The simplest diffs are literal - "find string of bytes to replace and replace them with newbytes."
Back in 1985, Apple did something a bit more sophisticated for their System Update 2.0 patch.
Their binary files were structured. The particular structure was called a "resource fork" and had a collection of hundreds or thousands of usually-tiny "resources" which could be individually modified. As a made-up example, replace String ID 50 in the file "System" to "Version 2.0" where it previously was "Version 1.0" or replace one linked-in graphic with another.
The patch program updated, deleted, and added individual resources.
This is important for historical reasons:
If Microsoft or anyone else gets the funny idea to patent "replacing parts of a structured file in an update mechanism" as a broad-scope patent and the USPTO grants the patent [which they probably will, out of ignorance], the patent will need to be challenged and narrowed significantly.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.