Delta Compression for Linux Security Patches?
cperciva asks: "For people without fast internet connections, it is often impractical to download large security patches. In order to avoid to reduce patch sizes, some operating systems -- starting with FreeBSD over a year ago, and recently followed by Mac OS X and Windows XP SP2 -- have started to use delta compression (also known as binary diffs, which constitutes a portion of my doctoral thesis), and can often reduce patch sizes by over a factor of 50. In light of the obvious benefits, I have to ask: When will Linux vendors follow suit?"
Certainly for your primary commercial auto-updated Linux distributions it does, but for anything else it usually doesn't. What makes more sense (because it's easier) is breaking up media and programs, and distributing them separately so you don't have to update one when you update the other. Some projects do this already, and even package their sources this way.
Personally I'd prefer to see binary distributions move to a model of using something like cvs, so you can just do a cvs up (or equivalent) and update everything. Some files would have to be marked to always be overwritten, while config files would be merged. This solves both your differential update problem (if the right system is used - I'm thinking that's pretty much not CVS but I don't know if there's a way to make it do all of that - CVS doesn't handle binaries amazingly intelligently from what I understand) and your updates in general. Plus, you can use it both for source and binary updates.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
... their biggest customers start using dialup.
You mean to tell me that beast I downloaded was just a diff? Jesus H. Christ!
The folks at mindvision made an installer/installer creation tool that allowed one to scan two different sets of files and directories to find differences between them (binary differences) and it would just package up those differences in the installer archive. In fact you could use it to diff and package delta between several versions at once. When the user ran the installer (really and updater) it would apply the binary patch to the file set as needed.
I was using this tool over 7 years ago now on Mac OS so I don't see what is so new about this concept... but I am glad is looks like it starting to be used more.
Now with broadband being so popular, and still on the rise, is this really an issue?
Yes, it is. I just switced to broadband less than two months ago. A lot of my friends are still on dialup. Also, do not forget rural areas which do not have access to broadband. You would be surprised how many people still have dialup, I believe the number of broadband users just recently surpassed the number of dialup users. This means, obviously, that nearly half of all internet users are still on dialup.
SUSE already does this.
RPM in general, however, doesn't nicely support this feature. Either RPM needs to be extended/modified, or a new format needs to be made. While I favor a new format for many reasons other than this, modifying RPM is probably the best solution in order to provide backwards compatibility.
Ok before I get berated by the karma (whoring) police I do realize these are not binary diffs. But, seriously, linux has been using diff's as a way to save bandwidth before Windows even offered 'updates'. Another example of Windows 'innovation' I guess.
Yes, I see how it is neat that there is a binary version of this process with Windows but linux is primarily a source based operating system. It is that way becuase the software is designed to be compiled for a variety of systems and setups and work with all of them.
I do understand the authors question though, but it really should be reworded. Linux is not a OS in the sense that Windows is an OS. He should perhaps be more correctly asking when one of the 'binary' distributions of Linux (or of a Linux 'based' OS to be exact) will plan on offering this. Binary packages are really only offered on a per distribution basis with the binaries not being very compatible between distro's and systems (although some basic compatibility is generally there). As to that question who knows and who cares I use Gentoo, and after trying almost every one of the binary distro's
"Take that Lisa's beliefs!" - Homer Simpson
Delta compression requires the vendor to create a delta for each older version that you can upgrade from. So if a package has had ten updates, the next yupdate will need to have eleven deltas. I don't think so. Unless you want to do something like Windows Update where an agent scans your binaries and compares the difference with the update and then downloads individual files ... but that's a lot more complicated and isn't justified by the bandwidth savings.
SUSE already does this.
Nope. SuSE's "patches" are created by packaging all the files which are affected by a security fix; those files are packaged intact, without any delta compression.
Now, this is certainly a step forward from the common (eg, Debian, RedHat) approach of having people download a complete new package, including copies of files which haven't changed at all, but SuSE's approach is still suboptimal by more than an order of magnitude.
Tarsnap: Online backups for the truly paranoid
XDelta3 recently reached its first public release.
http://xdelta.org/xdelta3.html
XDelta3 is a library which is designed to foster exactly this kind of functionality. If distrobutions integrate the xdelta functionality into their package management framework we would be well on our way to what the poster is looking for.
Used to do this back in ye olde DOS shareware days. I think RTPatch was the most common of the commercial ones.
Opportunity knocks. Karma hunts you down.
But if you're posting to Slashdot on a Friday night, you probably don't have a friend's house to go over to.
It works beautifully but I can't help but think it is a waste of bandwidth.
Firstly, linux programs tend to be smaller than windows programs (do one thing, and do it well). Even a huge beast like tetex is 'only' 14.4MB -- compare to SP2... This has reduced the demand for delta compression.
Secondly, in the windows world people release rarely. However, the opposite is true in the linux world -- projects with daily releases are not unheard of, and weekly releases are fairly common. This means enumerating patches (v 3.4 -> v. 3.7) is infeasible in Linux where it is feasible in Windows.
More sophisticated algorithms than delta checksums do exist (as I guess you know if your thesis is on them) -- rolling checksums have been used in several projects I know of. However, there is a widespread rumour that these techniques are patented. I have never seen any evidence, but it puts a damper on any implementations.
There is a semi-vapourware project implementing all of this (part of the apache project IIRC). However the project fizzled away several years ago.
http://www.daemonology.net/bsdiff/
bsdiff and bspatch are tools for building and applying patches to binary files. By using suffix sorting (specifically, Larsson and Sadakane's qsufsort) and taking advantage of how executable files change, bsdiff routinely produces binary patches 50-80% smaller than those produced by Xdelta, and 15% smaller than those produced by
http://sourceforge.net/projects/diffball
A general delta compression/differencing suite for any platform that supports autoconf/automake, written in c, w/ builtin support for reading,writing, converting between multiple file formats, and an easy framework to drop in new algorithms.
Especially since the license of bsdiff is not even close to a BSD license (don't let the name of BSD Protection License fool you). Unless the license is changed to something like BSD, BSDiff is not going to be implemented anywhere except in closed source software. Debian cannot even package this software becauses it is non-free.
I guess the bottom line is if you want to have something accepted in open source *and* in propriatary software, you want to license under BSD. You want to cater to one group (closed source in this case), you will lose the other.
I always for example grab the "regular" tar.gz version of the kernel for two reasons,
1) I always forget the j option to tar, since bz2 packages are not that common. It should autodetect it.
2) I have the perception that the combined download time and unpacking is longer for bz2
Point two was subjective up until now, but just for the hell of it I decided to measure it. I used the time command to measure how long it took to download the kernels and how long it took to unpack them:
time to download linux-2.6.8.tar.bz2 1m4.414s
time to download linux-2.6.8.tar.gz 1m9.706s
time to unpack linux-2.6.8.tar.bz2 2m05.457s
time to unpack linux-2.6.8.tar.gz 0m26.309s
This is on a P4C 3.2GHz, 1GB RAM, 8Mbit connection. So there you have it, with a fast enough connection the difference is significantly in favor of the old gz format. The size difference between the bz2 and gz kernel, about 8.8 MB, is not nearly good enough to merit the slower unpacking. If you have a slower machine but also a slower connection the result is likely in the same ballpark.
This goes to show that if you want to provide faster (subjective) update times to users, especially in the future with faster connections, you have to study the problem in detail and not just blindly try to optimize some aspect of the process (size in this case) since the global performance might in fact perform worse. Premature optimization and all that... What's the time for patching using delta compression any way? If a 600KB RPM update can be delta compressed to 10KB, but the patching process takes longer than 15 seconds, I'm likely see a slow down in system update time.
It's like deja vu all over again.
Enter "deltup" a tool that looks at to tarrballs and gives you a diff between the 2 that you can use to "transform the old tarball to a exact copy of the new tarball", it even preserves MD5 checksums compatibility. Now some enterprising gentoo user create a "dynamic deltup server" that automates the creation of these delta files, and people can reuse the delta files that other people used.
Using this technique in combination with gentoo portage people can reduce there traffic with on average 75%.
Have a look at the following URL's for more information:
http://forums.gentoo.org/viewtopic.php?t=215262
m e.html
http://linux01.gwdg.de/~nlissne/deltup-status.ati
Rigolo