Slashdot Mirror


Delta Compression for Linux Security Patches?

cperciva asks: "For people without fast internet connections, it is often impractical to download large security patches. In order to avoid to reduce patch sizes, some operating systems -- starting with FreeBSD over a year ago, and recently followed by Mac OS X and Windows XP SP2 -- have started to use delta compression (also known as binary diffs, which constitutes a portion of my doctoral thesis), and can often reduce patch sizes by over a factor of 50. In light of the obvious benefits, I have to ask: When will Linux vendors follow suit?"

289 comments

  1. Suse by Anonymous Coward · · Score: 0

    I think Suse does this.

    1. Re:SUSE by drinkypoo · · Score: 1

      Any excuse is a good excuse for getting rid of RPM. First of all, RPM puts you in dependency hell. Second of all, there's no reason whatsoever that we can't just do away with RPM tomorrow. All you have to do is create an RPM of your new package manager and start distributing packages. Bonus points if you figure out a way to turn RPM repository information into repository information for your system, but it's not strictly necessary. Extra bonus points if you design all your applications to be installable anywhere and add support to your packaging system, to make it easier for people to maintain multiple versions.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:SUSE by cperciva · · Score: 3, Informative

      SUSE already does this.

      Nope. SuSE's "patches" are created by packaging all the files which are affected by a security fix; those files are packaged intact, without any delta compression.

      Now, this is certainly a step forward from the common (eg, Debian, RedHat) approach of having people download a complete new package, including copies of files which haven't changed at all, but SuSE's approach is still suboptimal by more than an order of magnitude.

    3. Re:SUSE by Anonymous Coward · · Score: 1, Insightful

      do NOT programs like yum get rid of the dependicy issues?

    4. Re:SUSE by drinkypoo · · Score: 1

      I wouldn't know, because I don't use rpm-based systems any more, and therefore I'm not going to do the research to find out. However, it needs to be an integral part of the system and it needs to be made mostly irrelevant. I've moved on to trading compile time for having things work, and I'm about to go start setting up my dual p3-500 to supplement my dual p3-1000 in the compiling department. (Eventually it's going to be my x86 cluster master and fileserver, but right now it can shovel coal.)

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    5. Re:SUSE by cabbey · · Score: 1

      As you point out, SuSE already does this, using, you guessed it, RPM. So the changes have already been made, they just need to be adopted by other distros (if they haven't already been pickedup upstream by redhat) and (probably more importantly) someone needs to document the complex magic required to create the patch rpms.

    6. Re:Suse by xScruffx · · Score: 1

      Sorcerer (and its derivatives, I'd imagine) have been doing this for about two, two-and-a-half years as well.

      xScruffx

    7. Re:SUSE by Anonymous Coward · · Score: 0
      RPM in general, however, doesn't nicely support this feature. Either RPM needs to be extended/modified, or a new format needs to be made.

      Not necessarily, not if you approach the problem a bit differently. On an RPM-based system, the goal is to get a new RPM downloaded, right? Well, you can apply binary diffs to the RPMs themselves instead of to the files they contain. Then you can keep, say, foo-1.3-i386.rpm and use binary diffs to deliver foo-1.4-i386.rpm to your system.

      Now, I don't know whether the RPM format includes compression as a built-in thing, but even if it does, this doesn't make it impossible to transfer an RPM this way. If it does, then you need to uncompress both the original and the to-be-downloaded RPM first before applying the binary diff algorithm. Then you update the uncompressed data and run compression again to reconstruct the original.

      Finally, when the whole thing is done, you can even compare md5 checksums or something of your RPM files to verify that you have correctly reproduced the RPM file. If it fails, then just download the whole thing.

      Anyway, the point is that the RPM system doesn't need to be changed. The only thing that needs to be changed is the part that downloads the RPMs, although this part may now need to have knowledge of the internal structure of RPMs.

    8. Re:SUSE by Anonymous Coward · · Score: 0

      But then, where does that leave the original package, lets say xyz_1.12.rpm? Who takes care of the version numbers on the original packages that is over-written with the patch rpm?

      And more importantly, if I uninstall xyz, rpm might complain (I don't know for sure) that the over-written files are corrupt; or, if you succed to uninstall it, then the files of the patch RPM is broken...

    9. Re:SUSE by jrexilius · · Score: 1

      Actually this would be a good debate.

      One question, being someone who compiles by hand any package exposed to the network, is what good does a binary diff do me there? I recognize that most people trust and rely on their distro's package manager, and I do as well for non-critical packages (no user has shell access to my machines so I can live with distro managed security updates there).

      If a package manager allowed downloading just the effected source or binary files and not the whole package that may be a more flexible way of reducing bandwidth but not limiting choice of the user.

    10. Re:SUSE by Anonymous Coward · · Score: 0

      I wouldn't know, because I don't use rpm-based systems any more, and therefore I'm not going to do the research to find out.

      You're an idiot. Do everyone a favour and keep your tedious and ignorant opinions to yourself until you can be bothered to check out, even in a simple way, what the fuck it is you are blathering about.

    11. Re:SUSE by juhaz · · Score: 1

      Not necessarily, not if you approach the problem a bit differently. On an RPM-based system, the goal is to get a new RPM downloaded, right? Well, you can apply binary diffs to the RPMs themselves instead of to the files they contain. Then you can keep, say, foo-1.3-i386.rpm and use binary diffs to deliver foo-1.4-i386.rpm to your system.

      The problem with that is that you need to keep the RPM's themselves in addition to files they contain, which can add to a quite a lot of wasted disk space on modern distro.

      No big deal for someone running an update server, but it's not an optimal solution for desktop user or anything like that.

      Now, I don't know whether the RPM format includes compression as a built-in thing, but even if it does

      It does, RPM packages are gzip compressed cpio archives.

    12. Re:SUSE by juhaz · · Score: 1

      Any excuse is a good excuse for getting rid of
      RPM.


      Mind to point a better replacement, or even a reason? Source does not count, so save your gentoo zealotism. Technically DPKG is on the same level with RPM, so it does not do either.

      First of all, RPM puts you in dependency hell.

      Yeah, it does, it's supposed to, that's the goddamn primary JOB of a package manager. "Dependency hell" is tedious if you need to resolve them by hand one by one, but apps such as yum, up2date and apt-rpm have automated that for years now.

      Second of all, there's no reason whatsoever that we can't just do away with RPM tomorrow.

      Third and most important, there's no reason (such as a working and better package manager existing) whatsoever to DO away with RPM, unless "we can" is a valid reason, which it isn't.

  2. Warez by shird · · Score: 0

    Patchers and crackxors have been using binary diffs for some time, to get rid of copyright protection, which is often just NOPing out a couple bytes. Linux is behind the times...

    --
    I.O.U One Sig.
    1. Re:Warez by Anonymous Coward · · Score: 1, Informative

      Well, that's a particular kind of binary diff. It gets harder if you want the process automated, you know, like it has to be if you want to do anything productive with it (as opposed to blotting out a contiguous chunk of someone else's code).

    2. Re:Warez by dtfinch · · Score: 2, Informative

      If I understand correctly, a binary diff goes a few steps further than a patch. It stores insertions and deletions while a typical patch (like IPS) only stores replacements, which is optimal for files patched in a hex editor and even most database files but not for recompiled files where everything can change in location.

    3. Re:Warez by Anonymous Coward · · Score: 0
      Patchers and crackxors have been using binary diffs for some time, to get rid of copyright protection, which is often just NOPing out a couple bytes.

      Heh, you need patches to do that?
      dd of=file_to_patch skip=$((0xoffset)) <<< $'\x90\x90\x90'
      all linux users need to know is the offset and count, and we can use dd. go back to your fisher price speak'n'spell, come and talk to me when you want a real os.
  3. Doesn't make as much sense to use for Linux by drinkypoo · · Score: 5, Informative

    Certainly for your primary commercial auto-updated Linux distributions it does, but for anything else it usually doesn't. What makes more sense (because it's easier) is breaking up media and programs, and distributing them separately so you don't have to update one when you update the other. Some projects do this already, and even package their sources this way.

    Personally I'd prefer to see binary distributions move to a model of using something like cvs, so you can just do a cvs up (or equivalent) and update everything. Some files would have to be marked to always be overwritten, while config files would be merged. This solves both your differential update problem (if the right system is used - I'm thinking that's pretty much not CVS but I don't know if there's a way to make it do all of that - CVS doesn't handle binaries amazingly intelligently from what I understand) and your updates in general. Plus, you can use it both for source and binary updates.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    1. Re:Doesn't make as much sense to use for Linux by Anonymous Coward · · Score: 0

      and this is why everyone loves Gentoo! I think...

    2. Re:Doesn't make as much sense to use for Linux by bluesguy_1 · · Score: 3, Interesting

      I disagree. I've used smartversion on Windows for a couple years now for making versioned archives of important files, and I wish Linux had something comparable. It's liked having a portable single tar.gz of an entire cvs repository without all the headaches...

    3. Re:Doesn't make as much sense to use for Linux by GweeDo · · Score: 3, Informative

      What you are requesting can already be done basically in Gentoo (emerge -Uupv world), Debian (apt-get something or other) and Redhat/Fedora (up2date something or other). So why do we need something else again :) Oh, with Gentoo...add ccache to that for faster compiles too

    4. Re:Doesn't make as much sense to use for Linux by drinkypoo · · Score: 3, Informative

      I use gentoo. I never have found any ccache settings that make much of a difference. None of these systems do a binary differential update, they download a whole package and install it, or in the case of gentoo, download a source package and compile it. Neither of these approaches are what is being called for in this article, nor what I suggest above.

      Mind you, I'm fine with the way gentoo does things, but I have a fairly powerful system - not incredibly fast by modern standards but faster than anything I've run linux on before, or probably any Unix at all for that matter. For a dialup user on an older computer, atomically differential updates would make a really big difference.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    5. Re:Doesn't make as much sense to use for Linux by morcego · · Score: 5, Informative

      I'm not sure about Gentoo, but I'm positive that is not what happens for Debian, RedHat, Fedora, Mandrake, SuSe, Conectiva etc.

      On those systems, when you do an upgrade (apt-get update), you will get a fresh package, including not only the files that changes, but all the files for that package. And if we have a package with 1 binary and 50 images, and only the binary changed, we get to download all the images again.

      Some distributions have been implementing package fragmentantion for this (package-core and package-images for this example), and that is a good thing for these cases, although it is a nightmare to manage. Not as fine grained as proposed by the grandparent post, but good enough for most cases.

      --
      morcego
    6. Re:Doesn't make as much sense to use for Linux by ArbitraryConstant · · Score: 1

      Yeah... except it's better not to use Gentoo at all.

      I shouldn't say that. It's better not to use Gentoo if you like to have a system that works after your updates more than, oh, 9 times in 10.

      I used Gentoo for almost a year total (most recently in august of this year). Compile-time options change without notice, major and easy to spot bugs slip through on a regular basis, and when something like KDE gets updated, it can be a week before things work again. That is, when you can get YOUR config files updated when you're not even sure it's something you did wrong.

      I figure Suse Pro paid for itself in about a weekend in the time I saved. Would you believe that not a single update has broken anything yet! What a concept. /yes I'm bitter, but it's all true

      --
      I rarely criticize things I don't care about.
    7. Re:Doesn't make as much sense to use for Linux by timeOday · · Score: 2, Interesting
      Binary updates are not a good fit for Gentoo! Not only because most people don't use the binary packages, but because in order to generate the diff, the server must know the exact contents of the file on your system, as well as the exact contents of the updated file. The number of different binary patches would be exponential in the number of compile switches, compiler versions, USE flags, and so on - for both "old" and "new" file versions, so square it again!

      I guess if you reall wanted to be clever, you could send the server enough information (your existing package versions, make and USE flags etc), plus the desired flags for your new file. With this the server could compile a binary matching yours, then compile the new binary for you, then make a binary diff. But that, as Kramer would say, is "kooky talk."

    8. Re:Doesn't make as much sense to use for Linux by advocate_one · · Score: 2, Informative
      Binary updates are not a good fit for Gentoo! Not only because most people don't use the binary packages, but because in order to generate the diff, the server must know the exact contents of the file on your system, as well as the exact contents of the updated file.""

      No, those "binary" diffs for Gentoo would be done against the sources used for the previous version of the gentoo "package", which would then be used to download the diff so that the gentoo computer could then construct new sources to build against. It would require gentoo computers to keep the sources rather than discard them to save space.

      --
      Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
    9. Re:Doesn't make as much sense to use for Linux by EsbenMoseHansen · · Score: 1

      What features do you need, exactly? Have you tried googling?

      --
      Religion is regarded by the common people as true, by the wise as false, and by rulers as useful.
    10. Re:Doesn't make as much sense to use for Linux by Kristoffer+Lunden · · Score: 1

      Gentoo downloads the complete packages and builds them. Yes, the packages are separate and dependencies are tracked, but that is not at all the same thing as above. Gentoo does to some extent use a clever system of small patches that is applied to older source and then builds, but generally, you will see complete downloads.

      Binary patching is not really a good match for Gentoo at all, due to the nature of the distriution.

      In general, Gentoo, at least for the home user (like me), is a distribution that more or less needs broadband of some sort. Even the sync ("what's new") can take minutes via rsync on broadband. Not to mention the actual packages... Not that it isn't possible on dial-up, but then I'd go with something else that isn't as bandwidth intensive. Mind you. I'm fine with bandwidth intensive, because I have the cable for it.

      Now Debian, which I don't use, probably has an excellent match here, because of the prebuilding of their packages.

    11. Re:Doesn't make as much sense to use for Linux by lintux · · Score: 1

      Some distributions have been implementing package fragmentantion for this (package-core and package-images for this example)

      Yep, but still, in almost all cases, you have to upgrade both packages when a new version of the game comes out. For Debian, this splitting helps in another way: Instead of having the images in the archive for every supported architecture (more than ten, these days), they're now in the archive only once for every architecture.

      Don't know about other distro's, but usually such an images package is generated from the same package as the binaries, and both packages will get a higher version number on an upgrade. So the update tool will probably try to update them both.

    12. Re:Doesn't make as much sense to use for Linux by Anonymous Coward · · Score: 1, Interesting

      Especially with suse it is baaaad!

      Over the last weeks I got a couple kernel patches (each 50 Megs download! They always download the full packet! Dialup users will never be able to do this!) as well as a base kdepackage fix - which also resulted in about 20 megs download. I thin bdiff would _really_ benefit suse.

    13. Re:Doesn't make as much sense to use for Linux by sp0rk173 · · Score: 1

      Having used Gentoo for three years now, and using it right now, I can say my experience does not match yours. Not to be rude, but PEBKAC?

    14. Re:Doesn't make as much sense to use for Linux by JazzXP · · Score: 1

      I'll agree with that. I've been using Gentoo for about a month, and it's worked perfectly. I like it so much in fact, that I'm currently installing it on my laptop (literally downloading the kernel source now).

    15. Re:Doesn't make as much sense to use for Linux by MarkByers · · Score: 1

      Never use the '-U' (--upgradeonly) switch for emerge. This can break things. Use: emerge -auDv world && revdep-rebuild

      --
      I'll probably be modded down for this...
    16. Re:Doesn't make as much sense to use for Linux by Anonymous Coward · · Score: 0

      > On those systems, when you do an upgrade (apt-get update), you will get a fresh package, including not only the files that changes, but all the files for that package.

      With Debian the most impressive one was a typo in the install script of mozilla some months ago. If you updated daily you got 15mb twice for just two bytes you almost certainly were able to fix yourself before.

      cb

    17. Re:Doesn't make as much sense to use for Linux by cortana · · Score: 1
      On those systems, when you do an upgrade (apt-get update), you will get a fresh package, including not only the files that changes, but all the files for that package. And if we have a package with 1 binary and 50 images, and only the binary changed, we get to download all the images again.

      Then that package should be split into two packages. The foo-bin package would contian the small, arch-dependant binary, and foo-bata would contian the arch-independant image data. I think there is a note in Policy about when to do this, though I couldn't find it just now.

      Of course, this is not always done. The Mozilla package is sadly rather monolithic, and openoffice.org-bin actually does consist of 130 MB or so of code!

    18. Re:Doesn't make as much sense to use for Linux by Anonymous Coward · · Score: 0

      dear god, are you kidding? unix has had RCS since at least the early 80's.

      This functionality is so simple to replicate with rcs, i didnt even have to look it up. I'll bet you can do it with cvs, subversion, or anything else.

    19. Re:Doesn't make as much sense to use for Linux by ArbitraryConstant · · Score: 1

      I wouldn't be so quick to assume that.

      -Xinerama switched from being compiled in by default to being a compile time option that had to be specified in make.conf. I don't see a reasonable way I could have known that, at least not without spending hours and hours every week tracking changes.

      -There was a week or so where KDE would not compile because it required a tool that was masked. The only possible way for this bug to slip through was if no one had tried to compile it on a stable system before releasing it, not even once. I don't see a reasonable way I could have avoided this, as I actually did check the forums pretty extensively before I sync'd.

      -There has been a persistent problem where the second head on my computer is not initialized properly, causing the image to shake. This happened with XFree86 and X.org after the transition. I have no idea what causes it, but Suse, Fedora, Slackware, FreeBSD, and OpenBSD don't have the problem. They all used the same version of XFree86 that had the problem under Gentoo. I have yet to figure this one out, and probably never will.

      -The 2004.0 and 2004.1 LiveCDs just plain wouldn't boot on my computer, or would crash soon after boot. I don't have weird hardware. I don't see any way I could be responsible for this.

      In short, Gentoo is very powerful but I neither have the time nor the inclination to spend the kind of time required to keep it working. You will note, however, that some of the other OSes I tried (Slackware, FreeBSD, OpenBSD) do very little in the way of handholding. If I were simply an inept user, it's doubtful I could get those working properly.

      --
      I rarely criticize things I don't care about.
    20. Re:Doesn't make as much sense to use for Linux by Bert64 · · Score: 1

      Well gentoo downloads the complete new source and recompiles it, works well but doesn't solve the problem of time, infact it makes it worse.. Gentoo also doesn't download patches for packages when you already have the older source, it downloads the whole new version.
      What would be better for binary distributions is an rsync style system for updating the system binaries, rsync can do delta compression and wont touch any files that havent been modified.

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    21. Re:Doesn't make as much sense to use for Linux by isaidi · · Score: 1

      Have you tried Portage in Gentoo?
      http://www.gentoo.org/

      thats the closest i have found to CVS like update.

      i love the emerge tool

      config files are merged, might have to do some manual diff editing, and sources are updated and then u start recompiling.. :)

      $>emerge rsync
      $>emerge -U world ;)

    22. Re:Doesn't make as much sense to use for Linux by drinkypoo · · Score: 1

      Well, I was just saying I use gentoo and I'm fine with the way it does things now, I don't think that this is a good idea in any case. If you were to do anything you'd be autogenerating diffs from one source package to another. A compressed archive will differ too much from one version to another to be worth diffing, or so I suspect.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    23. Re:Doesn't make as much sense to use for Linux by drinkypoo · · Score: 1

      First of all, BSD's ports system is a CVS update. Second, a Score: 4 post under this topic that I wrote says that I use gentoo. Third, emerge sync works just find. Fourth, if you really want to update everything, you should do -Ud. I also specify -a (--ask) so that I get a confirmation request before I build.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    24. Re:Doesn't make as much sense to use for Linux by Rich0 · · Score: 1

      For gentoo it sometimes works this way and sometimes doesn't.

      When a patch is released, a new ebuild is created (an ebuild is basically a fetch-and-compile script). Often what will happen is that the ebuild will use the unpatched source, and apply the patch to it. In that case, if you've already downloaded the unpatched source then all your computer will download is the source patch and recompile.

      However, this system isn't intelligent enough to look at what you have and figure out the minimum you need to download to get to where you want to go. So, if you have linux-2.6.8-r2 and linux-2.6.8-r3 comes out, you'll probably just download a patch file. On the other hand, if you started with 2.6.8-r1, you'll probably download both the r2 and r3 patches and apply them both. And when 2.6.9 comes out instead of downloading the 2.6.8->2.6.9 patch you'll probably download all of 2.6.9.

      It works reasonably well, but not perfectly...

    25. Re:Doesn't make as much sense to use for Linux by advocate_one · · Score: 1
      if you're on dialup, Gentoo sucks... the only way to install it is to use a reference cd set, and then you're limited in how up to date you want the various components to be... same with a binay distro, they suck if you're on dialup... but at least with the old redhat, I could get hold of cds burnt by firms such as cheeplinux which had the contents of the update directories as of the night before ordering them.

      I downloaded some 150 mb of updates for Suse 9.1 last night and I'd have been most dischuffed on dialup, which is why I would look forward to a binary diff method of doing updates so that those on dialup would be able to stay up to date easily, and Microsofties couldn't point out how Xp has a binary diff update system...

      --
      Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
    26. Re:Doesn't make as much sense to use for Linux by jonadab · · Score: 1

      VMS has versioning built into the filesystem. This is high on my list of
      useful non-mainstream features that I'd like to see in more OSes.

      --
      Cut that out, or I will ship you to Norilsk in a box.
    27. Re:Doesn't make as much sense to use for Linux by sp0rk173 · · Score: 1

      Well, i wasn't assuming, just asking. Gentoo for the past couple of years has been slowly solidifying (it's a very young distro - younger than most of the other major ones). It did have it's problems in the past, but as far as things go now, it's pretty damn reliable. I'm not sure why the live CD's wouldn't boot, but thousands of other people have used them without error, the repeatability of your problem is clearly not universal, thus there must be something on your computer that isn't agreeing with it. Did you try any boot options? If it didn't boot, i can only think you didn't burn the disk right. I've booted both the 2004.0 and 2004.1 CDs without a problem, as have many many others. These all sound like transition bugs that get hammered out as time goes by. I guess i can't really say much to each particular issue other than, "it works for me." That's what it comes down to - you use what works for you. But, in the spirit of openmindedness, i would ask that you give it another try in about a year and see how it feels then.

    28. Re:Doesn't make as much sense to use for Linux by ArbitraryConstant · · Score: 1

      Keeping it running for a month isn't the problem. Keeping it running in a major transition (eg KDE 3.1 -> 3.2 or XFree86 -> X.org) is the problem.

      That, and the random bugs that get thrown in from time to time that you may not even notice.

      --
      I rarely criticize things I don't care about.
    29. Re:Doesn't make as much sense to use for Linux by ArbitraryConstant · · Score: 1

      "It did have it's problems in the past, but as far as things go now, it's pretty damn reliable."

      Unless there's a portage branch that I don't know about that undergoes proper regression testing, or the breakage is focused in areas I used and you didn't, you and I differ greatly on our definition of "reliable".

      "I'm not sure why the live CD's wouldn't boot, but thousands of other people have used them without error, the repeatability of your problem is clearly not universal, thus there must be something on your computer that isn't agreeing with it. Did you try any boot options? If it didn't boot, i can only think you didn't burn the disk right."

      Given that I've spent the last few years experimenting with OSes installed from disks I burnt, and all of them have been able to boot that machine, I think this is unlikely. I don't recall what troubleshooting I did when attempting to boot the machine, but I do remember following the installation docs extensively (as they were/are quite good), and I probably checked the forums (I did that a lot, I don't remember one instance from the other). If that didn't turn it up... well, then it was too hard to find.

      "These all sound like transition bugs that get hammered out as time goes by."

      This, though, is the problem. "Transition bugs" are introduced every time anything major gets updated, which happens quite a bit with a distro like Gentoo. It goes live before it's undergone sufficient testing. This means you have to keep a very close watch on the forums and what emerge wants to do, and it also means you have reduced freedom to update when there's a security patch. I don't see how this can be done in the amount of time I'm willing to devote to it.

      Gentoo does stuff that others can't. But it doesn't do what I need.

      --
      I rarely criticize things I don't care about.
    30. Re:Doesn't make as much sense to use for Linux by garaged · · Score: 1

      Come on !! Have you tried to update on dial up with rpm or in windows?? Is quite the same problem. Dial up is OLD, not to be used for most profesional systems now. A better solution is to get a better connection !! please!!

      --
      I'm positive, don't belive me look at my karma
  4. How about this... by ufoman · · Score: 1, Interesting

    You go over to a friends house that has broadband and a CD-Writer (both are very popular these days) and download the patches onto a CD-R and take it home?

    --
    The following statement is false.
    The previous statement is true.
    Welcome to my world.
    1. Re:How about this... by Anonymous Coward · · Score: 4, Funny

      But if you're posting to Slashdot on a Friday night, you probably don't have a friend's house to go over to.

    2. Re:How about this... by aimsmith · · Score: 1

      Here in South Africa the friend's broadband has hit the pathetic 3 gigabyte monthly download limit so she can't help.

    3. Re:How about this... by Anonymous Coward · · Score: 0

      aren't you doing the same thing?

    4. Re:How about this... by jonadab · · Score: 1

      > You go over to a friends house that has broadband and a CD-Writer (both very
      > popular these days) and download the patches onto a CD-R and take it home?

      Huh? This makes no sense. I read /., which means I'm probably a computer
      geek. Among other things, this means that if good broadband were *available*
      in my area, I'd have it. It's not, so I don't. Also, it means that most of my
      friends live hundreds of miles away, and the only regular contact I have with
      them is via the internet. "go over to a friend's house" is a nonsensical
      statement; by the time I could plan enough time off work and arrange for the
      trip, I could just use wget to download half a dozen ISO images over my dialup
      connection.

      --
      Cut that out, or I will ship you to Norilsk in a box.
    5. Re:How about this... by lachlan76 · · Score: 1

      That doesn't compete with the unsecured wireless LAN method. Or secured and you have a password, which works for me because my friend doesn't have a CD burner. And that other one is just lazy. And that other guy is just a shithead who says he'll download me Debian, and just keeps my CDs.

  5. Right after... by Fermier+de+Pomme+de · · Score: 4, Funny

    ... their biggest customers start using dialup.

  6. SP2? by keiferb · · Score: 5, Funny

    You mean to tell me that beast I downloaded was just a diff? Jesus H. Christ!

    1. Re:SP2? by cperciva · · Score: 4, Informative

      Sorry, the writeup was a bit unclear. Windows XP SP2 contains a new version of Windows Installer (or whatever they're calling it today). This new version includes support for downloading updates via binary diffs, and most updates to XP after this point should be done that way.

    2. Re:SP2? by dracvl · · Score: 4, Funny
      You mean to tell me that beast I downloaded was just a diff? Jesus H. Christ!

      If you look at the URL...

      http://www.microsoft.com/windows2000/techinfo/plan ning/redir-binarydelta.asp

      ...you will clearly see that what you downloaded was Windows 2000, with a binary patch that turned it into Windows XP SP2.

    3. Re:SP2? by York+the+Mysterious · · Score: 1

      Before even SP2 you get the BITS and HTTP 5.1 update the first time you login to Windows Update V5. The problem is it made the downloads faster, but the installs painfully slow.

      --

      Tim Smith - Ramblings from Nerd Land
    4. Re:SP2? by pommiekiwifruit · · Score: 1

      Ouch - a 75Mbyte path. And now I am downloading the 10 Mbyte patch to that patch already! (.net) And have to download huge patches to various software (e.g. Nero, 70Mbyte or so) to make them work with XP SP2... argh.

    5. Re:SP2? by cfuse · · Score: 1
      ...you will clearly see that what you downloaded was Windows 2000, with a binary patch that turned it into Windows XP SP2.

      It would have been 5Mb without all that fucking chrome ...

  7. as soon as it gets hacked in to RPM by sPaKr · · Score: 2, Insightful

    As soon as binary diffs get hacked into RPM then it might happen. binary diffs of one rpm to another later version wont really work as binary diffs are only small when they are produced on uncompressed, unecrypted data. The real issue is that linux doesnt really need binary diffs. Linux distros already have fine grain packages ( lots of little packegs not a few bigs ones). Security updates usally just require a one or very few packegs to be updated. Binary diffs only really make sense when you have huge packages that require a whole new package for upgrade. I bet the average RPM is about the same size as the minium binary diff from MS.

    1. Re:as soon as it gets hacked in to RPM by cperciva · · Score: 2, Insightful

      Binary diffs only really make sense when you have huge packages that require a whole new package for upgrade

      Binary diffs make sense any time you've got large files being updated. On my system, libssl (library archive + shared object file + profiled library) is 600kB; that's large enough to justify using a 10kB binary diff instead.

      I bet the average RPM is about the same size as the minium binary diff from MS.

      I can't say anything about Microsoft's patches directly, but the patches used by FreeBSD Update are on average 65 times smaller than the individual files being updated. As little faith as I have in Microsoft, I still doubt that they could produce patches which were sub-optimal by more than a factor of 50.

    2. Re:as soon as it gets hacked in to RPM by cbreaker · · Score: 1

      65 times smaller? So a patch that's normally 100k is now 1.5k?

      Maybe sometimes, but I don't see that happening on average.

      --
      - It's not the Macs I hate. It's Digg users. -
    3. Re:as soon as it gets hacked in to RPM by cperciva · · Score: 1

      65 times smaller? So a patch that's normally 100k is now 1.5k?

      Maybe sometimes, but I don't see that happening on average.


      Look at the statistics yourself. The average patch compression ratio (ie, [size of new file] / [size of patch file]) for FreeBSD Update is 66.404 right now. (Ignore the "Speedup due to patching" line -- that includes files which were downloaded before delta compression support was added.)

      In fact, my current development code produces patches around 30% smaller than that, but I haven't released it yet.

    4. Re:as soon as it gets hacked in to RPM by Waffle+Iron · · Score: 2, Informative
      The real issue is that linux doesnt really need binary diffs. Linux distros already have fine grain packages ( lots of little packegs not a few bigs ones). Security updates usally just require a one or very few packegs to be updated.

      I beg to differ. SuSE 9.1 came out only 5 months ago:

      $du -h /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/

      417M /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/i586
      14M /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/noarch
      431M /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/
      That's almost a whole CD worth of patches in half of a year. All of this is to correct for mistakes in probably no more than a few hundred total lines of code.
    5. Re:as soon as it gets hacked in to RPM by sPaKr · · Score: 2, Informative
      OK check out http://www.suse.com/us/private/download/updates/91 _i386.html turns out suse already is patch rpms. As for your 'CDs worth of updates, I say PFFFT' Digging into those directories we can see the updates include the New RPM, the PATCH rpm and instructions in both English and German. Example
      -rw-r--r-- 1 507 1002 62129 Apr 16 09:00 ypserv-2.12.1-44.1.i586.patch.rpm
      -rw-r--r--&nbsp ; 1 507 1002 125191 Apr 16 08:39 ypserv-2.12.1-44.1.i586.rpm
      -rw-r--r-- 1 507 1002 499 Apr 20 07:40 ypserv-2.12.1-44.1.i586_de.info
      -rw-r--r-- 1 507 1002 462 Apr 20 07:40 ypserv-2.12.1-44.1.i586_en.info
      lrwxrwxrwx 1 507 1002 27 May 27 06:02 ypserv.rpm -> ypserv-2.12.1-44.1.i586.rpm
      -rw-r--r-- 1 507 1002 44754 Aug 26 19:30 zlib-1.2.1-70.6.i586.patch.rpm
      -rw-r--r-- 1 507 1002 63682 Aug 26 10:18 zlib-1.2.1-70.6.i586.rpm
      -rw-r--r-- 1 507 1002 593 Sep 02 12:30 zlib-1.2.1-70.6.i586_de.info
      -rw-r--r-- 1 507 1002 553 Sep 02 12:30 zlib-1.2.1-70.6.i586_en.info
      -rw-r--r-- 1 507 1002 44291 Aug 26 19:30 zlib-devel-1.2.1-70.6.i586.patch.rpm
      -rw-r--r--&n bsp; 1 507 1002 66192 Aug 26 10:18 zlib-devel-1.2.1-70.6.i586.rpm
      -rw-r--r-- 1 507 1002 642 Sep 02 12:30 zlib-devel-1.2.1-70.6.i586_de.info
      -rw-r--r--&nbs p; 1 507 1002 599 Sep 02 12:30 zlib-devel-1.2.1-70.6.i586_en.info
      lrwxrwxrwx&nbs p; 1 507 1002 30 Sep 02 13:03 zlib-devel.rpm -> zlib-devel-1.2.1-70.6.i586.rpm
      lrwxrwxrwx 1 507 1002 24 Sep 02 13:03 zlib.rpm -> zlib-1.2.1-70.6.i586.rpm


      It looks like the total updates would be about half of that CD. And that would be full replacement RPMS. The patches seem to be only slighty smaller. And these are replacements RPMS for ALL possible installed packeges. Suse like most distros ships everything in rpms, but none installs all of them. I mean how many people install every Database, Every Development toolchain (that gnu ada compiler rocken your world?), Every desktop gui.. no one. Just as most people you have taken a small data point and whipped it into a full misunderstading. Statistics Lie, and Liers use Statistics. My Stat proff told me that, still true.
    6. Re:as soon as it gets hacked in to RPM by Anonymous Coward · · Score: 0

      Well, it seems like every other time I do an apt-get upgrade, I see I'm going to get a newer release of kdebase, kdelib, qt, qt-devel... and about once a month openoffice. That adds up to a lot of bytes to xfer pretty quick. AFAIK from what I've read, linux distros like Redhat have required quite a bit more data to be downloaded that MS in order to keep the system up to date. If MS is making binary diffs available now, then that margin will grow.

      I'm not to concerned, I have a decent connection to the net. Although I think my favorite method is to compile my system. In that respect, if you are downloading all the source, then it is easy to apply diffs that patch your local source tree and recompile.

    7. Re:as soon as it gets hacked in to RPM by Waffle+Iron · · Score: 1
      I don't have "everything installed". The bulk of the patch files on my system are 5 sets of kernel and kernel source RPMs at 50MB per set. I only have patches that got automatically downloaded for what is installed on my typical development workstation install, and that's what I showed.

      Obviously, SuSE didn't bother with deltas on those patches even though they supposedly have this technology; they sent the whole thing. The ".patch" versions for the kernel and many others aren't listed in SuSE's site either.

      Five kernel updates is redundant, but you have to download each one when it comes out, so the bandwidth still got used 5 times. On dialup, it would have been extremely annoying.

      Your assertion was that Linux always uses trivially small packages; you claimed that delta patches were therefore unnecessary anyway. I proved that assertion false with the example of how many large package updates have been downloaded to my real-world system in 5 months. My "statistics" are valid.

    8. Re:as soon as it gets hacked in to RPM by cowbutt · · Score: 1
      Actually, Red Hat were using binary diffs a long time ago - see rhmask. Of course, when they switched from shipping some proprietary software (CDE, Red Baron, Metrolink's(?) X11) to only shipping 100% FOSS, rhmask fell into disuse.

      It probably wouldn't take much to take rhmask and update it to use xdelta or something, though. Note what the xdelta manpage says about using it on compressed data, though:

      Gzip processing
      Attempting to compute a delta between compressed input files usually
      results in poor compression. This is because small differences between
      the original contents causes changes in the compression of whole blocks
      of data. To simplify things, Xdelta implements a special case for
      gzip(1) compressed files. If any version input to the delta command is
      recognized as having gzip compression, it will be automatically decom-
      pressed into a temporary location prior to comparison.

      [...]

      There is one potential problem when automatically processing gzip com-
      pressed files, which is that the recompressed content does not always
      match byte-for-byte with the original compressed content. The uncom-
      pressed content still matches, but if there is an external integrity
      check such as cryptographic signature verification, it may fail.

      That would clash with rpm's MD5 and GPG signature checking.

      --
    9. Re:as soon as it gets hacked in to RPM by LibrePensador · · Score: 1

      I disagree. Binary diffs are needed now.

      I have followed the release cycle of Mandrake 10 very closely and the amount of updates is huge, which is fine, it means that bugs are being addressed. However, the updates can come at 100MB at a time, simply because they just have no way of doing real patches and thus redownload the whole of Openoffice or kdelibs for a small change.

      I love Mandrake to death, but this is something that needs to be addressed as soon as possible. This issue has been enough of a showstopper that I have avoided putting Mandrake on my brother's computer until I can either get him on DSL or Mandrake makes its updates dial-up friendly.

      --
      Pragmatism as an ideology is not particularly pragmatic in the long term. Keep it in mind when you dismiss Free Software
    10. Re:as soon as it gets hacked in to RPM by Anonymous Coward · · Score: 0

      "Security updates usally just require a one or very few packegs to be updated."

      Tell that to the dialup user that needs to upgrade the suse kernel package => 50 megs download for a few byte changes :/

    11. Re:as soon as it gets hacked in to RPM by Anomalous+Cowturd · · Score: 1


      Binary diffs only really make sense when you have huge packages that require a whole new package for upgrade.


      Sort of like KDE and Gnome, then?

      --

      Java: the bastard demon spawn of C++ and Ada

    12. Re:as soon as it gets hacked in to RPM by Anonymous Coward · · Score: 0


      http://www.suse.com/us/private/download/updates/91 _i386.html


      So the people who are willing to pay for it get the 65-times benefit? Ahhh - the cognitive dissonance!

      Yes, I'm kidding. Sorry.

    13. Re:as soon as it gets hacked in to RPM by sPaKr · · Score: 1

      KDE and GNOME are not single packages. They are combinations of smaller packages. Waving the KDE and GNOE flags dont make your point, OpenOffice.Org might, but thats really just a statment against code bloat office suites

    14. Re:as soon as it gets hacked in to RPM by Anomalous+Cowturd · · Score: 1


      KDE and GNOME are not single packages. They are combinations of smaller packages. Waving the KDE and GNOE flags dont make your point, OpenOffice.Org might, but thats really just a statment against code bloat office suites


      My mistake, I didn't read it clearly enough. Anyway, it's still the same principle - an update to one thing causing updates of lots of other things. Consider it a statement against code bloat in general.

      --

      Java: the bastard demon spawn of C++ and Ada

    15. Re:as soon as it gets hacked in to RPM by fitten · · Score: 1

      Consider it a statement against code bloat in general.

      The Linux kernel is quite large these days and I've gotten patch notifications for it twice on SuSE 9.1 Professional in the past month or so....

  8. Mindvision by shawnce · · Score: 4, Informative

    The folks at mindvision made an installer/installer creation tool that allowed one to scan two different sets of files and directories to find differences between them (binary differences) and it would just package up those differences in the installer archive. In fact you could use it to diff and package delta between several versions at once. When the user ran the installer (really and updater) it would apply the binary patch to the file set as needed.

    I was using this tool over 7 years ago now on Mac OS so I don't see what is so new about this concept... but I am glad is looks like it starting to be used more.

    1. Re:Mindvision by v1 · · Score: 1

      Does anyone remember "rescompare" for the older macs? Same idea, you'd give it two apps and turn it loose, and it would isolate the differences for you to browse, and had the option to gennerate a self-contained executable patch file that would turn one of the images into the other. VERY handy tool for generating updates, and overall a very smart app to be able to find inserted and removed blocks of code cleanly without biting more than needed to be chewed, so to speak.

      Version 2.6 was (c) 1989-1996, so it's been around quite a long time too.

      --
      I work for the Department of Redundancy Department.
    2. Re:Mindvision by Anonymous Coward · · Score: 0

      I don't see what is so new about this concept...

      It's being used in highly visible projects. It wasn't that nobody did this before because they didn't know how; nobody did this before because they didn't need to. Updates are getting larger and more frequent, so the extra complexity really pays off.

  9. Re:Is this an Issue? by AhabTheArab · · Score: 4, Insightful

    Now with broadband being so popular, and still on the rise, is this really an issue?

    Yes, it is. I just switced to broadband less than two months ago. A lot of my friends are still on dialup. Also, do not forget rural areas which do not have access to broadband. You would be surprised how many people still have dialup, I believe the number of broadband users just recently surpassed the number of dialup users. This means, obviously, that nearly half of all internet users are still on dialup.

  10. Huh?? by themoodykid · · Score: 1, Insightful

    In light of the obvious benefits, I have to ask: When will Linux vendors follow suit?

    You make it sound like it's a sweeping trend and Linux is in the dark ages for not doing it. This is the first I've heard of this!

    Also, wouldn't normal source patches be compressed quite a bit more anyway b/c of the nature of redundancy in text? This is a benefit for binary-only systems as you say. Are there really a lot of users hurting because they just can't download all the new patched binaries?

    1. Re:Huh?? by damiam · · Score: 1

      95% of Linux systems (everything using apt, red carpet, yum, up2date, etc) recieve binary updates. On Debian unstable, my apt-get dist-upgrade after a month vacation runs about 200MB. Even over my crappy DSL, that's a few hours. If that could be reduced (and of course it could, often it's downloading all 50MB of OpenOffice for what was maybe a 50KB source patch), that'd save me a lot of time.

      --
      It's hard to be religious when certain people are never incinerated by bolts of lightning.
    2. Re:Huh?? by EvanED · · Score: 2, Insightful

      Um, I don't know about you, but I don't want to recompile for a security patch. I didn't even compile my system in the first place, it's just binaries.

    3. Re:Huh?? by Nutria · · Score: 1

      95% of Linux systems (everything using apt, red carpet, yum, up2date, etc) recieve binary updates. On Debian unstable, my apt-get dist-upgrade after a month vacation runs about 200MB. Even over my crappy DSL, that's a few hours. If that could be reduced (and of course it could, often it's downloading all 50MB of OpenOffice for what was maybe a 50KB source patch), that'd save me a lot of time.

      The problem is that 2 or three "dash versions" of packages may have been released in that month. Thus, the number of patches would be number_of_packages * number_of_versions (a very large number) and managing those packages would be a nightmare.

      So, the Debian Developers have deemed that the cost of "release early, release often" applied to a distro's worth of packages is too high, regarding binary diffs.

      --
      "I don't know, therefore Aliens" Wafflebox1
    4. Re:Huh?? by LnxAddct · · Score: 1

      200MB? wow, that times however many people update monthly leads to a ton of bandwidth. Hmm... that leads me to wonder if anyone has tried implementing a bit torrent style update system for a distro, seems like it'd speed up updates and save the project tons of cash on bandwidth.
      Regards,
      Steve

    5. Re:Huh?? by damiam · · Score: 1

      It's not like they have to manage any of that by hand. It'd be pretty easy to automate the whole thing; the maintainer would just upload the complete new package and the system could take care of the diffing. It'd take a bit more server space, but I think the payoffs would be quite worth it.

      --
      It's hard to be religious when certain people are never incinerated by bolts of lightning.
  11. Here the problem: by Sonic+McTails · · Score: 2, Interesting

    Linux makes it very easy to install new packages and upgrade packages from sources father away from the vendor. If a vendor tried to release a patch using delta versioning, it could totally wreck a system. Since neither RPM nor DPKG are designed to handle checking md5sum hashs against each file, and making sure the patch can be installed safely, it will have to wait until this feature is incomporaited into either system.

    --
    This signature was left intentionally blank.
    1. Re:Here the problem: by p00py · · Score: 1

      Err, I'm using RPM 4.3.1-0.3 and MD5Sums exist for all RPMs. It would be easy to verify them against valid versions of the binary patches, and abort patching any packages with files whose md5sum is different.

      --
      1+1=3 for sufficiently large values of 1.
    2. Re:Here the problem: by Kris_J · · Score: 1

      If you maintained a library of all the installation packages you've ever downloaded (assuming space is not an issue), a network installer could just look through your library, check hashes on matching filenames and download the smallest available new diff that can be used to build the desired new installation package. Then it doesn't even matter what's been installed or compiled, just what you've downloaded. Kind of like a cross between a cache and a progressive JPEG.

    3. Re:Here the problem: by kbmccarty · · Score: 1

      Here's another potential problem:

      % apt-cache show prelink

      ...

      Description: ELF prelinking utility to speed up dynamic linking
      The prelink package contains a utility which modifies ELF shared libraries
      and executables, so that far fewer relocations need to be resolved at
      runtime and thus programs come up faster.

      Used to make big C++ programs like OpenOffice, KDE, etc. load faster, but of course it has to modify the binaries to do so. I suppose this would break binary diffs against them.

      --
      - Kevin B. McCarty
  12. SUSE by DreadSpoon · · Score: 3, Interesting

    SUSE already does this.

    RPM in general, however, doesn't nicely support this feature. Either RPM needs to be extended/modified, or a new format needs to be made. While I favor a new format for many reasons other than this, modifying RPM is probably the best solution in order to provide backwards compatibility.

  13. Well... by iamdrscience · · Score: 2, Interesting

    On that topic, why does almost everybody distribute source code as gzipped tars instead of bzip2'ed tars (just about everybody that does use bzip2 also distributed gzips)? Sure, in the beginning gzip made more sense for people on slow machines, but nowadays the difference in the time it takes to decompress is trivial, whereas the compression benefits of bzip2 on text are phenomenal in my experience.

    1. Re:Well... by Attitude+Adjuster · · Score: 2
      Well, I don't know for sure, but here my $0.02 (twice).

      Ramble 1:In my admittedly limited experience (since '94), it was a while before traditional old-school Unix (tm) like OSF (now Digital Unix) and Solaris abandoned the encumbed compress/uncompress utilities and started having gzip.

      Even now, the old Solaris Ultra 10 sitting in the corner of my office doing nothing (running 5.7, which has had uptimes of a couple of years solid) doesn't have bzip2 - cant be arsed to ask the sysadmin to update it as I'm only using Linux these days, which is cheaper/faster/nicer/more capable for workstations than Solaris ever was.

      Ramble 2:The science software I use knows all about gzip - to save space I keep all my binary data files gzipped (I have 2 TB of disk space, but a lot of data) and the tools internally gunzip and re-gzip. That functionality hasn't been added for bzip (why? I have no idea) despite bzip2 being around for ages.

      So... maybe its social inertia preventing a complete move to bzip2, or maybe gzip is still more widely available than bzip2.

    2. Re:Well... by cperciva · · Score: 1

      One advantage of gzip is that it requires less memory to decompress. It probably doesn't matter if someone's old Pentium 90 with 16MB of RAM takes a while to decompress a file, but that machine will probably *never* successfully extract an archive compressed with bzip2 (at least with the default 900kB block size).

    3. Re:Well... by p3d0 · · Score: 2, Insightful
      bzip2 is a retarded name, for one thing. It makes it sound like it's in flux (gee, should I wait for bzip3?).

      They definitely should have done whatever was necessary to keep the name as just "bzip".

      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    4. Re:Well... by spitzak · · Score: 2, Insightful

      It would help a lot if tar would do it if you just provided -z instead of having to remember to provide -j. Come to think of it, it would be nice if tar just detected compression and you did not have to give it -z either! Can this be done?

    5. Re:Well... by cperciva · · Score: 1

      Come to think of it, it would be nice if tar just detected compression and you did not have to give it -z either! Can this be done?

      Yes. bsdtar does this.

    6. Re:Well... by random_static · · Score: 1

      actually, i used to use bzip (version 1, even) on a 16MB 486 back in the day. slow as fuck, but it did successfully (de)compress kernel tarballs and patches. bzip2 uses no more memory than v1 did, in my experience.

    7. Re:Well... by Mnemia · · Score: 1

      I think Gentoo has a policy of using bzip2 to compress all the source tarballs that they mirror themselves. Gzip is of course still used extensively for files that Gentoo legally cannot mirror on their own servers or cannot repackage from a binary format.

    8. Re:Well... by evilviper · · Score: 1

      bzip2 compresses only slightly better than gzip, but uses up MUCH more time to do it. It's more of an issue for the distributors than the users.

      bzip2 uses up much more memory just to uncompress. This makes low-end machines incredibly slow because it has to swap lots of data. On my Psion 5mx, it makes it impossible to uncompress bzip2 files, unless they were originally compressed with "-s" (not common).

      bzip2's legality is questionable. Nobody has done a patent search, so the methods it uses could very well be covered by patents.

      Decompression time is trivial only on very fast machines. Since you need a fast machine to run KDE at all, it makes sense they would require bzip2.

      bzip2 support isn't built into many applications yet. My browser can read a gzip compressed html file, but not a bzip2 compressed one. In an FTP session, I can just use "more" to read a gzip-compressed text file. But a bzip2 file, I would have to download locally, and uncompress it manually before I can read it.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    9. Re:Well... by Stinking+Pig · · Score: 1

      You mean like Mandrake does? Mandrake won't accept a src.rpm unless it's bzip2.

      --
      "Nothing was broken, and it's been fixed." -- Jon Carroll
    10. Re:Well... by Anonymous Coward · · Score: 0

      That functionality hasn't been added for bzip (why? I have no idea)

      bzip2 is about 10x slower than gzip for both compression and decompression. For on-the-fly compression it has to be gzip or lzf, otherwise it is perceptibly slow.

    11. Re:Well... by sploo22 · · Score: 1

      I don't see why not - just add a trivial front-end that parses the output of the file command. IIRC less already does this; typing "less foo.gz" decompresses the file on the fly.

      --
      Karma: Segmentation fault (tried to dereference a null post)
    12. Re:Well... by Sunspire · · Score: 5, Insightful

      I always for example grab the "regular" tar.gz version of the kernel for two reasons,

      1) I always forget the j option to tar, since bz2 packages are not that common. It should autodetect it.
      2) I have the perception that the combined download time and unpacking is longer for bz2

      Point two was subjective up until now, but just for the hell of it I decided to measure it. I used the time command to measure how long it took to download the kernels and how long it took to unpack them:

      time to download linux-2.6.8.tar.bz2 1m4.414s
      time to download linux-2.6.8.tar.gz 1m9.706s

      time to unpack linux-2.6.8.tar.bz2 2m05.457s
      time to unpack linux-2.6.8.tar.gz 0m26.309s

      This is on a P4C 3.2GHz, 1GB RAM, 8Mbit connection. So there you have it, with a fast enough connection the difference is significantly in favor of the old gz format. The size difference between the bz2 and gz kernel, about 8.8 MB, is not nearly good enough to merit the slower unpacking. If you have a slower machine but also a slower connection the result is likely in the same ballpark.

      This goes to show that if you want to provide faster (subjective) update times to users, especially in the future with faster connections, you have to study the problem in detail and not just blindly try to optimize some aspect of the process (size in this case) since the global performance might in fact perform worse. Premature optimization and all that... What's the time for patching using delta compression any way? If a 600KB RPM update can be delta compressed to 10KB, but the patching process takes longer than 15 seconds, I'm likely see a slow down in system update time.

      --
      It's like deja vu all over again.
    13. Re:Well... by Anonymous Coward · · Score: 1, Informative

      $ time wget http://www.kernel.org/pub/linux/kernel/v2.6/linux- 2.6.8.1.tar.bz2
      real 6m21.421s
      $ time tar jxf linux-2.6.8.1.tar.bz2
      real 1m10.380s

      See, it all depends.

    14. Re:Well... by perler · · Score: 1

      because

      a) the speed difference in decompressing is still extremly notably. (my servers are usually 1-2 ghz x86 machines..)

      b) bandwith is growing faster then processing power. means, in real live i'm quicker done when i /fast/ compress and later transfer a /bigger/ file then visa versa..

      PAT

    15. Re:Well... by Kjella · · Score: 1

      What's the time for patching using delta compression any way? If a 600KB RPM update can be delta compressed to 10KB, but the patching process takes longer than 15 seconds, I'm likely see a slow down in system update time.

      About the same. You get one extra reading from the original into memory, but this is pretty much identical to the size saved in the patch (e.g. you read a 600kB file and 10kB patch instead of a 600kB patch). The patching process is very fast, on the patcher's side (Creating a delta patch is more time-consuming).

      As for your numbers, I'd say there's a lot more people with less than 1/100th of your bandwidth (= 80kb/s) than 1/100th of your CPU power (30MHz). So I would say your example is skewed against using delta compression.

      Also note that most computers today have more free CPU power than bandwidth. If I was doing something pulling 100% load, then I probably wouldn't want updates and disk access running at the same time anyway. While network transfers there is often many of. As such, smaller download size is typically less deteriorating for the rest of the system's performance.

      Kjella

      --
      Live today, because you never know what tomorrow brings
    16. Re:Well... by True+Grit · · Score: 1
      with a fast enough connection

      You do realize that this added qualifier renders your comparison utterly meaningless?

      For someone with a 56k connection (which I had until 2 weeks ago) it doesn't take 2 minutes to download the kernel, it takes 12.

      Isn't this whole issue about helping those with low-bandwidth connections? Heck, we would be helping everyone by reducing the amount of data being pushed around, right?

      I wonder what a site like debian.org would save in bandwidth costs by converting dpkg and apt to using binary diffs on updates, rather than downloading the entire package every time? And no, micro-packaging (separating the binary executable from the program's "data" files) doesn't solve the problem (on Debian, besides kernel-source, look at the size of xemacs21-bin, or openoffice-bin). Think about it.....
    17. Re:Well... by samhalliday · · Score: 1

      what you just wrote is redundant without the equivalent times for the gzipped tarball.

    18. Re:Well... by Anonymous Coward · · Score: 0
      It may have been incomplete; but certainly not redundant since no one posted his info before.

      Now your post is simply flamebait because you posted neither times for a .gz nor a .bz2.

  14. use rsync by stonebeat.org · · Score: 2, Interesting

    delta based patch distribution on linux platform is quite easy. Just use RSYNC to sync application file to the source. I have used this technique of patching (i.e. RSYNC), to provide updates/patches to a in-house built application. Work very nicely.

    1. Re:use rsync by cperciva · · Score: 1

      Rsync is certainly good, but it has limitations. First, it is a protocol, which means that you need to be running a daemon (possible security issue), and it needs to be accessible (offline patching is impossible). Second, rsync tends to perform very poorly on compiled binaries, due to artifacts introduced in the linking process.

    2. Re:use rsync by Minna+Kirai · · Score: 1

      First, it is a protocol, which means that you need to be running a daemon (possible security issue)

      There is an rsyncd available, but that's just a rarely-used alternative to running it over rsh/ssh. The situation is similar to CVS: there is a natively daemon version, but smart people just run it via ssh.

      Although even if rsync was only a daemon, that'd still be backwards: the client of a protocol never needs to run a daemon to download!

    3. Re:use rsync by Anonymous Coward · · Score: 0

      Second, rsync tends to perform very poorly on compiled binaries, due to artifacts introduced in the linking process.

      Then so will the delta compression. I think most of the delta programs you see are children of rsync. The original rsync thesis even winds up explaining delta compression as further development of the rsync ideas.

    4. Re:use rsync by Anonymous Coward · · Score: 0

      " Work very nicely. "

      How kind of you to advise me.
      I do work very nicely, 'thou sometimes I leave the desk in a mess....

    5. Re:use rsync by cperciva · · Score: 1
      rsync tends to perform very poorly on compiled binaries, due to artifacts introduced in the linking process.
      Then so will the delta compression. I think most of the delta programs you see are children of rsync.

      Yes and no. Most delta compression programs suffer from the same difficulty which plagues rsync, but to a lesser extent because smaller block sizes are used when building a patch between two local files. bsdiff avoids that problem entirely by using a more sophisticated encoding method.

      It isn't unusual for bsdiff to provide 50-fold compression, xdelta to provide 10-fold compression, and rsync to save only 50% (ie, 2-fold compression).
    6. Re:use rsync by khanyisa · · Score: 1

      I think the solution is to store local caches of previous binary packages.
      Then tell rsync to get the new package based on the previous one. Then you can still do offline patching (download the patch first)
      The rsync protocol / intelligence could be improved to handle compiled binaries better.
      Anyway http is also a protocol ...

  15. I have to answer... by evilviper · · Score: 1
    In light of the obvious benefits, I have to ask: When will Linux vendors follow suit?

    Perhaps, right after they get a good package management system...

    I can't even imagine the mess that would be cause if someone tried to uninstall a binary-diff RPM/DEB.


    There are some rsync servers out there, which provide essentially the same service, and then some.

    Also, if download size is your #1 concern, why not download the source patches, and compile? A whole 10K may need to be downloaded...
    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    1. Re:I have to answer... by mrchaotica · · Score: 1

      You mentioned .rpm and .deb, what about .ebuild? : )

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

  16. RED HAT NETWORK WAS BROKEN by Anonymous Coward · · Score: 0, Troll

    Imagine my suprise when my servers, patched scrupulously on a rigorous schedule, suddenly were listed as needing 114 patches this morning!

    Yup, for the last month or so up2date has been failing to install packages - it downloads 'em, and makes lots of pretty hash marks like it is installing, then DOESN'T INSTALL THEM!

    Red Hat fixed this LESS THAN 24 HOURS AGO

    CHECK YOUR SERVERS PEOPLE

    OR BE ROOTED!

    1. Re:RED HAT NETWORK WAS BROKEN by hey · · Score: 0

      This posting is bogus.

  17. Re:Is this an Issue? by Another+Voice · · Score: 0

    Yes - not everyone got broadband infrastructure like in America/Canada/Korea/Japan. Even in Australia, where I live, broadband uptake is slow and have capped on most of the accounts - that's another story. Most of the ppl I know and friends are still on dialup due to cost.

  18. Re:Speaking of Linux and security... by Anonymous Coward · · Score: 1, Funny

    I have reason to believe that Daniel Lyons, author of the Forbes article is in fact the pen name of one John Doe, a 12 year old trailer park dwelling asexual dwarf who suffers from a severe dehabilitating mental illness known as "stupidity".

    Hope that helps.

  19. Gentoo by SuperBanana · · Score: 2, Interesting
    Jokes about gentoo aside, the source tarballs are cached in /var, and only removed when they exceed configured limits for max disk space. Patches are contained in the portage tree, along with the "ebuild" files which are the build instruction files.

    If the update is just a patch to the source, there's sometimes a minor revision made and an updated gentoo ebuild file and source code patch added to the portage tree, which is of course done via rsync. All in all, it's decently efficient. This mostly(I think) happens with unstable package versions, where a security update may make it into portage before the official project bumps their release, but that's not the case with stable stuff.

    I think for basic systems, compile time complaints are slightly exaggerated. My -original- celeron 450 isn't shabby at all at compiling most of the more basic system packages and server apps. Even glibc and gcc build with relative ease, and when I set up distcc amongst my three systems, it became even less of a hassle. Even without distcc, the time to clear out 50 packages of updates on a mail server is surprisingly low on a low-powered system.

    1. Re:Gentoo by bzBetty · · Score: 2, Informative

      Gentoo users should make sure they know about this its called deltup, basically a script for portage that grabs xdelta patches instead of downloading the entire file again. It seems to save me alot of bw anyway.

  20. Makes more sense for proprietary operating systems by iamdrscience · · Score: 1

    It makes a lot more sense for non open source operating systems because when you have the source, it means that there are going to be more people who compiled programs by themselves and thus are unable to use binary diffs. I'm sure the fact that FreeBSD has it is more because it's a neat hack than because a lot of people find it necessary (although I'm sure a few people do). So it's not really a necessity for linux distros to have this, but I bet that in the near future Debian and Redhat (maybe a few others) will setup functionality for binary diff patching just because.

  21. Re:Speaking of Linux and security... by Anonymous Coward · · Score: 0

    Yeah, I guess you are right. Hey wait a second, what about the direct quote from those people who really did pick Windows over Linux due to the lower TCO?

  22. Ummm... diffs? Not for Linux? Are you kidding? by !ucif3r · · Score: 5, Insightful

    Ok before I get berated by the karma (whoring) police I do realize these are not binary diffs. But, seriously, linux has been using diff's as a way to save bandwidth before Windows even offered 'updates'. Another example of Windows 'innovation' I guess.

    Yes, I see how it is neat that there is a binary version of this process with Windows but linux is primarily a source based operating system. It is that way becuase the software is designed to be compiled for a variety of systems and setups and work with all of them.

    I do understand the authors question though, but it really should be reworded. Linux is not a OS in the sense that Windows is an OS. He should perhaps be more correctly asking when one of the 'binary' distributions of Linux (or of a Linux 'based' OS to be exact) will plan on offering this. Binary packages are really only offered on a per distribution basis with the binaries not being very compatible between distro's and systems (although some basic compatibility is generally there). As to that question who knows and who cares I use Gentoo, and after trying almost every one of the binary distro's

    --
    "Take that Lisa's beliefs!" - Homer Simpson
    1. Re:Ummm... diffs? Not for Linux? Are you kidding? by cperciva · · Score: 1

      He should perhaps be more correctly asking when one of the 'binary' distributions of Linux (or of a Linux 'based' OS to be exact) will plan on offering this.

      I asked about Linux vendors... isn't that clear enough? Certainly when I hear "Linux vendors" I think "Redhat, SuSE, Mandrake, and other companies which make money by distributing operating systems built around the Linux kernel".

    2. Re:Ummm... diffs? Not for Linux? Are you kidding? by !ucif3r · · Score: 1

      Sorry about that, you are correct, you did. My own impatience has foiled me again.

      --
      "Take that Lisa's beliefs!" - Homer Simpson
    3. Re:Ummm... diffs? Not for Linux? Are you kidding? by !ucif3r · · Score: 1

      Sorry to double post, but I just checked your sig link and I suppose the question one might ask is if and when YOU might port your freeBSD update tool to one of the binary distributions ;-).

      --
      "Take that Lisa's beliefs!" - Homer Simpson
    4. Re:Ummm... diffs? Not for Linux? Are you kidding? by bergeron76 · · Score: 1

      What about the grammar police? Since I'm such a nerd, I think I'll assume that role for the next minute (my Karma's so bright I have to wear shades):

      As to that question who knows and who cares I use Gentoo, and after trying almost every one of the binary distro's

      a) As You really shouldn't start a sentence with a preposition.

      b) Run-on sentences are very hard to read and don't often tend to make very much sense since they are run-on sentences but I guess I shouldn't worry about it in this situation I just thought I'd point it out.

      c) I think you forgot to end your point

      d) with punctuation.

      --
      Don't think that a small group of dedicated individuals can't change the world. It's the only thing that ever has.
    5. Re:Ummm... diffs? Not for Linux? Are you kidding? by FiloEleven · · Score: 1

      You missed one. "...the binary distro's"

      The binary distro's what?

  23. Too complicated and confusing by avida · · Score: 4, Informative

    Delta compression requires the vendor to create a delta for each older version that you can upgrade from. So if a package has had ten updates, the next yupdate will need to have eleven deltas. I don't think so. Unless you want to do something like Windows Update where an agent scans your binaries and compares the difference with the update and then downloads individual files ... but that's a lot more complicated and isn't justified by the bandwidth savings.

    1. Re:Too complicated and confusing by Malc · · Score: 2, Insightful

      And how is this different from source code patches? It seems to me that they'll only provide patches from version to version, like they do with GNU Emacs. If you need to update multiple versions then you have to make a decision about going through 10 patches, or doing a full download of the desired version.

    2. Re:Too complicated and confusing by jackb_guppy · · Score: 1

      You are extactly right... I did limited delta patching in the 80's and 90's. All of our modules were under 64k so it was finding the set that actually changed and sending it out.

      Why is delta patching is coiming up.. becuase of poor design. Linking unnessary functions into a run time to save some time MAYBE. The only reason for the dalta patching the original person is asking about is because of poor development standards.

      Yes, this leads back to the Monolithic vs Micro Kernals arguements. Each has their advange... Monolithics tend for easier (and to some slopper) design. Micro tend for easier / localized maintanence.

      Delta patching is a wet dream for a fixing the issues in monolithic design and giving you a micro kernal "support".

      Hell if it was not for this sloppyness, MS could not say that: IE was intergrated in Windows and could not be removed because of the intergration.

    3. Re:Too complicated and confusing by Anonymous Coward · · Score: 0
      Delta compression requires the vendor to create a delta for each older version that you can upgrade from. So if a package has had ten updates, the next yupdate will need to have eleven deltas.

      That's not the only way to do it. The simplest way to go about it is to simply produce a delta between each version and its previous one. Then, if you have missed 10 updates, you simply download all the deltas all the way from your very old version up to the current one. Sure, the efficiency will drop some, but not that much since basically the same bits are being added and removed whether you do it in several tiny pieces or in bigger pieces.

      However, there is an even better way that still doesn't require you to have an update for every possible combination. Produce a delta between all releases in the sequence. But then, also produce a delta between every other item in the sequence, so that you produce a delta that takes you straight from 0 to 2, and one that goes straight from 2 to 4. Then also produce one that jumps four releases in the sequence, so that you have one that takes you from 0 to 4, and then another from 4 to 8. Continue with all powers of 2. Once you've done that, if you are N releases behind, the maximum number of deltas you would have to download is O(log N). If you are 65536 versions behind, you will still not have to download more than 32 deltas. :-) (It would be only 16 except that the endpoints might not fall at the most convenient spot, so you might have to download some extras.)

      So actually, this would take some extra work, certainly, but there is no technical problem that prevents this from being totally practical.

    4. Re:Too complicated and confusing by devilspgd · · Score: 1

      Assuming the files are properly versioned, the WU client would simply check the file's current version and send that to the WU server. The WU server would reply with all the diffs needed to get you up to speed, or if it would be faster, the updated file.

      It probably wouldn't be too hard to combine multiple diffs into one single diff and strip out any redundant or unnecessary modifications -- There are only a fixed number of versions in place between each service pack, and the service packs could be used as milestones (post-SP3 hotfixes might not install on pre-SP3 systems using a diff method)

      --
      Give a man a fish, he'll eat for a day, but teach a man to phish...
    5. Re:Too complicated and confusing by Spy+Hunter · · Score: 1

      Elegant, but too complex. Just keep a diff from the previous version to the current one, and if they don't have the previous version then just send them the whole file. That way you don't have to keep making and and storing tons of diffs, but the bandwidth for the most common case is still reduced substantially. You might want to keep two or three diffs around if people upgrade infrequently or the software is updated quickly, but after three versions the saved bandwidth is going to be very small compared with the overhead of maintaining this diff sequence (server disk space, processor time, complexity of associated software).

      --
      main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
    6. Re:Too complicated and confusing by mpcooke3 · · Score: 1

      I'm not sure it's that difficult. Rsync already does a lot of the work.

    7. Re:Too complicated and confusing by Keeper · · Score: 1

      Source code patches are text and generally follow a simple set of rules. Ie: replace this line of text (surrounded by these other lines of text) with this other line. Source code patches generally don't automatically resolve conflicts (ie: the line of test is different than the source, or the surrounding lines aren't quite what was expected). Even then, it's still possible for the patch to go bad, depending on what else has changed.

      Binary diffs don't have any rules other than the start/end point. It isn't really possible to intelligently change part of the binary unless the whole file is what you were expecting, as it isn't possible to make any reasonable assumptions that your change is 'compatible' with the other binary changes in the file. For example, the previous change may have inserted a new string, and added some code to use that string. The insertion changed the address of several other pieces of data, and the other do-dads that referenced that data were also fixed up. Let's say you have a second patch which was created before the first patch, and it knows nothing about this new string or the other data that got moved around. Anything that it patches is now wrong. Your binary is now useless.

    8. Re:Too complicated and confusing by SmittyTheBold · · Score: 1

      It isn't really possible to intelligently change part of the binary unless the whole file is what you were expecting,
      The only reasons the binary *wouldn't* be what you were expecting is if the end user had modified the binary, or the file is corrupted. About the only reference you'll find to an end user editing hex is when cracking or otherwise modifying an executable in a way the writer did not intend. Either way, the person providing the patch should be able to run a checksum algorithm on the installed file to verify that it is pristine, then apply the patch or re-install the complete file as needed.

      --
      ± 29 dB
    9. Re:Too complicated and confusing by hab136 · · Score: 1
      The only reasons the binary *wouldn't* be what you were expecting is if the end user had modified the binary, or the file is corrupted. About the only reference you'll find to an end user editing hex is when cracking or otherwise modifying an executable in a way the writer did not intend. Either way, the person providing the patch should be able to run a checksum algorithm on the installed file to verify that it is pristine, then apply the patch or re-install the complete file as needed.

      Unless the user has used UPX to compress their binaries, or strip to remove debugging data, both completely valid, normal, and reasonable reasons to have different binaries.

    10. Re:Too complicated and confusing by Keeper · · Score: 1

      You are exactly correct. That scenario is easy to support with source code diffs, not possible to support with binary diffs, which is why you either need to apply the diffs in a serialized order or have a huge matrix of diffs to apply based on the contents of the original file.

      Though your proposal to have the patch install the complete file if it wasn't what was expected amuses me, as it kind of defeats the purpose of sending the patch as a diff in the first place ...

    11. Re:Too complicated and confusing by SmittyTheBold · · Score: 1

      The only time I've seen UPX used in any serious fashion is in the unattended-install community for fitting more third-party software on Windows installer CDs. Similarly, stripping debug symbols from an executable is only of interest to a rather fringe group of users.

      I'm not going to go so far as to say "you shouldn't be doing that" but the fact is the vast majority of users will not; those motions may be valid and reasonable, but certainly are not "normal."

      I would imagine downloading the full file in such unusual circumstances would be far from voiding the benefit of binary patching seen in the majority of cases.

      --
      ± 29 dB
    12. Re:Too complicated and confusing by SmittyTheBold · · Score: 1

      What if you don't send the diff in the first place? To me, this whole procedure implies on-line patching; but we already have "network installers" and the like for patching a machine with no Internet access.

      The patch installer queries what version of software currently is installed. Then, it downloads a list of file names, sizes, and checksums to match against. It compares against what is currently installed, and either gets the serialized patches necessary to bring the installed files to current, or the full updated files in cases where the installed files are of undetermined status.

      I don't think providing a back-up mechanism to distribute full files defeats the purpose at all; in fact, it covers important corner cases that cannot be dealt with in any other sane manner while still providing the benefits of file patching to the vast majority of users.

      --
      ± 29 dB
  24. Yes, Linux has low HW mins by AHumbleOpinion · · Score: 1

    Yes, because one of the nice things about Linux is that it has relatively low hardware requirements.

  25. Shhhh! by Malc · · Score: 1

    This was supposed to be the last word.

  26. What exactly is binary diff/delta compression ? by Anonymous Coward · · Score: 1, Informative


    can someone explain for those people who have no idea what delta compression is and how it differs over something like zip/rar/gz/7z etc

    --ajs

    1. Re:What exactly is binary diff/delta compression ? by random_static · · Score: 1
      a "diff" is a file listing just the differences between two other files. (those two are assumed to be fairly similar. that's what makes diffs be small and efficient.)

      binary diffs are diffs on non-text, binary, files. turns out it's a little bit trickier to create these (and make them optimally small) than it is on text files, since you can't rely on there being newlines in a non-text file.

      delta compression is about sending a gzipped diff between file1 and file2, uncompressing it, and applying it to your already-extant copy of file1 to get file2, as opposed to just sending a gzipped file2. so long as file1 and file2 are similar, this can produce gi-normous "compression ratios".

    2. Re:What exactly is binary diff/delta compression ? by Dever · · Score: 1
      yeah, google.

      the second one down should do ya.

      --
      - I'd prefer not to.
    3. Re:What exactly is binary diff/delta compression ? by Darby · · Score: 1

      turns out it's a little bit trickier to create these (and make them optimally small) than it is on text files, since you can't rely on there being newlines in a non-text file.



      Why do the newlines matter?

    4. Re:What exactly is binary diff/delta compression ? by cowbutt · · Score: 1
      Binary diffs/deltas just record the changes from one binary file to another. If my original file was:
      123567890ABCEEE
      and I modified it to become:
      1234567890ABD
      Then convention would require me to distribute the second version as a replacement for the first. Using a binary diff, I could distribute a command file that represents something like:
      [insert,4,1]4;[delete,13,16];[append]D;
      (i.e. insert 1 character '4' at position 4, delete characters at original positions 13-16, then append D to the end). Of course, the command file itself probably wouldn't be using ASCII to represent those commands. A real command file might look something like:
      0xff 0x04 0x00 0x00 0x00 0x01 0x00 0x34
      0xfe 0x0d 0x00 0x00 0x00 0x10 0x00 0x00 0x00
      0xfd 0x44
      (where each of those is a single byte). For short files, like the example I've given, there isn't much of a win (or any!), but for longer files (e.g. the mozilla binary), it's much more likely that binary deltas will be shorter than a complete new copy.

      --

    5. Re:What exactly is binary diff/delta compression ? by random_static · · Score: 1
      Why do the newlines matter?
      in a text file, they make natural "region markers"; because of the way people write text files, changes are naturally constrained to being inside a line, or to be additions/deletions of lines, or some combination. that means you can break your text files up into one-line units, scan for differences/similarities between (extents of) those units, and likely have fairly good luck - even if your diff file format considers a line the smallest possible unit of change, that's not going to be outrageously wasteful even to change just one byte in a line.

      whereas with a binary file, if the changes touch bytes 3164, 3166 and 3190-3403 inclusive, without any line breaks anywhere in those ranges, you'll have to do a lot of byte-by-byte searching to figure it out. you'll have to look for insertions and deletions byte by byte, too, you can't assume that entire lines (with convenient newline end markers) will have been pushed in or cut out from between existing ones.

    6. Re:What exactly is binary diff/delta compression ? by Darby · · Score: 1

      Aha.

      Thanks.

    7. Re:What exactly is binary diff/delta compression ? by hostyle · · Score: 0

      You have file-old.tar.gz and file-new.tar.gz. A delta is the difference between the two files. Instead of downloading the complete file-new.tar.gz, you just download the differences between the two files (and then the entire file-new.tar.gz is created locally via your "delta-package-manager").

      --
      Caesar si viveret, ad remum dareris.
  27. Real hackers... by Black+Parrot · · Score: 2, Funny

    ...toggle their diffs in from the front panel.

    --
    Sheesh, evil *and* a jerk. -- Jade
    1. Re:Real hackers... by rampant+mac · · Score: 1
      I whispered that very same phrase into my wife's ear last night while I attempted to force myself upon her.

      She slapped me.

      I kinda liked it.

      --
      I like big butts and I cannot lie.
  28. Re:Makes more sense for proprietary operating syst by whovian · · Score: 1

    It makes a lot more sense for non open source operating systems

    Yes. Sun does this with Solaris, just SGI did with IRIX.

    --
    To-do List: Receive telemarketing call during a tornado warning. Check.
  29. Re:Makes more sense for proprietary operating syst by JamesKPolk · · Score: 1

    Maybe compiling source for every little update is fine for a hobbyist running Debian Unstable, but people just trying to run a server and get work done really would just like a quick little download to be done with it.

  30. XDelta3 by TheBashar · · Score: 4, Informative

    XDelta3 recently reached its first public release.

    http://xdelta.org/xdelta3.html

    XDelta3 is a library which is designed to foster exactly this kind of functionality. If distrobutions integrate the xdelta functionality into their package management framework we would be well on our way to what the poster is looking for.

    1. Re: XDelta3 by Omniscient+Ferret · · Score: 1

      I tried xdelta3 on a large text file; it was much slower & produced a much larger patch than xdelta1. That may be a pathologically bad example. Also, that's the first public release of xdelta3; the author's written that there's lots of tuning to be done.
      The article link for "binary diff" talks about a utility offering 50%-80% reduction from the equivalent xdelta1. If the point is saving space, I would compare the patch size first.
      The overhead of tracking deltas with the versions involved seems like enough to make me think about rsync instead. I think that's currently relatively CPU intensive on the server, though.

    2. Re:XDelta3 by Spy+Hunter · · Score: 2, Interesting

      Oh man, I just had a great idea. What if you incorporated XDelta3 into a Reiser4 filesystem plugin? Versioning built into the filesystem would be an *awesome* feature. I'm sure it's been done before on some other OS, but it could really go mainstream on Linux with Reiser4.

      --
      main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
    3. Re:XDelta3 by cowbutt · · Score: 1
      Versioning built into the filesystem would be an *awesome* feature. I'm sure it's been done before on some other OS

      Yeah, VMS.

      --

  31. Suse does this, partially by osho_gg · · Score: 1

    Suse already does this, but partially. Suse sends out patches for ascii configurations etc. and probably some non-ascii stuff as well. But, for things like kernel upgrade, KDE version upgrade etc. (where the diff is really needed due to the large size) - this is not done. I do not know why it is not done. It is either that it is not really feasible to do a binary diff across kernel version or KDE version changes or the technology is just not in place yet. Osho

  32. Shareware by slittle · · Score: 3, Insightful

    Used to do this back in ye olde DOS shareware days. I think RTPatch was the most common of the commercial ones.

    --
    Opportunity knocks. Karma hunts you down.
    1. Re:Shareware by Jordy · · Score: 1

      Yep, RTPatch is some 12 years old and has been able to do binary diff patches since the beginning. Heck, you could even package patches against patches in order to work against multiple versions of the installed software.

      It actually still exists and there are versions available for Linux, DOS, Windows, etc. I imagine the support for binary patches in WISE and InstallShield have hurt them quite a bit.

      --
      The world is neither black nor white nor good nor evil, only many shades of CowboyNeal.
    2. Re:Shareware by Anonymous Coward · · Score: 0

      I don't know about that. I know that America's Army, for one, uses RTPatch a lot (OK, the patches are still freak'n huger than seems necessary, but that probably has more to do with the game than the patch program). That's just the example that comes to mind, because I've been using those patches regularly over time, but I've had a number of other games that used RTPatch for a few patches recently (and of course, Doom used it extensively for the umpteen patches that came out for that).

      One of the biggest problems with delta patches, though, is that if the original files were somehow modified, the delta either fails or just screws up. Then you're forced to do a re-install/repair. "Whole file" patches just overwrite the old files, so a little corruption here and there just disappears.

      This is especially important for things like configuration files, although a patch utility is mucking around with configuration files is also not such a good idea...

  33. Jigdo in debian by Anonymous Coward · · Score: 0

    Is another useful technique. Downloads only the changed portions of ISO images.

  34. Specifix Linux by ken_vandine · · Score: 1

    Specifix' Linux distribution, based on Conary, does similar things when transferring updates from the repository to the client machine. It only transfers files that have actually changed. It doesn't do xdelta-style binary diffs for various reasons, but that functionality could theoretically be implemented.

    http://wiki.specifixinc.com/

  35. Linux doesn't need it by Anonymous Coward · · Score: 0

    All the others are proprietary, and don't distributed source. Linux distributes source, and is totally free-as-in-beer, so a) source=text=good compression, b) not monolithic, so you can update just those parts you need, and c) everyone can get CDs from their friends without any worries.

  36. It is NOT. Download the stuff at work by Anonymous Coward · · Score: 0

    What is the big deal? Download the stuff at work and take it home. Everybody does it! I barely use the Internet at home, I download everything I need at work. My employer is Stanford University, and I had broadband 12 years ago,when I started work here, I have never ever thought getting broadband at home, I still use dailup, I dont need higher speeds at home.

  37. Gentoo Portage by WamBamBoozle · · Score: 4, Interesting
    I wonder why Gentoo doesn't do this. Gentoo, as far as I can tell, always distributes a bzip2'ed tar of any particular distribution.

    It works beautifully but I can't help but think it is a waste of bandwidth.

    1. Re:Gentoo Portage by haruchai · · Score: 3, Informative

      No reason why they couldn't. According to this page:
      http://www.daemonology.net/bsdiff/, this util is already in Gentoo.

      --
      Pain is merely failure leaving the body
  38. Re:Makes more sense for proprietary operating syst by Minna+Kirai · · Score: 1

    Maybe compiling source for every little update is fine for a hobbyist running Debian Unstable,

    I don't see how a Debian user would do that. Maybe you meant to say Gentoo. But Debian provides binary packages by default (although each is paired with its source code on the package server).

    Is it really possible to instruct Debian to compile all updates locally? You'd need a decent quantity of custom scripting to get it going.

  39. That method is not as effective by Anonymous Coward · · Score: 0

    as one might usually think. Example: each compiler uses some sort of optimization to create executable code. I guess that has a similar effect on the overall binary like modifying one byte and then gzipping the file: nothing will be the same...

  40. Re:Is this an Issue? by Anonymous Coward · · Score: 0

    I believe the number of broadband users just recently surpassed the number of dialup users. This means, obviously, that nearly half of all internet users are still on dialup.

    FFS, not everyone lives in the USA -- and those figures were for the USA only. Do some more research. (I'd guess it's more like 75% worldwide still on dialup, but who knows?)

  41. Re:Its well known by Mongo222 · · Score: 2, Informative

    What makes you think you need to take a server out of production while code compiles on it? I never have.

  42. Several reasons, but not all technical by lakeland · · Score: 3, Insightful

    Firstly, linux programs tend to be smaller than windows programs (do one thing, and do it well). Even a huge beast like tetex is 'only' 14.4MB -- compare to SP2... This has reduced the demand for delta compression.

    Secondly, in the windows world people release rarely. However, the opposite is true in the linux world -- projects with daily releases are not unheard of, and weekly releases are fairly common. This means enumerating patches (v 3.4 -> v. 3.7) is infeasible in Linux where it is feasible in Windows.

    More sophisticated algorithms than delta checksums do exist (as I guess you know if your thesis is on them) -- rolling checksums have been used in several projects I know of. However, there is a widespread rumour that these techniques are patented. I have never seen any evidence, but it puts a damper on any implementations.

    There is a semi-vapourware project implementing all of this (part of the apache project IIRC). However the project fizzled away several years ago.

    1. Re:Several reasons, but not all technical by tyrione · · Score: 3, Interesting
      Well congratulations.

      You point out TeTeX at 14+MB which is as bare as it gets for TeTeX, then comes the TeTeX-Doc and the TeTeX-Extra which by now we're up to over 50MBs.

      Oh and here is the real kicker. Debian has updated 2.02 3 if not 4 times this month. Now 150MB+ to over 200MBs of fixes? Nope. SP2 looks a bit smaller now don't it?

      And that doesn't even touch the -1,-2,..-20 Debian patches they keep spewing out for project after project.

      The only plus for a 56k access is they don't cap youru downloads on a monthly basis. The badside obviously is bandwidth, but for me its time down waiting for important packages like TeTeX to update.

      Having a SVN approach to patching systems makes sense. Or CVS if you prefer a different versioning system approach.

      It's already been said but it is worth repeating, especially when one runs KDE or GNOME. Just Build a freakin' base package and update us with Binary Images that are new or replaced, documentation that is new or revision updates and binaries to the executables, libraries, so on that change and not the mountains of innert parts that don't change.

      You can't tell me KDELIBS , KDEBASE needs to be completely rebuilt each .x revions or -x revision by Debian and by completely rebuilt I mean all the inert files that don't actually get touched during the build process other than to make sure some wallpaper image still exists. Hell the Wallpaper backdrops, etc should be add-ons, not part of the distributions. But then again I suppose everyone thinks we all have T1 access.

      K.I.S.S.

    2. Re:Several reasons, but not all technical by Flammon · · Score: 1

      The current packaging systems are not granular enough. A package should contain exactly 2 files, 1 for meta and the other for content - content being text, image, library, whatever. The system should be more like a relational database where it is properly normalised. That way the same data is not replicated all over the place like it is done now with the GPL licence on my Debian system and I would have the ability to update 1 file at a time which would be huge bandwith saver.

      Another nice feature to the package manager would be a Reiser4 plugin. This plugin would take the package's meta information and store it as meta files on the filesystem.

    3. Re:Several reasons, but not all technical by Anonymous Coward · · Score: 0

      The only plus for a 56k access??? I've never had usage caps with my internet access, from 28.8 modem, to the adsl line I use now, find a proper isp.

    4. Re:Several reasons, but not all technical by bfree · · Score: 1

      Your not really meant to update to every new revision of a debian package for no reason! The point of debian is that it works, and when you need new features that have arrived in something you should be able to find packages for a suitable version. If you want to update TeTeX every time they make any change to it (could even just be changing a dependency) then don't complain about the bandwidth you use doing it! Or are you suggesting that all of those packages were security releases? It's like insisting you have to get a new version of knoppix when they make a small update to fix a driver you don't even use! The second choice (rather then simply updating for features) is to run an update every day/week/month/quarter as balanced by your willingness to risk overhauling your system for no reason and the bandwidth it takes up!

      --

      Never underestimate the dark side of the Source

    5. Re:Several reasons, but not all technical by ewe2 · · Score: 1

      And how, pray, will binary diffs improve this situation? The download cost isn't the problem, it's a symptom. The rudimentary/non-existent mechanisms to evaluate packages before they're allowed distribution is the problem.

      Hence the gcc-3.3 changelog problem continues to repeat itself - having the means to split a package up into component packages should carry with it the means to prevent this kind of persistent error.

      Me, I'm glad they get around to updating KDE at all.

      --
      insecurity asks the wrong question irritation gives the wrong answer
    6. Re:Several reasons, but not all technical by Anonymous Coward · · Score: 0

      Obviously, you're not using Debian stable. I've been hoping for years that they'd release updates for it. ;)

  43. It's already doing it. by Mongo222 · · Score: 3, Interesting



    http://www.daemonology.net/bsdiff/

    bsdiff and bspatch are tools for building and applying patches to binary files. By using suffix sorting (specifically, Larsson and Sadakane's qsufsort) and taking advantage of how executable files change, bsdiff routinely produces binary patches 50-80% smaller than those produced by Xdelta, and 15% smaller than those produced by .RTPatch (a $2750/seat commercial patch tool).

    http://sourceforge.net/projects/diffball

    A general delta compression/differencing suite for any platform that supports autoconf/automake, written in c, w/ builtin support for reading,writing, converting between multiple file formats, and an easy framework to drop in new algorithms.

    1. Re:It's already doing it. by Anonymous Coward · · Score: 0

      Yeah, and the creater of diffball is also a portage dev. He made a GLEP for it (GLEP 25), but it hasn't been implemented yet. It would be very helpful for dialup users. But since it would require more mirror space; it has been greatly slowed in completion.

      http://www.gentoo.org/proj/en/glep/glep-0025.htm l

  44. What has become of our education system by Anonymous Coward · · Score: 0

    I would think that someone working on their doctoral thesis would be able to find answers all by themselves.

  45. My question is... by CaptainPinko · · Score: 1, Funny

    when will Gentoo get this? ;)

    --
    Your CPU is not doing anything else, at least do something.
    1. Re:My question is... by russ_allegro · · Score: 1

      You must have no read this weeks newsletter...
      http://www.gentoo.org/news/en/gwn/2 0040830-newslet ter.xml

  46. Re:Makes more sense for proprietary operating syst by Anonymous Coward · · Score: 0

    I run (too) many servers and in fact there is no distribution that ships the configurations we require, even excluding cases where we are running modified codebases. patching and rebuilding is trivial, supporting idiot users is not.

  47. Re:Its well known by Stevyn · · Score: 2, Insightful

    While it's more difficult to set up a system with Gentoo than Windows 2000, it's easier to maintain.

    This is probably because of portage. Precompiled packages coming from all differnet sources can be a bitch to maintain. Mandrake is my example for this if you ever want to update a package they don't have RPMs for. And as for compile time, I'd rather let the computer sit for a hour or two overnight compiling a huge package than having to deal with the dependencies myself.

  48. Patches? by dedazo · · Score: 1

    What patches???

    --
    Web2.0: I love when people Flickr my cuil and digg my boingboing until my google is reddit and I start to yahoo
  49. The technology is already here by Anonymous Coward · · Score: 0

    With xdelta the computer science part is already done. Just throw an uncompressed version of the old and new packages at it, and add that difference file to ftp://update.$DISTRIBUTION.org's overburdened harddisk. Apt-emerge now only has to reconstitute the already installed package (maybe tricky), or at least insist on the package from the install disk. Rub the two files together with xdelta, maybe run a checksum, and you have your new package.

  50. Doctoral thesis by Noah+Adler · · Score: 0, Troll

    In order to avoid to reduce patch sizes Maybe you should've done a doctoral thesis on reading over what you write before submitting it to Slashdot.

  51. Re:Is this an Issue? by ticktockticktock · · Score: 1

    Also, don't forget what you have to use (dial-up) when your cable does go out. It once took a full week after my cable went out for a cable repair guy to come out and fix the problem.

  52. Another problem by Captain+Tripps · · Score: 1

    That's not the only problem with the name. Some x86 kernels have a `make bzimage` build option to build a special big zipped image that does tricks to get around BIOS memory limits. But someone working on the m68k port, which doesn't have the limitation, thought that bzimage meant bzip'ed image, and so for a while the m68k kernel had the option of using a superior compression scheme in its bootloader. I think they finally removed it, though.

  53. Bah by Anonymous Coward · · Score: 0

    SuSE has patch.RPMs

  54. Re:Its well known by Anonymous Coward · · Score: 0

    There is one source for Windows security updates, its called Windows Update, and it takes care of dependencies too. And it installs right when its done downloading, no need for compiling, and it downloads quickly due to the technology described in the original article. No need to leave your production servers up with known, exposed security holes while compiling the update that would fix it; as soon as the patch is out, you can install it and you're good to go.

  55. License of BSDiff by gnuman99 · · Score: 3, Interesting
    Certainly for your primary commercial auto-updated Linux distributions it does, but for anything else it usually doesn't.

    Especially since the license of bsdiff is not even close to a BSD license (don't let the name of BSD Protection License fool you). Unless the license is changed to something like BSD, BSDiff is not going to be implemented anywhere except in closed source software. Debian cannot even package this software becauses it is non-free.

    I guess the bottom line is if you want to have something accepted in open source *and* in propriatary software, you want to license under BSD. You want to cater to one group (closed source in this case), you will lose the other.

    1. Re:License of BSDiff by SnowZero · · Score: 1

      Excellent post, I wish I had mod points.

      It's pretty strange, as the author must really hate the GPL derivatives even though he doesn't have beef with proprietary companies doing the same thing and not releasing any source at all. If he really likes OSS but not the GPL, he should just use BSD with the advertising clause. Good enough for most OSS distributions, but not GPL compatible.

    2. Re:License of BSDiff by Haeleth · · Score: 1

      The license of bsdiff, inconvenient as it is, is irrelevant to the discussion at hand, for the simple reason that bsdiff is not the only binary diff utility in the world. In my personal experience, xdelta normally does nearly as good a job, and that's GPL - and already available on most Linux systems. And doesn't plain old rsync use delta compression anyway?

      There are also various other alternatives, like jojodiff (also GPL), which I prefer over xdelta because it's more portable, simpler to incorporate into another application, and also produces marginally smaller deltas on the specific data sets I tend to be modifying.

  56. Re:Its well known by Stevyn · · Score: 1

    While windows update is generally good, one can see with SP2 that's part of windows update that it can break your computer. If this happened in linux, you might have a chance in fixing it. If this happens in windows, your chances of fixing it are less because of the lower free support. In effect, you're system becomes a hostage.

  57. Use a GUI tool by JThundley · · Score: 1

    Just use a simple gui tool like Ark or probably file roller to do it. Or write a simple shell script and call it "extract"

    #!/bin/bash
    TYPE=`file $1`
    if echo $TYPE | grep bzip &> /dev/null && echo $TYPE | grep tar &> /dev/null; then
    tar xvjf "$1"
    else if....
    fi
    Simple stuff :)

  58. Dependancy Hell by Air-conditioned+cowh · · Score: 1

    Someone correct me if I'm wrong. But isn't the reason RPMs are so particular about dependancies because whoever does the packaging doesn't research whether their app will actually work with an _older_ version of a distribution. Then, if it did work, they could define a broader set of other packages it would work with in the spec file.

    Other than the RPMs needlessly not installing in older environments, applications like urpmi, yum, yast and redcarpet take care of other dependancies painlessly.

  59. Sun, are you kidding me? by Ayanami+Rei · · Score: 1

    Sun just sends out tarballs with funky headers, for both installation packages and patches thereof.

    The only replace individual files, never binary diff.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  60. Fedora Core 3 DVD ISO could use this by linuxguy · · Score: 1


    The current Fedora Core 2 DVD ISO is about 5GB beast. It would sure be nice to only have to download a 500MB Fedore Core 3 DVD ISO patch when FC3 does become available later this year.

    1. Re:Fedora Core 3 DVD ISO could use this by JoeBuck · · Score: 1

      You should be able to use apt or yum to upgrade from the test release to the final release, and the Fedora team should be testing to assure that this flow works.

    2. Re:Fedora Core 3 DVD ISO could use this by linuxguy · · Score: 1

      Most all packages are going to be updated so you're going to end up downloading it all in the end. While yum/apt-get serve a purpose they dont save you a whole lot of bandwidth when upgrading your entire distro.

      Not to mention that I would like to have a DVD of the newly released distro to install on new conmputers and lend to friends and co-workers.

  61. Gentoo has the best package management yet! by commonloon · · Score: 0

    Having used Redhat AS, etc., Fedora, and SuSE, and even a while back BSD... I am so incredibly impressed w/ Gentoo

  62. Re:Its well known by Anonymous Coward · · Score: 0

    Why don't you tell your boss that you have left systems up in production use for days at a time with known security exploits on them?

  63. rzip by MikeCapone · · Score: 1

    Or even better, rzip.

    You can set it to have a buffer of up to 900 megs, as opposed to bzip2's 900k. So instead of looking for redundant information in small blocks of 900k, it looks for it in everything you compress (up to 900 megs).

    And surprisingly, I haven't found it to be noticeably slower than bzip2, even on my ancient hardware (the only thing is that if you want to use it to it's full potential, you need a lot of ram, but it'll work anyway without that.. just slower).

  64. Linux MS SP2 and = to FreeBSD already. by Anonymous Coward · · Score: 0

    Because it's retarded.

    Think about it. If you have to download binary diffs, then each update and each patch needs to be updated incramentally. And even if it isn't I still end up using lots of third party software that is aviable thru things like apt-get but not neccicarially thru official channels.

    For instance I use Fedora with Dag rpm's add-on. Buy updating my OS against both repositories I get the best of both worlds. I get easy reliable software from Fedora and all the extras that you don't normally get like libdvdcss from Dag.

    Now I as part of the update using apt-get against Dag, I updated APT itself.

    Now imagine that Fedora releases a binary diff against the official apt, and it updates against my apt-get from dag.

    I'd be SCREWED!!!

    When you update a Linux distro you not just doing system files.

    When Windows does a update it's only against the core OS. That makes diffs managable, but if your running Apache on W2k no Microsoft patch for it will ever exist.

    But Linux updates both core system AND all applications!

    The way of using Apt-get, Yum, Portage is MUCH superior then what any vendor out their can possibly provide using closed source software.

    With RPM's I can download them automaticly using Apt/Yum. I can burn them to a Cd. I can skip revisions when updating. I can choose what to update and what not to, I can update against 3rd parties, I can setup my own mirror with a simple ftp site for a business campus.

    Whatever.

    These diffs may sound like a good idea, but would make things a unmanageble mess. The developer's time and effort is worth a lot more then shaving 10 miuntes off of a download.

    Especially with Linux we actually have a REAL MULTITASKING OS. Which Ms still doesn't have.

    I can update my OS + play games + browse the internet + compile code + whatever without ever having to reboot (except kernel updates) or even stop what I am doing. Hell I can have it running in the background and not even give a shit about when it updates, realy.

    What we have is not nearly as time-critical as what hell you have to go thru when patching a MS OS.

    FreeBSD may do it, but they are much more tightly controlled enviroment then what linux is.

  65. FreeBSD support not official by IronChef · · Score: 1

    The FreeBSD link looks like some dude's pet project. Cool, but it is not the official method for distributing patches.

  66. Why do they make it sound new? by mr_zorg · · Score: 2, Insightful
    ...have started to use delta compression (also known as binary diffs...
    Why does the poster make this sound like a new technology? And why does one of the high ranked comments link to a Microsoft tech note from 3/04 talking about this new thing called Binary Diff Compression?

    What's so new about it? I remember working with InstallShield, RTPatch, and others, way back in the Windows 3.11 days... New? <yawn>

  67. this gives me an idea... by null-sRc · · Score: 1

    I see some people talking about linux isos being so huge, and patches and all;

    so it brought me to think on something I saw at a .net conference...

    the guy had an app check for an updated dll on a server each time it was used, so that it automatically patched itself...

    so in taking this one step further, take an office suite for example: i never use most of the features (and i know ms office has that install on first use option) why not extend this idea to all software, but have it dl itself... this would rid users of the typical bloat for unused features while keeping downloads and updates very modular :D

    think of it like a page table or something, you run word.exe... is it on the computer? no (miss) check the online location and download... now inside word user clicks insert word art.. is it on computer? no (miss) download from remote path and cache it

    user clicks spell check .. spell check on computer? yes! check version with online version, is there update? no! keep using this one

    that kinda thing. sounds fun! and complicated to impliment :|

    but really a tech support departments dream since all you have to say for a user to patch is: connect to the internet

    --
    -judging another only defines yourself
    1. Re:this gives me an idea... by Minna+Kirai · · Score: 1

      think of it like a page table or something, you run word.exe..

      Have you used Microsoft Office lately? It already attempts this, and it's poorly implemented. You frequently get a messaged "Office setup needs to install new components" when you click on a certain feature.

      It frequently gets broken so you have to go through the install progressbar each time you restart the program. (this was on slashdot 10 days ago)

    2. Re:this gives me an idea... by null-sRc · · Score: 1

      like i said:

      (and i know ms office has that install on first use option)

      and yes it's poorly implemented right now, and not to the extent i was thinking.

      their implementation is cd based only; and for install only; i was thinking move that idea to the online world, as well as for each use of a component it checks for an update.

      this could reduce the base install to be very fast, as well as very small.

      then everything should basically load from an online location for the first time (probably not the fastest right now, but with 30mbps fttp available soon, why the hell not :))

      --
      -judging another only defines yourself
    3. Re:this gives me an idea... by JamesGecko · · Score: 1

      Knowing Micro$oft, this would be slow, hard to use and bloated.
      It would be awesome in something like open office, however, provided that they did it the right way.

  68. -1, 0s on comments in own story -nt by Anonymous Coward · · Score: 0

    no
    text

  69. Patches are always deltas by Anonymous Coward · · Score: 0

    Patches are by definition deltas (== a difference between two things, comes from physics).

    Vendors only mis-used the term for whole-file downloads/updates.

  70. Oh, please... by UranusReallyHertz · · Score: 1

    I've downloaded huge files including SP2 over a shitty rural modem connection (right now I'm at 28.8. Sometimes, when the planets are in the correct alignment and Satan accepted my sacrifice, I even conect at 41.1. Sadly, it actually does feel noticably faster) The key is patience! A modem is a very good teacher of patience.

    --
    Smoking is an expensive, slow, and unreliable method of suicide.
  71. Won't happen on Linux by Donny+Smith · · Score: 1

    I don't think it will happen on Linux for the reason that it is "too free".

    o Gentoo - builds from sources so you can't ship binary diffs
    o RPM based - symlinking and nature of open source (lots of individuality between systems running the same version of OS; such as workarounds and such)
    o APT-GET - similar to RPM
    o Others - wouldn't know but it just doesn't sound feasible

    Some may call this insignificant but when you have to patch kernel for vulnerability then every minute could be important. Downloading a 30MB RPM to hundreds of systems, I don't like that... Well, binary diffs are definitively a better idea.

    On Windows this comes extremely useful with anti-virus definitions - I know before Norton used to have huge downloads whereas Sopohos used binary diffs that would significantly shorten exposure ot new viruses (especially for big corporations with hundreds and thousands of desktops).

    On Linux it's certainly possible but because of the way it is, it may take years before that can be done reliably...

    That's a standardization advantage that closed and semi-closed OS'es enjoy.

    1. Re:Won't happen on Linux by Anonymous Coward · · Score: 0

      Years? You are so wrong about gentoo:

      http://www.gentoo.org/proj/en/glep/glep-0025.htm l

      You can binary patch the tarball, then unpack it. This in fact makes it fairly easy to implement in gentoo.

    2. Re:Won't happen on Linux by Anonymous Coward · · Score: 0

      I think you misuse the word "standardization". (Is Microsoft offering this delta compression as a documented standard, or even providing API hooks on just their specific platform for anyone but themselves?) Methinks you meant to use "proprietary". But anyhow.

      I don't use binary kernels, so whenever I need to do a security update, I use the source patches, which are probably much smaller than a binary patch could be (since the compilation process tends to scramble data around).

      I think binary patches would be a godsend for Windows software (every patch I've ever downloaded seem to have consisted of 100's of MBs of files I already had, but were slightly tweaked in a few places), but I think you're overestimating the security implications of all this. Over a fast connection, the difference between downloading 1 MB and 10 MB or even 100 MB of data isn't that huge. Over a slow connection, nothing will save you! Bwahahahaha...

    3. Re:Won't happen on Linux by Anonymous Coward · · Score: 0

      Incidentally, any exploit will probably be much smaller than any binary patch, too. Just a few hundred bytes for root shell. Let's see any delta compression algorithm beat that.

  72. Re:Its well known by LibrePensador · · Score: 1

    What in god's name are you talking about? If Mandrake does not have a package,you build one.

    That is the proper way to maintain an rpm distribution. And it would serve you well to learn the power of urpmi.

    By the way, when was the last time that you fail to find a package for Mandrake? Put together, contrib,plf, elsac, just to name a few and you have thousands of packages available.

    Do you need to berate other distributions to feel better about whatever it is you run, Gentoo in this case?

    --
    Pragmatism as an ideology is not particularly pragmatic in the long term. Keep it in mind when you dismiss Free Software
  73. Its done already by lonemonk · · Score: 1


    Some distros do already. I hope the expectations are better than a factor of fifty, cause all largish systems are patching out of control.

  74. Gentoo now has "source delta's" reducing traffic by rigolo · · Score: 3, Interesting
    Well, gentoo is known for the fact that you download the source of every program and than start compiling. These sources are distributed in .tar.gz or .tar.bz form and can be very large. A version change (even a change from .0.0.1 to .0.0.2) has it's own tarball and therefor is downloaded again completly. But, the real changes between these 2 can be small.

    Enter "deltup" a tool that looks at to tarrballs and gives you a diff between the 2 that you can use to "transform the old tarball to a exact copy of the new tarball", it even preserves MD5 checksums compatibility. Now some enterprising gentoo user create a "dynamic deltup server" that automates the creation of these delta files, and people can reuse the delta files that other people used.


    Using this technique in combination with gentoo portage people can reduce there traffic with on average 75%.


    Have a look at the following URL's for more information:

    http://forums.gentoo.org/viewtopic.php?t=215262

    http://linux01.gwdg.de/~nlissne/deltup-status.atim e.html


    Rigolo

  75. Re:Its well known by Anonymous Coward · · Score: 0

    Well you know *snort* us gentoo users are just the most 3733t mother fuckers on the planet.. we just have to say gentoo rules everything so we can remind you stupid fuckers we took days to compile all of our software from the source! Becuase it is faster and you are slow.

    urpmi is stupid emerge rulez. urpmi on;y installs binaries emerge can take days to make a binary file

    get with the program get with gentoo, but by that time i will move to hurd, just to be a bad ass like i am now saying that i run gentoo...

  76. Binary Diff Patching in Windows by Anonymous Coward · · Score: 0

    is going to present Microsoft a lot of problems in the future I think. All it takes is for some virus/malware to get funky with a file that is targetted by a patch and the patch application will fail.

    Think about it:

    step 1. a worm gets hold starts taking over blaster style - and changes the file used to break into the system in random ways

    step 2. microsoft releases a binary difference patch to correct the issue

    step 3. patch fails because target file does not match original hash

    step 4. virus continues unabated

    step 5. ???

    step 6. profit?

    I have already played around alot with the tech MS is using to do this - it is built into Windows Installer - and it is EXTREMEMLY finnicky.

    SP2 supposedly includes Windows Installer 3.0 - maybe things are different there I haven't checked yet.

  77. Factor of 50 ? by Ricx · · Score: 1

    According to the link http://www.daemonology.net/bsdiff/ you get binaries 50%-80% smaller - not 50 times smaller. Obviously 50% is still good ... :)

    1. Re:Factor of 50 ? by MarkByers · · Score: 1
      No - it says:
      bsdiff routinely produces binary patches 50-80% smaller than those produced by Xdelta
      --
      I'll probably be modded down for this...
  78. Nope, not SuSE by Carl+T · · Score: 2, Informative
    The online update part of SuSE's Yast2 doesn't require you to not have modified any of the files it's upgrading, so it can't very well be a binary diff. And it takes forever for minor patches on big things, such as Mozilla or the kernel sources.

    Here's a snippet from a patch description used by the online update, to give you some idea of what it does:

    Filename: libsmbclient.rpm
    Label: Samba client library
    Series: i586
    Size: 689245 309267
    PatchRpmBasedOn: 2.2.8a-107
    PatchRpmSize: 689245 309244
    Buildtime: 1090501315
    BuiltFrom: samba-2.2.8a-220.src.rpm
    --

    This signature is not in the public domain.
  79. Diff for image compression? by UranusReallyHertz · · Score: 1

    MPEG basically makes use of this concept, only encoding the differences from frame to frame. I always wondered if this couldn't be used on a series of similar photos. Many model photoshoots tend to be a huge number of very similar photos. Could they not all be encoded as a series of diffs? Start with photo one, then store a diff between photo one and 2, then the diff between photo 2 and 3, etc. I would think this coudl have some really dramatic space savings. You maybe could even have the software automatically sort the pics, choosing the most similar photos to diff, but I assume this would require diffing every pic with every other, a nasty O(n^2) problem.

    --
    Smoking is an expensive, slow, and unreliable method of suicide.
    1. Re:Diff for image compression? by Anonymous Coward · · Score: 0

      All compression schemes use this principle; it's called "redundancy". LZ and the deflate algorithms make use of encoding the differences within and between blocks of data. JPEG and MPEG compare blocks within an image and a video stream. Delta compression is a generalized form of the LZ/deflate-type compression, providing an expanded set of operations, except it compares to pre-existing data (the source) in order to generate the updated data (the target).

      Now, MPEG is a lossy scheme based on signal processing techniques (Fourier transforms) and is almost completely unrelated (other than by the fact that they're both compression schemes) from delta compression, which is a lossless format (with as much as bit-level granularity).

      Doing a naive pixel-by-pixel comparison of two images or video frames would produce very poor compression results, as there usually just isn't that much similarity (GIF is a lossless image compression mechanism that makes use of LZW and run length encoding (RLE)). The high compression ratios on MPEG and JPEG (100-200 or more) are a result of fudging things a bit, but obviously this process can never reproduce the original signal exactly.

  80. You shouldn't have to get entire packages by Anomalous+Cowturd · · Score: 2, Interesting

    Apparently, most distributions have you download the entire package for each update, although there are efforts underway to break up sections a bit more (if I'm wrong, I apologize - I use BSD).

    It sounds like what's really needed is to build packages of just the updated files. The install manifest would just specify the files in the archive, so there shouldn't be any complaints about missing files. Or does that show my ignorance?

    Actually, if you wanted a more general scheme, the update server would build packages on the fly. The updater would send a list of files in a package to the server, which would return a set of files that need updating. You could use this to upgrade any system, regardless of distribution. You would just have to update the database for whatever to show that it was at the latest version.

    Yes, this would take up a lot more CPU time, and be pretty slow on response time, but the savings in bandwidth should be worth it. All the time the servers were waiting for the network card could be used compressing files.

    --

    Java: the bastard demon spawn of C++ and Ada

    1. Re:You shouldn't have to get entire packages by pe1chl · · Score: 1

      What you describe is like an rsync server that sends the correct files to the client. That can already be done, you just need someone to setup the rsync server.

      But indeed I have always wondered why I need to download an entire 25MB RPM for each and every security fix of some packages. Sometimes the fix is not even in the binary files, but in some config file or the installation script of the RPM!

      What you would expect is a small update, that depends on the original RPM (program-1.0) and upgrades it to a fixed program (program-1.0-2 or program-1.1). Installation of the fix RPM should apply the patch and change the RPM database to the new situation, just as if you had removed the old version installed the new version from a big RPM file.

    2. Re:You shouldn't have to get entire packages by Anonymous Coward · · Score: 0

      Quite - the point is you want complete packages.

      It makes sense to manage updates as complete packages, and the Debian way makes far more sense than the Windows Service Pack route (especially when you had to keep reinstalling the SPs - Windows really needs decent modularity). "Patches" are an unnecessary headache.

      So it comes down to how the package repositories mirror themselves. For me this is a "black box" issue - I don't care for as long as it happens.

      Efficient would be nice, but I maintain several Debian instances uptodate over a 64kbps link, and if I was just to sync security updates it would be a lot less bandwidth hungry.

      Security is overplayed - if the vulnerability is so bad it could affect a properly configured box downloading a patch, then you need to get it via other channels anyway.

      Part of the driving force for the Microsoft way is that they maintain a centralised distribution system (presumably to spy on us, otherwise why would they object to filesharing distribution of patches?). Where the software is FREE, anyone can decide it needs a distributed, or peer to peer, distribution mechanism.

      I don't make source diffs of my updates, anyone who wants to get into such headaches is free to do so, it is free software.

      Besides Microsoft patching has FAR bigger issues than the amount of bandwidth it takes up, it is just the bandwidth costs Microsoft money, where as crap user interfaces for patching cost their users time and money.

    3. Re:You shouldn't have to get entire packages by Proteus · · Score: 1

      I apologize - I use BSD

      I forgive you. Please come home now?

      Love,
      Tux

      --
      We may not imagine how our lives could be more frustrating and complex—but Congress can. – Cullen Hightower
  81. is there any real benefit? by borud · · Score: 2, Interesting
    Why does Linux need this? How many people have a connection which is so bad they really benefit from this?

    Sure it is always nice to have faster downloads. But is it worth the extra work involved in setting this up both at the distribution point and on the client side?

    I am not being rethorical. I am just wondering.

    1. Re:is there any real benefit? by Anonymous Coward · · Score: 0

      I am on a connection right now which this would be useful. My father only has dial-up and only uses the software that comes with the distro (either Mandrake or SuSE). Thus this would be great for him.

    2. Re:is there any real benefit? by pilybaby · · Score: 1
      How many people have a connection which is so bad they really benefit from this?
      Acording to Offcom, in the UK, 15% of homes have broadband. That's 85% still using dial up or nothing at all.
    3. Re:is there any real benefit? by kcb93x · · Score: 1

      I happen to still be on dialup. Despite having been told by USWest (which was bought by Qwest ~5 years ago) back at a Minnesota State Fair in about '99 or so that I'd have DSL in 6 months. They just put in the local trunk line, which I can now shoot. But they still haven't gotten my line to be able to use it...And no, cable's not in my area (never will...houses are too far apart) and satellite's too laggy and expensive.

      So I'm on dialup.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    4. Re:is there any real benefit? by qcubed · · Score: 1

      There are plenty of people with dialup remaining.

      Also, I have two philosophical reasons for supporting this:

      1. Small is beautiful, particularly when it comes to any sort of download. Even on broadband, I would rather get a 4mb patch as opposed to a 8mb patch. Not only that, but a 4mb patch would be a bit more portable. Also, why spend 30 minutes to get a patch which will modify one program that I don't use all that much when a compressed patch that takes 5 minutes will do the trick?

      2. Linux is supposed to be "for the people". Open-source and all that. If it's free, for everyone, why not put the money where the rhetoric is, and do this for those even on dialup? Let others be exclusionary. Let open-source be open to everyone.

    5. Re:is there any real benefit? by lachlan76 · · Score: 1

      How many people have a connection which is so bad they really benefit from this?

      I do. Almost all my friends do. I think it's safe to say that most people would benefit from this.

  82. Re:Its well known by Anonymous Coward · · Score: 0

    WTF are you compiling on a server that takes days to finish? Odds are it'll be on a timescale of minutes/10s of minutes, not hours. Unless it's X or KDE or something, shit doesn't take that long to compile on a decent spec system.

  83. Why not using PAR2 for the updates? by Anonymous Coward · · Score: 0


    PAR is good for protecting your files, and as far as i understand PAR2 it could be used in the same way to update older fileversions to new ones, right?

    http://parchive.sourceforge.net/

    that would save some bandwith...

  84. Re:Is this an Issue? by MoogMan · · Score: 1

    More to the point... As this is an issue generally for just 56k users, the linux kernel would need (proper) support for modems (read:winmodems) before this would be as effective as intended.

    I like the sound of using delta compression for making an "upgrade" iso. Say I wanted to upgrade from fedora core 2 to fedora core 3, I could just download an upgrade iso at a fraction of the size etc.

  85. If I remember correctly, Novell used it ages ago by deniea · · Score: 2, Informative

    I'm quite sure Novell has been doing it in the past. At least with the older versions of Netware (3.x and 4.x versions).

    You had the whole Novell NOS + couple of services in, lets say 100Mb or so, and you needed to update tons of NLM's. Just needed to download a quite small patch file (over a POTS line) that usually could fit on a floppy or so) and then it updated the loose NLM's.

    Nothing new I guess..

  86. Because it won't work by repvik · · Score: 1

    Yep. Gentoo can't use binary diffs, because they'll be guaranteed to fuck up the system.

    There are several reasons for this:
    1. CFLAGS. The users can adjust the optimizations of every binary compiled. Which means that *many* systems will not have identical binaries.
    2. Varying versions of gcc/binutils. GCC will produce different binaries depending on which revision you have used. So unless *everyone* uses the same version of GCC, you can be damn sure things will get stuffed.
    3. USE-flags. Every user can adjust the dependancies of every single package by adjusting the USE-flag. Which results in differences in the binary.

    Now, *source* patches would make a great difference. But at the moment, portage isn't intelligent enough to do this. I know there has been some discussion about this, and I was actually asked to submit this as a bug to bugzilla. I haven't done this yet (*blush*).

    I will submit it later today, unless it's already been done ;)

    1. Re:Because it won't work by WamBamBoozle · · Score: 1

      Yes. Source patches is what is needed. Silly me.

      For example, the set of patches needed to take us from mozilla 1.1 to mozilla 1.2 is much smaller than sending the entire source ball.

      Under /usr/portage there are a lot of bzipped patch files so clearly something like this is going on.

  87. Gentoo kinda has this by repvik · · Score: 1

    Of course not direct binary patches, since that'd be virtually impossible to implement on gentoo (Every system has different binaries).

    But *do* read this: http://forums.gentoo.org/viewtopic.php?t=215262
    ( downloads only a patch to the sources you already have)

  88. Re:Its well known by Mongo222 · · Score: 1

    Even it if did take "days at a time" to compile a simple security fix.. (which I've never seen take more than 30 minutes)

    It's still beats the hell out of waiting months for Microsoft to get arround to deciding to do something about the gaping security holes they have in thier offering, and praying that they didn't create a bigger problem with the fix than the original problem.

  89. Re:use rsync (and apply it to mass patching..?) by teqo · · Score: 1
    Yes, rsync is very convenient for sync'ing systems and saving bandwidth. When applying this to large-scale xdelta patch distribution, how about having a client-server scenario which works as follows:
    1. Have the client read the package database and send the package inventory to the server
    2. Have the server create a representation of all the packages involved in the client's installation, as some kind of 'virtual filesystem shnapshot'
    3. Have the server (virtually) apply the updates to the virtual image, getting an up-to-date image, and creating the xdelta between both
    4. Have the server send that delta and the client apply it, with some automagical final updates to the local package database on the client side

    While implementing this is nothing for the faint of heart and would suggest some huge resources on the server side at first sight, it would make xdelta updates possible, while taking care of all the different package combinations on all the possible systems to update.

    Now, would SuSE/Novell, RedHat or whoever please hire me in order to do this? ;)

    Any comments?

  90. Re:Is this an Issue? by teqo · · Score: 2, Informative
    Say I wanted to upgrade from fedora core 2 to fedora core 3, I could just download an upgrade iso at a fraction of the size etc.

    As somebody pointed out before in this article, there is rsync which minimizes transmitted data using some xdelta-like algorithm. This is not really new, and some sites offer anonymous rsync downloads for exact this reason.

    (Rumours were that some people actually use rsync in the following way to get the latest Debian ISOs from a collection of old, already downlod packages: They cat'ed all their packages together to one huge binary file and then ran rsync against the remote ISO image and that local file. Since most data was already in that file, only transmission of a few megabytes of new data and some data arranging had to happen....)

    Here I uttered a few quick&dirty thoughts (which most certainly somebody else has had before, as usual) on how rsync could help in mass patching, don't know if they are worth reading for you... .)

  91. My experiances by pilybaby · · Score: 1

    I did my final year project / dissertation on delta compression and created a java web service & GUI to allow the distribution of delta files that users could download and apply.

    It still requires a fair bit of work to make it very usable (hardly the best software engineering development ever), the Swing code is awful because I had to learn it in a week and it could do with some object serialisers on the data it returns. It worked ok though.

    If anyone's interested they can read my report [PDF] (2.3MB). The point of doing that project was the reason that it is a technology that was massivly under exploited. It is quite limited for some things however, especially compressed archives and to a certain extent binary compiled files. However if you want to compress tars of source code it's brilliant and massivly improves over zip technology.

    The package I used at the time was a java port of the xdelta project, javaxdelta. It had some bugs in it at the time however which meant that it didn't always work, I think from the discussion on their maling list that they've been fixed recently. I don't think it's as fast as the normal C++ xdelta implementation and xdelta as an algorithm isn't as good as some others, notably vcdiff and zdelta (see Suel & Memon, Algorithms for Delta Compression and Remote File Synchronization (2002))

    I'm happy though, there may be some money making opertunities for my project =)

  92. Join the Cooker list and bug Warly by leonbrooks · · Score: 1

    I did something like this for 10.0 (I wanted the patch indexes to be incremental) but my lone voice wasn't enough. If other people join Cooker, link to the article, and express themselves it may come to pass for 10.2 (10.1 is too far down the pipe already).

    It's not as simple as it seems, 'coz those RPMs are bzip2-compressed internally, so a simple binary diff isn't likely to help much; they'll need to do a special stream of RPMs that are binary diffed before compression, which will probably require (more) surgery to RPM as well.

    --
    Got time? Spend some of it coding or testing
  93. Different forms of binary diffs by davidwr · · Score: 2, Informative

    Not all diffs are alike. The simplest diffs are literal - "find string of bytes to replace and replace them with newbytes."

    Back in 1985, Apple did something a bit more sophisticated for their System Update 2.0 patch.

    Their binary files were structured. The particular structure was called a "resource fork" and had a collection of hundreds or thousands of usually-tiny "resources" which could be individually modified. As a made-up example, replace String ID 50 in the file "System" to "Version 2.0" where it previously was "Version 1.0" or replace one linked-in graphic with another.

    The patch program updated, deleted, and added individual resources.

    This is important for historical reasons:
    If Microsoft or anyone else gets the funny idea to patent "replacing parts of a structured file in an update mechanism" as a broad-scope patent and the USPTO grants the patent [which they probably will, out of ignorance], the patent will need to be challenged and narrowed significantly.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
    1. Re:Different forms of binary diffs by Anonymous Coward · · Score: 0

      Not only that, but the technique is fairly obvious to any practitioner familiar with the current state of the art, or whatever the gibber-jabber legalese is.

      For example, any database system performs these operations of updating specific parts of a structured file based on some sort of key mechanism. (Update record 52 with this new value...) I use the term database loosely here; such techniques are the basis of any file system or decent binary file format. (There are many crappy binary file formats which are nothing more than bit soup, but the good ones are broken down into possibly random accessible records and subrecords with well-defined structures.)

      I hadn't heard that individually updatable subunits was one of the benefits of Apple's resource fork system, but it was truly ingenious of them to design it right into the basic file format of their operating system. I've always admired the original Macintosh OS's design, even if some of it is somewhat primitive by modern standards (which is naturally understandable).

  94. Re:Frist Psot! by Anonymous Coward · · Score: 0

    Senator Frist, your Psot would be more "Fristy" if you weren't so damn slow.

  95. Re:Its well known by Stevyn · · Score: 1

    I'm not berating Mandrake. I've used both Mandrake and Gentoo and I'm giving my person opinions on their packaging systems. I had times with mandrake when I couldn't find precompiled binaries for packages and I had to go back to hunting down dependencies and dealing with "./config" errors.

    I've found that portage offers a lot more packages that are more up to date.

    My concern is people hear mandrake is the easiest distro, try it out, and go back to windows prematurely because they figured if they had trouble with mandrake, they'd surely have more trouble with other distros. I just want people to experiment around. I did, and I feel my opinions on the matter could help people out if they were frustrated with mandrake as I was.

  96. RPM and binary diff by Alan+Cox · · Score: 1

    RPM can support this. You need to package the rpm with rsync friendly gzip then on the target box assemble the bits you have on disk from the original and rsync the two. Thats cpu costly for the server end unfortunately.

    Rusty did a talk on the same things with dpkg a few years ago using rsync friendly dpkg formats to cut down international costs for Debian mirrors in AU

    1. Re:RPM and binary diff by cperciva · · Score: 1

      RPM can support this. You need to package the rpm with rsync friendly gzip then on the target box assemble the bits you have on disk from the original and rsync the two.

      As I've noted several times already, rsync does a really horrible job of synchronizing compiled binaries. Your rsync-friendly-deflate will allow you to effectively avoid retransmitting unmodified files (which, as also noted here, SuSE achieves simply by creating "patch" rpms with only the modified files), but it won't help you take advantage of the similarity between the old and new versions of any individual binary.

  97. gzip can be made to work better with rsync by Sits · · Score: 1

    One reason could be that gzip can be made to produce rsync friendlier archives.

    Often, changing even a small part of an archive will change it sufficently that rsync is forced to retransfer the entire lot. However, there is a patch for gzip (which I believe is in Fedora Core 2 and is scheduled for mainline gzip inclusion too) that produces rsync friendly archives at the cost of slightly larger archives.

  98. Prior Art by Anonymous Coward · · Score: 0

    I already do this. The utility I use to apply the patches is called "make".

  99. Sounds like *BSD Ports to me by nurb432 · · Score: 1

    Or that gentoo-emerge thingie.. But isnt that only source?

    Redhat's updater does that too.. and i know its binary..

    But the point is that concept of upgrading 'parts' is already here..

    --
    ---- Booth was a patriot ----
  100. Something to try by slashdotjunker · · Score: 2, Interesting
    Something that might be useful to try out is a patch method I developed for a MMORPG. The problem is a challenge since MMOs are constantly updating game files. Also, many of the files are very large art resources which might only change a small percentage from patch to patch. On top of this, players are always mesing with their files so you can't assume that any file is in any known state on the basis of some "patch level".

    The method I came up with was to use essentially the rsync algorithm, but I reversed it so that the computationally expensive parts were performed on the client side. The results of each computation were stored on the server side as a "patch" so that the computation was performed only once. This provided a patch system that was dynamic but without generating large server load.

    The advantages are:
    1. Patches are generated dynamically so the files can be in any state (truncated, too big, filled with garbage, missing enitirely, etc).
    2. The heavy computation is performed on the client side so that patch generation does not drive up server load.
    3. Computed patches are stored and reused, so a database of patches it quickly built up.
    4. Patches are efficient (based on binary diff).

    A detailed example follows; knowledge of rsync is required for understanding.

    For example, let's suppose you are releasing a new content upgrade. A particular file's signature has changed from F4A3 to 26B1. (For brevity I am using a 16-bit signature, in practice it is much much larger.) When the first client connects to the patch server it receives the updated list of file signatures. The client notices that the file is now old so it requests a patch for F4A3->26B1. The patch does not exist yet, so the reverse-rsync algorithm activates and the client calculates the F4A3->26B1 patch. When that patch has been generated it is returned to the patch server and all future clients can just download the patch and skip the reverse-rsync. After applying a patch, the result is checked to make sure that you actually ended up with 26B1. If you did not, extra rounds of patching are performed.

    Some Notes: These extra rounds are consequences of rsync and a security check as well to prevent bad patches from being uploaded to the server. And, normally the release maintainer would "pre-seed" the patch server by patching a clean current version to the new version just before release.

    1. Re:Something to try by Anonymous Coward · · Score: 0

      That's an interesting approach, using caching techniques and client-side processing to reduce server load. On the other hand, I could see how it'd be easy for a malicious user to force a huge number of updates by manipulating their files to force a re-patch operation, quickly taking up an ungodly amount of disk space on the server.

      I'm also assuming the 32-bit checksum was just for your example, since it's obviously insufficient if you want to prevent the upload of bad patches. A strong cryptographic hash like SHA-1 would be necessary. I'm not even sure if MD5 would be good enough, since it's vulnerable to a birthday attack, and the ability to generate an unlimited number of patches to search for collisions would make that easier. Granted, you probably wouldn't be able to sneak in a malicious payload, but you could probably find a corrupting payload. In fact, it'd be inevitable if only hashes were checked, since hashes have large number of collisions by nature.

      That's the whole problem with doing these calculations client side... the problem of corrupted assets is probably small enough that it's probably not worth worrying about, or caching the necessary patches. Patching between multiple version steps would probably make more sense, but it'd be better to adapt the algorithm to patch several times in a row instead.

    2. Re:Something to try by Anonymous Coward · · Score: 0

      Whoops, 16-bit checksum. I knew I was thinking that was a 16-bit checksum, but I fooled myself into thinking that it was a 32-bit checksum. In that case, it's probably not even good enough for a single large file. CRC-32 (used by Ethernet and ZModem) is generally considered the minimum for the large data transfers of today.

    3. Re:Something to try by slashdotjunker · · Score: 1
      ... I could see how it'd be easy for a malicious user to force a huge number of updates by manipulating their files to force a re-patch operation, quickly taking up an ungodly amount of disk space on the server.
      Ungodly? The size of the patch depends on the size of the binary diff. If the patches are very large then maybe direct download would be easier then using a differential patch system. Also, it would be quite trivial to stop an attack like this.
      A strong cryptographic hash like SHA-1 would be necessary.
      I don't think you really understand the rsync algorithm. Andrew Tridgell has an excellent discussion on signature size in section 4.1 of his PhD thesis. You should estimate the correct size for yourself since it depends on the size of your files. BTW, Tridgell asserts that 32 bits of strength is enough for files up to 64GB in size.
      Patching between multiple version steps would probably make more sense, but it'd be better to adapt the algorithm to patch several times in a row instead.
      I thought I mentioned this in my original post. A single pass is sometimes sufficient but not always. You must be ready to handle multiple passes.
  101. why not just tell it what to change? by aminorex · · Score: 1

    Rather than giving the computer the data, or even
    a delta patch from old data to new data, just tell
    it what 2k data blocks change to what new values,
    where the set of data block values is indexed by a
    hash and registration index. Store a map between
    and data blocks in
    DNS.

    OK, so I'm only half-joking.

    --
    -I like my women like I like my tea: green-
    1. Re:why not just tell it what to change? by Anonymous Coward · · Score: 0

      Other than the admittedly silly bit about storing this in DNS, this is essentially the rsync algorithm. Adding the silly bit about DNS, and you have what some people have been talking about doing with BitTorrent. I've been thinking about something along these lines for a UDP "spew" update system, where random UDP packets are fired off into the aether, and slotted into place at the receiving end... sorta shotgun approach to file transfers, rather than using TCP's incremental approach of send-send-send-retransmit-send-send-retransmit-ret ransmit-send...

  102. some Linux distros already do by Anonymous Coward · · Score: 0

    SuSE provides .patch.rpm files which are RPMs containing only the changed files in it. It's not a binary diff though.

  103. Re:Its well known by Anonymous Coward · · Score: 0

    If it's a security patch, it's probably a good idea, since compiling can take awhile and you don't want to leave your box on the net while there's a known vulnerability. On the other hand, you can just cross compile on another server (or better yet, your personal workstation, where the CPU time probably isn't needed and you can work interactively with the files quickly and offline), and upload the change later. Still doesn't eliminate the need to take the box down to keep out the script kiddies, but it does avoid the problem of loading down the server.