Slashdot Mirror


New Binary Diffing Algorithm Announced By Google

bheer writes "Today Google's Open-Source Chromium project announced a new compression technique called Courgette geared towards distributing really small updates. Courgette achieves smaller diffs (about 9x in one example) than standard binary-diffing algorithms like bsdiff by disassembling the code and sending the assembler diffs over the wire. This, the Chromium devs say, will allow them to send smaller, more frequent updates, making users more secure. Since this will be released as open source, it should make distributing updates a lot easier for the open-source community."

192 comments

  1. Google by Anonymous Coward · · Score: 0, Funny

    can suck my diff!

  2. dictionary by neonprimetime · · Score: 2, Informative

    courgette (kr-zht)

    n. Chiefly British

    A zucchini.

    1. Re:dictionary by koxkoxkox · · Score: 1

      it is from French. A cute word in French, but I can't imagine how it would be pronounced in English.

    2. Re:dictionary by 4D6963 · · Score: 1

      it is from French. A cute word in French

      Indeed, I couldn't help but lol when I first saw the name.

      --
      You just got troll'd!
    3. Re:dictionary by Anonymous Coward · · Score: 1, Informative
    4. Re:dictionary by Anonymous Coward · · Score: 0

      Who wouldn't want to get backdoored by a zucchini?

    5. Re:dictionary by MrMr · · Score: 2, Funny

      Stoopid Brits,
      Because the Americans pronounce the Italian word Zucchini flawlessly.

    6. Re:dictionary by Anonymous Coward · · Score: 1, Funny

      A faggot?

    7. Re:dictionary by Ed+Avis · · Score: 1

      A cute word in French, but I can't imagine how it would be pronounced in English.

      Think 'Corvette' and change the V to a soft J or zh sound.

      --
      -- Ed Avis ed@membled.com
    8. Re:dictionary by Big_Monkey_Bird · · Score: 2, Insightful

      zoo-kee-nee

    9. Re:dictionary by Anonymous Coward · · Score: 0

      Easy: check out its entry on dict.cc (an English<->German online dictionary), and click on the speaker icon in the leftmost column. They have both an automatically-generated computer version and a sound recording by a British user.

    10. Re:dictionary by 4D6963 · · Score: 1

      Thanks but in case you missed it I'm French, I know what that is.

      --
      You just got troll'd!
    11. Re:dictionary by youthoftoday · · Score: 1

      It's pronounced exactly the same as French. Only with a British accent.

      --
      -1 not first post
    12. Re:dictionary by xouumalperxe · · Score: 1

      Only if you read it "coor vet".

    13. Re:dictionary by 4D6963 · · Score: 1

      Yeah, I figured.. thanks for that?

      --
      You just got troll'd!
    14. Re:dictionary by Anonymous Coward · · Score: 0

      koos-jet

    15. Re:dictionary by TeXMaster · · Score: 1

      Stoopid Brits, Because the Americans pronounce the Italian word Zucchini flawlessly.

      So I guess we Italians are lucky that 'zucchini' is _not_ an Italian word (the Italian word being zucchina (s.) / zucchine (pl.), the latter being pronounced zook-kee-neh) ;-)

      --
      "I'm never quite so stupid as when I'm being smart" (Linus van Pelt)
    16. Re:dictionary by TeXMaster · · Score: 1

      He said "would_n't_"

      --
      "I'm never quite so stupid as when I'm being smart" (Linus van Pelt)
    17. Re:dictionary by daem0n1x · · Score: 1

      Coor jet

    18. Re:dictionary by MrMr · · Score: 1

      Sure it isn't dzook-kee-neh?

  3. uses a primitive automatic disassembler by flowsnake · · Score: 4, Insightful

    An interesting approach - I wonder if this would also work as well on compiled bytecode like .NET or Java uses?

    1. Re:uses a primitive automatic disassembler by hattig · · Score: 3, Insightful

      Why not just ship the affected .class files for Java? Disassemble the Jar that is being updated, replace/add/remove the class(es) as per the update instructions and rebuild the Jar.

      Chrome's problem is that a massive, bulky, chrome.dll file needs to be sent out with each update, and that it isn't easy to split it up into its constituent parts because it is lower level.

      It's not a new idea, but quite impressive that they've actually gone and worked it all out so that it is reliable. And nicer than patching via jumps (noop the old code, add a jump to new code, append new code to end).

    2. Re:uses a primitive automatic disassembler by Anonymous Coward · · Score: 0

      It's an interesting approach to a problem that should never have arisen. Why does everything have to be thrown into a 10MB dll? People are browsing on computers, not wristwatches, and there's no reason to abandon modular design to tangle things together for trivial performance gains. Google loves throwing its massive revenues at building and maintaining bad code that's obsessively optimized. Look at the unholy amounts of research and tricky implementation they put into Apps and Gmail and Wave.. inexplicably designing all their own widgets and developing the mind-bogglingly expensive Google App Engine into a Python- and Java- linkable library and stretching and extending the capabilities of Javascript and the DOM far beyond their intended purpose to get drag-and-drop, fancy transitions, live editing... all of this is standard fare in a desktop app but Google insists on coding it (inevitably ugly) to run in the browser. I should have expected the same thing from Chrome.

    3. Re:uses a primitive automatic disassembler by FunkyELF · · Score: 4, Insightful

      It's an interesting approach to a problem that should never have arisen. Why does everything have to be thrown into a 10MB dll? People are browsing on computers, not wristwatches, and there's no reason to abandon modular design to tangle things together for trivial performance gains.

      Just because the end result is a 10MB dll doesn't mean that the code and design isn't modular. Thats like saying Linux isn't modular because the live CDs / DVDs come in a single gigantic 4.7Gb .iso file.

    4. Re:uses a primitive automatic disassembler by Cyberax · · Score: 3, Interesting

      It would work fine, if you include Java/.NET specific disassemblers.

      In fact, you can already compress Java JARs about ten times by using pack200 algorithm (it works essentially the same way).

    5. Re:uses a primitive automatic disassembler by PCM2 · · Score: 2, Interesting

      An interesting approach - I wonder if this would also work as well on compiled bytecode like .NET or Java uses?

      I suspect that it would. I once heard Manuel De Icaza give a talk in the early days of the Mono project. He said the "bytecode" that came out of the C# compiler was analogous to the output that comes out of the front end of a C compiler. The JIT is the equivalent of the C compiler's back end, only it runs right before execution. I suspect that what Google's decompiler is doing is reverting the binary to something similar to the C compiler's internal representation -- and if so, this method would work pretty much the same for bytecode. But that's just a guess.

      --
      Breakfast served all day!
    6. Re:uses a primitive automatic disassembler by Imagix · · Score: 1

      If you disassemble and reassemble a jar file, then it would no longer be correctly signed.

    7. Re:uses a primitive automatic disassembler by pjt33 · · Score: 1

      If they ship the signature then the manifest can be updated. Shouldn't be a problem.

    8. Re:uses a primitive automatic disassembler by Grishnakh · · Score: 1

      Look at the unholy amounts of research and tricky implementation they put into Apps and Gmail and Wave.. inexplicably designing all their own widgets and developing the mind-bogglingly expensive Google App Engine into a Python- and Java- linkable library and stretching and extending the capabilities of Javascript and the DOM far beyond their intended purpose to get drag-and-drop, fancy transitions, live editing... all of this is standard fare in a desktop app but Google insists on coding it (inevitably ugly) to run in the browser.

      I think this is unfortunately completely unavoidable if you're going to run apps (like GMail) remotely, and within a web browser. Yes, it's a lot simpler to do this stuff in a desktop app, but Google doesn't have the luxury of making things like GMail a desktop app. A big part of the reason for using webmail in the first place is that you can run it from just about any computer, without needing to install special software. If you needed special software to run GMail, why even bother with it? You could just download Thunderbird or Mutt or whatever email client you like the best and download your email with POP or IMAP. People like webmail services like GMail because it frees them from being tied to using one computer for their email.

      However, running apps remotely doesn't require a web browser either. Unix users have been able to run apps remotely for almost 25 years now using the X Window System. However, that never caught on outside of Unix OSes and Linux, plus there's security issues. You can't exactly hop on your friend's XP box and run an X application from a remote server, unless he happens to have Exceed installed (for $$$).

      I do agree that modern web development is a nightmare, with PHP or other scripting engines, HTML, Javascript, etc. all mashed together to create a very unclean programming environment. It'd be a lot better if web development just went away altogether, replaced by something basically just like X, running apps remotely. Unfortunately, because of history, we're stuck with things the way they are.

    9. Re:uses a primitive automatic disassembler by youthoftoday · · Score: 1

      Yes but they'd have to assume that the head revision was exactly the same. What if people applied some patches but not others?

      --
      -1 not first post
    10. Re:uses a primitive automatic disassembler by turbidostato · · Score: 1

      "You can't exactly hop on your friend's XP box and run an X application from a remote server, unless he happens to have Exceed installed (for $$$)."

      Well, at least you can install the Cygwin environment for free, don't you?

    11. Re:uses a primitive automatic disassembler by 644bd346996 · · Score: 1

      That's a problem with or without digital signing.

    12. Re:uses a primitive automatic disassembler by disambiguated · · Score: 1

      I don't know about Java, but .net already has symbolic offsets, and so would not benefit from (or need) this. I'd guess that Java is the same.

    13. Re:uses a primitive automatic disassembler by WuphonsReach · · Score: 1

      You can't exactly hop on your friend's XP box and run an X application from a remote server, unless he happens to have Exceed installed (for $$$).

      Actually, you can. There do exist free X servers for Windows, see XMing.

      --
      Wolde you bothe eate your cake, and have your cake?
    14. Re:uses a primitive automatic disassembler by Grishnakh · · Score: 1

      Actually, you can't, unless your friend just happens to have one of these free X servers installed. How many Windows boxes have these things installed? Just having the software available for download somewhere doesn't cut it.

  4. Also less overhead for Google by Enderandrew · · Score: 4, Insightful

    Google has to pay the cost for maintaining servers and handling bandwidth for all the OS updates they push out. The more efficient they are in this process, the more money the save.

    The good news is that the same benefits could be applied to Red Hat, Ubuntu, openSUSE, etc. Lower costs helps the profitability of companies trying to make a profit on Linux.

    The end users also see benefits in that their packages download quicker. I'd be honestly pretty disappointed in any major distro that doesn't start implementing a binary diff solution around this.

    --
    http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
    1. Re:Also less overhead for Google by maxume · · Score: 2, Insightful

      Yeah, but it is probably more like unplugging wall warts than insulating your house.

      That is, it will be a hassle, be visible, and show a tiny benefit, all the while distracting from more worthwhile activity.

      --
      Nerd rage is the funniest rage.
    2. Re:Also less overhead for Google by pembo13 · · Score: 1

      > I'd be honestly pretty disappointed in any major distro that doesn't start implementing a binary diff solution around this.

      Fedora already has Presto using DeltaRPMs

      --
      "Thanks for all the money you paid to us. We've used it to buy off ISO among other things" -Microsoft
    3. Re:Also less overhead for Google by Ed+Avis · · Score: 2, Interesting

      Fedora is already using binary diffs to speed up downloading updates - see yum-presto. With a better binary diff algorithm, the RPM updates can hopefully be made even smaller. As the Google developers note, making smaller update packages isn't just a 'nice to have' - it really makes a difference in getting vulnerabilities patched faster and cuts the bandwidth bill for the vendor and its mirror sites. Remembering my experiences downloading updates over a 56k modem, I am also strongly in favour of anything that makes updating faster for the user.

      --
      -- Ed Avis ed@membled.com
    4. Re:Also less overhead for Google by Rayban · · Score: 2, Informative

      DeltaRPM uses bsdiff - an impressive but generic binary diff algorithm.

      Courgette is designed to replace this binary diff with one that understands compiled code well enough to optimize the binary diffs by a significant amount.

      --
      æeee!
    5. Re:Also less overhead for Google by mzs · · Score: 1

      Courgette uses bsdiff as well, after it breaks down into some parts that bsdiff better.

    6. Re:Also less overhead for Google by klui · · Score: 2, Interesting

      It's pretty cool they do this. As opposed to iTunes "updates" of 70+MB--really no updates at all since they're just the monolithic install package.

    7. Re:Also less overhead for Google by Anonymous Coward · · Score: 0

      Less overhead for Google but more overhead for everyone else. How much extra power is being consumed by making thousands, maybe millions of machines recompile code rather doing it a handful of times on Google's servers?

    8. Re:Also less overhead for Google by bill_mcgonigle · · Score: 1

      DeltaRPM also have to deal with non-code objects as well. I wonder how much of an average RPM is actually machine code.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    9. Re:Also less overhead for Google by Anonymous Coward · · Score: 0

      CPU is cheaper than bandwidth.

    10. Re:Also less overhead for Google by shird · · Score: 1

      I doubt this is of much concern to them. How many people will be running their new OS and doing updates, vs the bandwidth required for your average google maps session or you-tube video, done by many more people? Even the daily search results and ad serving to the content network would far exceed this.

      I'm pretty sure the amount of bandwidth would be trivial in comparison. It's not like Ubuntu or MS require people to re-download the entire OS to perform a patch.

      Besides which, with the network of mirrors they already have in place, they are quite capable of serving this at minimal relative cost.

      --
      I.O.U One Sig.
    11. Re:Also less overhead for Google by Enderandrew · · Score: 2, Insightful

      I don't run Ubuntu, but rather openSUSE. When I download updates (including weekly snapshot builds of KDE 4, Firefox and OpenOffice) I end up downloading around 3 gigs each time. That is practically the whole OS. Small binary diffs would make a huge difference here.

      I doubt I pull 3 gigs of bandwidth from all Google sites combined in a week.

      The other difference is that other Google sites generate revenue. The OS likely will not.

      --
      http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
    12. Re:Also less overhead for Google by xtracto · · Score: 1

      Google has to pay the cost for maintaining servers and handling bandwidth for all the OS updates they push out. The more efficient they are in this process, the more money the save.

      The good news is that the same benefits could be applied to Red Hat, Ubuntu, openSUSE, etc. Lower costs helps the profitability of companies trying to make a profit on Linux.

      The end users also see benefits in that their packages download quicker. I'd be honestly pretty disappointed in any major distro that doesn't start implementing a binary diff solution around this.

      Open source Binary diff/patch tools have been available for a while. However, the majority of the stupid Linux distributions (Ubuntu, Fedora, Mint, etc) keep sending me hundreds of Megabites for each *update* I do, because they send the whole programs to reinstall (on top of the others). That is not an "update" that is a reinstallation!

      --
      Ubuntu is an African word meaning 'I can't configure Debian'
    13. Re:Also less overhead for Google by Anonymous Coward · · Score: 0

      A correct analogy would be using a single switch (a hard to build switch, but easy to use after that) to flip off milliong of wall warts at the same time.

      And a small benefit times million makes plenty.

    14. Re:Also less overhead for Google by Anonymous Coward · · Score: 0

      An OpenSUSE 11.1 user calls bullshit.

      A quick check with YaST shows that the complete set of packages for installing all of those apps from scratch totals ~475 MB, including translations. I don't pull down 3 GB to do a complete network-based developer-workstation installation.

      So poster is either lying or clueless. (And poster's history demonstrates that, for him, the two alternatives are perhaps not mutually exclusive.)

  5. Rsync by Gerald · · Score: 1, Interesting

    Would this algorithm be useful to the rsync team?

    1. Re:Rsync by six · · Score: 1

      Simple answer: no

      statistically, I think rsync has very few binary files to deal with, at least the way I'm using it.

      also, their technique may make the diff data smaller, but it also makes the diffing/patching process a LOT slower, something many rsync users don't want because on a LAN you don't care much about bandwidth usage.

    2. Re:Rsync by Anonymous Coward · · Score: 3, Interesting

      Umm, many of us use rsync like mad on binaries such as ISO images or repository trees full of RPMs which are full of compressed data.

      The rsync algorithm (see the cool thesis) is actually quite adept at processing "non-text" byte sequences. It's main innovation is having a heuristic to break the stream into smaller phrases by identifying canonical start/end boundaries by running the same heuristic on both source and destination files (without ever possessing both files in the same place). It transfers a description of the source file as a sequence of phrases identified by their hash. It transfers entire phrases when the recipient doesn't already have one with the same hash. It doesn't compute "diffs" in any normal sense, e.g. comparing before/after copies and creating a patch.

      An application of rsync could be immensely improved in any situation where it operates on structured data inside a serialization "container" by cracking open the container and operating on the internal structured data stream instead of the container representation. Then, the recipient would reencode the container after generating a full copy of the internal structured data. However, if the container encoding is not normalized and deterministic, you would not get back a byte-for-byte identical container on the other side.

      This google strategy is to open up the "assembled container" and do diffs on the source representation. It sounds more like an attempt to recover easy newline boundaries for doing diffs without a clever phrase boundary heuristic like used in rsync. I wonder whether their assembler is deterministic. Operating on source code in general could be great, but compilation with optimizers and linkers is not necessarily deterministic, so a different sort of integrity protection system would be needed to validate the results of the process.

    3. Re:Rsync by bcrowell · · Score: 3, Interesting

      Simple answer: no statistically, I think rsync has very few binary files to deal with, at least the way I'm using it. also, their technique may make the diff data smaller, but it also makes the diffing/patching process a LOT slower, something many rsync users don't want because on a LAN you don't care much about bandwidth usage.

      Well, your use of rsync is not necessarily typical. I use unison for file synchronization, and unison uses rsync; I have quite a few binary files that I sync. You're right that it would be a tradeoff of cpu versus bandwidth, but actually that's the whole point of rsync.

      On the other hand, I can think of at least some other reasons that this algorithm would be less appropriate for rsync than for google's intended use. (1) Google knows that their file is an executable, and knows that it's an x86 executable. Rsync would have to detect this using the kind of heuristic algorithms used by the unix "file" utility. (2) It's kind of ugly to imagine building all this x86-specific stuff into a generic program like rsync, which people may be using on ARM machines, or which people may use in the future on architectures that haven't been invented yet. (3) Google is doing one-to-many file distribution (pushing out a single patch to millions of users), so that means that the tradeoff of cpu versus bandwidth is an excellent one for them. With rsync, the typical use is one-to-one (syncing machine A with machine B), so the tradeoff isn't as awesomely spectacular.

      BTW, a completely different application of this that Google may be interested in is for Google Native Client, which runs x86 code sandboxed in a browser. Suppose google has 10^8 users using a particular web app that runs x86 code in their browser via Native Client. As long as the program doesn't change, most of the users are probably just getting it from cache, so it loads fast and doesn't cost google any bandwidth. Then on a certain day, google updates the software. Potentially this causes a bottleneck where 10^8 users are all trying to download the same code (could be especially bad on a corporate network with lots of people simultaneously downloading the same stuff on a Monday morning). If Courgette is incorporated in the browser, then potentially it could be smart enough to realize that it already has version x cached, and what it's trying to download is version x+1, and work out the patch without having to download the whole thing again.

    4. Re:Rsync by relguj9 · · Score: 1

      I wonder whether their assembler is deterministic. Operating on source code in general could be great, but compilation with optimizers and linkers is not necessarily deterministic, so a different sort of integrity protection system would be needed to validate the results of the process.

      Yah, that was my question kind of too (I'm assuming it is). Generally, I'd think that you want everyone with the same version to have the exact same deliverables.

    5. Re:Rsync by NoCowardsHere · · Score: 3, Interesting

      Not really. Where Google's algorithm really shines is in exactly the field they designed it for: efficiently trying to update a large number of identical binary files of known content (particularly those representing code) on many remote computers, by sending a compact list of the differences.

      Rsync actually has to solve a different problem: figuring out where differences exist between two files separated over a slow link when you DON'T precisely know the content of the remote file, but know it's likely similar to a local one. Its rolling-checksum algorithm is very good at doing this pretty efficiently for many types of files.

    6. Re:Rsync by mzs · · Score: 2, Interesting

      That was a very good high level explanation of how rsync works, thanks. It shows though that this approach is uninteresting to rsync. The hashes and heuristics are so that you do not have to compare the two ends over a potentially slow link, that is the beauty of rsync. In google's case they have all the versions locally and can do the comparisons orders of magnitude faster. So even if there was some way to get binary 1-1 (which I bet they do in fact) it would be useless for rsync as you would have to read the whole file to do the comparison, so why not just send the whole file instead of changed to begin with?

      The other point is that when google says assembler they do not mean something like MASM or gas, but rather something that can take two binary blobs (one that you have and one that they send) and run them to generate a third binary blob that matches what they intended the result to be after encapsulating it again. I think you get that but I wanted to make that clear to other readers.

    7. Re:Rsync by ChrisMounce · · Score: 1

      Maybe. But I can think of a few issues:

      1. This is geared toward a specific type of file (x86 executable), not generic data files.
      2. Adding an educated-guess-where-all-the-pointers-are system might just mess with the rsync protocol.
      3. Google has the advantage of knowing, with a quick version number check, exactly what changes need to be made: most data flows from server to client. The rsync destination would have to send back two sets of rolling checksums: first for the disassembly, then for the guess made using the patched disassembly. Don't know how big of an effect this would be on efficiency, but it would be at least slightly slower than what Google can achieve.

      I really like the idea, though, and I'm a big fan of rsync. It would be interesting if there was a general purpose system for guessing changes in files, given that X changed.

    8. Re:Rsync by Anonymous Coward · · Score: 0

      I think rsync sends whole files when it detects you're sending across a local network anyway, or via a command-line switch.

    9. Re:Rsync by drinkypoo · · Score: 1
      1. In your example (backups) there's no reason there couldn't be a metadatabase with the types of your files (once determined by magic, as in file(1).)
      2. There's no reason why you couldn't do this with other architectures as well. Any binaries for which you had the compiler/decompiler set would be supported.

        Taking this idea a step further, there's no particular reason that it couldn't be done for other types of file as well. For example, uncompressed images could be losslessly compressed and transmitted, then recomposed into the original file.

      3. If you were backing up a lot of binaries over a slow link, the difference could be fairly sizable.

      I'm sure there's other problems :)

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    10. Re:Rsync by Kz · · Score: 1

      not exactly; rsync sends whole files when you use it to copy local directories. That is, when there's no network involved. No matter how fast your LAN is, rsync will default to do its delta magic to send to another machine

      of course, there's a command line to override it either way

      --
      -Kz-
    11. Re:Rsync by Anonymous Coward · · Score: 1, Insightful

      Well, not to mention 90%+ of Windows users lack a compiler and linker. And for the other 10% it can be either some version of VC or a windows port of GCC. This is doubly annoying because C++ code has no standardized ABI (in the vein of early 90's UNIX adding extensions that did nothing but make sysadmin work twice as hard) so you have to ensure all your .dlls have been compiled by the same compiler, etc.

      Asking a Windows user to compile something is like asking a painter to engineer a building.

    12. Re:Rsync by fulldecent · · Score: 1

      >> re: the computation overhead of disassembling, diffing, reassambling versus normal binary diffing

      Executable files releases are one-to-many. Front loading computation overhead is useful.

      Rsync transfers are (normally) one-to-one.

      --

      -- I was raised on the command line, bitch

    13. Re:Rsync by shitzu · · Score: 1

      Rdiff-backup, anyone?

    14. Re:Rsync by Anonymous Coward · · Score: 0

      One thing you missed in this reply: Google is generating a diff based on having BOTH versions on one file system. In contrast, rsync generally only has knowledge of one end of the transaction on any given machine. I doubt this algorithm would work well in that case.

    15. Re:Rsync by Tubal-Cain · · Score: 1

      Umm, many of us use rsync like mad on binaries such as ISO images or repository trees full of RPMs which are full of compressed data.

      Neither of which are edited much by the end user, in my experience. What good is a diff algorithm on a file that hasn't changed?

    16. Re:Rsync by Anonymous Coward · · Score: 0

      Rsync does a wonderful job of diffing entire trees as well, so a replicated repository will download new RPMs, reuse existing ones, and delete files which have gone away.

      But to truly be impressed by rsync, do the following in your spare time. Prepare to be amazed and/or horrified:

      1. acquire the entire RPM tree from an exploded ISO, say rsync://mirrors.kernel.org/fedora/releases/11/Fedora/x86_64/os/
      2. cat every file in that tree into one long appended file and name it, say ./Fedora-11-x86_64-DVD.iso
      3. run rsync -vP --stats rsync://mirrors.kernel.org/fedora/releases/11/Fedora/x86_64/iso/Fedora-11-x86_64-DVD.iso .

      you will see that you can recover a byte-for-byte ISO image with a remarkably small amount of downloads, if you already have a complete mirror of the filesystem contents. The awesome part is that it works for partial mirrors too, so people in bandwidth-challenged circumstances can very effectively restore good copies of bulk material by feeding rsync whatever input fodder they have on hand.

      Of course you can do similar things on trees instead of ISO files, but the explanation is a lot more elaborate with application-specific details.

    17. Re:Rsync by Anonymous Coward · · Score: 0

      Google knows that their file is an executable, and knows that it's an x86 executable. Rsync would have to detect this using the kind of heuristic algorithms used by the unix "file" utility.

      There's heuristic and heuristic. Detecting an x86 binary in one of the handful of common formats is pretty easy and totally reliable.

    18. Re:Rsync by Lennie · · Score: 1

      Rsync is usefull when you don't know anything about the data, if you actually know something about, their is a better way to do it.

      --
      New things are always on the horizon
    19. Re:Rsync by Pollardito · · Score: 1

      Everyone is throwing around the term "binary file" as if this is some generic solution for transferring anything that isn't text. This solution is great for compiled code, but it won't do anything for image binaries, music binaries, and video binaries because none of those can be disassembled into code. To take advantage of any of the optimizations that this new diff is using, rsync would have to first check every file to determine if it is compiled code, then check the architecture that it's designed for to be sure that it's compatible with its disassembler, then do the work of disassembling both sides to then compare the result, and then hope that it's assembler isn't doing some optimization that the original didn't use (or failing to do some optimization that they did use) otherwise the resulting file won't necessarily run the same (and whoever wrote it will thank you for creating a support nightmare).

      That's a lot of work for one specific type of rsync target, and a potential nightmare given the assembler and disassembler compatibility issues. On top of that you're going to throw a lot of computation into disassembling both the source and destination version of the files, whereas Google's distribution system presumably already has their version disassembled ahead of time and ready for comparison.

      This sounds like a really nice tool, but the lesson here is that the more that a program knows about its problem domain the more efficient it can be. This tool is completely tuned to the differencing of programs that Google compiled

  6. The cool thing is... by salimma · · Score: 3, Interesting

    The cool thing is, one can easily extend this to other executable formats, as long as the assembler is readily available client-side: Windows users could relate to those pesky, resource-hogging Java updates, and .NET and Mono applications could similarly benefit.

    This is, interestingly, the second binary diffing innovation that affects me in the past few months. Fedora just turned on delta updates with Fedora 11, a feature borrowed from the openSUSE folks.

    --
    Michel
    Fedora Project Contribut
    1. Re:The cool thing is... by mzs · · Score: 5, Informative

      You didn't RTFA before posting did you? When they say assembler they mean something of their own that takes a binary blob and one you have already and reassembles the original binary. It just so happens that the disassembler knows a lot about windows executables and the archive format that google uses breaks it up into some portions and assigns labels to addresses. Then it runs bsdiff on this smaller subset.

      The code outlined in the blog post is really in these files:

      win32_x86_generator.h
      win32_x86_patcher.h

      Notice these names? This is the disappointing aspect to all of this, it is one more new reason that Chrome is x86 and primarily Windows. You would need one for Mach-O and ELF to do this on other platforms and then if you were on another processor, say ARM or PPC, you would need something that understood that as well. Then there is the issue about 64-bit. In any case the assembler is not something like gas or MASM which is what you imagined.

    2. Re:The cool thing is... by salimma · · Score: 1

      Ah, that is indeed correct. A bit disappointing, but upon reflection, going binary->assembly code->diff would likely result in a larger patch anyway, even if the diff is then compressed efficiently.

      --
      Michel
      Fedora Project Contribut
    3. Re:The cool thing is... by salimma · · Score: 1

      Ah, delta RPM turns out to use normal compression, no binary diffing is involved:

      $ rpm -q deltarpm --requires
      libbz2.so.1()(64bit)
      libc.so.6()(64bit)
      libc.so.6(GLIBC_2.2.5)(64bit)
      libc.so.6(GLIBC_2.3.4)(64bit)
      libc.so.6(GLIBC_2.4)(64bit)
      libc.so.6(GLIBC_2.7)(64bit)
      librpm.so.0()(64bit)
      librpmio.so.0()(64bit)
      rpmlib(CompressedFileNames) = 3.0.4-1
      rpmlib(FileDigests) = 4.6.0-1
      rpmlib(PayloadFilesHavePrefix) = 4.0-1
      rtld(GNU_HASH)

      --
      Michel
      Fedora Project Contribut
    4. Re:The cool thing is... by Rayban · · Score: 2, Informative

      The source for the disassembler is pretty simple.

      http://src.chromium.org/viewvc/chrome/trunk/src/courgette/disassembler.cc

      Porting that to parse x86 out of ELF or another executable container wouldn't be too difficult. Porting it to parse x64 or PPC would be tougher.

      --
      æeee!
    5. Re:The cool thing is... by mzs · · Score: 2, Informative

      Actually it would be pretty hard to go x86 PEX to x86 ELF as relocations are done completely differently in those formats. They are often names or indexes in DLLs with PEX and handles (think dlopen) while there are all sorts of relocation entries in ELF where the instructions themselves get modified by the run time linker to the correct address or offset (the premunged instruction either has an index in it to a name or another relocation table with more info).

    6. Re:The cool thing is... by xiaomai · · Score: 1

      win32_x86_generator.h win32_x86_patcher.h Notice these names? This is the disappointing aspect to all of this, it is one more new reason that Chrome is x86 and primarily Windows

      I hope that Chromium's multi-platform support does improve (the Linux port is getting better and better every day), but I think that this particular example is doesn't affect the multi-platform situation. Windows is (for the foreseeable future) by far the most significant install base. I'm assuming this was done to decrease Google's bandwidth costs. Linux/Mac (or ARM/PPC) could receive the traditional binary diffs without making a noticeable impact on bandwidth.

    7. Re:The cool thing is... by K.+S.+Kyosuke · · Score: 1

      Fail. deltarpm uses bsdiff. It has been already mentioned here.

      --
      Ezekiel 23:20
    8. Re:The cool thing is... by salimma · · Score: 1

      It's a modified bsdiff algorithm; it does not actualy reuse the original bsdiff

      --
      Michel
      Fedora Project Contribut
  7. wait a minute by six · · Score: 3, Insightful

    announced a new compression technique called Courgette geared towards distributing really small updates

    I just RTFA, this has nothing to do with a compression technique.

    What they developed is a technique to make small diffs from *executable binary files* and it doesn't look like it's portable to anything other than x86 because the patch engine has to embed an architecture specific assembler + disasembler.

    1. Re:wait a minute by hattig · · Score: 2, Informative

      Another problem is that you would need to maintain every little diff from previous versions, and apply them one after another. Every so often you would have a cumulative diff maintained, so you'd do: 3.0.3 -> 3.0.4 -> 3.0.5 -> 3.2 (cumulative 3.0.5 to 3.2) -> 3.2.1 -> 3.2.2 -> 3.5 (cumulative 3.2.2 to 3.5) -> 3.5.1

    2. Re:wait a minute by flowsnake · · Score: 5, Informative

      Not really a problem. Every n releases you push a complete patch - a bit like key frames in MPEG. People who keep their stuff reasonably up-to-date benefit from the smaller patches, those who don't just have to go back to the 'key frame' equivalent. And on the client - the latest version on the host is effectively the sum of all the diffs up to that point. OK so there is not enough information there to revert to an arbitrary earlier version, but usually we don't revert to older versions of executables. If we absolutely have to revert, maybe to undo a bad update, we can always just download a complete version of the required version.

    3. Re:wait a minute by hattig · · Score: 1

      Yeah, it's very manageable, and the build tools would generate each update binary diff and put them on the update servers anyway, and the idea is to keep people up to date often, so most people would be on the current or previous minor version anyway.

      Reversion would probably be best handled by having a new update. As long as the update code wasn't the aspect that got broken.

    4. Re:wait a minute by minor_deity · · Score: 1

      It would be a good idea to backup the file being updated before patching it; so the backup could easily be saved in order to revert.

    5. Re:wait a minute by Anonymous Coward · · Score: 0

      I read it and thought: Wait a minute, that doesn't make any sense.

      I can see how diffing the commands instead of the pure bytes leads to better results. But why do you need the text representation of these commands (the assembler code)? The algorithm should work equally well when used on the machine code. So just split up the binary file into single machine code commands and then use diff.

    6. Re:wait a minute by Pollardito · · Score: 1

      They'd only have to do that if they wanted to be able to patch from an old version to another old version, otherwise they just need to store a diff comparing each old version to the current. That'd be more work ahead of time since each new release would require you to run and store a diff to every old release (rather than just one diff to the previous version), but that's work you'd have to do just once versus work that you have to do for every client patching.

  8. I had this idea by Fantom42 · · Score: 1

    I remember having this idea a while back and thinking that it would surely be out there already implemented. When it wasn't, I didn't follow through and try to invent it myself. I was focused on another project at the time. Never really came back to that original idea. Awesome that someone did, and that they are keeping it open so everyone can use it! Thanks, Google!

    1. Re:I had this idea by Lennie · · Score: 1

      What is cool about this text is, you could claim it for anything and still look smart. ;-)

      --
      New things are always on the horizon
  9. Microsoft version? by Anonymous Coward · · Score: 1, Funny

    Any takers that Microsoft will release their own version of this, but compile the assembly before sending it over the wire?

    Maybe they can call it "Compiled Assembly from Disassembly" (CAD).

  10. Bad explanation by DoofusOfDeath · · Score: 4, Insightful

    Courgette achieves smaller diffs (about 9x in one example)

    That's potentially very misleading. I can compress any document, down to a single but, if my compression algorithm is sufficiently tailored to that document. For example:

    if (compressed_data[0] == 0):
          return = get_Magna_Carta_text()
    else:
          return unzip(compressed_data[1:])

    What we need to know is the overall distribution of compression ratios, or at least the average compression ratio, over a large population of documents.

    1. Re:Bad explanation by six · · Score: 1

      their technique is specifically geared towards executable binary files, it doesn't make sense to use it, and also won't work at all with any other type of file.

      so yes the 9x thing is quite unfair, that's like comparing bzip2 vs. lame for compressing music.

    2. Re:Bad explanation by DoofusOfDeath · · Score: 2, Funny

      I can compress any document, down to a single but,

      Oh crap. There goes any chance of this being a technical discussion.

    3. Re:Bad explanation by DoofusOfDeath · · Score: 1

      their technique is specifically geared towards executable binary files, it doesn't make sense to use it, and also won't work at all with any other type of file.

      so yes the 9x thing is quite unfair, that's like comparing bzip2 vs. lame for compressing music.

      Even if you consider the set of binary strings that represent object code to be a subset of the set of all binary strings, I think my point still holds.

      The only way my point would be totally invalid, I think, is if were always the case that the assembly-language diff can be expressed in fewer bits than can the corresponding object-code diff. If that is always true, then I agree that my point is invalid.

    4. Re:Bad explanation by Anonymous Coward · · Score: 0

      You could reduce that to zero bytes. Return the result of get_Magna_Carta_text() every time, and require that the sourcetext be the Magna Carta. The output is undefined for all other inputs :)

    5. Re:Bad explanation by CarpetShark · · Score: 2, Funny

      return = get_Magna_Carta_text()
      return unzip(compressed_data[1:]

      Most of your point is good, but I suspect that, no matter what language you're using, ONE of these will give you a syntax error ;)

    6. Re:Bad explanation by Calyth · · Score: 1

      It is a rather poor explanation...
      But their compressor is catered to executables, so that it transmit a sort of primitive assembly language, so that the diffs are smaller, transmit that, and have the client end to apply and reassemble.

      So their compression algorithm has little applications outside of executables.

    7. Re:Bad explanation by Anonymous Coward · · Score: 0

      Oh crap. There goes any chance of this being a technical discussion.

      Au contraire!
      "but" refers to what is colloquially known as "white girl booty" or "flat butt."
      "butt" refers to the more preferable "buttle butt," which provides for a better booty slapping experience.
      "buttt" is the result of morbid obesity and is considered an aquired taste.
      "but!" is the sound someone makes while trying (and usually failing) to interrupt a speaker.
      "btu" is either the result of drinking and typing or refers to british thermal units

      I could go on, btu i need another drink .

    8. Re:Bad explanation by Anonymous Coward · · Score: 0

      Where does compression figure into this?

      You have two executables, you need to transmit the smallest amount of data to turn a copy of one into the other. Usually this is just a code change or two, but a few bytes difference can cause a ripple of changes in jump locations (anything not branching by a relative offset for example). Some branches also would be affected by this.

      If you disassemble the exe first and assign labels to jump and branch locations, and then transmit the difference in the code, you drop the need to explicitly change these jump targets, because they are implicitly changed when you apply the diff and "re-assemble".

      It's not the same as compression.

    9. Re:Bad explanation by pescadero · · Score: 1

      Luckily, he wrote it in a programming language which was specifically designed to handle this exact piece of code.

    10. Re:Bad explanation by Anonymous Coward · · Score: 0

      My pseudocode compiler (aka. "brain") parsed both of them just fine.

    11. Re:Bad explanation by petermgreen · · Score: 1

      that's like comparing bzip2 vs. lame for compressing music.
      more like comparing bzip2 vs flac (or a flac variant modified slightly to ensure that it could produce a file that was completely identical to the original rather than just a file containing identical audio) for compressing music

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  11. Not a 'binary diffing' program. . . by Anonymous Coward · · Score: 0

    It seems to me that the moniker 'binary diff' doesn't really apply here, does it? My understanding of a binary diff algorithm/program has always been one that can take any arbitrary binary file, not knowing anything about what *type* of file it was (e.g. the file could be an image file, video file, executable, word document, zip file, CAD file, or whatever) and create a diff for that binary 'data'.

    It sounds like this is very specific to x86 executables?

    Still, whatever you call it, it's good to see progress being made. I just wonder why you can't create small, efficient diffs for any kind of binary file?

    1. Re:Not a 'binary diffing' program. . . by Grishnakh · · Score: 1

      Still, whatever you call it, it's good to see progress being made. I just wonder why you can't create small, efficient diffs for any kind of binary file?

      You can. It's called "bsdiff".

      The problem is that when you change a large piece of compiled code, even if just to insert a subroutine, it ends up changing a giant portion of the code because all the jump addresses shift to account for this new code in the middle. So though most of the code is identical, every place that has an address is now shifted. Google's new utility takes that into account and has a way of optimizing for this case.

      However, for any other type of binary file, besides an x86 executable, it won't help.

  12. Today? by Blakey+Rat · · Score: 3, Funny

    Google's Open-Source Chromium project announced a new compression technique called Courgette geared towards distributing really small updates today.

    Better hurry! It won't work tomorrow!

    1. Re:Today? by Night+Goat · · Score: 1

      I was thinking the same thing! Proofreading is a lost art these days.

    2. Re:Today? by Anonymous Coward · · Score: 0

      Would you like some cheese with that whine?

  13. Like many brilliant ideas... by istartedi · · Score: 4, Informative

    ...it makes you smack yourself on the head and go "why hasn't everybody been doing this for years?".

    The idea is simple, and reminds me of something I learned in school regarding signals. Some operations are easy to perform in the frequency domain, so you do the Fourier transform, perform the operation, and then transform back.

    This is really just the same idea applied to the problem of patches. They're small in source; but big in binary. It seems so obvious that you could apply a transform,patch,reverse process... but only when pointed out and demonstrated.

    It's almost like my favorite invention: the phonograph.

    The instructions for making an Edison phonograph could have been understood and executed by any craftsman going back thousands of years. Yet, it wasn't done until the late 19th century.

    Are the inventors that brilliant, or are we just that stupid.

    --
    For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
    1. Re:Like many brilliant ideas... by BitZtream · · Score: 1

      Or perhaps hindsight is 20/20.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    2. Re:Like many brilliant ideas... by ashtophoenix · · Score: 1

      Are the inventors that brilliant, or are we just that stupid.

      Isn't this simply "another level of indirection"...? :)

      I think to answer your question, its not brilliance but the ability to step away from the norm and take a fresh look at established things that brings about such innovation. (IMHO)

      --
      Life is about being a Phoenix!
    3. Re:Like many brilliant ideas... by Anonymous Coward · · Score: 0

      I think to answer your question, its not brilliance but the ability to step away from the norm and take a fresh look at established things that brings about such innovation. (IMHO)

      IMHO that's how you define brilliance :)

    4. Re:Like many brilliant ideas... by Anonymous Coward · · Score: 0

      "The instructions for making an Edison phonograph could have been understood and executed by any craftsman going back thousands of years. Yet, it wasn't done until the late 19th century.

      Are the inventors that brilliant, or are we just that stupid."

      Yet Leonardo Da Vinci had the idea for a helicopter and it wasn't for several centuries that man took to the skies, let alone in an actual helicopter. Sometimes the ideas are there but not the desire or the means to move the process along, and sometimes the idea comes at exactly the right time. It's certainly possible that earlier craftsmen could have been able to reproduce a phonograph if shown how, but I have to wonder if society would have been ready for such a device.

    5. Re:Like many brilliant ideas... by Anonymous Coward · · Score: 0

      It's definitely not its brilliance... Maybe it's its frame of reference?

    6. Re:Like many brilliant ideas... by merreborn · · Score: 3, Insightful

      Like many brilliant ideas... it makes you smack yourself on the head and go "why hasn't everybody been doing this for years?".

      Probably because the old solution was:

      A) Simple
      B) Good enough for most purposes.

      Sure, you can shave 80% off your patch filesize... but unless you're as big as google, patch bandwidth probably isn't a major priority -- you've likely got much more important things to dedicate engineering resources to.

      You know how they say "Necessity is the mother of invention"? Well, when an invention isn't really necessary for most folks, it tends to show up a little later than it might otherwise have.

    7. Re:Like many brilliant ideas... by Anonymous Coward · · Score: 0

      The phonograph could have been executed thousands of years ago using what materials? I don't belive a practical phonograph could be made without modern materials.

    8. Re:Like many brilliant ideas... by MBCook · · Score: 1

      It's only a helicopter because it has a spinning thing who's axel is perpendicular to the ground. And it wouldn't actually work. Even if it could achieve lift by turning the screw, without a tail rotor (or counter-rotating screw) it would simply spin and go nowhere. Then there is the problem of the tremendous power it takes to power a helicopter rotor.

      istartedi is right that any decent craftsman could have made a phonograph. Even with the correct information to design a good wing/blade, a helicopter wouldn't have been possible 150+ years ago. A very high quality phonograph could have been made by any watchmaker.

      --
      Comment forecast: Bits of genius surrounded by a sea of mediocrity.
    9. Re:Like many brilliant ideas... by Just+Some+Guy · · Score: 2, Insightful

      Actually, it made me smack my head and say "I assumed we were already doing this".

      Sure, you can shave 80% off your patch filesize... but unless you're as big as google, patch bandwidth probably isn't a major priority

      So, you've never installed a service pack or another OS update? I'd be more than happy to run "sudo aptitude patch-upgrade" to download the 3KB difference between OpenOffice 3.1.foo and 3.1.foo+1.

      --
      Dewey, what part of this looks like authorities should be involved?
    10. Re:Like many brilliant ideas... by xenocide2 · · Score: 3, Informative

      I've been reviewing various proposals like this, and basically, it's a tradeoff mirrors don't want. This sorta stuff has been proposed for ages. I listened to a recording of the author of rsync give an introduction to the algorithm and program, and among the questions was "is this suitable for .deb?". The answer was "Not unless the archive is completely recompressed with gzip patched to be rsync compatible".

      Eventually that patch landed, and you could conceivably do this. Except it expands the entire archive by 1-2 percent. And there's a lot of CPU overhead where there was none before.

      Then someone cooked up zsync. It's the same thing as rsync except with a precalculated set of results for the server side. This looked like a winner. But someone else really wants LZMA compressed packages to fit more onto pressed CDs. So now we're at a fundamental impass: optimize for the distribution of install media to new users, or optimize for the distribution of updates to existing users.

      The best resolution I've seen is to use LZMA compression on the entire CD volume, but that requires the kernel to get their ass in gear and allow yet another compression in the kernel. That may have finally happened, I haven't checked recently. But generally LZMA requires more RAM to operate, so that could raise the minimum requirements on installs.

      In short, it's a balancing act of effort, bandwidth, CPU and RAM. What works for some may not work for all.

      --
      I Browse at +4 Flamebait

      Open Source Sysadmin

    11. Re:Like many brilliant ideas... by bill_mcgonigle · · Score: 1

      The phonograph could have been executed thousands of years ago using what materials? I don't belive a practical phonograph could be made without modern materials.

      Edison's first device used wax cylinders. That was also found to be impractical for scale production, but it seems conceivable that some machine using a crank or a wound spring and a clay/beeswax/shellac medium and analog amplification could have been made during the Bronze Age.

      Because we have no evidence of such devices and we assume people are only recently smart we conclude that no such device was ever made.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    12. Re:Like many brilliant ideas... by Anonymous Coward · · Score: 0

      Brass... not exactly iron age technology.

    13. Re:Like many brilliant ideas... by timeOday · · Score: 1

      All compression is based on basically the same idea - a model of the process that created the data. The model makes somewhat less error in predicting what will come next compared to a random model / no model, and hence needs less information to correct those predictions. For instance, in speech compression it is normal to start with a model of the human vocal tract. Even general compression algorithms like gzip work this way, making basic assumptions, such as real-world data tends to have values which repeat, a weak model but useful nonetheless.

    14. Re:Like many brilliant ideas... by 0xABADC0DA · · Score: 1

      ...it makes you smack yourself on the head and go "why hasn't everybody been doing this for years?". ... It seems so obvious that you could apply a transform,patch,reverse process... but only when pointed out and demonstrated.

      Basically google created a big custom 'transform' that applies only to x86 exe files. The reason why this is a boring story and why nobody has done this before is because nobody has found a generic, simple way to do this. And they still haven't.

      If google had actually created a 'new binary diffing algorithm' instead of a specific hack, and this worked for most binary files that have similarities that would be newsworthy. For instance if it could out of the box create small diffs for .exe, .doc, .xls, .ttf, .3ds, ... that would be something.

    15. Re:Like many brilliant ideas... by Anonymous Coward · · Score: 0

      This is really just the same idea applied to the problem of patches. They're small in source; but big in binary. It seems so obvious that you could apply a transform,patch,reverse process... but only when pointed out and demonstrated.

      This is nothing new. Transform-patch-reverse is an age-old technique for binary diffs of compressed files, for example.

      Even this specific approach is nothing new; I can think of at least one example of a patch to a program being distributed in the form of a script that disassembled the program, patched the source code, and compiled it again. I think that was done in about 2005, and I doubt it was the first.

    16. Re:Like many brilliant ideas... by Anonymous Coward · · Score: 0

      Because we have no evidence of such devices and we assume people are only recently smart we conclude that no such device was ever made.

      It's nothing to do with "assuming people are only recently smart". We know of plenty of impressive accomplishments of ancient civilisations. The ancient Greeks even invented a form of steam engine, though it's not clear that they realised it could be more than a toy. We know there were plenty of very clever people around in ancient times. Chronological snobbery is invariably a sign of ignorance.

      So it's not unthinkable that they might have worked out enough about sound to realise it could be recorded. Empedocles (5th century BC) is said to have taught that sound was transmitted by motions in the air that struck the ear and made it ring like a bell. You don't have to go too far from that to think of recording the motion of the air by attaching a stylus to something that the air can move. Once you've done that the phonograph can be derived easily enough. Plenty of people worked it out independently at roughly the same time in the 19th century. It didn't take a genius (though Edison was certainly a genius at publicity and self-promotion).

      The problem is the complete absence of any evidence that they did think of that. There's no mention of it. Sound recording does not even appear as a plot device in ancient fiction! Given the amount of knowledge that has survived, it is simply not very likely that anyone invented a complete technology that subsequently vanished without leaving the faintest trace of any form.

    17. Re:Like many brilliant ideas... by Pollardito · · Score: 1

      Sure, you can shave 80% off your patch filesize... but unless you're as big as google, patch bandwidth probably isn't a major priority

      It's priority tends to scale though. Once you're as big as Google you have so many people downloading it that if you can shave just a little off each download it adds up to something that matters even to someone as big as you.

    18. Re:Like many brilliant ideas... by Anonymous Coward · · Score: 0

      Well, the question is, is it worth it for the improvement versus just using an existing binary diff algorithm. Maybe with a binary diff, it'd take 500KB instead of 3KB, but you already saved 120MB by not downloading a bunch of files that didn't change.

    19. Re:Like many brilliant ideas... by VoltageX · · Score: 1

      I'd ask Blizzard what their patch bandwidth is too. Even though they use BT, I'm sure the users would appreciate the smaller download/upload.

      --
      "Anonymous could not immediately be reached for further comment." - International Business Times
  14. Solving the wrong problem by Animats · · Score: 0, Flamebait

    A better binary diffing algorithm is useful for source control. But for security? If the code is so awful that the bandwidth required for security updates is a problem, the product is defective by design.

    It sounds like Google tried "agile programming" on trusted code, and now has to deal with the consequences of debugging a pile of crap.

    1. Re:Solving the wrong problem by Joe+Random · · Score: 3, Insightful

      If the code is so awful that the bandwidth required for security updates is a problem, the product is defective by design.

      No one is saying that the bandwidth is a problem. They're just saying that the bandwidth is unnecessary. FSM-forbid that anyone try to optimize something.

      Plus, as the article points out, smaller updates mean more people can receive the update per unit-bandwidth, which means faster distribution of security updates when something critical is fixed.

    2. Re:Solving the wrong problem by salimma · · Score: 1

      Binaries should, with rare occasion, not be under source control anyway.

      --
      Michel
      Fedora Project Contribut
    3. Re:Solving the wrong problem by BuR4N · · Score: 1

      It has of course nothing to do with what you implying, its all about saving money (yea, stuff like bandwidth etc do cost money)

      --
      http://www.intellipool.se/ - Intellipool Network Monitor
    4. Re:Solving the wrong problem by Anonymous Coward · · Score: 0

      Bandwidth isn't the problem, user time is. Imagine seamless updates where you don't even have to click OK, watch progress bars, restart the app. Just set up the app to autoupdate, and because the updates are small, it can do that without bothering you and without any perceivable lag. Imagine not having to micromanage updates anymore because they're totally transparent.

      This technique can be part of that solution. It requires more serverside and clientside CPU but much less downloading time, and the net is increasingly the bottleneck in everything. So it solves the right problem in the right place.

    5. Re:Solving the wrong problem by Anonymous Coward · · Score: 1, Insightful

      Out of curiosity, could you please point us to some of your code so we can do a comparison?

      Thanks.

    6. Re:Solving the wrong problem by Ed+Avis · · Score: 1

      Right because we all know that sensible coders get it right first time, and have every feature you'll ever want already implemented in version 1.0, and so never need to push out a new version of their program, and if for some unthinkable reason they do want to make a new release, the users positively enjoy watching the download creep through as slowly as possible...

      BTW, have you seen the size of the package updates for your Linux distribution recently?

      --
      -- Ed Avis ed@membled.com
    7. Re:Solving the wrong problem by PCM2 · · Score: 2, Informative

      If the code is so awful that the bandwidth required for security updates is a problem, the product is defective by design.

      You don't understand what the phrase "defective by design" means. It's used by anti-DRM folks to describe "features" that nobody wants and that actually reduce the usefulness of a product, but which are inserted into the product intentionally by the manufacturer out of a misguided desire to support DRM. If a bug/feature is "by design" then you should not expect a patch for it, ever.

      A product that needs lots of security patches, on the other hand, is not defective by design; rather, it is simply badly designed.

      Don't go out of your way to use catchphrases when simple English will do.

      --
      Breakfast served all day!
  15. Re:Evil bastages by megamerican · · Score: 1

    How dare people ask questions. Nothing good ever came from doing that!

    --
    If you have something that you dont want anyone to know, maybe you shouldnt be doing it in the first place -Eric Schmidt
  16. Nothing terribly new by davidwr · · Score: 2, Interesting

    There was a old, pre-X MacOS binary-diff program that diffed each element of the resource fork independently.

    Back in those days, non-code elements usually were stored in unique "resources," one icon might be in ICON 1, another might be in ICON 2, etc.

    Code was split among one or more resources.

    If a code change only affected one section of code, the only 1 CODE resource would be affected.

    Since each resource was limited to 32KB, a diff that only affected one resource would never be larger than twice that size.

    If only a single byte changed, the diff was only the overhead to say what byte changed, the old value, and the new value, much like "diff" on text files only on a byte rather than line basis.

    So, conceptually, this isn't all that new.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
    1. Re:Nothing terribly new by mzs · · Score: 1

      One thing is very different from back then and now, disk space was at a premium, so there were a ton of optimizations to code fragments in the object file format to that end. For example most data was zero, so only portions that were not were in the object file bunched-up. The utility you refer to was able to take advantage of this. It was otherwise a simple ed-like that could insert, remove, and replace. I wish I could remember the name.

  17. Courgette? by Anonymous Coward · · Score: 0

    Sounds like a old horny midget woman.

  18. Cougarette? by Anonymous Coward · · Score: 0

    Am I the only one who misread "Courgette"? :)

  19. binary diff by visible.frylock · · Score: 2, Informative

    If you're not familiar with the process of binary diff (I wasn't) there's a paper linked from the article that explains some about bsdiff:

    http://www.daemonology.net/papers/bsdiff.pdf

    Wayback from 2007/07/09:
    http://web.archive.org/web/20070709234208/http://www.daemonology.net/papers/bsdiff.pdf

    --
    Billy Brown rides on. Yolanda Green bypasses Gary White.
    1. Re:binary diff by mzs · · Score: 1

      Yes and it is a generic approach that does not need to know anything about pex, x86, and archives. In fact it is used by this google implementation. And there was something much like what google has here implemented back in 1999 cited in the paper you posted the link to:

      B.S. Baker, U. Manber, and R. Muth, Compressing Differences of Executable Code, ACM SIGPLAN Workshop on Compiler Support for System Software, 1999.

  20. Can a layman get an explanation in English? by AP31R0N · · Score: 2, Insightful

    Please?

    --
    Utilizing the synergization of benchmark e-solutions to pre-workaround action items!
    1. Re:Can a layman get an explanation in English? by ledow · · Score: 2, Informative

      Rather than send the difference in the executable, they send the difference in a sort of source code. Saves space if a small source change (move this line past this function) causes a big executable difference.

    2. Re:Can a layman get an explanation in English? by six · · Score: 5, Informative

      Binary executable files contain a lot of addresses (variables, jump locations, ...) that are generated by the assembler at compile time.

      Now consider you just add one 1-byte instruction somewhere in the middle of your program (let's say "nop"). When you compile it again, all the code that reference addresses beyond the insert point will have changed because the address has been incremented. So these 4 bytes added to your source code could mean addresses that get incremented in the compiled file in thousands of places.

      What they do basically is take the binary file, disassemble it back to pseudo source code (not real asm I guess), and diff that against old version. The patch engine on the client end does the same disassembling, applies the patch, and reassembles the patched source code to an executable file.

      This means diffs gets much smaller (4 bytes vs. 1000s in my extreme example), but also makes the diff/patch process much more complex, slower, and not portable.

    3. Re:Can a layman get an explanation in English? by chris_eineke · · Score: 3, Informative

      A compiler takes source codes and turns them into assembler code. That's lines of human-readable machine instruction mnemonics (for example, "Copy from here to here." "Is that bigger than zero?"). The assembler takes those lines and turns them into machine instructions, a sequence of binary numbers.

      Finding the difference between two huge gobs of binary numbers is difficult. Instead, they turn the binary numbers back into lines of mnemonics and use a algorithm that finds the difference between two huge listings of mnemonics.

      That method is easier because the listings of a program that has been changed slightly can be very similar to the listing of a unmodified program. That has to do with how compilers work.

      Capiche? ;)

      --
      "All you have to do is be fragile and grateful. So stay the underdog." Chuck Palahniuk, Choke
    4. Re:Can a layman get an explanation in English? by Anonymous Coward · · Score: 0

      I'm waiting until they announce a project "geared towards distributing really small updates yesterday." Now THAT will be fast.

      (Yes I'm looking at you, editors.)

    5. Re:Can a layman get an explanation in English? by AP31R0N · · Score: 1

      Thanks! That makes sense.

      --
      Utilizing the synergization of benchmark e-solutions to pre-workaround action items!
    6. Re:Can a layman get an explanation in English? by unfunk · · Score: 1

      I'd like to see a car analogy here, but it needs to involve the car's transmission diff

    7. Re:Can a layman get an explanation in English? by DiegoBravo · · Score: 1

      I'm not sure about the benefits.

      1) The bulk of many (most?) software packages are resources, not executables
      2) A lot of the executables is a lot of linker/DLL overhead, specially the smaller ones
      3) The optimizers (I think) remix the assembly instructions, so small changes in the program logic result in a lot of changes in assembly, ergo, in machine code. The best solution in terms of BW remains sending diffs in high level source code.

    8. Re:Can a layman get an explanation in English? by Anonymous Coward · · Score: 1, Funny

      Everything except "capiche", yeah.

    9. Re:Can a layman get an explanation in English? by eric-x · · Score: 1

      It makes patches smaller by doing a few extra voodoo pre and post processing steps. Many changes in the binary can be tracked back to only 1 change in the original source code, the voodoo knows how to do this and stores just that one master change instead all of the resulting changes. Car analogy: instead of storing the activity of each piston in a car engine it just stores the fact if the engine is running or not.

    10. Re:Can a layman get an explanation in English? by sorak · · Score: 1

      Wouldn't the diff patch make the code more portable? IANA open source hacker, so this is an honest question.
      .
      Let say you are using the example you gave. All the addresses are referencing memory locations that may change based on the distribution being used (although that wouldn't be a factor for Google). You might have #ifdef statements in the code that include one set of files if compiled under BSD, as an example, and another set if compiled under GNU/Linux, and your memory locations may vary based on the system architecture, optimization methods in use, and platform.
      .
      Under these circumstances, each patch would have to be tailored to the specific platform used. There would be no portability between patches. But, if the compilers are similar enough to create the same pseudocode/bytecode/ASM, or smart enough to save the source code, and use it for future comparisons, then wouldn't one patch be just as portable as the original source code?

    11. Re:Can a layman get an explanation in English? by Anonymous Coward · · Score: 0

      ok, now this time in olde english.

    12. Re:Can a layman get an explanation in English? by PCM2 · · Score: 4, Insightful

      But, if the compilers are similar enough to create the same pseudocode/bytecode/ASM, or smart enough to save the source code, and use it for future comparisons, then wouldn't one patch be just as portable as the original source code?

      It's a good theory and you're a smart person, but:

      1. The compilers probably wouldn't be similar enough. Even developers who use GCC to compile something for Linux usually use Visual Studio to compile the same code for Windows. (The source code for Chrome, for example, shipped as a Visual Studio project.) Mac OS X likes to have everything written in Objective C, so that output would probably be very different.
      2. Different operating systems rely on different shared libraries to do the same things. So a function call that opens a file in Linux might not look like a function call to do the same thing on Windows -- it might take a different number of arguments, for example, which means it would look rather different in machine language.

      Portability doesn't appear to be Google's primary concern, though. They seem to be keen on the idea of delivering binaries over the wire (real binaries, not bytecode) -- see Google Native Client.

      --
      Breakfast served all day!
    13. Re:Can a layman get an explanation in English? by Fantom42 · · Score: 1

      Imagine you had a tape of somenoe talking and you needed to tell someone what was different from the last time they spoke. One way to do that would be to listen to the language and words and compare their meanings to figure out the difference. The other way would be to just send them the new tape to replace the previous one. A similar problem happens with new versions of software. You want to send the changes but you end up having to send the whole thing.

      This technology essentially writes down what the compiled code "says" inasmuch as what is required to see what is different, and then sends only that part of it.

    14. Re:Can a layman get an explanation in English? by Estanislao+Mart�nez · · Score: 1

      1) The bulk of many (most?) software packages are resources, not executables
      2) A lot of the executables is a lot of linker/DLL overhead, specially the smaller ones

      These are not a problem, because the bulk of the changes involved in security fixes are code changes. Or to put it differently, even if the bulk of most software packages is resources, linker and DLL overhead, the bulk of most security fix diffs is code.

      3) The optimizers (I think) remix the assembly instructions, so small changes in the program logic result in a lot of changes in assembly, ergo, in machine code. The best solution in terms of BW remains sending diffs in high level source code.

      I think you overestimate the effect of optimizing compilers in this regard. Small changes in code--things like inserting a line for a missing bounds or null check--should not produce large-scale changes in the assembly code, but mostly the deletion or insertion of a bit of code.

      There's also the fact that (a) there's little motive to use full-blown optimized compilation on most of an application's code (best save it for the 20% of the code that runs 80% of the time), and (b) most security fixes are small anyway.

    15. Re:Can a layman get an explanation in English? by DiegoBravo · · Score: 1

      God points; you're pretty right regarding security patches. I hope those ideas could be implemented in the Debian package manager, since currently many little security fixes carry the update of several full packages because of the dependency rules. Of course this is a different but related problem.

      >There's also the fact that (a) there's little motive to use full-blown optimized compilation on most of an application's code (best save it for the 20% of the code that runs 80% of the time),

      AFAIK, the distributors usually apply the same kind of optimizations to the 99% of the code, just because it's easier. But of course the 80/20 rule is valid.

  21. Microsoft patch API by Anonymous Coward · · Score: 0

    A bit related, Microsoft have provided at binary patch API for years that can take advantage of the internal structure of .EXE files to achieve better compression. It also does multi-way patching so you can specify n source files and one target file... the patch will contain enough information to patch any of those n source files up to the target file.

  22. A solution in search of a problem? by alexborges · · Score: 1

    I mean, whats the need, really?

    Cant we all be merry and unixize our minds and think: hey, all components of the sw are files, lets just separate the heavy parts that are less prone to any security issue (GUI), from all the libs and executables, and lets just send those through the wire as a package....

    We do that in linux and it works(TM): we package everything, have dependencies and then only update what needs to be updated without weird binary patching.

    Now, not to say that its not usefull to have a new bindiff algorithm: lets plug it into git and svn, a specialized switch (--Gelf-exec-compression) in rsync also comes to mind, and that should make them more powerfull and all...

    --
    NO SIG
  23. That's just a dissembler. How about bittorrent? by Khopesh · · Score: 1

    This is diffs the dissembled version of the original against the update on the server, then does the opposite on the client. I couldn't help but think of this as similar to Gentoo's model ... download a compressed diff of the source and then recompile. Both have the same problem: too much client-side CPU usage (though Gentoo's is an extreme of this). Isn't Google Chrome OS primarily targeting netbooks? Can such things handle that level of extra client-side computation without leaving users frustrated?

    I'd rather improve the distribution model. Since packages are all signed, SSL and friends aren't needed for the transfer, nor does it need to come from a trusted authority. Bittorrent comes to mind. I'm quite disappointed that the apt-torrent project never went anywhere. It's clearly the solution.

    --
    Use my userscript to add story images to Slashdot. There's no going back.
  24. Re:That's just a dissembler. How about bittorrent? by BobMcD · · Score: 1

    Torrents are problematic in that they are contrary to the way many, if not all, ISP's are set up. Nodes are intended to be consumers, not data providers. Shifting the downloaded data to the UPSTREAM is against the intent of the architects and produces problems.

    Redesign the network, sure. Until then, torrents for updates are only taking your problem and shifting it to someone with fewer resources (without permission.)

  25. BCJ filter by hpa · · Score: 2, Interesting

    The concept of a Branch-Call-Jump (BCJ) filter is well-known in the data compression community, and is a standard part of quite a few deployed compression products. Used as a front end to a conventional compression algorithm -- or, in this case, a binary compression algorithm -- does indeed give significant improvements. The application to binary diff is particularly interesting, since it means you can deal with branches and other references *over* the compressed region, so this is really rather clever.

  26. Re:That's just a dissembler. How about bittorrent? by tvjunky · · Score: 1

    This is diffs the dissembled version of the original against the update on the server, then does the opposite on the client. I couldn't help but think of this as similar to Gentoo's model ... download a compressed diff of the source and then recompile. Both have the same problem: too much client-side CPU usage (though Gentoo's is an extreme of this). Isn't Google Chrome OS primarily targeting netbooks? Can such things handle that level of extra client-side computation without leaving users frustrated?

    I don't think this is really a problem in this case. In the time even a slow computer, by today's standards, has downloaded a kilobyte over a WAN link it has easily performed millions of CPU operations on it. The same would be true for any kind of compression really. Since Bandwidth through your pipe is just orders of magnitudes slower than anything that happens within your machine, this added level of complexity is clearly more beneficial than a direct approach. That's why it makes sense to compress files on the server (or the client uploading it to the server), transfer them and decompress them on the client, even if the client is quite slow.

    I'd rather improve the distribution model. Since packages are all signed, SSL and friends aren't needed for the transfer, nor does it need to come from a trusted authority. Bittorrent comes to mind. I'm quite disappointed that the apt-torrent project never went anywhere. It's clearly the solution.

    With patches between minor versions at about 80kB (as stated in TFA), I don't think that a distribution using bittorrent would really be the way to go here. Add to this the fact that google has quite a lot of bandwidth at their disposal and I don't see this happening anytime soon.
    I aggree however that it may be a good idea to transfer large amounts of linux packeges that way. But with a lot of smaller packages the protocol overhead of bittorrent might become a limiting factor regarding its usefulness.

  27. LLVM by reg · · Score: 1

    If you're going to do this in the long run, it would make more sense to distribute LLVM byte code and compile that to native code on the host... That way you have small diffs to the byte code, and also you get targeted optimization (and maybe even custom profile guided optimization).

  28. Not terribly novel by Anonymous Coward · · Score: 0

    This is a classic trade-off; you can compress better by sacrifizing platfom-indepence. It has been done before, e.g, the ExeDiff tool for the DEC Alpha. The only news here is that its being done by Google. Interested parties should refer to: G. Motta, J. Gustafson, and S. Chen. "Differential Compression of Executable Code," Proceedings of the Data Compression Conference (DCC 07), Mar. 2007, pp. 103-112 for an overview of the available tools and how they perform.

    Regards,
    Jacob Gorm Hansen, author of the EDelta linear-time executable differ.

  29. wrong title, of course by TheGratefulNet · · Score: 1

    its not 'binary diffing'.

    does it work with gif files? jpg? png? dng? mov? mpg?

    its meant for PROGRAM binary files. of a specific type of processor, at that.

    that's fine.

    but please say so, and don't imply its binary. its really specific and not helpful for image (pic, sound) formats.

    --

    --
    "It is now safe to switch off your computer."
  30. That's odd. by haelduksf · · Score: 1

    Google's Open-Source Chromium project announced a new compression technique called Courgette geared towards distributing really small updates today.

    That's a pretty small market- I can't imagine that there are many teams that will release really small updates today.

  31. Semantic Content FTW by Tiger · · Score: 1

    Of course, if you just generate a patch file with the source changes, and zip that up, it will be even tinier.

    Except for that upgrade of the underlying FooLib from 2.1 to 2.2 that's part of your hotfix. Well, all you need then is to just get a patch file for that too, and include that.

    And then compile the whole sucker on the other end.

    Everyone's got plenty of CPU, right? And we're just about all using trivially decompilable bytecode anyway, so if you make your patchfile based on the source changes after compilation and decompilation, you've got all the right transforms in place to come up with a pretty effectively minimal changeset.

    Of course, we have plenty of bandwidth anyway, and programs are small. Media files are not, but I'm not going to get any space savings trying to disassemble that picture of your mom.

  32. Anyone like zucchini? by Anonymous Coward · · Score: 1, Insightful

    In case anyone has missed the reference of the name "Courgette", it's French for summer squash/zucchini type vegetables. So, Courgette as in squash, and squash as in make smaller.

  33. More frequent updates? by watanuki · · Score: 1

    This, the Chromium devs say, will allow them to send smaller, more frequent updates, making users more secure.

    Is patch size usually a factor in determining whether to send out a patch or not?

    In my experience, people update because a patch fixes some bugs or introduce some features they want. The size of the patch doesn't matter unless the user is severely bandwidth-limited.

    For developers, patches are usually sent out on a schedule (e.g. Microsoft's patch Tuesday), when certain milestone is reached, or when a sufficiently dangerous bug is found and need to be fixed quickly. I am not any organization that sends out patches based on 'the size of the diffs accumulated so far', but please enlighten me if you know of one.

    1. Re:More frequent updates? by watanuki · · Score: 1

      Oops, the last sentence should of course read 'I am not aware of any organization that sends out patches ...'.

    2. Re:More frequent updates? by Anonymous Coward · · Score: 0

      I think it does matter. Time again I click 'Install Later' when Firefox wants to install all over again. I cannot be bothered to wait for this. I want to browse and browse now!

  34. The same as in French by Kupfernigk · · Score: 1

    The sort of English people who need to pronounce "courgette" are the sort who still go to old fashioned food shops, not supermarkets, and so have to ask for things. The sort who tend to be better off, older and who have spent more time on holiday in France and Italy (which is where we learnt to eat them in the first place.)

    --
    From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."
  35. What is EFF take on this? by Anonymous Coward · · Score: 0

    Electronic Frontier Foundation

  36. What? by microbee · · Score: 1

    Since this will be released as open source, it should make distributing updates a lot easier for the open-source community.

    We open source community don't have no F**KING business in binary distribution!

    1. Re:What? by doshell · · Score: 1

      Since this will be released as open source, it should make distributing updates a lot easier for the open-source community.

      We open source community don't have no F**KING business in binary distribution!

      Not all of us run Gentoo.

      --
      Score: i, Imaginary
  37. The dynamic deltup server network by xororand · · Score: 1

    A similar approach for distributing updates to source packages has been around for years: The dynamic deltup server network. You can tell their servers which source archives you already have and which new version you want. The server then unpacks both archives and sends you a deltup diff that can be used to create a bit-by-bit copy of the desired source archive, using the deltup program.
    An example use case for this are source based operating system distributions, like Gentoo GNU/Linux. The saved bandwidth is usually significant, often more than 90%.

    http://linux01.gwdg.de/~nlissne/dynamic.html

  38. Re:Evil bastages by CarpetShark · · Score: 1

    Somebody will troll and it'll get marked insightful or something stupid. And that last sentence guaranteed I won't get nothing but a -1 troll either, but oh well.

    Don't be so hard on yourself. I'm sure you were already modded down way before that sentence.

  39. Compare2Crack by anton_kg · · Score: 1

    Hey, my favorite Compare2Crack.v.0.10.(S)1995 just got opensourced! I wonder how long will take for crackers to adopt distribution of "bugfixes" for commercial software as well.

  40. Open Source community benefits from binary diff by omkhar · · Score: 1

    So the OPEN SOURCE community will benefit from BINARY diffs.

    uhuh. I think we're just fine with patch/diff.

    oh, what's that? You meant the people that DISTRIBUTE BINARY version of OPEN SOURCE programs will benefit? Ahhh, now I see.

    1. Re:Open Source community benefits from binary diff by zblach · · Score: 1

      I've never understood why distributing binaries has been a bad idea. Sure, having the source available is fantastic, but why is that diametrically opposed to binary distribution? Sometimes it's nice not having to compile everything from source, and especially for this use case. It's the same software compiled for the same platform. Compiling once and distributing the binary patch makes more sense than to send out a diff file and require a recompilation. Not every machine running google chrome is a development machine with all required libraries. If you want the source it's available. Why attack this method of updates? Isn't open source about choice and multiple solutions?

      --
      # cat /dev/mem | strings | grep -i sheep | wc -l i can't sleep.
  41. Re:That's just a dissembler. How about bittorrent? by Anonymous Coward · · Score: 0

    I think you mean "disassemble". Dissemble is not the opposite of assemble and means something quite different.

  42. Defective design by Anonymous Coward · · Score: 0

    Defective design (design is defective, intentional or not)
    Defective by design (intentionally defective design)
    Bad/ineffective design (unintentionally defective)

    The first one is ambiguous.

  43. source code patch by shird · · Score: 1

    If Google's OS is to become open source - why the need for binary diffs at all?

    Isn't it feasible that the client store all the source and the patch process just involves a diff to the original source code that gets recompiled?

    Also, this dissassemble,edit,reassemble scheme isn't new. There are a few viruses out there that perform similar actions in order to create space in the executable for themselves. pretty ingenious actually.

    --
    I.O.U One Sig.
    1. Re:source code patch by doshell · · Score: 1

      If Google's OS is to become open source - why the need for binary diffs at all?

      Isn't it feasible that the client store all the source and the patch process just involves a diff to the original source code that gets recompiled?

      Compiling takes time and CPU. Considering that they're aiming the Chrome OS at netbooks, I don't think they want to put that burden in the client. And even for better-than-average machines, I suspect (drawing from experience acquired during my Gentoo days) that compiling a whole browser application would take a considerable amount of time.

      --
      Score: i, Imaginary
  44. Overly Complex.. by thesupraman · · Score: 1

    From my first pass over it this actually looks like a very strange way to describe a rather simple approach.

    More generically they could achieve the same, if not more, by using a differential PPM or LZ method, which is very simple to design/implement, needs zeto knowledge of what data it is handling, and is in no way a new idea.

    I suspect whomever designed this had a good idea but not that enough compression experience to know solutions were already on hand.

    In effect the new binary should be compressed with a dictionary compresser pre-initialised with the contents of the old binary as an initial seed and everything needed is achieved - the initial compression time can be large for a large binary, however as this is a single compress, millions of decompress situation, it would not matter at all (and by large I mean minutes, not days).

    Still, their more complex description/implementation sounds much more whizzy and gets more attention, no doubt.

  45. many ways to do that by ei4anb · · Score: 2, Interesting
    I worked on a project back in the 80's where we had half a million lines of code running on a high availability machine with 384 32bit CPUs.

    As bandwidth was a tad limited in those days we too looked for an efficient way to distribute updates. The solution was to distribute the smaller bug fixes as patches, similar to debug scripts. The loader would run those debug scripts after loading the program. To apply a patch the customer would put the patch file in the same folder as the program, restart the program on the hot standby side of the cluster and provoke a switchover to the standby.

    The patch was applied without any downtime. If the customer wanted to back-out the bug fix then all they had to do was delete the patch file and switch back to the unpatched side of the cluster.

    Most patches were small and we only had a few hundred bytes to send out at a time. Afterwards the world upgraded to Windows and forgot such technology :-(

  46. Courgette-in-the-middle? by javarome · · Score: 1

    I may be missing something but I'm surprised that nobody raised the security issue of putting an man-in-the-middle able to disassemble the expected code and replace it by its own. What about signatures in this stuff?