New Binary Diffing Algorithm Announced By Google
bheer writes "Today Google's Open-Source Chromium project announced a new compression technique called Courgette geared towards distributing really small updates. Courgette achieves smaller diffs (about 9x in one example) than standard binary-diffing algorithms like bsdiff by disassembling the code and sending the assembler diffs over the wire. This, the Chromium devs say, will allow them to send smaller, more frequent updates, making users more secure. Since this will be released as open source, it should make distributing updates a lot easier for the open-source community."
An interesting approach - I wonder if this would also work as well on compiled bytecode like .NET or Java uses?
Google has to pay the cost for maintaining servers and handling bandwidth for all the OS updates they push out. The more efficient they are in this process, the more money the save.
The good news is that the same benefits could be applied to Red Hat, Ubuntu, openSUSE, etc. Lower costs helps the profitability of companies trying to make a profit on Linux.
The end users also see benefits in that their packages download quicker. I'd be honestly pretty disappointed in any major distro that doesn't start implementing a binary diff solution around this.
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
announced a new compression technique called Courgette geared towards distributing really small updates
I just RTFA, this has nothing to do with a compression technique.
What they developed is a technique to make small diffs from *executable binary files* and it doesn't look like it's portable to anything other than x86 because the patch engine has to embed an architecture specific assembler + disasembler.
That's potentially very misleading. I can compress any document, down to a single but, if my compression algorithm is sufficiently tailored to that document. For example:
if (compressed_data[0] == 0):
return = get_Magna_Carta_text()
else:
return unzip(compressed_data[1:])
What we need to know is the overall distribution of compression ratios, or at least the average compression ratio, over a large population of documents.
Please?
Utilizing the synergization of benchmark e-solutions to pre-workaround action items!
If the code is so awful that the bandwidth required for security updates is a problem, the product is defective by design.
No one is saying that the bandwidth is a problem. They're just saying that the bandwidth is unnecessary. FSM-forbid that anyone try to optimize something.
Plus, as the article points out, smaller updates mean more people can receive the update per unit-bandwidth, which means faster distribution of security updates when something critical is fixed.
zoo-kee-nee
Probably because the old solution was:
A) Simple
B) Good enough for most purposes.
Sure, you can shave 80% off your patch filesize... but unless you're as big as google, patch bandwidth probably isn't a major priority -- you've likely got much more important things to dedicate engineering resources to.
You know how they say "Necessity is the mother of invention"? Well, when an invention isn't really necessary for most folks, it tends to show up a little later than it might otherwise have.
Out of curiosity, could you please point us to some of your code so we can do a comparison?
Thanks.
Actually, it made me smack my head and say "I assumed we were already doing this".
Sure, you can shave 80% off your patch filesize... but unless you're as big as google, patch bandwidth probably isn't a major priority
So, you've never installed a service pack or another OS update? I'd be more than happy to run "sudo aptitude patch-upgrade" to download the 3KB difference between OpenOffice 3.1.foo and 3.1.foo+1.
Dewey, what part of this looks like authorities should be involved?
Well, not to mention 90%+ of Windows users lack a compiler and linker. And for the other 10% it can be either some version of VC or a windows port of GCC. This is doubly annoying because C++ code has no standardized ABI (in the vein of early 90's UNIX adding extensions that did nothing but make sysadmin work twice as hard) so you have to ensure all your .dlls have been compiled by the same compiler, etc.
Asking a Windows user to compile something is like asking a painter to engineer a building.
In case anyone has missed the reference of the name "Courgette", it's French for summer squash/zucchini type vegetables. So, Courgette as in squash, and squash as in make smaller.