Exhaustive Data Compressor Comparison

← Back to Stories (view on slashdot.org)

Exhaustive Data Compressor Comparison

Posted by kdawson on Sunday April 22, 2007 @02:10PM from the pick-one-smaller-or-faster dept.

crazyeyes writes "This is easily the best article I've seen comparing data compression software. The author tests 11 compressors: 7-zip, ARJ32, bzip2, gzip, SBC Archiver, Squeez, StuffIt, WinAce, WinRAR, WinRK, and WinZip. All are tested using 8 filesets: audio (WAV and MP3), documents, e-books, movies (DivX and MPEG), and pictures (PSD and JPEG). He tests them at different settings and includes the aggregated results. Spoilers: WinRK gives the best compression but operates slowest; AJR32 is fastest but compresses least."

4 of 305 comments (clear)

Min score:

Reason:

Sort:

Re:What's the point of compressing JPEG,MP3,DivX e by trytoguess · 2007-04-22 15:25 · Score: 5, Interesting

Er... did ya check out the comparisons? As you can see here here jpeg at least can be compressed considerably with Stuffit. According to this the program can "(partially) decode the image back to the DCT coefficients and recompress them with a much better algorithm then default Huffman coding." I've no idea what that means, but it does seem to be more thorough and complex than what you wrote.
Agreed completely. by Kadin2048 · 2007-04-22 16:28 · Score: 5, Interesting

Back in the early/mid 90s I was pretty obsessed with data compression because I was always short on hard drive space (and short on money to buy new hard drives with); as a result I tended to compress things using whatever the format du jour was if it could get me an extra percentage point or two. Man, was that a mistake.

Getting stuff out of some of those formats now is a real irritation. I haven't run into a case yet that's been totally impossible, but sometimes it's taken a while, or turned out to be a total waste of time once I've gotten the archive open.

Now, I try to always put a copy of the decompressor for whatever format I use (generally just tar + gzip) onto the archive media, in source form. The entire source for gzip is under 1MB, trivial by today's standards, and if you really wanted to cut size and only put the source for deflate on there, it's only 32KB.

It may sound tinfoil-hat, but you can't guarantee what the computer field is going to look like in a few decades. I had self-expanding archives, made using Compact Pro on a 68k Mac, thinking they'd make the files easy to recover later, which didn't help me at all now -- a modern (Intel) Mac won't touch it (although to be fair a PPC Mac will run OS 9 which will, and allegedly there's a Linux utility that will unpack CPP archives, although maybe not self-expanding ones).

Given the rate at which bandwidth and storage space are expanding, I think the market for closed-source, proprietary data compression schemes should be very limited; there's really no good reason to use them for anything that you're storing for an unknown amount of time. You don't have to be a believer in the "infocalypse" to realize that operating systems and entire computing-machine architectures change over time, and what's ubiquitous today may be unheard of in a decade or more.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Re:duh by Firethorn · 2007-04-22 16:37 · Score: 4, Interesting

Not only that, but you can sacrifice compression to create recovery capability in the case of lost/corrupted data, especially in the newer ones.

Missing part 3 of 10? No problem!

Of course, I'm a holder of a license for Rar from way back when. I like it.

--
I don't read AC A human right
Re:What's the point of compressing JPEG,MP3,DivX e by kyz · 2007-04-22 22:36 · Score: 4, Interesting

While the main thrust of JPEG is to do "lossy" compression, the final stage of creating a JPEG is to do lossless compression on the data. There are two different official methods you can use: Huffman Coding and Arithmetic Coding.

Both methods do the same thing: they statistically analyse all the data, then re-encode it so the most common values are encoded in a smaller way than the least common values.

Huffman's main limitation is that each value compressed needs to consume at least one bit. Arithmetic coding can fit several values into a single bit. Thus, arithmetic coding is always better than Huffman, as it goes beyond Huffman's self-imposed barrier.

However, Huffman is NOT patented, while most forms of arithmetic coding, including the one used in the JPEG standard, ARE patented. The authors of Stuffit did nothing special - they just paid the patent fee. Now they just unpack the Huffman-encoded JPEG data and re-encode it with arithmetic coding. If you take some JPEGs that are already compressed with arithmetic coding, Stuffit can do nothing to make them better. But 99.9% of JPEGs are Huffman coded, because it would be extortionately expensive for, say, a digital camera manufacturer, to get a JPEG arithmetic coding patent license.

So Stuffit doesn't have remarkable code, they just paid money to get better compression that 99.9% of people specifically avoid because they don't think it's worth the money.

--
Does my bum look big in this?