Exhaustive Data Compressor Comparison

← Back to Stories (view on slashdot.org)

Exhaustive Data Compressor Comparison

Posted by kdawson on Sunday April 22, 2007 @02:10PM from the pick-one-smaller-or-faster dept.

crazyeyes writes "This is easily the best article I've seen comparing data compression software. The author tests 11 compressors: 7-zip, ARJ32, bzip2, gzip, SBC Archiver, Squeez, StuffIt, WinAce, WinRAR, WinRK, and WinZip. All are tested using 8 filesets: audio (WAV and MP3), documents, e-books, movies (DivX and MPEG), and pictures (PSD and JPEG). He tests them at different settings and includes the aggregated results. Spoilers: WinRK gives the best compression but operates slowest; AJR32 is fastest but compresses least."

15 of 305 comments (clear)

duh by Gen.+Malaise · 2007-04-22 14:12 · Score: 5, Funny

Nothing to see. High compression = slow and low compression = fast. umm duh?
1. Re:duh by setirw · 2007-04-22 14:18 · Score: 5, Funny
  
  High compression = slow and low compression = fast
  
  You compressed the article into that statement. How long did it take to write the comment?
  
  --
  This message printed on 100% post-consumer recycled electrons.
2. Re:duh by h2g2bob · 2007-04-22 15:19 · Score: 5, Funny
  
  Oh, if only they'd compressed the article onto a single page!
WOW! by vertigoCiel · 2007-04-22 14:15 · Score: 5, Funny

I never would have guessed that there was a tradeoff between the quality and speed of compression! No way! Next they'll be saying things like 1080p HD offers quality at the expense of computational power required!
Screw speed, size reduction: gimme compatibility by xxxJonBoyxxx · 2007-04-22 14:17 · Score: 5, Insightful

Screw speed and size reduction. All I want it compatibility with other OSs (i.e., fewest things that have to be installed on a base OS to use it). For that, I'd have to say Zip and/or gzip wins.
small = slow by Anonymous Coward · 2007-04-22 14:17 · Score: 5, Funny

So that's why smaller computers are slower, right?
I keep it simple by Anonymous Coward · 2007-04-22 14:17 · Score: 5, Funny

I fill an old station wagon with backup tapes, and then put it in the crusher.
Skip the blogspam by Anonymous Coward · 2007-04-22 14:19 · Score: 5, Informative

as its slashdotted

this site
http://www.maximumcompression.com/
has been up for years and performs tests on all the compressors with various input sources, much more comprehensive
Re:/. effect rears its ugly head once again! by killa62 · 2007-04-22 14:23 · Score: 5, Funny

yep, looks like they're using WinRK on the fly to decompress the website from storage
Poor article. by FellowConspirator · 2007-04-22 14:24 · Score: 5, Insightful

This is a poor article on several points. First, the entropy of the data in the files isn't quantified. Second, the strategy used for compression isn't described at all. If WinRK compresses so well on very high entropy data, there must be some filetype specific strategies used.

Versions of the programs aren't given, nor the compile-time options (for the open source ones).

Finally, Windows Vista isn't a suitable platform for conducting the tests. Most of these tools target WinXP in their current versions and changes to Vista introduced systematic differences in very basic things like memory usage, file I/O properties, etc.

The idea of the article is fine, it's just that the analysis is half-baked.
1. Re:Poor article. by RedWizzard · 2007-04-22 15:10 · Score: 5, Insightful
  
  I've got some more issues with the article. They didn't test filesystem compression. This would have been interesting to me because often the choice I make is not between different archivers, but between using an archiver or just compressing the directory with NTFS' native compression.
  They also focused on compression rate when I believe they should have focused on decompression rate. I'll probably only archive something once, but I may read from the archive dozens of times. What matters to me is the trade-off between space saved and extra time taken to read the data, not the one-off cost of compressing it.
What's the point of compressing JPEG,MP3,DivX etc by mochan_s · 2007-04-22 14:26 · Score: 5, Insightful

What's the point of compressing JPEG,MP3,DivX etc since they already do the compression? The streams are close to random (with max information) and all you could compress would be the headers between blocks in movies or the ID3 tag in MP3.
Exhaustive?! by jagilbertvt · 2007-04-22 15:12 · Score: 5, Informative

It seems odd that they didn't include executables/dlls in the comparison (where maxmumcompression.com does). I also find it odd that they are compressing items that normally don't compress very well with most data compression programs (divx/mpegs/jpegs/etc). I'm guessing this is why 7-zip ranked a bit lower than most.

I did some comparison last year, and found 7-zip to do the best job for what I needed (great compression ratio without requiring days to complete). It also doesn't take into account the network speed at which the file is going to be transmitted. I use 7-zipfor pushing application updates and such to remote offices (most over 384k/768k WAN links). Compressing w/ 7-zip has saved users quite a bit of time compared to winrar or winzip.

I would definitely recommend checking out maximumcompression.com (As others have, as well) over this article. It goes into a lot greater detail.
Re:What's the point of compressing JPEG,MP3,DivX e by trytoguess · 2007-04-22 15:25 · Score: 5, Interesting

Er... did ya check out the comparisons? As you can see here here jpeg at least can be compressed considerably with Stuffit. According to this the program can "(partially) decode the image back to the DCT coefficients and recompress them with a much better algorithm then default Huffman coding." I've no idea what that means, but it does seem to be more thorough and complex than what you wrote.
Agreed completely. by Kadin2048 · 2007-04-22 16:28 · Score: 5, Interesting

Back in the early/mid 90s I was pretty obsessed with data compression because I was always short on hard drive space (and short on money to buy new hard drives with); as a result I tended to compress things using whatever the format du jour was if it could get me an extra percentage point or two. Man, was that a mistake.

Getting stuff out of some of those formats now is a real irritation. I haven't run into a case yet that's been totally impossible, but sometimes it's taken a while, or turned out to be a total waste of time once I've gotten the archive open.

Now, I try to always put a copy of the decompressor for whatever format I use (generally just tar + gzip) onto the archive media, in source form. The entire source for gzip is under 1MB, trivial by today's standards, and if you really wanted to cut size and only put the source for deflate on there, it's only 32KB.

It may sound tinfoil-hat, but you can't guarantee what the computer field is going to look like in a few decades. I had self-expanding archives, made using Compact Pro on a 68k Mac, thinking they'd make the files easy to recover later, which didn't help me at all now -- a modern (Intel) Mac won't touch it (although to be fair a PPC Mac will run OS 9 which will, and allegedly there's a Linux utility that will unpack CPP archives, although maybe not self-expanding ones).

Given the rate at which bandwidth and storage space are expanding, I think the market for closed-source, proprietary data compression schemes should be very limited; there's really no good reason to use them for anything that you're storing for an unknown amount of time. You don't have to be a believer in the "infocalypse" to realize that operating systems and entire computing-machine architectures change over time, and what's ubiquitous today may be unheard of in a decade or more.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."