Exhaustive Data Compressor Comparison

← Back to Stories (view on slashdot.org)

Exhaustive Data Compressor Comparison

Posted by kdawson on Sunday April 22, 2007 @02:10PM from the pick-one-smaller-or-faster dept.

crazyeyes writes "This is easily the best article I've seen comparing data compression software. The author tests 11 compressors: 7-zip, ARJ32, bzip2, gzip, SBC Archiver, Squeez, StuffIt, WinAce, WinRAR, WinRK, and WinZip. All are tested using 8 filesets: audio (WAV and MP3), documents, e-books, movies (DivX and MPEG), and pictures (PSD and JPEG). He tests them at different settings and includes the aggregated results. Spoilers: WinRK gives the best compression but operates slowest; AJR32 is fastest but compresses least."

29 of 305 comments (clear)

duh by Gen.+Malaise · 2007-04-22 14:12 · Score: 5, Funny

Nothing to see. High compression = slow and low compression = fast. umm duh?
1. Re:duh by setirw · 2007-04-22 14:18 · Score: 5, Funny
  
  High compression = slow and low compression = fast
  
  You compressed the article into that statement. How long did it take to write the comment?
  
  --
  This message printed on 100% post-consumer recycled electrons.
2. Re:duh by h2g2bob · 2007-04-22 15:19 · Score: 5, Funny
  
  Oh, if only they'd compressed the article onto a single page!
3. Re:duh by morcego · 2007-04-22 16:03 · Score: 4, Informative
  
  So you alreay knew WinRK gave the best compression? I didn't; never even heard of it. My money would have been on bzip2.
  
  I agree with you on the importance of this article but ... bzip2 ? C'mon.
  Yes, I know it is better than gzip, and it is also supported everywhere. But it is much worst than the "modern" compression algorithms.
  
  I have been using LZMA for some time now for things I need to store longer, and getting good results. It is not on the list, but should give results a little bit better than RAR. Too bad it is only fast when you have a lot of memory.
  
  For short/medium time storage, I use bzip2. Online compression, gzip (zlib), of course.
  
  --
  morcego
4. Re:duh by Firethorn · 2007-04-22 16:37 · Score: 4, Interesting
  
  Not only that, but you can sacrifice compression to create recovery capability in the case of lost/corrupted data, especially in the newer ones.
  
  Missing part 3 of 10? No problem!
  
  Of course, I'm a holder of a license for Rar from way back when. I like it.
  
  --
  I don't read AC A human right
5. Re:duh by timeOday · 2007-04-22 17:07 · Score: 4, Informative
  
  I agree with you on the importance of this article but ... bzip2 ? C'mon.
  Well, now I know.
  Here's a scatterplot of resulting file sizes and compression times from the text compression data (lower is better), and as my luck would have it, bzip2 is really the only one that's out of line - i.e. the furthest from the pareto frontier. But then, looking at the same data with file sizes plotted in the range of [0.0, 1.0], it seems like there's a major case of diminishing returns for the expensive algorithms anyways. If you care at all about compression time, good ol' gzip is still a pretty decent choice!
6. Re:duh by Compact+Dick · 2007-04-22 17:30 · Score: 4, Informative
  
  LZMA ... is not on the list 7-Zip [included in the test] is based on LZMA.
  
  --
  Use ISO 8601 dates [YYYY-MM-DD]
WOW! by vertigoCiel · 2007-04-22 14:15 · Score: 5, Funny

I never would have guessed that there was a tradeoff between the quality and speed of compression! No way! Next they'll be saying things like 1080p HD offers quality at the expense of computational power required!
Screw speed, size reduction: gimme compatibility by xxxJonBoyxxx · 2007-04-22 14:17 · Score: 5, Insightful

Screw speed and size reduction. All I want it compatibility with other OSs (i.e., fewest things that have to be installed on a base OS to use it). For that, I'd have to say Zip and/or gzip wins.
small = slow by Anonymous Coward · 2007-04-22 14:17 · Score: 5, Funny

So that's why smaller computers are slower, right?
I keep it simple by Anonymous Coward · 2007-04-22 14:17 · Score: 5, Funny

I fill an old station wagon with backup tapes, and then put it in the crusher.
Skip the blogspam by Anonymous Coward · 2007-04-22 14:19 · Score: 5, Informative

as its slashdotted

this site
http://www.maximumcompression.com/
has been up for years and performs tests on all the compressors with various input sources, much more comprehensive
Re:/. effect rears its ugly head once again! by killa62 · 2007-04-22 14:23 · Score: 5, Funny

yep, looks like they're using WinRK on the fly to decompress the website from storage
Interesting, needs better graphs by MBCook · 2007-04-22 14:23 · Score: 4, Informative

I read this earlier today through the firehose. It was interesting, but the graphs are what struck me. It seems to me all the graphs should have been XY plots instead of pairs of histograms. That way you could easily see the relationship between compression ratio and time taken. Their "metric" for showing this, basically multiplying the two numbers, is pretty bogus and isn't nearly as easy to compare. With the XY plot the four corners are all very meaningful. One is slow with no compression, one each good compression/time, and the sweet spot of good compression and good time. It's easy to tell those on two opposing corners apart (good compression vs good time), where as with the article's metric they could look very similar.
Still, interesting to see. The popular formats are VERY well established at this point (ZIP in Windows and Mac (stuffit seems to be fading fast), and GZIP and BZIP2 on Linux). They are so common (especially with ZIP support built into Windows since XP and also built into OS X) I don't think we'll see them replaced any time soon. Of course, with CPU power getting cheaper and cheaper we are seeing formats that are more and compressed (MP3, H264, Divx, JPEG, etc) so these utilities are becoming less and less necessary. I no longer need to stuff files on floppies (I've got the net, DVD-Rs, and flash drives). Heck, if you look at some of the formats they "compressed" (at like 4% max) you almost might as well use TAR.

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
Re:Screw speed, size reduction: gimme compatibilit by Nogami_Saeko · 2007-04-22 14:24 · Score: 4, Insightful

Nice comparison, but there's really only two that matter (at least on PCs):

ZIP for cross-platform compatibility (and for simplicity for less technically-minded users).

RAR for everything else (at 3rd in their "efficiency" list, it's easy to see why it's so popular, not to mention ease of use for splitting archives, etc).

--
"Nothing strengthens authority so much as silence." - Charles de Gaulle
Poor article. by FellowConspirator · 2007-04-22 14:24 · Score: 5, Insightful

This is a poor article on several points. First, the entropy of the data in the files isn't quantified. Second, the strategy used for compression isn't described at all. If WinRK compresses so well on very high entropy data, there must be some filetype specific strategies used.

Versions of the programs aren't given, nor the compile-time options (for the open source ones).

Finally, Windows Vista isn't a suitable platform for conducting the tests. Most of these tools target WinXP in their current versions and changes to Vista introduced systematic differences in very basic things like memory usage, file I/O properties, etc.

The idea of the article is fine, it's just that the analysis is half-baked.
1. Re:Poor article. by RedWizzard · 2007-04-22 15:10 · Score: 5, Insightful
  
  I've got some more issues with the article. They didn't test filesystem compression. This would have been interesting to me because often the choice I make is not between different archivers, but between using an archiver or just compressing the directory with NTFS' native compression.
  They also focused on compression rate when I believe they should have focused on decompression rate. I'll probably only archive something once, but I may read from the archive dozens of times. What matters to me is the trade-off between space saved and extra time taken to read the data, not the one-off cost of compressing it.
What's the point of compressing JPEG,MP3,DivX etc by mochan_s · 2007-04-22 14:26 · Score: 5, Insightful

What's the point of compressing JPEG,MP3,DivX etc since they already do the compression? The streams are close to random (with max information) and all you could compress would be the headers between blocks in movies or the ID3 tag in MP3.
english language is mostly fluff by Blue+Shifted · 2007-04-22 14:27 · Score: 4, Funny

the most interesting thing about text compression is that there is only about 20% information in the english language (or less). yes, that means that 4/5ths of it is meaningless filler. filled up with repetitive patterns. as you can see, i really didn't need four sentences to tell you that, either.

i wonder how other languages compare, and if there is a way to communicate much more efficiently.
7zip by Lehk228 · 2007-04-22 14:33 · Score: 4, Insightful

7-zip cribsheet:

weak on retarded things to zip like WAV files (use FLAC) mp3's, jpegs and divx movies.

7zip does quite well in documents (2nd) and ebooks (2nd) 3rd on MPEG video, 2nd in PSD

also i expect 7zip will improve in higher end compressions settings, when possible i give it hundreds of megs and unlike commercial apps 7zip can be configured well into the "insane" range

--
Snowden and Manning are heroes.
Re:What about LHA, TAR by 644bd346996 · 2007-04-22 14:42 · Score: 4, Informative

TAR is not a compressor.
Archive Comparison Test by Repton · 2007-04-22 15:09 · Score: 4, Insightful

See also: the Archive Comparison Test. Covers 162 different archivers over a bunch of different file types.

It hasn't been updated in a while (5 years), but have the algorithms in popular use changed much? I remember caring about compression algorithms when I was downloading stuff from BBSs at 2400 baud, or trading software with friends on 3.5" floppies. But in these days of broadband, cheap writable CDs, and USB storage, does anyone care about squeezing the last few bytes out of an archive? zip/gzip/bzip2 are good enough for most people for most uses.

--
Repton.
They say that only an experienced wizard can do the tengu shuffle.
Exhaustive?! by jagilbertvt · 2007-04-22 15:12 · Score: 5, Informative

It seems odd that they didn't include executables/dlls in the comparison (where maxmumcompression.com does). I also find it odd that they are compressing items that normally don't compress very well with most data compression programs (divx/mpegs/jpegs/etc). I'm guessing this is why 7-zip ranked a bit lower than most.

I did some comparison last year, and found 7-zip to do the best job for what I needed (great compression ratio without requiring days to complete). It also doesn't take into account the network speed at which the file is going to be transmitted. I use 7-zipfor pushing application updates and such to remote offices (most over 384k/768k WAN links). Compressing w/ 7-zip has saved users quite a bit of time compared to winrar or winzip.

I would definitely recommend checking out maximumcompression.com (As others have, as well) over this article. It goes into a lot greater detail.
Re:What's the point of compressing JPEG,MP3,DivX e by trytoguess · 2007-04-22 15:25 · Score: 5, Interesting

Er... did ya check out the comparisons? As you can see here here jpeg at least can be compressed considerably with Stuffit. According to this the program can "(partially) decode the image back to the DCT coefficients and recompress them with a much better algorithm then default Huffman coding." I've no idea what that means, but it does seem to be more thorough and complex than what you wrote.
Re:This is nothing new by Starburnt · 2007-04-22 15:29 · Score: 4, Funny

So they've compressed it to 11. I'd say that's a step forward.
Agreed completely. by Kadin2048 · 2007-04-22 16:28 · Score: 5, Interesting

Back in the early/mid 90s I was pretty obsessed with data compression because I was always short on hard drive space (and short on money to buy new hard drives with); as a result I tended to compress things using whatever the format du jour was if it could get me an extra percentage point or two. Man, was that a mistake.

Getting stuff out of some of those formats now is a real irritation. I haven't run into a case yet that's been totally impossible, but sometimes it's taken a while, or turned out to be a total waste of time once I've gotten the archive open.

Now, I try to always put a copy of the decompressor for whatever format I use (generally just tar + gzip) onto the archive media, in source form. The entire source for gzip is under 1MB, trivial by today's standards, and if you really wanted to cut size and only put the source for deflate on there, it's only 32KB.

It may sound tinfoil-hat, but you can't guarantee what the computer field is going to look like in a few decades. I had self-expanding archives, made using Compact Pro on a 68k Mac, thinking they'd make the files easy to recover later, which didn't help me at all now -- a modern (Intel) Mac won't touch it (although to be fair a PPC Mac will run OS 9 which will, and allegedly there's a Linux utility that will unpack CPP archives, although maybe not self-expanding ones).

Given the rate at which bandwidth and storage space are expanding, I think the market for closed-source, proprietary data compression schemes should be very limited; there's really no good reason to use them for anything that you're storing for an unknown amount of time. You don't have to be a believer in the "infocalypse" to realize that operating systems and entire computing-machine architectures change over time, and what's ubiquitous today may be unheard of in a decade or more.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Re:rar? by Petrushka · 2007-04-22 17:19 · Score: 4, Funny

I take it you come from a planet where very few people use Windows. Please, I'm curious to know, what are things like there?
How about by DrSkwid · 2007-04-22 19:34 · Score: 4, Funny

Give it am MD5 hash and a file length and it will compute all the possible files that could have produced the hash. Automatically filter our the invalid files and the set you're left with can't be that large.

--
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Re:What's the point of compressing JPEG,MP3,DivX e by kyz · 2007-04-22 22:36 · Score: 4, Interesting

While the main thrust of JPEG is to do "lossy" compression, the final stage of creating a JPEG is to do lossless compression on the data. There are two different official methods you can use: Huffman Coding and Arithmetic Coding.

Both methods do the same thing: they statistically analyse all the data, then re-encode it so the most common values are encoded in a smaller way than the least common values.

Huffman's main limitation is that each value compressed needs to consume at least one bit. Arithmetic coding can fit several values into a single bit. Thus, arithmetic coding is always better than Huffman, as it goes beyond Huffman's self-imposed barrier.

However, Huffman is NOT patented, while most forms of arithmetic coding, including the one used in the JPEG standard, ARE patented. The authors of Stuffit did nothing special - they just paid the patent fee. Now they just unpack the Huffman-encoded JPEG data and re-encode it with arithmetic coding. If you take some JPEGs that are already compressed with arithmetic coding, Stuffit can do nothing to make them better. But 99.9% of JPEGs are Huffman coded, because it would be extortionately expensive for, say, a digital camera manufacturer, to get a JPEG arithmetic coding patent license.

So Stuffit doesn't have remarkable code, they just paid money to get better compression that 99.9% of people specifically avoid because they don't think it's worth the money.

--
Does my bum look big in this?