A Look at Data Compression
With the new year fast approaching many of us look to the unenviable task of backing up last years data to make room for more of the same. That being said, rojakpot has taken a look at some of the data compression programs available and has a few insights that may help when looking for the best fit. From the article: "The best compressor of the aggregated fileset was, unsurprisingly, WinRK. It saved over 54MB more than its nearest competitor - Squeez. But both Squeez and SBC Archiver did very well, compared to the other compressors. The worst compressors were gzip and WinZip. Both compressors failed to save even 200MB of space in the aggregated results."
Is it just me or is that site really difficult to navigate amongst all those ads? Speed of compression would have been nice too.
I did a small test of the common linux compression commands back in 2000. Here are the results: (note that some of the command options have changed since then, for example tar now uses -j for bzip2)
THE COMPRESSION UTILITY TEST
Compression utilities tested: zip, rar, gzip, bzip2, tgz(tar with the z flag invoked). Each test was run three times. For each completed test the system was rebooted. Hardware used: Pentium2 350Mhz, 256Mb RAM. OS: linux Mandrake 7.1. The system load was minimal. The "time" commands was used to time the elapsed time, the "ls -l" command was used to determin the size and a script was used to determine the total size of gzip files.
Note: gzip, packs individual files recursively. For bzip2, the command invoked was tar -cvIf file.bz2 dir (in gnu tar, the I flag invokes bzip2). for tgz, tar with the z flag invokes gzip.
TEST 1 - compressing multiple files
total size of the dir: 91.621.857 bytes, total files: 3540 (most of these files are ascii and html, but there are a few gifs and jpgs too.)
default compression settings:
tool time elapsed MB/s compressed to time elapsed uncompressing
gzip 1m.44s 0.88 24.884.124 37s
zip 1m.10s 1.3 25.813.958 41s
rar 3m.25s 0.44 20.784.489 48s
bzip2 3m.54s 0.39 17.399.561 1m.17s
tgz 1m.09s 1.32 23.821.446 36s
maximum compression settings:
tool time elapsed MB/s compressed to time elapsed uncompressing
gzip 2m.00s 0.76 24.670.516 36s
zip 1m.42s 0.89 25.593.448 39s
rar 10m.12s 0.14 18.698.710 1m.02s
bzip2 n/a (the comprsession rate can not be specified through tar, is the maximum default?)
tgz n/a (the compression rate can not be specified through tar, is the maximum default?)
CONCLUSION: use tgz (tar with the z flag) if time is an issue, otherwise use bzip2(tar with the I flag)
TEST 2 - compressing 1 ascii file
size of the ascii file: 53.819.786 bytes (the file was taken out of my mailbox)
default compression settings:
tool time elapsed MB/s compressed to time elapsed uncompressing
gzip 42s 1.28 15.560.144 15s
zip 41s 1.31 15.560.261 17s
rar 1m.57s 0.45 11.507.387 17s
bzip2 1m.58s 0.45 10.788.502 39s
tgz 54s 0.99 15.560.907 8s
maximum compression settings:
tool time elapsed MB/s compressed to time elapsed uncompressing
gzip 44s 1.22 15.486.842 15s
zip 45s 1.19 15.486.959 16s
rar 6m.40s 0.08 09.582.810