A Look at Data Compression
With the new year fast approaching many of us look to the unenviable task of backing up last years data to make room for more of the same. That being said, rojakpot has taken a look at some of the data compression programs available and has a few insights that may help when looking for the best fit. From the article: "The best compressor of the aggregated fileset was, unsurprisingly, WinRK. It saved over 54MB more than its nearest competitor - Squeez. But both Squeez and SBC Archiver did very well, compared to the other compressors. The worst compressors were gzip and WinZip. Both compressors failed to save even 200MB of space in the aggregated results."
Just downloaded it and I find that it compresses significantly better than winrar when both are set to maximum. Decompress is quite slow. I use it to compress a small collection of utilities.
Humor from a Genetically Molested Mind
WinRK may have won only because he used the fast compression setting on all the compressors he tested. Results for default setting and best compression settings are TBA.
I did a short review and benchmarking of unix compressors people might be interested in.
Mouse powered Chips, Open source Processors and Lego
In this day and age, when magnetic storage is like $0.50 to $0.75 per GIGABYTE, I just can't fathom why a responsible admin would risk the possible data corruption that could come with compression.
Because when you are storing Petabytes of information it makes a difference in cost.
Besides, all the problems you mention with data coruption can be solved by backing up the information more than once. Anyplace that places a high value on there info is going to have multiple backups in multiple places anyways. The most usefull application of compression is in archiving old customer records. Being mostly text, you can easily get above 50% compression ratios. Also, these are going to be backed up to tape (not disk). Being able to reduce the volume of tapes being stored by 50% can save a lot of money for a large organization.
Fly me to the moon Let me sing among those stars Let me see what spring is like On jupiter and mars
"My concern with all the 'new' compression programs is that they, unlike Zip, haven't survived the test of time. I've recovered damaged zip archives in the past and they have come through mostly intact. I've used archive/compression like ARJ with options to be able to recover data even if there are multiple bad sectors on a harddrive or floppy disk. How many of the new compression programs have the tools available to adequately recover every possible byte of data?"
The solution to this issue is popular on usenet, since it's common for large files to be damaged. There's a utility called par2 that allows recovery information to be sent, and it's extremely effective. It's format-neutral, but most large binaries are sent as multi-part RAR archives. par2 can handle just about any damage that occurs, up to and including missing files.
Most of the time however, when it's simply someone downloading something it is only necessary to detect damage so they can download it again. All the formats I have experience with can detect damage, and it's common for MD5 and SHA1 sums to be sent separately anyway for security reasons.
I rarely criticize things I don't care about.
If you look at the methodology - all the results were obtained using the software set to the fastest mode - not the best compression mode.
.wav and .mp3 files I'd want to do a binary compare of the restored files to ensure they weren't just run through a lossy codec...
So, I would consider gzip the best performer by this criteria. After all, if I cared most about space savings I'd have picked the best-mode - not the fast-mode. All this articles suggests is that a few archivers are REALLY lousy for doing FAST compression.
If my requirements were realtime compression (maybe for streaming multimedia) then I wouldn't be bothered with some mega-compression algorithm that takes 2 minutes per MB to pack the data.
Might I suggest a better test? If interested in best compression, then run each program in a mode which optimizes purely for compression ratio. On the other hand, if interested in realtime compression then take each algorithm and tweak the parameters so that they all run in the same time (which is a realtively fast time), and then compare compression ratios.
With the huge compression of multimedia files I'd also want the reviewers to state explicity that the compression was verified to be lossless. I've never heard of some of these proprietary apps, but if they're getting significant ratios out of
Since WinZip does not handle .7z, .ace or .rar files, it has lost much of its appeal for me. With my old serial no longer working, I now have absolutely no reason to use it. Now when I need a compressor for Windows I choose WinAce & 7-Zip. Between those two programs, I can de-/compress just about any format you're likely to encounter online.
/dev/random
It's interesting to note that Stuffit produces worthwhile compression of JPG images, something long thought to be impossible.
I'd heard the makers of Stuffit were claiming this, but I was sceptical, it's good to see independant confirmation.
Quidquid Latine dictum sit, altum videtur (anything said in Latin sounds important)
if you download a file over gprs and each megabyte costs you 3$, then saving 200 megabytes means saving 600$, which is a price for a low-end pc or almost a laptop.
another case is if you only have 100 megabytes you can use and only a zzzxxxyyy archiver can compress it into the 100mb while gzip -9 leaves you with 102mb.
so it really depends if you need it or not. sometimes you need it, mostly you don't.
but bashing on the issue "like nobody ever needs it" is certainly wrong.
I'd tell you the chances of this story being a dupe, but you wouldn't like it.
Phillip W. Katz, better known as Phil Katz (November 3, 1962-April 14, 2000), was a computer programmer best-known as the author of PKZIP, a program for compressing files which ran under the PC operating system DOS.
http://en.wikipedia.org/wiki/Phil_Katz