A Look at Data Compression
With the new year fast approaching many of us look to the unenviable task of backing up last years data to make room for more of the same. That being said, rojakpot has taken a look at some of the data compression programs available and has a few insights that may help when looking for the best fit. From the article: "The best compressor of the aggregated fileset was, unsurprisingly, WinRK. It saved over 54MB more than its nearest competitor - Squeez. But both Squeez and SBC Archiver did very well, compared to the other compressors. The worst compressors were gzip and WinZip. Both compressors failed to save even 200MB of space in the aggregated results."
Just downloaded it and I find that it compresses significantly better than winrar when both are set to maximum. Decompress is quite slow. I use it to compress a small collection of utilities.
Humor from a Genetically Molested Mind
It's a real shame that 1) the guy only did Windows archivers, and 2) SBC Archiver is no longer in active development, closed source, and Windows-only.
Disinfect the GNU General Public Virus!
WinRK may have won only because he used the fast compression setting on all the compressors he tested. Results for default setting and best compression settings are TBA.
I did a short review and benchmarking of unix compressors people might be interested in.
Mouse powered Chips, Open source Processors and Lego
Speed aside [and speed would be a huge concern if you insisted on compression], I just don't understand the desire for compression in the first place.
As the administrator, your fundamental obligation is data integrity. If you compress, and if the compressed file store is damaged [especially if the header information on a compressed file - or files - is damaged], then you will tend to lose ALL of your data.
On the other hand, if your file store is ASCII/ANSI text, then even if file headers are damaged, you can still read the raw disk sectors and recover most of your data [might take a while, but at least it's theoretically do-able].
In this day and age, when magnetic storage is like $0.50 to $0.75 per GIGABYTE, I just can't fathom why a responsible admin would risk the possible data corruption that could come with compression.
Looks like the site got slashdotted while I was in the middle of reading it. What file types were used as input? Clearly compression algorithms differ on the file types that they work best on. Also, a better metric would probably have been space/time, rather than just using time. Also, I know that zlib, for example, allows you to choose the compression level - was this explored at all?
Also, do any of you know any lossless algorithms for media (movies, images, music, etc)? Most algorithms perform poorly in this area, but I thought that perhaps there were some specifically designed for this.
Not only time, but also how much memory the algorithm uses, though the author did not mention how much space each algorithm uses. gzip, for instance, does not use much, but others, like rzip (http://rzip.samba.org/) uses alot. rzip may use up to 900MB during compression.
I did a test with compressing a 4GB tar archive with rzip, wich result in a compressed file of 2.1 GB. gzip at max compression gave about 2.7 GB.
So one should choose an algorithm based upon need, and of course, availability of source code. Using a propetiary, closed source compression algorithm with no open source alternative implementation is begging for trouble down the road,
If you look at the methodology - all the results were obtained using the software set to the fastest mode - not the best compression mode.
.wav and .mp3 files I'd want to do a binary compare of the restored files to ensure they weren't just run through a lossy codec...
So, I would consider gzip the best performer by this criteria. After all, if I cared most about space savings I'd have picked the best-mode - not the fast-mode. All this articles suggests is that a few archivers are REALLY lousy for doing FAST compression.
If my requirements were realtime compression (maybe for streaming multimedia) then I wouldn't be bothered with some mega-compression algorithm that takes 2 minutes per MB to pack the data.
Might I suggest a better test? If interested in best compression, then run each program in a mode which optimizes purely for compression ratio. On the other hand, if interested in realtime compression then take each algorithm and tweak the parameters so that they all run in the same time (which is a realtively fast time), and then compare compression ratios.
With the huge compression of multimedia files I'd also want the reviewers to state explicity that the compression was verified to be lossless. I've never heard of some of these proprietary apps, but if they're getting significant ratios out of
Since WinZip does not handle .7z, .ace or .rar files, it has lost much of its appeal for me. With my old serial no longer working, I now have absolutely no reason to use it. Now when I need a compressor for Windows I choose WinAce & 7-Zip. Between those two programs, I can de-/compress just about any format you're likely to encounter online.
/dev/random
3 hours 47 minutes with WinRK versus gzipping in 3 minutes 16 seconds. Is it really worth watching the progress bar for 200 megs smaller file?
/not surprisingly, the article is about 2.5 gb files
If your file starts out as 250 mb, it might be worth it. However, if you start with a 2.5 gb file, then it's almost certainly not -- especially once you take the closed-source and undocumented nature of the compression algorithm into account.
http://rzip.samba.org/ is a phenomenal compressor. It does much better than bzip2 or rar on large files and is open source.
Interesting that the article talks about compression ratio and compression speed. When considering compression, Decompression time is extremely relevant. I don't mind witing more to compress the fileset, as long as decompression is fast. I normally compress once, and then decompress various times (media files and games for example).
It's interesting to note that Stuffit produces worthwhile compression of JPG images, something long thought to be impossible.
I'd heard the makers of Stuffit were claiming this, but I was sceptical, it's good to see independant confirmation.
Quidquid Latine dictum sit, altum videtur (anything said in Latin sounds important)
Proprietary, costs money...
I use ZipGenius - handles 20 compression formats including RAR, ACE, JAR, TAR, GZ, BZ, ARJ, CAB, LHA, LZH, RPM, 7-Zip, OpenOffice/StarOffice Zip files, UPX, tc.
You can encrypt files with one of four algorhythms (CZIP, Blowfish, Twofish, Rijndael AES).
If you set an antivirus path in ZipGenius options, the program will prompt you to perform an AV scan before running the selected file.
It has an FTP client, TWAIN device image importing, file splitting, convert RAR into SFX, converts any Zip archive into an ISO image file, etc.
And it's totally free.
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
if you download a file over gprs and each megabyte costs you 3$, then saving 200 megabytes means saving 600$, which is a price for a low-end pc or almost a laptop.
another case is if you only have 100 megabytes you can use and only a zzzxxxyyy archiver can compress it into the 100mb while gzip -9 leaves you with 102mb.
so it really depends if you need it or not. sometimes you need it, mostly you don't.
but bashing on the issue "like nobody ever needs it" is certainly wrong.
I'd tell you the chances of this story being a dupe, but you wouldn't like it.
Phillip W. Katz, better known as Phil Katz (November 3, 1962-April 14, 2000), was a computer programmer best-known as the author of PKZIP, a program for compressing files which ran under the PC operating system DOS.
http://en.wikipedia.org/wiki/Phil_Katz
I have ported ppmd to a nice pzip style utility and a pzlib style library. Find it at http://pzip.sf.net/
Speed is better than bzip2 and compression is top class, beaten only by 7zip and LZMA compresserors (which require much more speed and memory). Problem is that decompression is the same speed as the compression, unlike bzip2/gzip/zip where the decompression is much faster
The review quoted above is totally useless because 7zip for example uses a 32Kb dictionary. Given a 200Mb dictionary it really starts to perform quite well! I would not be suprised if 7zip didn't come out the winner there given a better compression parameter.
7-zip is the 16th most popular download on SourceForge (8544268 downloads so far), and it gets downloaded about 18000 times per day, so it must be going somewhere in terms of popularity.
Erlang.org: wow
I knew I had seen this story before but it wasn't here. This article was up on Digg three days ago--with only three Diggs to it's name (at the time of this writing), but it's front page news here? Interesting to say the least...
I predict that this Digg will become frontpage Slashdot news shortly. It was quite popular (914 diggs so far) and it's hit the three-day mark...
I know, this is all so OT, but it's no worse then whining about duplicate postings here...
Oh the irony here is just too much to take without laughing! My comment gets hammered with the REDUNDANT pummel when I point out that /. is being REDUNDANT in posting old Diggs? Man, it just doesn't get any better then this to make a point.
Moderators: did you catch the not-so-subtle play I made here by quoting ALL of my original message? In case you didn't, I'm beinging REDUNDANTLY sarcastic...
Enjoy!