Slashdot Mirror


A Look at Data Compression

With the new year fast approaching many of us look to the unenviable task of backing up last years data to make room for more of the same. That being said, rojakpot has taken a look at some of the data compression programs available and has a few insights that may help when looking for the best fit. From the article: "The best compressor of the aggregated fileset was, unsurprisingly, WinRK. It saved over 54MB more than its nearest competitor - Squeez. But both Squeez and SBC Archiver did very well, compared to the other compressors. The worst compressors were gzip and WinZip. Both compressors failed to save even 200MB of space in the aggregated results."

10 of 252 comments (clear)

  1. WinRK is excellent by drsmack1 · · Score: 4, Interesting

    Just downloaded it and I find that it compresses significantly better than winrar when both are set to maximum. Decompress is quite slow. I use it to compress a small collection of utilities.

  2. Actually by Sterling+Christensen · · Score: 5, Interesting

    WinRK may have won only because he used the fast compression setting on all the compressors he tested. Results for default setting and best compression settings are TBA.

  3. Unix compressors by brejc8 · · Score: 5, Interesting

    I did a short review and benchmarking of unix compressors people might be interested in.

  4. Re:Why compress in the first place? by Ironsides · · Score: 4, Interesting

    In this day and age, when magnetic storage is like $0.50 to $0.75 per GIGABYTE, I just can't fathom why a responsible admin would risk the possible data corruption that could come with compression.

    Because when you are storing Petabytes of information it makes a difference in cost.

    Besides, all the problems you mention with data coruption can be solved by backing up the information more than once. Anyplace that places a high value on there info is going to have multiple backups in multiple places anyways. The most usefull application of compression is in archiving old customer records. Being mostly text, you can easily get above 50% compression ratios. Also, these are going to be backed up to tape (not disk). Being able to reduce the volume of tapes being stored by 50% can save a lot of money for a large organization.

    --
    Fly me to the moon Let me sing among those stars Let me see what spring is like On jupiter and mars
  5. Re:Why compress in the first place? by ArbitraryConstant · · Score: 4, Interesting

    "My concern with all the 'new' compression programs is that they, unlike Zip, haven't survived the test of time. I've recovered damaged zip archives in the past and they have come through mostly intact. I've used archive/compression like ARJ with options to be able to recover data even if there are multiple bad sectors on a harddrive or floppy disk. How many of the new compression programs have the tools available to adequately recover every possible byte of data?"

    The solution to this issue is popular on usenet, since it's common for large files to be damaged. There's a utility called par2 that allows recovery information to be sent, and it's extremely effective. It's format-neutral, but most large binaries are sent as multi-part RAR archives. par2 can handle just about any damage that occurs, up to and including missing files.

    Most of the time however, when it's simply someone downloading something it is only necessary to detect damage so they can download it again. All the formats I have experience with can detect damage, and it's common for MD5 and SHA1 sums to be sent separately anyway for security reasons.

    --
    I rarely criticize things I don't care about.
  6. Re:More time = More compression by Rich0 · · Score: 5, Interesting

    If you look at the methodology - all the results were obtained using the software set to the fastest mode - not the best compression mode.

    So, I would consider gzip the best performer by this criteria. After all, if I cared most about space savings I'd have picked the best-mode - not the fast-mode. All this articles suggests is that a few archivers are REALLY lousy for doing FAST compression.

    If my requirements were realtime compression (maybe for streaming multimedia) then I wouldn't be bothered with some mega-compression algorithm that takes 2 minutes per MB to pack the data.

    Might I suggest a better test? If interested in best compression, then run each program in a mode which optimizes purely for compression ratio. On the other hand, if interested in realtime compression then take each algorithm and tweak the parameters so that they all run in the same time (which is a realtively fast time), and then compare compression ratios.

    With the huge compression of multimedia files I'd also want the reviewers to state explicity that the compression was verified to be lossless. I've never heard of some of these proprietary apps, but if they're getting significant ratios out of .wav and .mp3 files I'd want to do a binary compare of the restored files to ensure they weren't just run through a lossy codec...

  7. small mistake by ltwally · · Score: 4, Interesting
    There is a small mistake on page 3 of the article, in the first table: WinZip no longer offers free upgrades. If you have a serial for an older version (1-9), that serial will only work on the older versions. You need a new serial for v10.0, and that serial will not work when v11.0 comes out.

    Since WinZip does not handle .7z, .ace or .rar files, it has lost much of its appeal for me. With my old serial no longer working, I now have absolutely no reason to use it. Now when I need a compressor for Windows I choose WinAce & 7-Zip. Between those two programs, I can de-/compress just about any format you're likely to encounter online.

    --



    /dev/random
  8. JPG compression by The+Famous+Druid · · Score: 5, Interesting

    It's interesting to note that Stuffit produces worthwhile compression of JPG images, something long thought to be impossible.
    I'd heard the makers of Stuffit were claiming this, but I was sceptical, it's good to see independant confirmation.

    --
    Quidquid Latine dictum sit, altum videtur (anything said in Latin sounds important)
  9. Re:Speed by moro_666 · · Score: 5, Interesting

    if you download a file over gprs and each megabyte costs you 3$, then saving 200 megabytes means saving 600$, which is a price for a low-end pc or almost a laptop.

    another case is if you only have 100 megabytes you can use and only a zzzxxxyyy archiver can compress it into the 100mb while gzip -9 leaves you with 102mb.

    so it really depends if you need it or not. sometimes you need it, mostly you don't.

    but bashing on the issue "like nobody ever needs it" is certainly wrong.

    --

    I'd tell you the chances of this story being a dupe, but you wouldn't like it.
  10. Lest We Forget - Philip W. Katz by BigFoot48 · · Score: 4, Interesting
    While we're discussing compression and PKZip, I thought a little reminder of who started it all, and who died before his time, may be in order.

    Phillip W. Katz, better known as Phil Katz (November 3, 1962-April 14, 2000), was a computer programmer best-known as the author of PKZIP, a program for compressing files which ran under the PC operating system DOS.

    http://en.wikipedia.org/wiki/Phil_Katz