Slashdot Mirror


Best Format for Archive Distribution?

Meostro asks: "I'm looking for the best format to use to distribute arbitrary datasets. Tarballs compressed with gzip seem to be the most common thing out there, with zip coming in a close second. What advanced compression packages are the most widely recognized or available on the widest array of systems? Cross-platform compatibility is my most important goal, followed by compression ratio, decompression time, compression time and extra features (solid archives, support for multiple files, etc.). I'm starting up a free data site to provide test data for anything you can imagine: images for compression and format interpretation, text and audio for language processing, programming language examples to test parsing, and more. I hope this will grow to be a significant (read: multi-gigabyte) archive, so I want to start off right with my distribution format. Right now the plan is data.tar.bz2, but i'm open to anything that will give me better compression as long as it's available for Linux, Windows and Mac."

8 of 109 comments (clear)

  1. One other choice by gowen · · Score: 4, Insightful

    tar.gz and tar.bz2 are ok for small archives (20MB or so), but if you're dealing with large archives there's only one solution.

    RAR -- cross platform, built in integrity checking, and when used with Parity files, makes splitting and reassembling archives an absolute doddle.

    --
    Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
    1. Re:One other choice by harrkev · · Score: 5, Insightful

      One problem with this is that it is not a common format. For limited use (one-time distribution, short-term backup), this is OK. But what about long-term archives.

      If you want to de-compress this stuff in 10 or 20 years, will you be able to find software then that can handle it? Epspecially if the new cell processors somehow become popular, will Windows BOHICA 2025 edition be able to run 20-year-old binaries in order to read this thing?

      If the source is available, the job is easier in Linux, but if the format is not actively maintained, it may take a lot of work to modify the program to run whatever Linux looks like in 20 years.

      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
  2. Zip by isaac · · Score: 2, Insightful

    Zip.

    Zip is miles more common than anything else and compresses better (generally) than gzip. It's supported out of the box on almost every OS either natively or with bundled software. Even Solaris comes with unzip.

    Forget .tar.bz2 unless your audience is the type of people you'd expect to have cygwin or 3rd-party compression tools installed on their windows peecees.

    -Isaac

    --
    I am not a lawyer, and this is not legal advice. For Entertainment Purposes Only.
  3. It really depends... by node+3 · · Score: 2, Insightful

    Zip is probably the most commonly installed archiver across all systems.

    tar/tar.gz/tar.bz is supported out-of-the-box on Linux and Mac OS X, but can throw Windows users for a loop (easily remedied, but they aren't likely to have untar installed, and will find the file extension at least a bit odd). For some data tar.bz will result in noticeably smaller files, but at a greater cost of compression/decompression time.

    After that, you're not really going to find an archival format that's really common.

    In the end, it depends on what type of data you are archiving, and your target audience, but unless you have a specific reason otherwise, zip with an md5 checksum file is probably the solution of least effort (just make sure you back-up the archive--don't want to have a problem with the only copy you have!).

  4. Multi-format by sporktoast · · Score: 4, Insightful

    Have you considered going multi-format?

    Either increase the size of your storage to handle 2 or 3 of the more popular and widely available formats (zip, rar, tar.(gz|bz2)), or use compression-on-the-fly libraries (behind a cache to reduce server load). This would allow the recipient to decide, and end up supporting perhaps a larger population.

    --
    In a related story, the IRS has recently ruled that the cost of Windows upgrades can NOT be deducted as a gambling loss.
  5. Whatever you choose... by BinLadenMyHero · · Score: 3, Insightful

    ...avoid closed formats.
    Using Free software will help you archive your number one goal: that everyone can access the data, now and forever.

  6. For Longevity by 4of12 · · Score: 4, Insightful

    Pick any system for which the source code is available, eg .tar.bz2

    Anything else is gambling.

    I still gamble, but only that a C compiler will exist in the future.

    --
    "Provided by the management for your protection."
  7. you need real time web compression by mozkill · · Score: 2, Insightful

    if you are making a site which is for people to download stuff from then why not use real-time web compression?

    1. your web server can display the un-compressed version of the file on the server , 2. then the user starts the download from the browser, 3. the webserver compresses it on the fly and delivers it to the browser which unzips it when its done.

    this saves you time from having to zip and unzip all the time and it SERVES your original purpose.

    --

    -- Betting on the survival of the media industry is a serious risk. I advise investing elsewhere.