Slashdot Mirror


New 25x Data Compression?

modapi writes "StorageMojo is reporting that a company at Storage Networking World in San Diego has made a startling claim of 25x data compression for digital data storage. A combination of de-duplication and calculating and storing only the changes between similar byte streams is apparently the key. Imagine storing a terabyte of data on a single disk, and it all runs on Linux." Obviously nothing concrete or released yet so take with the requisite grain of salt.

7 of 438 comments (clear)

  1. What kind of data? by Short+Circuit · · Score: 4, Insightful

    I can create a compression algorithm that compresses my 2GB of data to 1 bit. But it would be crap for any other datastream fed to it.

    1. Re:What kind of data? by ivan256 · · Score: 5, Insightful

      The article says:

      it can compress anything: email, databases, archives, mp3's, encrypted data or whatever weird data format your favorite program uses.

      In other words, they're full of crap.

    2. Re:What kind of data? by slimey_limey · · Score: 4, Insightful

      So it can compress its own output? Sweet....

    3. Re:What kind of data? by devjoe · · Score: 5, Insightful

      Well, there's an idea here that might hold some truth. Note that they are marketing it to data centers, people with LOTS and LOTS of files. Because people tend to have multiple copies of the same files, they can achieve great compression by eliminating the duplicate copies in the archive -- or likewise, any files with large sections that are the same among various files.

      20 email accounts subscribed to the same mailing list? Store the bodies of those e-mails only once, and you save a big chunk of disk space. A bunch of people downloaded the same MP3 file? We only need one copy in the archive. As long as there are multiple copies of the same data, it can compress any type of data.

      The difference here is that they are taking advantage of the redundancy of files across an entire filesystem (and a HUGE one), rather than the redundancies within an individual file. (I would assume they also do the latter type of compression with a conventional algorithm.) 25x compression seems extreme, but I am sure they can achieve some extra compression here.

  2. *sniff* by bryanp · · Score: 4, Insightful

    *sniff* *sniff* *sniff*

    I smell ... vapor.

    --
    "An unarmed man can only flee from evil, and evil is not overcome by fleeing from it." Col. Jeff Cooper
  3. Dubious by pilkul · · Score: 4, Insightful

    Stuff like new compression algorithms generally comes out in academic papers, which are then applied in practice by regular programmers. That's what happened with the Burrows-Wheeler algorithm at the core of bzip2. Some company concerned with mostly implementation rather than theory wouldn't come up with a revolutionary advance. The writeup is very vague, but it sounds to me like they're just using a simple LZ type algorithm, and they're only claiming 25x compression if the data is mostly the same already. Well duh.

  4. TFA by pcosta · · Score: 4, Insightful

    If everybody stopped laughing and actually RTFA, they aren't claiming 25x compression on anything. The algorithm is targeted at data backup, i.e. very large files and works by comparing incoming data patterns to patterns already stored. Looks like a modification of LZH that uses the compressed file as the pattern table. I'm not saying that it works or that is a breakthrough, but they are not claiming impossible lossless compression on anything. It might actually be interesting for the application it was designed for.