Slashdot Mirror


Data Deduplication Comparative Review

snydeq writes "InfoWorld's Keith Schultz provides an in-depth comparative review of four data deduplication appliances to vet how well the technology stacks up against the rising glut of information in today's datacenters. 'Data deduplication is the process of analyzing blocks or segments of data on a storage medium and finding duplicate patterns. By removing the duplicate patterns and replacing them with much smaller placeholders, overall storage needs can be greatly reduced. This becomes very important when IT has to plan for backup and disaster recovery needs or when simply determining online storage requirements for the coming year,' Schultz writes. 'If admins can increase storage usage 20, 40, or 60 percent by removing duplicate data, that allows current storage investments to go that much further.' Under review are dedupe boxes from FalconStor, NetApp, and SpectraLogic."

11 of 195 comments (clear)

  1. Don't forget to weigh in the cost by leathered · · Score: 2, Informative

    The shiny new NetApp appliance that my PHB decided to blow the last of our budget on saves around 30% by using de-dupe, however we could have had 3 times conventional storage for the same cost.

    NetApp is neat and all but horribly overpriced.

    --
    For all intensive porpoises your a bunch of rediculous loosers
    1. Re:Don't forget to weigh in the cost by hardburn · · Score: 2, Informative

      Was it near the end of the fiscal year? Good department managers know that if they use up their full budget, then it's harder to argue for a budget cut next year. Managers will sometimes blow any excess funds at the end of the year on things like this for that very reason.

      --
      Not a typewriter
  2. Re:Wrong layer by KiloByte · · Score: 2, Informative

    It's not fully automatic, I assume? Since that would cause a major slowdown.

    For manual dedupes, btrfs can do that as well, and a part of vserver patchset (not related to the main functionality) includes a hack that works for most Unix filesystems.

    --
    The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
  3. Use ZFS. It offers dedupe, compression, etc. by jgreco · · Score: 3, Informative

    ZFS offers dedupe, and is even available in prepackaged NAS distributions such as Nexenta and OpenNAS. You too can have these great features, for much less than NetApp and friends.

  4. Re:Wrong layer by phantomcircuit · · Score: 4, Informative

    It is fully automatic and it's not that much of a slow down. The reduced IO might actual provide a performance boost.

  5. Re:Wrong layer by suutar · · Score: 5, Informative

    Actually, it is automatic. ZFS already assumes you have a multithreaded OS running on more cpu than you probably need (e.g. Solaris), so it's already doing checksums (up to and including SHA256) for each data block in the filesystem. Comparing checksums (and optionally entire datablocks) to determine what blocks are duplicates isn't that much extra work at that point, although for deduplication you probably want to use a beefier checksum than you might choose otherwise, so there is some increase in work. http://blogs.sun.com/bonwick/entry/zfs_dedup has some more information on it. Getting it onto my linux box, now.. there's the rub. userspace ZFS exists, but I've only seen one pointer to a patch for it that includes dedup, and I haven't heard any stability reports on it yet.

  6. Re:Um.. by cetialphav · · Score: 2, Informative

    AFAIK this is pretty much how every compression algorithm works. No need to give it a fancy name.

    The reason it has a different name is to distinguish this from a compressed file system. The blocks of data are not compressed in these systems. Imagine that you have a file system that stores lots of vmware images. In this system, there are lots of files that store the same information because the underlying data is OS system files and applications. Even if you compress each image, you will still have lots of blocks that have duplicate values.

    Deduplication says that the file system recognizes and eliminates duplicate blocks across the entire file system. If a given block has redundant data within it, that redundancy is not removed because the blocks themselves are not actually compressed. This is the difference between a compressed file system and a deduplicated file system. In fact, there is no reason that you could not combine both of these methods into a single system.

  7. Re:Use ZFS. It offers dedupe, compression, etc. by lisany · · Score: 2, Informative

    Except NexentaStor (3.0.3) has an OpenSolaris upstream (which has gone away, by the way) kernel bug that hanged our Nexenta test box. Not a real good first impression.

  8. Re:Wrong layer by hoytak · · Score: 2, Informative

    The latest stable version of zfs-fuse, 0.6.9, includes pool version 23 which has dedup support. Haven't tried it out yet, though.

    http://zfs-fuse.net/releases/0.6.9

    --
    Does having a witty signature really indicate normality?
  9. Re:De-Dupe on Linux? by suutar · · Score: 2, Informative

    There's a few. I've read there's a patchset for ZFS on FUSE that can do deduplication; there's also opendedup and lessfs. The problem is that none of these has been around long enough to be considered bulletproof yet, and for a filesystem whose job is to play fast and loose with file contents in the name of space savings, that's kinda worrisome.

  10. Re:Wrong layer by TheRaven64 · · Score: 2, Informative

    Nexenta is developed by the people behind the Illumous Foundation, who have created a 'spork' of OpenSolaris, which will continue to import code from each of the source dumps that Oracle has said they will do after each Solaris release, will fix bugs, and will replace the binary-only components of OpenSolaris with open ones.

    --
    I am TheRaven on Soylent News