Data Deduplication Comparative Review
snydeq writes "InfoWorld's Keith Schultz provides an in-depth comparative review of four data deduplication appliances to vet how well the technology stacks up against the rising glut of information in today's datacenters. 'Data deduplication is the process of analyzing blocks or segments of data on a storage medium and finding duplicate patterns. By removing the duplicate patterns and replacing them with much smaller placeholders, overall storage needs can be greatly reduced. This becomes very important when IT has to plan for backup and disaster recovery needs or when simply determining online storage requirements for the coming year,' Schultz writes. 'If admins can increase storage usage 20, 40, or 60 percent by removing duplicate data, that allows current storage investments to go that much further.' Under review are dedupe boxes from FalconStor, NetApp, and SpectraLogic."
The shiny new NetApp appliance that my PHB decided to blow the last of our budget on saves around 30% by using de-dupe, however we could have had 3 times conventional storage for the same cost.
NetApp is neat and all but horribly overpriced.
For all intensive porpoises your a bunch of rediculous loosers
It's not fully automatic, I assume? Since that would cause a major slowdown.
For manual dedupes, btrfs can do that as well, and a part of vserver patchset (not related to the main functionality) includes a hack that works for most Unix filesystems.
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
ZFS offers dedupe, and is even available in prepackaged NAS distributions such as Nexenta and OpenNAS. You too can have these great features, for much less than NetApp and friends.
It is fully automatic and it's not that much of a slow down. The reduced IO might actual provide a performance boost.
Actually, it is automatic. ZFS already assumes you have a multithreaded OS running on more cpu than you probably need (e.g. Solaris), so it's already doing checksums (up to and including SHA256) for each data block in the filesystem. Comparing checksums (and optionally entire datablocks) to determine what blocks are duplicates isn't that much extra work at that point, although for deduplication you probably want to use a beefier checksum than you might choose otherwise, so there is some increase in work. http://blogs.sun.com/bonwick/entry/zfs_dedup has some more information on it. Getting it onto my linux box, now.. there's the rub. userspace ZFS exists, but I've only seen one pointer to a patch for it that includes dedup, and I haven't heard any stability reports on it yet.
AFAIK this is pretty much how every compression algorithm works. No need to give it a fancy name.
The reason it has a different name is to distinguish this from a compressed file system. The blocks of data are not compressed in these systems. Imagine that you have a file system that stores lots of vmware images. In this system, there are lots of files that store the same information because the underlying data is OS system files and applications. Even if you compress each image, you will still have lots of blocks that have duplicate values.
Deduplication says that the file system recognizes and eliminates duplicate blocks across the entire file system. If a given block has redundant data within it, that redundancy is not removed because the blocks themselves are not actually compressed. This is the difference between a compressed file system and a deduplicated file system. In fact, there is no reason that you could not combine both of these methods into a single system.
Except NexentaStor (3.0.3) has an OpenSolaris upstream (which has gone away, by the way) kernel bug that hanged our Nexenta test box. Not a real good first impression.
The latest stable version of zfs-fuse, 0.6.9, includes pool version 23 which has dedup support. Haven't tried it out yet, though.
http://zfs-fuse.net/releases/0.6.9
Does having a witty signature really indicate normality?
There's a few. I've read there's a patchset for ZFS on FUSE that can do deduplication; there's also opendedup and lessfs. The problem is that none of these has been around long enough to be considered bulletproof yet, and for a filesystem whose job is to play fast and loose with file contents in the name of space savings, that's kinda worrisome.
Nexenta is developed by the people behind the Illumous Foundation, who have created a 'spork' of OpenSolaris, which will continue to import code from each of the source dumps that Oracle has said they will do after each Solaris release, will fix bugs, and will replace the binary-only components of OpenSolaris with open ones.
I am TheRaven on Soylent News