Data Deduplication Comparative Review

← Back to Stories (view on slashdot.org)

Data Deduplication Comparative Review

Posted by samzenpus on Wednesday September 15, 2010 @11:10AM from the a-little-order-please dept.

snydeq writes "InfoWorld's Keith Schultz provides an in-depth comparative review of four data deduplication appliances to vet how well the technology stacks up against the rising glut of information in today's datacenters. 'Data deduplication is the process of analyzing blocks or segments of data on a storage medium and finding duplicate patterns. By removing the duplicate patterns and replacing them with much smaller placeholders, overall storage needs can be greatly reduced. This becomes very important when IT has to plan for backup and disaster recovery needs or when simply determining online storage requirements for the coming year,' Schultz writes. 'If admins can increase storage usage 20, 40, or 60 percent by removing duplicate data, that allows current storage investments to go that much further.' Under review are dedupe boxes from FalconStor, NetApp, and SpectraLogic."

154 of 195 comments (clear)

Min score:

Reason:

Sort:

Second post by Anonymous Coward · 2010-09-15 11:13 · Score: 2, Funny

Same as the first.
Wrong layer by Hatta · 2010-09-15 11:15 · Score: 4, Insightful

Filesystems should be doing this.

--
Give me Classic Slashdot or give me death!
1. Re:Wrong layer by suutar · 2010-09-15 11:23 · Score: 1
  
  Actually, this feature is a recent addition to ZFS, and it's the main reason I'm interested in putting ZFS on my file server. I just have to get around to picking up another drive to serve as the backup first.
2. Re:Wrong layer by bersl2 · 2010-09-15 11:24 · Score: 2, Interesting
  
  No, deduplication has quite a bit of policy attached to it. Sometimes you want multiple independent copies of a file (well, maybe not in a data center, but why should the filesystem know that?). The filesystem should store the data it's told to; leave the deduplication to higher layers of a system.
3. Re:Wrong layer by PCM2 · 2010-09-15 11:31 · Score: 2, Interesting
  
  The filesystem should store the data it's told to; leave the deduplication to higher layers of a system.
  But if that's the kind of deduplication you're talking about, does it really make sense to try to do it at the block level, as these boxes seem to be doing? Seems like you'd want to analyze files or databases in a more intelligent fashion.
  
  --
  Breakfast served all day!
4. Re:Wrong layer by JWSmythe · 2010-09-15 11:40 · Score: 1
  
  Wouldn't a compressed filesystem already do this? They don't just get the compression from nowhere. They eliminate duplicates blocks and empty space. You don't just get compression from nowhere.
  Pick your platform. I know in both Linux and Windows, there have been compressed filesystems for quite some time.
  It doesn't really negate the need for good housekeeping routines, nor good programming. Do you really want 100 copies of record X, or would one suffice? Sadly, people tend to think that they have unlimited space, until the time comes when they've run out of space. "Oh shit, what do we do now!" is way too common an occurrence.
  
  --
  Serious? Seriousness is well above my pay grade.
5. Re:Wrong layer by KiloByte · 2010-09-15 11:40 · Score: 2, Informative
  
  It's not fully automatic, I assume? Since that would cause a major slowdown.
  For manual dedupes, btrfs can do that as well, and a part of vserver patchset (not related to the main functionality) includes a hack that works for most Unix filesystems.
  
  --
  The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
6. Re:Wrong layer by dougmc · 2010-09-15 11:45 · Score: 2, Interesting
  
  But if that's the kind of deduplication you're talking about, does it really make sense to try to do it at the block level, as these boxes seem to be doing? Seems like you'd want to analyze files or databases in a more intelligent fashion
  This isn't a new thing -- it's a tried and true backup strategy, and it's quite effective at making your backup tapes go further. It increases the complexity of the backup setup, but it's mostly transparent to the user beyond the saved space.
  As for doing it at the file level rather than the block level, yes, that makes sense, but the block level does too. Think of a massive database file where only a few rows in a table changed, or a massive log file that only had some stuff appended to the end.
7. Re:Wrong layer by phantomcircuit · 2010-09-15 11:50 · Score: 1
  
  Data de duplication is mostly being used for virtual servers. So no this is being done at the right level, the block level.
8. Re:Wrong layer by phantomcircuit · 2010-09-15 11:51 · Score: 4, Informative
  
  It is fully automatic and it's not that much of a slow down. The reduced IO might actual provide a performance boost.
9. Re:Wrong layer by suutar · 2010-09-15 11:52 · Score: 5, Informative
  
  Actually, it is automatic. ZFS already assumes you have a multithreaded OS running on more cpu than you probably need (e.g. Solaris), so it's already doing checksums (up to and including SHA256) for each data block in the filesystem. Comparing checksums (and optionally entire datablocks) to determine what blocks are duplicates isn't that much extra work at that point, although for deduplication you probably want to use a beefier checksum than you might choose otherwise, so there is some increase in work. http://blogs.sun.com/bonwick/entry/zfs_dedup has some more information on it. Getting it onto my linux box, now.. there's the rub. userspace ZFS exists, but I've only seen one pointer to a patch for it that includes dedup, and I haven't heard any stability reports on it yet.
10. Re:Wrong layer by icebike · 2010-09-15 12:11 · Score: 1
  
  True, compression does a lot of this.
  But De-duplication does that and goes one step further.
  Multiple copies of the same block of data (either entire files or portions of files) that match even if stored in separate directories can be replaced by a pointer to a single copy of that file or block.
  How many times would, say, the boilerplate at the bottom of a lawyer/doctor/accountant's file systems appear verbatim in every single document filed in every single directory?
  A proper system might allow you to have just one of these.
  
  --
  Sig Battery depleted. Reverting to safe mode.
11. Re:Wrong layer by dgatwood · 2010-09-15 12:15 · Score: 1
  
  Yes and no. Compression generally does involve reduction of duplication of information in one form or another, but does so at a finer level of granularity. With a compressed filesystem, you'll generally see compression of the data within a block, maybe across multiple blocks to some degree, but for the most part, you'd expect the lookup tables to be most efficient at compressing when they are employed on a per-file basis. The more data that shares a single compression table, the closer your input gets to being essentially random, and the lower your overall compression rate typically is.
  Deduplication, as I understand it (and I've read very little about this, so I could be misunderstanding) takes this a step further, taking advantage of the fact that multiple copies and/or multiple generations of a given file often exist in storage, and that when compressing two files results in very similar or identical compression tables, you can easily throw away one copy and express the other copy relative to the first.
  Although this is conceptually related to the way many compression schemes work (Huffman coding and LZW in particular), the mechanism for doing so must inherently be a lot smarter. Arbitrarily combining random or contiguous chunks of such large data sets would result in expansion, not compression. Thus, the deduplication algorithms use various techniques to determine how similar two files are before deciding to try to express one in terms of the other.
  
  --
  Check out my sci-fi/humor trilogy at PatriotsBooks.
12. Re:Wrong layer by Anonymous Coward · 2010-09-15 12:17 · Score: 1, Interesting
  
  This technology is not just deduplicated backups, this is deduplicated STORAGE. Big difference. Combine with a SAN that has thin provisioning and automatic on the fly tiering between cache, SSD, FC, and some ATA disks and you can have a decent cost effective setup. Oddly, the cost per GB is about the same but you buy less and get fast and slow disks and it also has a lot of integrated DR features. I'll know how it all works in a few months, we are about two months away from a rolling upgrade of several Clariion CX3-80's to CX4's. It looked really good in the lab ;)
  Although we will have to increase our MPLS bandwidth, we will also be getting rid of tapes. I know people claim tapes are cheap but even with the great backup software setup and as automated as possible, you still have people on the ground loading and unloading, you pay for the Iron Mountain or Recall trucks, and you are paying of dearly for the tape hardware. We have older StorageTek SL500's of various size. Those bitches are can cost like 250K with a support contract and you are pushing data over your network or best case over your fiber network every night. Need to do a recovery? Call Iron Mountain and wait a few hours for the tapes to arrive. blah..
  I guess ever situation is different but for us, getting rid of tape, retiring our older CX3-80's and migrating to a CX4 with more features was a sound decision over keeping our existing setup. The ROI is less then 2 years and the we can use the additional features immediately.
  Kind of unrelated but I'd like to get rid of FC and move to 10GB iSCSI or FCoE but I guess I'm happy with the intermediate steps for now.
13. Re:Wrong layer by hawguy · 2010-09-15 12:29 · Score: 1
  
  No, deduplication has quite a bit of policy attached to it. Sometimes you want multiple independent copies of a file (well, maybe not in a data center, but why should the filesystem know that?). The filesystem should store the data it's told to; leave the deduplication to higher layers of a system.
  Why do you want multiple independent copies of a file? If you're doing it because your disk storage system is so flakey that you aren't sure you can read the file, deduplication policy is not what you need -- you need a more reliable storage system and backups.
  Most disks have a fine line between throwing random unrecoverable read errors and failing completely, so there's little value in having multiple copies of the same file on the same physical disk. (and most storage systems will have automatically replaced the drive with a hot spare once it started throwing too many soft read errors)
14. Re:Wrong layer by icebike · 2010-09-15 12:29 · Score: 1
  
  Thus, the deduplication algorithms use various techniques to determine how similar two files are before deciding to try to express one in terms of the other.
  But I understood de-duplication to be not concerned with files at all. Simply blocks of data on the device.
  As such my might de-duplicate the boiler plate out of a couple hundred thousand word documents scattered across many different directories.
  Is that not the case? Are they not yet that sophisticated?
  
  --
  Sig Battery depleted. Reverting to safe mode.
15. Re:Wrong layer by hoggoth · 2010-09-15 12:51 · Score: 2, Interesting
  
  > Getting it onto my linux box, now.. there's the rub
  So don't put it on Linux. Set up a Solaris or Nexenta box. I just did it. I installed a Nexenta server with 1TB of mirrored, checksummed storage in 15 minutes. I wrote it up here http://petertheobald.blogspot.com/ - it was extremely easy. Now all of my computers back up to the Nexenta server. All of my media is on it. I have daily snapshots of everything at almost no cost in disk storage.
  
  --
  - For the complete works of Shakespeare: cat /dev/random (may take some time)
16. Re:Wrong layer by dgatwood · 2010-09-15 13:02 · Score: 3, Interesting
  
  I think it depends on which scheme you're talking about.
  Basic de-duplication techniques might focus only on blocks being identical. That would work for eliminating actual duplicated files, but would be nearly useless for eliminating portions of files unless those files happen to be block-structured themselves (e.g. two disk images that contain mostly the same files at mostly the same offsets).
  De-duplicating the boilerplate content in two Word documents, however, requires not only discovering that the content is the same, but also dealing with the fact that the content in question likely spans multiple blocks, and more to the point, dealing with the fact that the content will almost always span those blocks differently in different files. Thus, I would expect the better de-duplication schemes to treat files as glorified streams, and to de-duplicate stream fragments rather than operating at the block level. Block level de-duplication is at best a good start.
  What de-duplication should ideally not be concerned with (and I think this is what you are asking about) are the actual names of the files or where they came from. That information is a good starting point for rapidly de-duplicating the low hanging fruit (identical files, multiple versions of a single file, etc.), but that doesn't mean that the de-duplication software should necessarily limit itself to files with the same name or whatever.
  Does that answer the question?
  
  --
  Check out my sci-fi/humor trilogy at PatriotsBooks.
17. Re:Wrong layer by hoytak · 2010-09-15 13:09 · Score: 2, Informative
  
  The latest stable version of zfs-fuse, 0.6.9, includes pool version 23 which has dedup support. Haven't tried it out yet, though.
  http://zfs-fuse.net/releases/0.6.9
  
  --
  Does having a witty signature really indicate normality?
18. Re:Wrong layer by h4rr4r · 2010-09-15 13:12 · Score: 2, Insightful
  
  Open Solaris is dead, and there are kernel bugs in the latest version, so good luck with that. I looked at doing it at one time and due to fears about Opensolaris I stayed away. I consider myself lucky.
19. Re:Wrong layer by h4rr4r · 2010-09-15 13:30 · Score: 1
  
  I thought everything that used FUSE was slow as hell, is this not true?
20. Re:Wrong layer by suutar · 2010-09-15 13:46 · Score: 1
  
  Sweet, thanks for the pointer. I was also concerned about the death of OpenSolaris but it sounds like Nexenta may be just what I want.
21. Re:Wrong layer by JWSmythe · 2010-09-15 13:46 · Score: 1
  
  How many times would, say, the boilerplate at the bottom of a lawyer/doctor/accountant's file systems appear verbatim in every single document filed in every single directory?
  I won't argue about that. I'm still shocked to see the bad housekeeping practices on various servers I've worked on. No, really, you don't need site_old site_back, site_backup, site_backup_1988. and site_backup_y2k. Has anyone even considered getting rid of those? Nope. They're kept "just in case". What "just in case"? Just in case you want to roll back to a 20 year old copy of your data?
  
  --
  Serious? Seriousness is well above my pay grade.
22. Re:Wrong layer by drsmithy · 2010-09-15 15:33 · Score: 2, Insightful
  
  Sweet, thanks for the pointer. I was also concerned about the death of OpenSolaris but it sounds like Nexenta may be just what I want.
  Nexenta is built off Open Solaris and is, therefore, also dead - though it may take longer for the thrashing to stop.
23. Re:Wrong layer by Bigjeff5 · 2010-09-15 15:37 · Score: 1
  
  Google luck on finding solutions to your problems that are based on logic and rational thinking, I doubt you can pull it off judging by your statements so far.
  I dunno, I found it pretty easy. I got some interesting results too:
  Critical Thinking - HowTo.Lifehack
  Virgo free weekly horoscope
  Actually that's pretty funny.
  Maybe you're right, maybe it is hard to google luck on finding solutions to your problems that are based on logic and rational thinking.
  
  --
  Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
24. Re:Wrong layer by drsmithy · 2010-09-15 15:41 · Score: 3, Insightful
  
  Filesystems should be doing this.
  No, block devices should be doing this. Then you get the benefits regardless of which filesystem you want to layer on top.
25. Re:Wrong layer by drsmithy · 2010-09-15 15:44 · Score: 2, Interesting
  
  Wouldn't a compressed filesystem already do this? They don't just get the compression from nowhere. They eliminate duplicates blocks and empty space. You don't just get compression from nowhere.
  No, because compression is limited to a single dataset. Deduplication can act across multiple datasets (assuming they're all on the same underlying block device).
  Consider an example with 4 identical files of 10MB in 4 different locations on a drive, that cat be compressed at 50%.
  "Logical" space used is 40MB.
  Using compression, they will fit into 20MB.
  Using dedupe, they will fit somewhere in between 5MB and 10MB.
  Using dedupe and compression, they will fit into ~5MB (probably a bit less).
  It doesn't really negate the need for good housekeeping routines, nor good programming. Do you really want 100 copies of record X, or would one suffice?
  Far better to let the computer do the heavy lifting, than trying to impose partial order on an inherently chaotic situation.
  Not to mention that the three textbook scenarios where dedupe really shines are backups, email and virtual machines, none of which can really be helped by "better housekeeping".
26. Re:Wrong layer by drsmithy · 2010-09-15 15:48 · Score: 1
  
  But I understood de-duplication to be not concerned with files at all. Simply blocks of data on the device.
  It depends.
  Simplistic dedupe schemes only operate at the file level. More advanced schemes operate at the block/cluster level.
27. Re:Wrong layer by drsmithy · 2010-09-15 16:36 · Score: 1
  
  Actually, this feature is a recent addition to ZFS, and it's the main reason I'm interested in putting ZFS on my file server.
  You'll probably be disappointed. Dedupe savings for the kind of stuff you'd typically find on a home file server are miniscule.
28. Re:Wrong layer by hoytak · 2010-09-15 17:19 · Score: 1
  
  Well, I don't think the userspace file system layer is the main slowdown on my file server box (using old hardware + a slower ethernet card; for a background backup system, it works), so I'm not speaking from experience here. I've heard the general idea is a 30-60 % slowdown is standard, depending on the operation.
  
  --
  Does having a witty signature really indicate normality?
29. Re:Wrong layer by stenWolf · 2010-09-15 19:21 · Score: 1
  
  FUSE is bad on so many levels - basically it doesn't work, crashes repeatedly and generally unsuited as hell for an FS that wasn't created for it. tried several instances of zfs FUSE over the years - never got it to work anywhere near as well as on native solaris. That said, there are other alternatives openindiana is based on the opensolaris codebase, created by whatever remains of the community after oracle killed(? in some press releases oracle tech strategists claimed opensolaris is transformed to a new solaris express...) the project. and slashdot already reported on native zfs for linux - http://linux.slashdot.org/story/10/08/27/2259253/Native-ZFS-Is-Coming-To-Linux-Next-Month
30. Re:Wrong layer by dgatwood · 2010-09-15 19:42 · Score: 1
  
  Well, ultimately that's the way most of your basic compression algorithms work. It's not computationally infeasible. It's pretty trivial to perform a series of checksums on various parts of a file, and with a little knowledge of the file format, it should be possible to improve the search patterns significantly. The hard part ends up being that the storage for all the potentially interesting checksums might well exceed the space savings, depending on the nature of the data set..
  Either way, without that level of introspection, I would expect a block-level de-duplication algorithm to nod do much better than a very basic whole-file de-duplication algorithm except when you're working with very specialized and unusual data sets. *shrugs* Maybe I'm just too cynical.
  
  --
  Check out my sci-fi/humor trilogy at PatriotsBooks.
31. Re:Wrong layer by badkarmadayaccount · 2010-09-15 20:05 · Score: 1
  
  *cough*VMS*cough*Files-11*cough*
  
  --
  I know tobacco is bad for you, so I smoke weed with crack.
32. Re:Wrong layer by Nefarious+Wheel · 2010-09-15 20:17 · Score: 1
  
  This is just compression on a larger scale.
  
  --
  Do not mock my vision of impractical footwear
33. Re:Wrong layer by noidentity · 2010-09-15 21:31 · Score: 1
  
  Filesystems should be doing this.
  
  No, block devices should be doing this. Then you get the benefits regardless of which filesystem you want to layer on top.
  
  No, filesystems should be doing this. Then you get the benefits regardless of which hardware you want to put on the botton.
34. Re:Wrong layer by smallfries · 2010-09-15 22:04 · Score: 1
  
  Filesystems should be doing this.
  No, block devices should be doing this. Then you get the benefits regardless of which filesystem you want to layer on top.
  No, filesystems should be doing this otherwise you are introducing a new type of component into the system: a block device with a variable number of blocks depending on its contents. Good luck not breaking the semantics of file operations on any application running above it.
  Oh that's strange. I've edited a block (ie done a write into the middle of a file) and it failed because the device is now full...
  
  --
  Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
35. Re:Wrong layer by TheRaven64 · 2010-09-15 22:42 · Score: 2, Informative
  
  Nexenta is developed by the people behind the Illumous Foundation, who have created a 'spork' of OpenSolaris, which will continue to import code from each of the source dumps that Oracle has said they will do after each Solaris release, will fix bugs, and will replace the binary-only components of OpenSolaris with open ones.
  
  --
  I am TheRaven on Soylent News
36. Re:Wrong layer by Mysticalfruit · 2010-09-15 22:48 · Score: 1
  
  I'm currently using zfs in production on Solaris (not open solaris) and I'm happy with it.
  
  However, it's currently DOA for the foreseeable future unless there's a viable stable port to linux. Yeah, there's the KQ port (possibly) coming, and we already have the posix layerless llnl port for Lustre.
  
  The fly in the ointment for this all is btrfs. My feeling is that by the time we get a mature not slow as shit port of zfs to linux, btrfs will have matured to a point that it'll have all the features of zfs I like (which it at this point already has) such as snapshots, clones, etc, plus deduping. It'll be something that's in active development with RedHat and will be getting the shit beat out of in in Fedora so by the time I'm ready to actually put it into production, it'll be mature.
  
  --
  Yes Francis, the world has gone crazy.
37. Re:Wrong layer by TheRaven64 · 2010-09-15 22:52 · Score: 1
  
  Which is actually how ZFS works - it's done at the block layer, and you can store any FS on top of it. Usually you'll use ZPL, which delegates layout entirely to the low layers and just provides POSIX semantics on top, but you can also use ZVOL and use another FS inside that container.
  
  --
  I am TheRaven on Soylent News
38. Re:Wrong layer by stenWolf · 2010-09-16 00:00 · Score: 1
  
  I beg to differ. btrfs is mainly oracle driven. Most of the development is by oracle payed engineers - and oracle bought sun with zfs already mature and in place - what's their incentive to continue pouring money to compete with their own product?
39. Re:Wrong layer by drsmithy · 2010-09-16 02:32 · Score: 1
  
  No, filesystems should be doing this. Then you get the benefits regardless of which hardware you want to put on the botton.
  If you make it filesystem-specific, the benefits can only be realised by systems that can use that filesystem. Further, it means the data under consideration for dedupe must be within that filesystem.
  When you're talking about centralised storage - which we are - you want the dedupe happening low in the storage stack so the benefits are spread across the widest range of systems and datasets.
40. Re:Wrong layer by drsmithy · 2010-09-16 02:44 · Score: 1
  
  The fly in the ointment for this all is btrfs. My feeling is that by the time we get a mature not slow as shit port of zfs to linux, btrfs will have matured to a point that it'll have all the features of zfs I like (which it at this point already has) such as snapshots, clones, etc, plus deduping.
  So far as I know, btrfs currently doesn't support parity-based RAID schemes and dedupe. I'd estimate it's 3-5 years away from being "production ready".
41. Re:Wrong layer by h4rr4r · 2010-09-16 04:43 · Score: 1
  
  I pay redhat to fix those, so why bother fixing them in an OS I was lucky enough to avoid?
42. Re:Wrong layer by StikyPad · 2010-09-16 09:20 · Score: 2, Funny
  
  Sounds like what we need is a giant table of all possible byte values up to 2^n length, then we can just provide the index to this master table instead of the data itself. I call this proposal the storage-storage tradeoff where, in exchange for requiring large amounts of storage, we require even more storage. I'll even throw in the extra time requirements for free.
  
  --
  https://www.eff.org/https-everywhere
43. Re:Wrong layer by maraist · 2010-09-16 16:26 · Score: 1
  
  This thread seems to be getting too defocused from reality.
  
  Here's the rub.
  
  Checksumming == good. All else being equal, we should have more of it.
  But checksumming is expensive (adds latency to your write).
  
  So once you have it, might as well use it.
  
  Background thread can compare checksums of blocks as starting points to identifying identical blocks (since checksum collisions are more than possible, they're only a matter of time - I see colliding MD5 sums all the time in BackupPC - you can tell because they append a semi-colon + sequence ID to the file-name to disambiguate).
  
  As some thread posters have listed - file-names prevent entire files from being block-shared.. Rubbish. File-names in Unix file systems have never been coupled with file-metadata. Files are identified by inode numbers, not file-names.. file-names are meta-data stored in directory files (which is why hard links are possible). Now unless you have noatime in your mount options, replicating inode descriptors will be nearly impossible, but that should only be a small fraction of your disk blocks anyway.
  
  Historically, the main way you'd leverage shared blocks is through snapshot images - which all use copy-on-write. LVM and netapp and I'm sure dozens of other vendors supply this because it's trivial to do.
  
  All this is really likely doing is extending the existing SNAPSHOT copy-on-write logic to merge blocks from different file-systems (which snapshots technically are) AND from within the same file-system. And most likely done through block-level checksum comparisons. Though since ext and many other file-systems don't naturally support check-sums at the block level, I doubt this is leveraging file-system level operations.
  
  --
  -Michael
44. Re:Wrong layer by maraist · 2010-09-16 17:11 · Score: 1
  
  Haha. I call small-minded skizzies on your sir!
  
  Imagine a specialized net-appliance (screw netapp). It has 32 Gig of RAM and a 512Gig high-speed random-access SSD (where read speed is more important).
  
  Split the 512Gig into two 256Gig portions.
  
  The first portion contains 4 bytes of the MD5 sum of each 512B block (represents up to 32TB of block storage).
  
  Every 2048B block being back-ground scanned for deduping does an SSD lookup against the 256G SSD hash-map which is open-chained and points back to existing 2048B blocks on disk. This lets us efficiently cross-link (reverse copy-on-write). I'd prefer 512B block boundries, but most file systems use 2048B blocks (or large) and HD's are starting to move to this to increase ECC efficiency. Plus it just reduces the overhead.
  
  So that's just a minor optimization of whatever people have already been doing in software.. bla bla. Boring stuff, right.
  
  For those blocks that DIDN'T match...
  
  We do a modified version of zlib compression (which only stores 32KB worth of back-data). We extend this to store 4 Gig worth of code points (assuming a 4Byte identity prefix match and 4 Byte SSD disk block pointer). Each reference is a 256Byte block which thus supports up an 8 bite length pointer.
  
  So now as you scan through the 2048 byte blocks being stored on disk, you do a hash-lookup of every consecutive 7 bytes. You hash the first 4 bytes and lookup in RAM. If matched, you lookup in SSD the remaining byte-string and see how many bytes of match. If more than 7, you store a disk-pos + length vector. Saving you at least 1 bytes (1 byte magic, 4 byte pointer, 1 byte length), and possibly the entire 256 bytes. If you can compress to one of 50% or 75%, you store at 1024 bytes or 512bytes.. As soon as you reach either of these two boundries, you stop compressing. Though this does assume you're not using 2048B boundry HDs. You then store into one of 2 special areas on disk that are 1/2 and 1/4 block compressed.
  
  So this solves highly compressible but single byte-offset situations.. e.g. I copy sections of source-code (at least 512 bytes) and paste them either into the same file or some other file.
  
  So long as you don't pull out the HD, the ref-map in SSD matches the previous runs on disk, so you don't have to do random disk-seeks to reconstruct the blocks. So now reading highly compressed blocks not only reduces the number of bytes read from physical disk, it increases the ratio of SSD to HD reads. :)
  
  I'm only joking of course. But not really.. You hiring netapp??
  
  --
  -Michael
45. Re:Wrong layer by maraist · 2010-09-16 17:15 · Score: 1
  
  No, NFS should be doing this, that way you aren't tied to specific file-system or disk systems limitations. ;)
  
  --
  -Michael
46. Re:Wrong layer by maraist · 2010-09-16 17:19 · Score: 1
  
  Uhh, what does LVM do then? Oh yeah, you OVER-ALLOCATE.. My bad. And yes, with LVM-snapshots, you very well can crash the system if free space is maxed out. I don't recall, but I believe it deletes the snapshot, but since that's a mounted file-system, it's just as bad.
  
  There's also commercial NAS hardware which works like this. They have little green, yellow and red lights next to each physical disk.. Supposedly you should swap out a yellow or red disk with a larger one to avoid either automatically reducing RAID redundency (e.g. 2 disk redundancy reduces down to 1 disk redundancy), and then ultimately producing seek errors when no remaining physical blocks can map to a requested virtual block. I forget the name of the vendor in question, but it was far cheaper than a netapp - but really meant to sit next to your workstation (obviously).
  
  It's not a new concept at all.
  
  --
  -Michael
47. Re:Wrong layer by smallfries · 2010-09-16 21:08 · Score: 1
  
  So you argue that it can be done that way, because some comercial system does it that way: AND IT SUFFERS FROM THE PROBLEM THAT I SAID MEANS THAT YOU *SHOULD* DO IT ANOTHER WAY.
  I'm sorry, but I don't think I have the mad explanation skills necessary to put this in a way that you can understand.
  
  --
  Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
48. Re:Wrong layer by Mysticalfruit · 2010-09-16 22:45 · Score: 1
  
  Good point. They did just stick a fork in OpenSolaris. I'd be psyched if they just re-licensed ZFS as open source.
  
  Who knows... at the same time, Oracle has relationships with vendors like EMC and NetApp. I can't imagine would be happy with the idea of pumpkin heads like me putting together very scalable storage solutions out of linux servers with a huge pile of JBOD with cheap fast disks.
  
  We recently built a ~40TB Solaris ZFS iscsi/NFS/Samba file server that's really kickass fast and scalable. The whole thing to build was under 10k.
  
  The other side of the coin is that ZFS brings to the table an easy way for storage companies to go after the low end without a ton of R&D on their end.
  
  --
  Yes Francis, the world has gone crazy.
49. Re:Wrong layer by maraist · 2010-09-17 00:31 · Score: 1
  
  Sorry, but we're either in disagreement, or you're not understanding.
  
  When a Unix partition gets to the last xth percent remaining space it locks the partition down to all but root.
  
  When a partition gets to 15% free, all sorts of monitor / alarm bells SHOULD be ringing (if you have a properly configured system).
  
  If you get to the 50% mark, then you need to start planning ahead for an upgrade.
  
  By over-allocating, you can do this at a group level instead of on a per-partition level.
  
  Thus keep all partitions 15% full (without wasting 85% of disk-space - due to over-allocation).
  
  Running out of disk space is running out of disk space - whether it's at the ext layer or LVM layer or NAS layer.. You should be monitoring and planning ahead no matter what layer it's in.
  
  The fact that editing a pre-existing block CAN cause a failure (because of sys-admin-neglegance) is NOT a fault of the application or technology. Especially since there is no difference between it failing due to pre-existing disk-allocation v.s. appending to a file (/var/log/messages). It's the same as saying 'well, my application polled the disk-free in second one, then assumed it was safe to allocate an extra 4Meg.. But then when it came time to do so, there was no free space. waaah).
  
  --
  -Michael
50. Re:Wrong layer by smallfries · 2010-09-17 02:11 · Score: 1
  
  Everything that you've pointed out is true, so we have no disagreement there. But I suspect that you are not understanding the point that I have made.
  Any file-system makes a set of semantic guarantees / constraints on any application that uses it. It is the responsibility of the application to make sure that what it does matches these constraints, while relying on these guantees. Same as any API.
  
  It's the same as saying 'well, my application polled the disk-free in second one, then assumed it was safe to allocate an extra 4Meg.. But then when it came time to do so, there was no free space. waaah).
  No. It is very different. It is taking a situation that exists in every application out there (modifying the contents of a file) and allowing there to be a failure where previously there was none. Every application assumes that changing bytes it has already allocated on the disk is not a situation that can fail due to disk space running out. Several application depend upon this behaviour.
  
  --
  Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
51. Re:Wrong layer by drsmithy · 2010-09-18 08:46 · Score: 1
  
  Oh that's strange. I've edited a block (ie done a write into the middle of a file) and it failed because the device is now full...
  The real problem here is that your disk is full, not that it's being deduped.
  However, if you knew your systems monitoring was so tragically bad (/nonexistant), you'd simply have a space guarantee on the volume (or the file) - any system capable of overallocating should support that.
52. Re:Wrong layer by drsmithy · 2010-09-18 11:53 · Score: 1
  
  I hope you see now how a block layer dedup scheme is quite dumb in this scenario, which can thoroughly be properly addressed at the file system layer since files are byte-streams.
  The question is how common is how frequently that happens compared to the ones where file-level dedupe is basically worthless - virtual machines, email attachments, databases, in fact basically anything where duplicated data may appear within files (or files masquerading as block devices) but not as discrete files.
  I am quite confident those scenarios are vastly more common than one of inserting bytes randomly into the middle of existing files.
53. Re:Wrong layer by drsmithy · 2010-09-18 12:33 · Score: 1
  
  Nexenta is developed by the people behind the Illumous Foundation, who have created a 'spork' of OpenSolaris, which will continue to import code from each of the source dumps that Oracle has said they will do after each Solaris release, will fix bugs, and will replace the binary-only components of OpenSolaris with open ones.
  Like I said, it might take a bit longer for the body to stop thrashing.
  Basing your whole product almost entirely on the generosity of another company - especially one not know for being generous - is nuts, and is basically what Nexenta are doing. The only people crazier are the ones buying storage solutions from them.
Don't forget to weigh in the cost by leathered · 2010-09-15 11:19 · Score: 2, Informative

The shiny new NetApp appliance that my PHB decided to blow the last of our budget on saves around 30% by using de-dupe, however we could have had 3 times conventional storage for the same cost.
NetApp is neat and all but horribly overpriced.

--
For all intensive porpoises your a bunch of rediculous loosers
1. Re:Don't forget to weigh in the cost by ischorr · 2010-09-15 11:28 · Score: 1
  
  I assume they didn't spend the money only for dedupe? That box has a whole lot of features.
2. Re:Don't forget to weigh in the cost by hardburn · 2010-09-15 11:35 · Score: 2, Informative
  
  Was it near the end of the fiscal year? Good department managers know that if they use up their full budget, then it's harder to argue for a budget cut next year. Managers will sometimes blow any excess funds at the end of the year on things like this for that very reason.
  
  --
  Not a typewriter
3. Re:Don't forget to weigh in the cost by alen · 2010-09-15 11:53 · Score: 1
  
  No shit
  We have deduce and plain old tape. 20 lot-4 tapes cost $700. That's 20 - 60 terabytes depending on compression.
  We also pay$20,000 a year in support for a dedupe software app. Plus the disk, servers and power to keep it running and you have to buy at least 2 since if your os crashes then your data is gone
  Cheap disk my ass
  The tape backup was a little pricey at first but the tapes hold so much and are so fast that we hardly buy any more tape. Like we used to blow $25,000 a year or more for dlt tape
4. Re:Don't forget to weigh in the cost by hawguy · 2010-09-15 12:39 · Score: 1
  
  I don't think it takes a NetApp sales rep to recognize the value of a reliable storage system. I'm sure he would say the same of EMC - it's expensive but worth every penny when you've got hundreds (or thousands) of people relying on your storage.
  If you're in a 10 person office, you can get by with less, but when you've got a large corporate environment, you'll recognize the advantage of paying for Netapp or EMC.
5. Re:Don't forget to weigh in the cost by h4rr4r · 2010-09-15 13:08 · Score: 3, Insightful
  
  More disk is still so much cheaper it really cannot be justified on that front. More disks also mean more IOPS, so reducing sinning platters can be a bad thing.
  There are some reasons to go for it, but even with thousands of clients it may or may not be suitable for what you are doing.
6. Re:Don't forget to weigh in the cost by mlts · 2010-09-15 14:01 · Score: 1
  
  The Netapp box does a lot more than deduping:
  1: The newer models come with 512GB-1TB of SSD, and automatically place data either on the SSD, the FC drives, or the SATA platters depending on how much it is used. If the chunk of data is used all the time, it sits on the SSD. This helps a lot with the bottleneck of a lot of machines needing to access the same data block with deduplication. This is different from other disk solutions, as the NetApp chooses the "tier" of disk for you. However, a lot of servers don't put out the throughput requiring someone to select between T1 and T2 disks, so for this, the NetApp is fine. Carve your LUNs out, carry on.
  2: NetApp's WAFL system has been around saving butts for a long time. People don't realize this until you walk in and see that a junior admin blew away /net, and is looking at you with the deer in the headlight glance. A quick move from a snapshot directory, and nobody is the wiser.
  3: You can put two NetApp SAN clusters in two geographically disparate locations and have them send changes via the WAN. This way, DR can be automated and made quite fast.
  4: SANs are a lot more than just a bunch of disks shoved in a rack. They tend to be very intelligent of where data is placed, and on the backend, at least use RAID 6, where more than two drives have to fail at the same time for data to get lost. Almost all have multiple controllers, so if one path via the network fabric gets stomped on, machines are still able to access their LUN via the second one.
  This isn't to say the NetApp is for everyone. If someone just needs a bunch of disk and no other features, a BackBlaze pod or a tower full of eSATA JBOD drives may be good enough. However, if one has a number of machines and is doing large amounts of random I/O, having an enterprise grade SAN goes without saying.
7. Re:Don't forget to weigh in the cost by zooblethorpe · 2010-09-15 14:31 · Score: 2, Funny
  
  ...so reducing sinning platters can be a bad thing.
  Satan, is that you?
  Cheers,
  
  --
  "What in the name of Fats Waller is that?"
  "A four-foot prune."
8. Re:Don't forget to weigh in the cost by drsmithy · 2010-09-15 15:52 · Score: 1
  
  The shiny new NetApp appliance that my PHB decided to blow the last of our budget on saves around 30% by using de-dupe, however we could have had 3 times conventional storage for the same cost.
  Where are you going to get three times as much storage for the same cost (well, actually it'd need to be a lot less to pay for all the additional physical and logical infrastructure) that has redundant controllers, FC, iSCSI, NFS, SMB, no-impact snapshotting, dedupe, replication and 24x7x4 support ?
9. Re:Don't forget to weigh in the cost by Krahar · 2010-09-15 15:54 · Score: 3, Insightful
  
  Sinning platters cause original spin.
10. Re:Don't forget to weigh in the cost by Trepidity · 2010-09-15 15:55 · Score: 1
  
  Are they really that superior to the Sun storage products (the ones Sun invented ZFS for) to be worth the big multiple in price? I mean, Sun isn't stuff-you-put-together-at-Frys prices either, but it's a lot cheaper than EMC or NetApp.
  
  --
  10 PRINT CHR$(205.5+RND(1)); : GOTO 10
11. Re:Don't forget to weigh in the cost by drsmithy · 2010-09-15 16:01 · Score: 1
  
  More disk is still so much cheaper it really cannot be justified on that front.
  Sure it can, easily.
  If your primary concern is up-front cost, you shouldn't be buying equipment in an enterprise environment. The up-front cost is the _least_ of your concerns.
12. Re:Don't forget to weigh in the cost by hawguy · 2010-09-15 17:04 · Score: 1
  
  More disks don't neccessarily mean more IOPS, a better storage system means better IOPS. If all you're looking for is raw IOPS, I'm sure you can build a system from commodity components that outperforms a reasonably sized Netapp or EMC filer. But you wouldn't be able to scale that system to 100TB or more.
  And I wouldn't trust that home-brew system to run my company's database and other critical servers that have to run 24x7x365.
13. Re:Don't forget to weigh in the cost by drsmithy · 2010-09-15 17:32 · Score: 1
  
  Are they really that superior to the Sun storage products (the ones Sun invented ZFS for) to be worth the big multiple in price? I mean, Sun isn't stuff-you-put-together-at-Frys prices either, but it's a lot cheaper than EMC or NetApp.
  The issues that will give most people pause are those of maturity and future. Sun's solution hasn't been around for very long, and some features like FCP target and dedupe are *very* new. There's also something of a question mark over where it's going in the future with Oracle's acquisition of Sun.
  The "big multiple in price" is really accurate either. I quickly priced out a rough equivalent to our 3140 in a 7410 with 4x20-spindle shelves, 2 "read accelerators", some 10GbE cards and 4Gb HBAs and it came out at over $250k. Obviously that's before discounting, and it has more/faster CPUs, but it's certainly in the same _ballpark_ as the ~$175k we paid.
14. Re:Don't forget to weigh in the cost by drsmithy · 2010-09-15 17:36 · Score: 1
  
  The "big multiple in price" is really accurate either. I quickly priced out a rough equivalent to our 3140 in a 7410 with 4x20-spindle shelves, 2 "read accelerators", some 10GbE cards and 4Gb HBAs and it came out at over $250k. Obviously that's before discounting, and it has more/faster CPUs, but it's certainly in the same _ballpark_ as the ~$175k we paid.
  I also just realised that $250k is probably not including support, which is likely to be knocking on the door of 6 figures for 3 years of 24x7x4 support. Half of the price of our NetApp was the support contract.
15. Re:Don't forget to weigh in the cost by TheRaven64 · 2010-09-15 23:07 · Score: 2, Insightful
  
  No, good department managers don't know that. Department managers in companies with bad senior management know that. Companies with competent senior management are willing to increase the budgets for departments that have shown that they are fiscally responsible, and cut the budgets or fire the department heads of others.
  
  --
  I am TheRaven on Soylent News
16. Re:Don't forget to weigh in the cost by maraist · 2010-09-16 17:48 · Score: 1
  
  HDFS disk size is meaningless herein.
  
  Likewise, with mysql-INNODB, I can utilize 25 4TB eSATA externally managed machines (assuming 4 2TB disks RAID-1'd together), each mapped to a 4TB block-device, which INNODB treats efficiently.. (Or if using ext4, I could use LVM to map those eSATA devices together for a general purpose disk).
  
  I could even have LVM stripe those remote volumes to get better IOPs.
  
  At $700 a base machine (gray boxes) (including disks), that's $17,500.
  
  If I didn't care about random-write-speed, then I could go with RAID5. Put 3 disks in each machine and reduce costs to $15,000.
  
  Or I could go with RAID5 on a hot-swappable 16-disk $1,000 RAID controller and reduce it down to 4 machines. Bringing the price down to $11,600.
  
  We're assuming either with HDFS or mysql or any other app, that you build redundancy on TOP of the applications. Which is the ONLY smart thing to do with enterprise grade applications.
  
  Failure of a disk is ASSUMED.. Meaning your $100,000 netapp WILL fail one day.. You are an IDIOT if you don't believe this.. Sure it may take 10 years. But what happens then? Replace it every 5 years? How about every 3? That's not a capital cost, that's a variable cost. Sure your data may be worth it. But as a business, can you attain the same degree of reliability cheaper? HELL YEAH. And the 3 year replacement cycle doesn't handle a power surge which blows the hardware (say a rogue UPS). Sure put redundant power supplies on isolated UPSes - ok, I'm unplugging an ethernet cable and accidentally cause an electrical surge which blows the network controller.. OPS, data is safe, no access! 2 days down-time!!
  
  The point is RAID was invented to solve a class of problems in a cost effective way. It doesn't solve every problem, and I completely agree that solutions LIKE netapp are GREAT when you want to start medium and leave room to grow large. When you want to consolidate and dynamically repartition disk space AND spindles (e.g. vmware solutions). When you want lower maintenance costs (avoiding having to rebuild lots of regularly failing gray-boxes, constantly swapping out one of hundreds of $100 2TB disks).. BUT I submit to you that on scale, this is cheap / slave-labor. You hire high-school students to replicate base OS hard drives (with a replicating station), you buy 25% overstock of base hardware so you have fast (30 minute) build new machine / deploy into cluster, environments. You only need accuracy, you don't need intelligence. Yes, a $30k / year salary is more than the extra $30k you spend on the netapp.. But you'd have to buy dozens of netapps to really scale an enterprise solution.. And that same $30k can usually handle it.
  
  So there is absolutely a scale region where a netapp makes sense.. But it is NOT the high end - which is what they'd like you to believe. And I submit that there are application-level redundancy solutions which are more reliable (though at the cost of semi-static configurations - which a netapp type system does provide value-add).
  
  --
  -Michael
Not enough products by ischorr · 2010-09-15 11:24 · Score: 2, Interesting

Odd that if they reviewed this class of products they didn't review the most common deduping NAS/SAN applicance - the EMC NS-series (particularly NS20).
1. Re:Not enough products by drdrgivemethenews · 2010-09-15 11:45 · Score: 1
  
  I found it odd too, though they seem to be reviewing boxes that do dedup on live data, as opposed to backup streams. Appliances like the NS-series claim dedup percentages of 95%+, but they accomplish this seeming miracle when slowly changing datasets are backed up over and over (even differential backup systems usually do a full backup fairly regularly).
2. Re:Not enough products by georgewilliamherbert · 2010-09-15 11:47 · Score: 1
  
  Thirded. Data Domain (now part of EMC) really started the commercial use of this...
3. Re:Not enough products by alen · 2010-09-15 11:56 · Score: 1
  
  If it's emc then you need to be a global fortune 10 company to afford it
  I used to joke that they are like crack dealers. The initial hardware is not that much, but they get you on the disk upgrades, licenses to go above some storage size, backend bandwith, etc
4. Re:Not enough products by ischorr · 2010-09-15 12:24 · Score: 1
  
  I can't say that I've ever heard dedup percentage of 95% related to the NS series, which is very similar to the products in this article (NAS/SAN server that does dedupe on live data that lives on the array). Maybe you're confusing with products like Data Domain or Avamar or something?
5. Re:Not enough products by ischorr · 2010-09-15 12:26 · Score: 1
  
  The NS20 goes head-to-head with that NetApp box, so I'm not sure if that's true in this case (need to be fortune 10 to afford it). And from what I read a couple of days ago, it's the most commonly sold NAS product in this class...which is why I thought it was weird not to include it in the review. I'm curious what they would have said about it.
6. Re:Not enough products by dchaffey · 2010-09-16 01:04 · Score: 1
  
  I think you are confusing the prevalence of deduplication in backup and retention with in-line deduplication of live data as it is accessed.
  
  The algorithms to deduplicate data are one thing when you are looking at a backup window or replication cycle, but implementing them with a negligible enough impact on storage IO latency is another story entirely.
Foredown your data by HTH+NE1 · 2010-09-15 11:31 · Score: 1

I can't wait until the Dilbert strip hits where the PHB does this across all their backups and deduplicates them all away, thinking he's just saved a ton of money on backup media.
Redundancy can be a very good thing!

--
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
Re:Um.. by HTH+NE1 · 2010-09-15 11:37 · Score: 1

Diffs are fine until you lose the root file upon which they are based. Then you lose everything you've never changed. You need to do periodic full backups.

--
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
Which filesystem should be doing this??? by DanDD · 2010-09-15 11:40 · Score: 2, Insightful

Filesystems should be doing this.
The one on your desktop machine, or the primary NAS storage that you access shared data from, or the backup server that ends up getting it all anyway? You see, this is a shared database problem. If your local filesystem does this, then it has to 'share' knowledge of all the unique blocklets with every other server/filesystem that wishes to share in this compressed file space. De-duplication is a means of compression that works across many filesystems - or at least it can be, if it is properly implemented.

--
"Every time I see an adult on a bicycle, I no longer despair for the future of the human race." - H. G. Wells
1. Re:Which filesystem should be doing this??? by DeadDecoy · 2010-09-15 12:02 · Score: 1
  
  Plus if you want speed and safety, having redundant, mirrored copies of data can be useful.
2. Re:Which filesystem should be doing this??? by icebike · 2010-09-15 12:05 · Score: 1
  
  Well in the end, does not the filesystem running on the device end up controlling the actual reads and writes regardless of whether the file is shared across the network or across the world?
  My take is that there is not much to justify the claim that this should be in the filesystem vs the hardware. If you don't want to de-duplicate some data (for what ever reason) then you don't put it on that type of storage.
  But it seems to me that a hardware approach is a perfectly reasonable layer to do this. It eliminates several potential points of failure, (FS version changes, FS bugs, memory failures, bus failures, end user fiddling, etc).
  Its OS insensitive, and when you replace the server OS or hardware the search for drivers is eliminated. Obsolescence is defined by when the NAS fails to meet your needs, not by when the developer moves on to something new, or the company declines to release new drivers for the next version of your OS.
  As long as I can get out exactly the same data I put in, why would I want to do this at the FS layer? Why would I care, as long as it was reliable?
  I'm aware there are traps, such as having to make minor unique changes in thousands of files, forcing the system to un-de-duplicate many megabytes of data, potentially over-flowing the available storage. But that's equally possible in an FS based solution as a hardware based one.
  
  --
  Sig Battery depleted. Reverting to safe mode.
3. Re:Which filesystem should be doing this??? by Vancorps · 2010-09-15 12:12 · Score: 1
  
  Rarely is it useful on the same local storage. Keeping live copies offsite or in separate hardware is a good strategy but on the same hardware is just wasteful.
4. Re:Which filesystem should be doing this??? by Eristone · 2010-09-15 14:57 · Score: 1
  
  Ah, so you want to go to other hardware to restore a file that you have a snapshot of on your local hardware? And that fileset happens to be oh say a few hundred gigabytes. Out of curiosity, do you manage production fileservers with end users that are able to do stupid things?
5. Re:Which filesystem should be doing this??? by hawguy · 2010-09-15 16:55 · Score: 1
  
  You and Vancorps are talking about two different things. Deduplication (whether done by the filesystem or the storage system) doesn't preclude having snapshots.
  Vancorps was talking about the futility of keeping multiple copies of files on the same storage device as an aid to recovering corrupt data. He was not arguing that regular snapshots should not be made, just that redundant data could be deduped away without sacrificing any real measure of file integrity.
De-Dupe on Linux? by MarcQuadra · 2010-09-15 11:40 · Score: 1

Are there any open-source filesystems that offer deduplication?
It seems that the FS du-jour changes faster than any of the promised 'optional' features ever materialize.
Instead of working full-bore on The Next Great FS, it would be really nice to have compression, encryption, deduplication, shadow copies, and idle optimization running in EXT4.
Maybe I'm just jaded, but I've been a Linux user for 12 years now. Sometimes it feels like the names of the technologies are changing, but nothing ever gets 'finished'. Maybe the NTFS/BSD model (good core design, long intervals with only minor changes) would be wise in Linux filesystem development.

--
"Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
1. Re:De-Dupe on Linux? by Microlith · 2010-09-15 12:14 · Score: 1
  
  [blockquote]Maybe the NTFS/BSD model (good core design, long intervals with only minor changes) would be wise in Linux filesystem development.[/blockquote]
  You mean like the extremely long lived EXT* series of filesystems?
  Eventually the things you want to add lead you to rethink the core design instead of hacking on things on the outside more and more. But that process takes a long time and requires a lot of work to accomplish. Which is why BTRFS is a break from EXT* and IIRC it supports most (if not all) of the features you mentioned.
2. Re:De-Dupe on Linux? by cetialphav · 2010-09-15 12:20 · Score: 1
  
  Instead of working full-bore on The Next Great FS, it would be really nice to have compression, encryption, deduplication, shadow copies, and idle optimization running in EXT4.
  To do all these things, you have to change how data is stored on the disk and what information is present. When you do this, you necessarily create a new file system. These aren't simple features that you can just tack onto an existing file system.
  I suspect that one of these days we will be running the ext10 file system that has most of these features and evolved from ext3 in a methodical way, but it will in no way actually resemble ext3. There will always be other systems being developed to try out new ideas but getting things both reliable and fast is hard enough that the new systems never cross over the experimental hurdle at which point their innovations will migrate into ext\d.
3. Re:De-Dupe on Linux? by suutar · 2010-09-15 13:28 · Score: 2, Informative
  
  There's a few. I've read there's a patchset for ZFS on FUSE that can do deduplication; there's also opendedup and lessfs. The problem is that none of these has been around long enough to be considered bulletproof yet, and for a filesystem whose job is to play fast and loose with file contents in the name of space savings, that's kinda worrisome.
4. Re:De-Dupe on Linux? by Slashcrap · 2010-09-15 21:17 · Score: 1
  
  Instead of working full-bore on The Next Great FS, it would be really nice to have compression, encryption, deduplication, shadow copies, and idle optimization running in EXT4.
  Maybe I'm just jaded, but I've been a Linux user for 12 years now. Sometimes it feels like the names of the technologies are changing, but nothing ever gets 'finished'. Maybe the NTFS/BSD model (good core design, long intervals with only minor changes) would be wise in Linux filesystem development.
  So you're saying you'd like to see it evolve slowly like NTFS, while adding all these whiz-bang new features stat?
5. Re:De-Dupe on Linux? by Christian+Smith · 2010-09-15 22:21 · Score: 1
  
  Are there any open-source filesystems that offer deduplication?
  I've been meaning to look at http://www.lessfs.com/wordpress/, a Linux FUSE based dedup system.
  But ZFS is open source, you know. If you don't like the way OpenSolaris has gone, try it with FreeBSD.
Re:Um.. by Znork · 2010-09-15 11:44 · Score: 1

No need to give it a fancy name.
It's much easier for sales if you give it a fancy name, and preferably one that doesn't trigger comparisons with other solutions.
Of course, as deduplication is mainly a solution for enterprises that have been tricked into buying obscenely expensive storage, and who lack any coherent data storage policy and tiering strategy, the fancy name might be superfluous; they're spread wide and lubed up already.
If you get it just for dedupe maybe by Sycraft-fu · 2010-09-15 11:45 · Score: 1

However they have a ton of features including extremely high performance and reliability. For example they monitor your unit and if a drive fails, they'll send one you next day air. Sometimes the first you know of the failure is a disk shows up at your office.
Don't get me wrong, they aren't the only way to go, we have a much cheaper storage solution for less critical data, but the people who think dropping a bunch of disks in a Linux server gives you the same thing as a NetApp for less cost are fooling themselves.
It is exceedingly high end stuff, which is why it costs so much.
1. Re:If you get it just for dedupe maybe by h4rr4r · 2010-09-15 12:58 · Score: 1
  
  You can just have nagios monitoring for errors and even order a drive off amazon if you really wanted. NetApps have a lot of neat features, mailing you drives are not really one of them.
2. Re:If you get it just for dedupe maybe by drsmithy · 2010-09-15 15:57 · Score: 1
  
  You can just have nagios monitoring for errors and even order a drive off amazon if you really wanted.
  Not even touching on all the things that could go wrong with this (and there are many), the best response time you're going to be looking at for this is ~12 hours, and that's only in ideal circumstances.
  NetApp will have a replacement drive on your doorstep in 4-8 hours, often less.
3. Re:If you get it just for dedupe maybe by jabuzz · 2010-09-15 20:34 · Score: 1
  
  If you are relying on getting a fast replacement disk from *ANY* vendor to assure that you don't suffer data loss from disk failures then you are doomed from the get go. Your array needs to have a sufficient number of global hot spares to automatically replace any failed disk. At which point the four hour response time is not actually worth it (do you really want to get up a 03:00 on Christmas Day to replace a disk?) and next day is just fine.
  The reality is that all the cool features of a NetApp are not really worth it, because they can be replaced at a fraction of the cost with cheaper conventional disk. Now I need to get back to setting up the extra 180TB of conventional disk we have just purchased.
4. Re:If you get it just for dedupe maybe by Degrees · 2010-09-16 02:03 · Score: 1
  
  We just converted from Xiotech to NetApp, and the NetApp is crap. "High end" isn't how we would describe NetApp. And their sales people lied to us (er, said things that may technically be true but are about as honest as 'pigs CAN fly with sufficient initial velocity'). They also claimed that de-dupe would save us 50% storage space. Lies.
  It was a huge mistake. If it weren't for the political loss of face of having spent so much money, we would scrap it all and start over with any vendor other than NetApp.
  
  --
  "The most sensible request of government we make is not, "Do something!" But "Quit it!"
5. Re:If you get it just for dedupe maybe by drsmithy · 2010-09-16 02:39 · Score: 1
  
  If you are relying on getting a fast replacement disk from *ANY* vendor to assure that you don't suffer data loss from disk failures then you are doomed from the get go.
  I'm not. It's called risk minimisation.
  Your array needs to have a sufficient number of global hot spares to automatically replace any failed disk. At which point the four hour response time is not actually worth it (do you really want to get up a 03:00 on Christmas Day to replace a disk?) and next day is just fine.
  Whether I want to not is not particularly relevant. I'm getting *paid* to do it because it's my job.
  If you came to me and said a disk failed but we don't need to bother replacing it for a few days because there is (or was) a hot spare, you'd be lucky to walk away still employed.
6. Re:If you get it just for dedupe maybe by drsmithy · 2010-09-16 02:56 · Score: 1
  
  We just converted from Xiotech to NetApp, and the NetApp is crap. "High end" isn't how we would describe NetApp. And their sales people lied to us (er, said things that may technically be true but are about as honest as 'pigs CAN fly with sufficient initial velocity').
  For example ?
  They also claimed that de-dupe would save us 50% storage space. Lies.
  On what sort of data ?
Use ZFS. It offers dedupe, compression, etc. by jgreco · 2010-09-15 11:48 · Score: 3, Informative

ZFS offers dedupe, and is even available in prepackaged NAS distributions such as Nexenta and OpenNAS. You too can have these great features, for much less than NetApp and friends.
This is new? by Angst+Badger · 2010-09-15 11:48 · Score: 2, Interesting

Didn't Plan 9's filesystem combine journaling and block-level de-duplication years ago?

--
Proud member of the Weirdo-American community.
1. Re:This is new? by BitZtream · 2010-09-15 14:39 · Score: 1
  
  Plan 9 could have the cure for cancer too but still no one gives a shit about it.
  Dedup is a good 30 years old at least, if you want to point out that it isn't new.
  Only slashdotters and Linux children get excited at silly things like this.
  
  --
  Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
2. Re:This is new? by Trepidity · 2010-09-15 16:00 · Score: 1
  
  Besides, what Plan 9 user needs journaling anyway?
  Ah yes, the wonders of logic: making vacuously true statements about the empty set. ;-)
  
  --
  10 PRINT CHR$(205.5+RND(1)); : GOTO 10
3. Re:This is new? by clueelf · 2010-09-16 07:04 · Score: 1
  
  Plan9 has what's called Venti,
  "...Venti, intended for archival data. In this system, a unique hash of a block's contents acts as the block identifier for read and write operations. This approach enforces a write-once policy, preventing accidental or malicious destruction of data. In addition, duplicate copies of a block can be coalesced, reducing the consumption of storage and simplifying the implementation of clients. Venti is a building block for constructing a variety of storage applications such as logical backup, physical backup, and snapshot file systems."
  http://plan9.bell-labs.com/sys/doc/venti/venti.html
Nor do they give proper mention to Quantum DXi by DanDD · 2010-09-15 11:52 · Score: 1

Quantum was one of the first to bring variable-block data deduplication products to market, so in a sense their omission is rather odd.
However, the article seems centered on primary storage, and not the marriage of backup/replication/physical tape, which is Quantum's focus.
Personally, I'd be _terrified_ of using dedup for primary storage. What this does is exactly the opposite of RAID - it squeezes every last bit of redundancy out of your data, and makes everything dependent upon the integrity of your blockpool database. Loose a single blocklet and you stand to lose _all_ of your data.
Compressing common data across many filesystems for things like backups makes a lot more sense, and seems more cost effective.

--
"Every time I see an adult on a bicycle, I no longer despair for the future of the human race." - H. G. Wells
1. Re:Nor do they give proper mention to Quantum DXi by ischorr · 2010-09-15 12:40 · Score: 1
  
  "Personally, I'd be _terrified_ of using dedup for primary storage. What this does is exactly the opposite of RAID - it squeezes every last bit of redundancy out of your data, and makes everything dependent upon the integrity of your blockpool database. Loose a single blocklet and you stand to lose _all_ of your data. "
  Dedupe reduces multiple copies of the same data *on the same storage*
  I think you're implying that having - probably purely at random - multiple copies of some files on the same FS is somehow a proper backup/redundancy strategy. It sounds like you're saying that WITHOUT dedupe, if a file got corrupted you'd at least be able to go restore it from some other copy of the file. I can't imagine how this is true - you can't rely on chance copies of multiple files to be able to recover from a file corruption. That's crazy. With or without dedupe you better have BACKUPS of the data on some other storage.
  Maybe you mean that if something gets corrupted in some of the deduped data that you'll lose ALL dedupe data (so maybe half of your filesystem or something). Most dedupe technologies don't work that way - if corruption occurs it will impact the actual data or possibly file that was affected (and obviously each copy of that data throughout the FS). But not more than that.
2. Re:Nor do they give proper mention to Quantum DXi by immortalpob · 2010-09-15 13:49 · Score: 2, Interesting
  
  You are missing his point. On a non-deduplicated system if one block goes bad you lose one file, on a deduplicated system you can lose any number of files due to one bad block. It gets worse when you consider the panacea of non-backup deduplication, consider all of your servers are VMs and reside on the same deduplicated storage, one bad block can take them ALL DOWN. Now admittedly any dedupe solution will sit on some type of raid, however there is still the possibility of something terrible, and this is made worse by the likelihood of a URE during a raid-5 rebuild.
3. Re:Nor do they give proper mention to Quantum DXi by jgreco · 2010-09-15 14:30 · Score: 1
  
  That's why you have the system store more than one copy, and you have it validate their integrity when reading them. Think of it as sensible RAID. I suggest a quick Google for "zfs data integrity", etc.
4. Re:Nor do they give proper mention to Quantum DXi by ischorr · 2010-09-15 14:49 · Score: 1
  
  "You are missing his point. On a non-deduplicated system if one block goes bad you lose one file, on a deduplicated system you can lose any number of files due to one bad block."
  This is true, but he was saying "This is the opposite of RAID...it squeezes every bit of redundancy out of your data". Like having random duplicate copies of files scattered around a filesystem was a redundancy mechanism that is somehow on-par with RAID, and so enabling dedupe means that you have eliminated a serious data redundancy mechanism. It's true that it might be a higher-impact loss when you lose a single file (and require more restores or mean more users will be impacted), but it's not a situation where you've suddenly killed your data backup plan and lost all your data. You're just not going to be using random duplicate copies of files on your FS in this way.
5. Re:Nor do they give proper mention to Quantum DXi by drsmithy · 2010-09-15 16:08 · Score: 1
  
  On a non-deduplicated system if one block goes bad you lose one file, on a deduplicated system you can lose any number of files due to one bad block.
  That's why you have RAID and block-level checksumming.
  What scenario are you envisaging where this can happen ?
6. Re:Nor do they give proper mention to Quantum DXi by drsmithy · 2010-09-15 16:10 · Score: 1
  
  Personally, I'd be _terrified_ of using dedup for primary storage. What this does is exactly the opposite of RAID - it squeezes every last bit of redundancy out of your data, and makes everything dependent upon the integrity of your blockpool database. Loose a single blocklet and you stand to lose _all_ of your data.
  If you're striving for availability by keeping multiple copies of the same data on the same physical device(s), You're Doing It (Very) Wrong.
Re:Um.. by georgewilliamherbert · 2010-09-15 11:54 · Score: 1

No, it's not.
Differential backups are taking a single filesystem, seeing what changed (either at the file level (whole changed/updated/new files) or block level (changed blocks within files).
Block level deduplication is noticing that the storage appliance on which you back up 100 desktops and 10 servers has 50 copies of the same version of each data block in each Microsoft OS file from XP, 25 from Win 7, and 35 from Fedora, and only storing 1 copy of each of those blocks rather than 100 separate ones. It's returning those blocks to the usable storage pool and remapping without having to "compress" anything, not having to rewrite the backup data images, etc. It's just saying "This is block 3 of the binary for Internet Explorer 8, and I already have a copy of that", for each and every common block out there.
You still have to upload the blocks, and the system still needs to scan them to notice the duplication, but it's a lot more than "oh, compression".
Re:De-Dupe on Linux? - yes by Anonymous Coward · 2010-09-15 11:57 · Score: 1, Interesting

http://www.opendedup.org/
Re:Um.. by cetialphav · 2010-09-15 11:58 · Score: 2, Informative

AFAIK this is pretty much how every compression algorithm works. No need to give it a fancy name.
The reason it has a different name is to distinguish this from a compressed file system. The blocks of data are not compressed in these systems. Imagine that you have a file system that stores lots of vmware images. In this system, there are lots of files that store the same information because the underlying data is OS system files and applications. Even if you compress each image, you will still have lots of blocks that have duplicate values.
Deduplication says that the file system recognizes and eliminates duplicate blocks across the entire file system. If a given block has redundant data within it, that redundancy is not removed because the blocks themselves are not actually compressed. This is the difference between a compressed file system and a deduplicated file system. In fact, there is no reason that you could not combine both of these methods into a single system.
Re:Um.. by icebike · 2010-09-15 12:18 · Score: 1

Although there is nothing to say compression of data might not also happen. I don't believe compression and de-duplication are mutually exclusive.
This is actually a good argument for de-duplication to run on the device. It can surf thru files more or less at leisure looking for duplicate blocks all over the file system, without tying up the server's bus/controller.
That could be done independent of File System compression, which generally, as you pointed out, works best on large blocks of repetitive bytes within a single file.

--
Sig Battery depleted. Reverting to safe mode.
Re:Use ZFS. It offers dedupe, compression, etc. by lisany · 2010-09-15 12:21 · Score: 2, Informative

Except NexentaStor (3.0.3) has an OpenSolaris upstream (which has gone away, by the way) kernel bug that hanged our Nexenta test box. Not a real good first impression.
Re:Data Deduplication . . by initdeep · 2010-09-15 12:40 · Score: 1

they dont have to do this at the file level.
they do it at the block level.
so in your example, since the only change would be the signature on the bottom of each email, the email blocks themselves would be deduped, and the signatures would be retained.
think of backing up a whole bunch of similar desktops in an enterprise situation where the majority of the OS files are going to be the same or have only slight variations.
even if the files have slight variations, only the actual bits that are different would be stored and the rest would be deduped and only one copy kept.
personally i know a fairly large company using avamar for this, however they do it on their backups only. And iirc, they still keep different sets of backups, just dedupe within the backup itself which saves them quite a bit of space per backup.
Re:Data Deduplication . . by hawguy · 2010-09-15 12:50 · Score: 1

I think what you're talking about is single instance storage in your mail server. But as you mentioned, it only works well on identical emails and attachments.
No dedupe system that I'm aware of does what you'd need to do to dedupe forwarded emails. It's technically possible by recognizing similar messages and doing diff's on them to find identical sections. But, it's computationally difficult and there's not much payback -- better to go after the lowhanging fruit and dedupe all of the identical gif's and mp3s that people have downloaded off the internet.
When we deduped our corporate fileserver, we got around 40% of our space back.
Ya it is by Sycraft-fu · 2010-09-15 13:20 · Score: 3, Insightful

Something you start to appreciate when you are called on to do a really high availability, high reliability system is to have features like this. For one thing it reduces the time it takes to get a replacement. Unless a drive fails late at night, you get one the next day. You don't have to rely on someone to notice the alert, place the order, etc. It just happens. Also, like most high end support companies, their shipping time is fairly late so even late in the day it is next day service. What arrives is the drive you need, in its caddy, ready to go.
Then there's just the fact of having someone else help monitor things. It's easy to say "Oh ya I'll watch everything important and deal with it right away," but harder to do it. I've known more than a few people who are not nearly as good at monitoring their critical system as they ought to be. A backup is not a bad thing.
You have to remember that the kind of stuff you are talking about for things like NetApps is when no downtime is ok, when no data loss is ok. You can't say "Ya a disk died and before we got a new on in another died so sorry, stuff is gone."
Not saying that your situation needs it, but there are those that do. They offer other features along those lines like redundant units, so if one fails the other continues no problem.
Basically they are for when data (and performance) is very important and you are willing to spend money for that. You put aside the tech-tough guy attitude of "I can manage it all myself," and accept that the data is that important.
1. Re:Ya it is by h4rr4r · 2010-09-15 13:47 · Score: 2, Insightful
  
  I mean have the nagios server order the drive without any human intervention.
  Also if it was really critical you would keep several disks ready to go on site. You know for when you can't wait for next day. Also like netapp you too can have many hot spares in the volume.
  If you have problems with people not noticing or reacting to alerts you need to fire them.
2. Re:Ya it is by h4rr4r · 2010-09-15 13:50 · Score: 1
  
  You have to remember that the kind of stuff you are talking about for things like NetApps is when no downtime is ok, when no data loss is ok.
  Then what you want is redundancy, because downtime and loss of data are guarantees in life. The real service NetApp provides is letting companies hire MCSEs and be ok with the job they do. They spend money to outsource this part of their IT, which is fine. Just do not pretend that they are doing anything else.
3. Re:Ya it is by Trepidity · 2010-09-15 15:58 · Score: 1
  
  Isn't not just small to medium sized businesses; most tech companies, even really huge ones, don't buy this kind of enterprise equipment. You won't find any of it at Google or Amazon, for example, even though they are quite large.
  
  --
  10 PRINT CHR$(205.5+RND(1)); : GOTO 10
4. Re:Ya it is by Anonymous Coward · 2010-09-15 16:05 · Score: 2, Insightful
  
  I'll but in and say that firing people is a piss poor way to fix problems unless you've made very sure that the person in question needs to go. What you do is find out what happened if an alert goes unnoticed and make a change that removes the root cause of that failure. That may be that you have to let go of the guy doing drugs in the corner, but it may also be that your hardware issues alerts in a way that it is easy to miss. You may also realize that perhaps an alert happens only once a year, and in that case you may need to issue spurious alerts to make sure that people know what to do and remain vigilant. The root cause may even be that your staff is completely overworked, and just think where firing someone is going to put you then. Or maybe what you need is to put a siren on the damn thing that will make it impossible to miss even at 3 in the night when the guy at watch falls asleep because he's been pulling all-nigthers to keep your company in business. Firing someone just because a fuck-up happened is sometimes a very bad response.
5. Re:Ya it is by totally+bogus+dude · 2010-09-15 16:16 · Score: 2, Insightful
  
  Developing a monitoring system for a complicated piece of storage that reacts properly to every possible failure mode is a massive undertaking. It will take a lot of time just to figure out everything that you need to monitor, and the possible values for them during normal operation; let alone actually test that your system correctly detects and responds to every possibility.
  If your business is providing SAN management/support services, then I can see this as being worthwhile. It's a massive investment in technology and skills amongst your staff, but if that's what you make your money doing, it may well give you a competitive edge.
  But if your business is anything else, why are you going to invest so much into something that's really just a background piece of infrastructure? What's your plan for retaining the staff that know how the monitoring system works, and know your storage system in sufficient detail to be able to understand all the things it's checking, etc?
  If you really have the expertise on-hand to implement such a thing in a way that you're comfortable relying on, why on earth wouldn't you use them for something more productive that will actually make your business money? Again, if your business is monitoring storage infrastructure, it makes sense. If your business is anything else, why are you spending the time of highly skilled people to implement something you can easily buy off-the-shelf (i.e. a standard support contract)?
6. Re:Ya it is by cetialphav · 2010-09-15 16:56 · Score: 1
  
  You won't find any of it at Google or Amazon, for example, even though they are quite large.
  Have no illusions, though. The Google and Amazon solutions are neither cheap nor easy to implement. They rely on top-notch engineers being able to build an intelligent storage layer on top of a bunch of dumb commodity disks. Their needs are specialized enough (and they have enough cash) that this makes some sense. But most businesses do not have that kind of talent or that kind of cash.
  Like everything in engineering, this comes down to looking at your business requirements and finding a solution that meets them for the best price. When you need what NetApp and competitors do, there is no cheaper alternative than buying their product. If you think you can cheaply build something with equivalent functionality, then you do not understand what these products are really doing.
7. Re:Ya it is by dbIII · 2010-09-15 17:05 · Score: 1
  
  You put aside the tech-tough guy attitude of "I can manage it all myself,"
  And then you wonder if the guys you outsourced it to care enough to do things as advertised.
  It's the attitude of them losing a small contract versus you losing your job. Unless you are a HUGE customer you have to assume their care factor is zero and have at least something to fall back on if they take too long or don't come through at all. Last weeks backup will be missing things but if it's on site and you can get stuff from it NOW that can save the pain of waiting a few days for others to get their act together (eg. for me a four day wait on an expensive 12 hour replacement deal because the vendor "restructured").
8. Re:Ya it is by drsmithy · 2010-09-15 17:59 · Score: 1
  
  Isn't not just small to medium sized businesses; most tech companies, even really huge ones, don't buy this kind of enterprise equipment.
  I think you'll find they do. Small businesses less so, but certainly medium and up will almost certainly have something from the usual suspects of NetApp, EMC, IBM, Equallogic, etc.
  You won't find any of it at Google or Amazon, for example, even though they are quite large.
  Not really a good example. Both companies are huge and therefore have (more than) sufficient staff to both engineer and support their own internally developed solutions. Further, they can probably derive a business benefit from those developments, something rarely true for the average company, especially when talking about generic, commodity functionality like storage.
  For businesses in the meat of the bell curve, most technology is a means to an end, not an end in itself. This is true even for "technology-driven" businesses, since the bulk of their IT functions are neither unique nor interesting. A couple of hundred grand of CapEx for a mid-range NetApp filer is not a lot compared to the couple of hundred grand *per annum* OpEx you'll need to spend on dedicated staff to develop, support and maintain something equivalent in-house, especially when taking into consideration how vulnerable you become when they decide to jump ship to another employer, and the fact that investment is almost certainly not delivering any direct benefits to the business.
Re:Um.. by igny · 2010-09-15 13:33 · Score: 2, Funny

Yeah! To fight dupes I compute CRC checksum for each file and store it (and only it) on my back up drive. That method removes dupes almost automatically and there is a side effect of a huge compression ratio too. I have been downloading the high def videos from Internet for quite a while now and with my compression method I have used less than 10 percent of 1GB flash drive! I strongly recommend this method to everyone!

--
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
Re:Data Deduplication . . by h4rr4r · 2010-09-15 13:51 · Score: 1

Did it cost less than buying 40% more disks? Heck, did it cost less than building another fileserver with 100% more disk and then syncing between them?
I already do this by MyLongNickName · 2010-09-15 13:57 · Score: 3, Funny

After an analysis of a 1TB drive, I noticed that roughly 95% were 0's with only 5% being 1's.
I was then able to compress this dramatically. I just record that there are 950M 0's and 50M 1's. The space taken up drops to around 37 bits. Throw in a few checksum bits, and I am still under eight bytes.
I am not sure what is so hard about this disaster recovery planning. Heck, I figure I am up for a promotion after I implement this.

--
See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
1. Re:I already do this by smallfries · 2010-09-15 22:07 · Score: 1
  
  Why settle for promotion?
  I reckon that the Ministry for the Economy would be desperate to hire you with those mad skills.
  
  --
  Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Use ZFS. It offers dedupe, compression, etc. by jgreco · 2010-09-15 14:24 · Score: 1

I found a ton of stuff I didn't really care for with Nexenta. They've put some good effort into it, and it'd be a fine way to go if you wanted commercial support, but overall it doesn't really seem to fit our needs here. ZFS itself is a resource pig, but on the other hand, resources have become relatively cheap. It's not unthinkable to jam gigs of RAM in a storage server ... today. Five years ago, though, that would have been much more likely to be a deal-breaker.
Re:Um.. by Trepidity · 2010-09-15 15:50 · Score: 1

That's still a particular type of compression, isn't it? I mean, I can buy giving it a new name, since it has a bunch of infrastructure around it, but it's a perfectly general kind of data-compression algorithm as well, even if not the world's most efficient: break the data into fixed-size blocks, then, for any blocks that appear more than once, replace all occurrences after the first with a pointer to the first. Block-based RLE compression is basically a simpler version of that (where you only deduplicate the blocks when they appear in sequence).

--
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Re:Data Deduplication . . by drsmithy · 2010-09-15 16:15 · Score: 1

aside from the mentioned 'to reduce duplicate data to increase available storage space' are there any other benefits to de-duplicating your storage?
An intelligent caching layer will only store the deduped data once, allowing it to cache more data, get more cache its, reduce physical disk IO and improving response times.
Re:Um.. by flyingfsck · 2010-09-15 16:24 · Score: 1

I just bookmark the movies on movie2k and huku and play them when I want to watch them. That saves even more space.

--
Excuse me, but please get off my Pennisetum Clandestinum, eh!
No DataDomain/EMC? by microbee · 2010-09-15 16:27 · Score: 1

If the market leader isn't included in the review, I am wondering how worthy this report is.
1. Re:No DataDomain/EMC? by TheRaven64 · 2010-09-15 23:16 · Score: 1
  
  You've never read an InfoWorld article before, have you?
  
  --
  I am TheRaven on Soylent News
Re:Um.. by drsmithy · 2010-09-15 16:51 · Score: 1

Of course, as deduplication is mainly a solution for enterprises that have been tricked into buying obscenely expensive storage, [...]
If we were "tricked", what is the cheaper, but equally capable alternative ?
Re:Um.. by drsmithy · 2010-09-15 17:03 · Score: 1

That's still a particular type of compression, isn't it?
Not really. Compression is taking a chunk of data and replacing it with a different, smaller chunk of data plus instructions (albeit in abbreviated form) about how to turn it back into the original chunk of data. Dedupe is taking a chunk of data and replacing it with a pointer to a "remote", identical chunk of data.
Compression is nearly always applied on a per-file basis, whereas dedupe is applied on a per-volume basis. Conceptually similar to the difference between compressing every individual file in a volume vs tar.gz-ing the whole volume.
Compression is generally much more limited over the "window" of data it can work with at any time (eg: if you have two similar bits of data at the start and finish of a file, but they're separated from each other by 10 gigabytes of other data, then they'll probably be compressed independently).
Consider also the difference in processing necessary to decompress data vs dereference a pointer on disk. Some compression schemes (eg: h.264) are quite CPU intensive to decompress, while the overhead for reading deduped data is basically zero.
Re:Data Deduplication . . by hawguy · 2010-09-15 17:17 · Score: 1

Since the dedupe license came for "free" with my filer, yes, that 40% improvement cost less than buying 40% more disks.
And yes, it's much cheaper than building another fileserver with 100% more disk and syncing between them. How much do you think it costs to build a fileserver with 150TB of disk space, and how would you recommend that I sync the 75TB of data between them? I don't think this is a job for rsync.
I do actually replicate between two identical (nearly identical) arrays, but I use my array vendor's software to do this -- they can replicate only modified blocks, much faster that a tool that only has visibility at the filesystem level.
EMC is worth its weight in manure by Nicolas+MONNET · 2010-09-15 21:21 · Score: 1

EMC's software is the most buggy, unintuitive, poorly documented and abysmally supported piece of shit I've ever had the displeasure to use. It is simply revolting. Considering how much it costs, it's mind boggling.
(Well, what could I expect from an appliance that runs on Windows 95?)
EMC is mature like an Alzheimer patient by Nicolas+MONNET · 2010-09-15 21:23 · Score: 1

I have to use that shit and it's obviously designed by complete morons. Seriously, I have to work with a spanking new SAN worth hundreds of thousands and it's so full of bug I can't imagine why people buy that crap. Oh wait I do know but I can't tell.
Re:Lots of stupidity being displayed today by smallfries · 2010-09-15 22:11 · Score: 1

Yup, I totally agree with you. There is a *lot* of stupidity on display today.
Tell you what, why don't you image a 100GB disk into a single file, then run zip over it and come back and tell us how it worked out for you when you're done...
Here is a hint: the buffer is a constant-size because it has been tuned for a particular type of file where the granularity of repetition is very fine/small. Trying to do the same on much larger files where the granularity of repetition is much coarser requires a much bigger buffer. And it does not increase linearly...

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Lots of stupidity being displayed today by Whuffo · 2010-09-15 22:57 · Score: 1

Well, if that 100GB disk is full of the usual mix of programs and data it'll deflate to about 50% of its size. Maybe even better; whole hard drives are a special case with lots of "slack" and unallocated space.
Software implementations of compression have some issues - because of the way the compressed data is stored it's not tolerant of errors and a problem in the driver that implements the compression can render the entire disk unreadable. Doing it in hardware can improve the reliability, but it's not an invention. I suppose that next we'll hear about how a patent has been issued and when that gets discussed the trolls will come out for another party.
Product announcements like this one wouldn't happen and discussions like this would be much shorter if the people involved knew what they were talking about. But I suppose that's too much to expect these days...
Re:Um.. by TheRaven64 · 2010-09-15 23:03 · Score: 1

That's still a particular type of compression, isn't it?

Yes, it's a fairly simple kind of compression too. Basically, your FS is a mapping from names to sets of numbers, then from numbers to blocks of data. With deduplication, you are just tweaking the second map so that identical blocks are only stored once. This is compression - you are storing data in less space - but it is a very specific kind of compression, which is why it gets the name.
Compression is a very generic term, but in the context of filesystems usually refers to per-file or per-block compression. With something like ZFS, you can combine the two, so individual blocks are LZJB-compressed and duplicated blocks are not stored twice.
Pretty much any compression algorithm works on a window. With something stream-based, like gzip, it builds the compression tables as it scans the file, discarding data when they get too big. With something like bzip2, it scans a fixed size (up to 900KB, as I recall), calculates redundancy within that, and then stores the result. If you had 900KB of random data, repeated over and over in a file, these algorithms would do quite badly.
The reason for deduplication is that the coarse granularity can catch compression opportunities that conventional compression algorithms miss. It would not be feasible, for example, to build a Huffman tree for the contents of a 250GB hard disk. If you did, you'd almost certainly get a lot of compression, but you'd need a huge amount of RAM, and you'd need to rebuild the tree periodically when you modified the FS. In contrast, deduplication can run quite quickly. ZFS computes checksums for every block anyway - it just needs to store a table of these hashes to find duplicates. This takes quite a bit of RAM, but not an unfeasibly huge amount, and the CPU load is relatively small (just one hash table lookup per store - the bottleneck is almost certainly the disk, rather than the CPU).

--
I am TheRaven on Soylent News
Re:Lots of stupidity being displayed today by TheRaven64 · 2010-09-15 23:23 · Score: 1

Well done, you've missed the point. That 32KB buffer is not so useful if you have the same 32KB of data repeated - the deflate algorithm will miss it entirely because there's little redundancy within the 32KB blocks, but a lot of redundancy among them. Now what happens when you scale up the window to the 250GB of a cheap hard disk? First, your RAM usage goes to several TB. Second, your CPU usage spikes so high that it takes a few days to write a single 512-byte block. Not so great.
The point of deduplication is to find identical blocks on the disk and replace them with copy-on-write references to the same block. This is very cheap, in terms of CPU time, and relatively cheap in terms of RAM usage.
This is not a substitute for block-level compression, it's an orthogonal approach. A filesystem with dedup can also use block-level compression. ZFS, for example, uses LZJB for compressing individual blocks, then uses deduplication to avoid storing redundant blocks (it also implements copy-on-write for snapshots and clones, so if you copy a filesystem you don't actually allocate any space until you modify either version).

--
I am TheRaven on Soylent News
Re:Lots of stupidity being displayed today by smallfries · 2010-09-15 23:39 · Score: 1

Which "usual" mix of programs and data is that then?
Or are you taking your experience on zipping smaller things that they "usually" decrease in size by about 50%. Because I've already explained to you why that experience doesn't translate very well into spotting duplicate blocks at much larger sizes...
The test is not exactly hard: take a file that doesn't compress very well under zip (i.e almost any media file) and build a fake file hierarchy out of lots of copies of it. Then make a disk image and try and zip it. What you've described will fail, for the reasons that I've described to you. But de-duplication would definitely work on that disk image.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
where are the products by dosguru · 2010-09-16 01:10 · Score: 1

In realm of big IT where I have about 13PB of backup data on deduplicated disk we didn't even look at these products. Data Domain's DD690 and 880 are overall excellent but can't compress Oracle data if their lives depended on it. At least not the ways that the DBAs like to back up Oracle. IBM's Diligent product is a fantastic piece of technology for both Open and Mainframe systems, but is VTL based and does not come cheap.
Optimized replication between sites is one of the best parts of dedupe, even over storage. If something can actually get 10x compression then that 1GbE link I have between locations functionally acts more like a 10GbE for no more cost. A huge boost on the WAN for DR.
What about Windows Home Server's backup? by jbarr · 2010-09-16 01:25 · Score: 1

I know this is functionally different, but I have a Windows Home Server backing up my PCs at home. WHS apparently uses a "Single Instance Store" model to store backup sets. If the same file is detected on multiple computers, it is stored only once saving storage space. I'm backing up the C: drive of three Windows 7 Home Premium PCs to the WHS. Each PC uses between 40 and 60GB of space, yet the backup sets on the WHS total only 80GB. I'm sure that could partly be attributed to compression, but still, this seems to be pretty cool.

--
My mom always said, "Jim, you're 1 in a million." Given the current population, there are 7000 of me. God help us all!
File based disk deduplication is trivial by lopgok · 2010-09-16 02:04 · Score: 1

I wrote a simple python program that does file based disk de-duplication. It will work anywhere python runs as long as the filesystem supports hard links.

It is available under gpl at: http://jdeifik.com/

With the hardware solutions in the article, prices start at $25k or so, and you are beholden to the hardware vendor. Doing it in software will work with any reasonable OS, any reasonable filesystem, and any hardware. Sure, block based de-duplication is more efficient, but it is filesystem specific, and right now ZFS is the only somewhat reasonable filesystem that supports it.
Re:Um.. by Nesman64 · 2010-09-17 00:48 · Score: 1

It's the decompression time that will kill you (and cause the heat death of the universe).

--
coffee | nose > keyboard
ZFS-FUSE stability is great. by jafo · 2010-09-20 04:26 · Score: 1

The ZFS-FUSE setup is fantastic. For most things you are very much limited by platter speed; I've found the performance to be quite acceptable.

As far as stability goes, the 0.6.9 release, which has been out for around 3 or 4 months, has been exceptional. I did extensive stress testing of it over the last 9 months or so, and all the issues I found were resolved (quickly) by the ZFS-FUSE folks.

I currently have a 16TB backup system running with something around 2,000 snapshots and 80% space used, and it works just great. I also have a personal storage system that I have been running ZFS-FUSE on for around 3 years now, and it also has been great. I was originally running 0.5.0 on it but upgraded to the 0.6.9 after my above stress testing. It also has been great.

I used Nexenta in the past, it was ok but I think there were definite ZFS issues in it at that time, maybe 3 years ago. The systems I had would reboot every 30 to 60 days. Then I upgraded it to the latest Nexenta a year or 18 months later and had all sorts of data loss issues while trying to do the zfs send/recv from the old systems. An annoyance with Nexenta was the hardware support; coming from Linux which supports just about anything to having to dig around to find compatible storage controllers... I recently (maybe 10 months ago now, before I started really hard down the ZFS-FUSE testing path) tried OpenSolaris and ran into some weirdness where I did the install, then did an update and it spent an hour downloading updates, then bombed out. So I started it again and it spent an hour downloading updates and bombed out.

There are two reasons I'm using ZFS-FUSE so heavily: 3 years ago there was no option for encrypting my ZFS storage system in Solaris, and I just am so much more comfortable with Linux than Solaris. My home storage system stores a lot of private data, that I want to have close at hand at home, but if someone steals it I don't want to worry about the scanned check images and bills we have saved there, etc... So crypto was a huge deal for me in that server.

Sean