Slashdot Mirror


The State of ZFS On Linux

An anonymous reader writes: Richard Yao, one of the most prolific contributors to the ZFSOnLinux project, has put up a post explaining why he thinks the filesystem is definitely production-ready. He says, "ZFS provides strong guarantees for the integrity of [data] from the moment that fsync() returns on a file, an operation on a synchronous file handle is returned or dirty writeback occurs (by default every 5 seconds). These guarantees are enabled by ZFS' disk format, which places all data into a Merkle tree that stores 256-bit checksums and is changed atomically via a two-stage transaction commit.. ... Sharing a common code base with other Open ZFS platforms has given ZFS on Linux the opportunity to rapidly implement features available on other Open ZFS platforms. At present, Illumos is the reference platform in the Open ZFS community and despite its ZFS driver having hundreds of features, ZoL is only behind on about 18 of them."

42 of 370 comments (clear)

  1. Re:rsync causes lockups? by gbkersey · · Score: 2

    We use ZFS to store backups, and we backup with rsync. No problems so far.

  2. Working well for me by zeigerpuppy · · Score: 4, Informative

    I've been using ZFSonLinux for a year in production. No problems at all. It's my storage back end for Xen Virtual machines. Just make sure you use ECC RAM and a decent hard disk controller. Instant snapshots and ZFS send/receive functions are awesome, have reduced my backup times by an order of magnitude. I use a Debian Wheezy/Unstable hybrid.

    1. Re: Working well for me by zeigerpuppy · · Score: 2

      The technical descriptions I've read say that you absolutely should use ECC because ZFS will eventually hit a checksum mismatch. This could result in valid data being flagged as corrupt. ECC RAM is not much more expensive these days but you do need a mobo that supports it.

    2. Re: Working well for me by zeigerpuppy · · Score: 2

      Good sane description here: http://ianhowson.com/do-you-re...

  3. Re:rsync causes lockups? by QuietLagoon · · Score: 2

    Been using rsync on ZFS for many months (FreeBSD 10.0). No issues whatsoever.

  4. Re:rsync causes lockups? by NFN_NLN · · Score: 3, Informative

    Is the target not a zfs filesystem as well? If so zfs send/recv allows for replication and handles deltas at the filesystem level. It should be more efficient.

  5. Re:rsync causes lockups? by NFN_NLN · · Score: 2
  6. Re: Unfamiliar by zeigerpuppy · · Score: 5, Informative

    ZFS is a layer below LVM. It's best to give it direct control over your drives (no hardware RAID). The reason for this is to allow it to do data integrity checks on the actual data being written. It's similarly fast compared to hardware RAID but guarantees data integrity in a much more compete fashion. I use a striped mirrored setup which is similar to RAID 10 (over 4x 3TB drives with caches on a pair of SSDs). If you cache like this, frequent reads don't need to go to the spindles. It also had built in compression and deduplication. The best thing IMO is instant snapshots though, that's one feature I can't believe I lived without.

  7. Re:Unfamiliar by Anrego · · Score: 5, Interesting

    I too have kinda been watching passively with a kinda "I'll look into this once it's ready" attitude.

    The gist as far as I understand it is (again, take with huge helping of salt (it's not that bad for your health any more!), I'm posting these partly to be told I'm wrong):

    Pros:
    - data integrity (checksums and more rigorous checks that something is actually written to the disk)

    Cons:
    - cpu and ram overhead (even by current standards, uses a tonne of resources)
    - doesn't like hardware raid (apparently a lot of the pros rely on talkign to an actual disk)
    - expandability sucks (can be done, but weird rules based on pool sizes and such) compared to most raid levels where you can easily toss a new disk in there and expand.

  8. No thanks, I'll stick with ReiserFS by Anonymous Coward · · Score: 2, Funny

    It's a killer file system. Once you've used it, you won't be able to leave it.

    1. Re:No thanks, I'll stick with ReiserFS by rssrss · · Score: 2

      Groan

      --
      In the land of the blind, the one-eyed man is king.
  9. Magic by N7DR · · Score: 2

    I've been using ZFS on Linux for about a year. I can summarise my position on the experience with two words: it's magic.

    It is still tricky to run one's root system off ZFS (at least on Debian). That, I think, is for those who are brave and have to time to deal with issues that might arise following updates. But for non-root filesystems, ZFS is, as I said, magic. It's fast, reliable, caches intelligently, adaptable to a large variety of mirror/striping/RAID configurations, snapshots with incredible efficiency, and simply works as advertised.

    Someone once (before the port to other OSes) said that ZFS was Solaris' "killer app". Having used it in production for a year, I can understand why they said that.

    1. Re:Magic by brambus · · Score: 2

      it updated zfs code, updated a disk format encoding but you could not revert it

      You can thank your package maintainer for this. ZFS never ever ever upgrades the on-disk format silently. You always have to do a manual "zpool upgrade" to do it. It'll tell you when a pool's format is out of date in "zpool status", but it'll never do the upgrade by itself.

      updating a disk image format and not allowing n-1 version of o/s to read it is a huge design mistake and I'm not sure I understand the reasoning behind it, but until that is changed, I won't run zfs

      Again, this is not ZFS' fault, it's your package maintainer for auto-upgrading all your imported zpools. ZFS never does this by itself.

  10. Re: Unfamiliar by zeigerpuppy · · Score: 3, Interesting

    Actually it's pretty friendly on resources but likes lots of RAM to perform well (1Gb per Tb of storage is a good minimum). One of my servers runs on an atom processor (8x 3TB drives in equivalent to RAID 6 gets throughput of about 200MB/sec) Adding disks is also a strength. You can grow data sets quite easily but naturally performance degrades until you update the whole drive set. A lot of RAID controllers can be put in HDA mode so you may be lucky. However the Adaptec controllers go cheap 2nd hand ($100).

  11. Still no SELinux support by Kahenraz · · Score: 2

    How can it be production-ready if it still lacks SELinux support.. the ZOL FAQ suggests either permissive or disabling of it entirely.

  12. Re:rsync causes lockups? by yup2000 · · Score: 3, Insightful

    I've been using ZFS on linux for years with nightly backup jobs that rely on rsync. I've never had a problem.

  13. Re:Unfamiliar by mcrbids · · Score: 5, Insightful

    There are so many pros for ZFS that I don't even. Until you try it, you won't "get it" - it's more like trying to describe purple to a life long blind guy. But, I'd adjust your list to at least include:

    Pros:
    - Data integrity
    - Effortless handling of failure scenarios (RAIDZ makes normal RAID look like a child's crayon drawing)
    - Snapshots.
    - Replication. Imagine being able to DD a drive partition without taking it offline, and with perfect data integrity.
    - Clones. Imagine being able to remount an rsync backup from last tuesday, and make changes to it, in seconds, without affecting your backup?
    - Scrub. Do an fsck mid-day without affecting any end users. Not only "fix" errors, but actually guarantee the accuracy of the "fix" so that no data is lost or corrupted.
    - Expandable. Add capacity at any time with no downtime. Replace every disk in your array with no downtime, and it can automatically use the extra space.
    - Redundancy, even on a single device! Can't provide multiple disks, but want to defend against having a block failure corrupting your data?
    - Flexible. Imagine having several partitions in your array, and be able to resize them at any time. In seconds. Or, don't bother to specify a size and have each partition use whatever space they need.
    - Native compression. Double your disk space, while (sometimes) improving performance! We compressed our database backup filesystem and not only do we see some 70% reduction in disk space usage, we saw a net reduction in system load as IO overhead was significantly reduced.
    - Sharp cost savings. ZFS obviates the need for exotic RAID hardware to do all the above. It brings back the "Inexpensive" in RAID. (Remember: "Redundant Array of Inexpensive Disks"?)

    Cons:
    - CPU and RAM overhead comparable to Software RAID 5.
    - Requires you to be competent and know how it operates, particularly when adding capacity to an existing pool.
    - ECC RAM strongly recommended if using scrub.
    - Strongly recommended for data partitions, YMMV for native O/S partitions. (EG: /)

    --
    I have no problem with your religion until you decide it's reason to deprive others of the truth.
  14. Re:Unfamiliar by guruevi · · Score: 5, Informative

    The CPU and RAM overhead is relatively minimal. You can get away with very few resources, even after enabling compression.

    I have a ZFS server ~5 years old right now, serving over 100 NFS and a handful of Samba/Netatalk connections simultaneously (home directories mounted on NFS, SMB and AFP for other mounts). There is a fairly steady 1000-2000 IOPS with spikes up to 100k IOPS, the machine has an uptime over 300 days, the CPU load (8 2.4GHz Xeon CPU's) hovers around 5-10% (100TB of data in 8 RAIDZ2 stripes of 8 disks (2 and 4TB), 800GB in SSD read cache, 120GB in mirrored SSD write cache, directly attached with SAS).

    It will off course eat as much RAM as you will give it but for the amount you spend on a halfway decent SAS RAID controller, you can easily buy 100GB of RAM and a set of SSD's. You don't WANT a RAID controller. Regular SAS controllers with ZFS are so much faster; RAID controllers are limited by their on-board chips which are typically sub-GHz RISC (ARM, Intel, MIPS) processors - an external SAS RAID controller will cost you about $2-5000 extra and have a throughput of a few 100MBps and a few 100's of IOPS. In contrast, my setup (36 disks, 4 6G SAS channels) can give a whopping 20Gbps and 1M IOPS.

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
  15. Re:Be sure to use ECC RAM on home set-ups by Anonymous Coward · · Score: 3, Interesting

    No... their numbers are about right.

    And the numbers go back to times before Google existed.

    Even on the old Cray Y systems, there was roughly one single bit error every day, corrected by ECC. Every week or so there would be roughly 1 double bit error, recovered by data reload...

    The only times the memory got disabled was when double bit errors were NOT recovered OR the error rate exceeded 10 (from my memory, number could be higher) in a day. The hardware itself would remap memory so that the system would keep running until the CE could run diagnostics on it and either replace it or restore it to use as an identified transient error.

  16. Re: Unfamiliar by csirac · · Score: 2

    For the same reasons your package manager bothers with shasums on the software you install even though the several network layers reaponsible for delivering it already faithfully checksummed each little packet as it flew past: the filesystem is the earliest and only point which knows exactly what files are supposed to actually look like in their entirety. That ZFS/BTRFS scrubs turn up errors on large pools with otherwise perfectly fine hardware means those block/packet level validations are at too low a level to make assurances for the higher level data structures using them.

  17. above, below, and at the same level. ZFS is everyt by raymorris · · Score: 4, Interesting

    > ZFS is a layer below LVM.

    Typically you'd layer raid, then LVM, then the filesystem. ZFS tries to be all three. It's raid, and it's a volume manager, and it's a filesystem. There are some benefits to integration, and some drawbacks. With the raid>lvm>filesystem approach, it's trivial to add dm-cache, bcache, iscsi, or any other piece of storage technology. With ZFS, anything you want to add has to be specifically supported within ZFS.

    The Unix tradition is small, single purpose tools that do one thing well. Witness sort, grep, wc, etc. Want to count the log entries that mention Slashdot? You don't need a special tool for that, just grep slashdot | wc -l . Tools like mdadm and lvm are building blocks that can be combined to suit your need, the Unix way. ZFS is a big monolithic package that does everything, much like Microsoft Word or Outlook. ZFS is more in the Microsoft tradition.

  18. Re:Unfamiliar by Anonymous Coward · · Score: 5, Informative

    The point of ZFS is that hardware raid sucks.

    With hardware raid you're trusting a small, underpowered embedded computer to manage data at a block level.

    1. That computer is purposefully kept in the dark about the data being stored as it's designed to be agnostic. Thus it has no way to gracefully recover from errors. It's either your whole volume is consistent, or an unknown state of corruption. This is bad.

    2. RAID schemes are mathematically unable to deal with large modern hard drives. The unavoidable error rates for 4GB+ drives (and their interconnects) mean that you are guaranteed to have corruption within the useful lifetime of the drive. This means even if everything works perfecly with 0 hardware failures, your raid array will have to rebuild sometime in it's lifetime. This is bad. It's why you're stupid to go with RAID5 with large hard drives.

    3. RAID controllers are pretty much all unique and their volumes are non portable. They are also not documented well. Your drives are useless without the controller, and even recovering with a new controller of the same type is a crapshoot.

    ZFS throws the above model away because:

    1. Your computer is fast, has lots of processors, and lots of cheap ram. Why ignore all that and use a small, embedded computer that's slower and costs extra?

    2. Being part of the filesystem, it's aware of everything on both the block and the file level. It's aware of every file, the blocks it uses, the checksum of the file, and the checksum of every block. You can give yourself as many or as few redundant blocks as you want for some or all of your files.

    3. Your volume can be imported on to any other computer that supports ZFS. It's a standard and is portable.

    4. Because of all of the above you and implement a whole list of amazing features you can't even begin to dream of in RAID. Look up what you can do with copy-on-write filesystems and you'll wonder how you ever lived without them. (Basically free versioning/snapshotting that almost parodoxically improves performance at the same time)

  19. Re:Unfamiliar by MightyYar · · Score: 2, Insightful

    I would add to you "cons" list that it requires* ECC RAM, though you should probably be using that anyway.

    * It's not technically a requirement, but you'll probably be sorry if you don't use it.

    --
    W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
  20. Re:above, below, and at the same level. ZFS is eve by Vesvvi · · Score: 3, Interesting

    I think you're giving the wrong idea here. I have yet to find a format of storage capacity that zfs won't support, with one exception: you can't create a zvol on a zpool, then attach that zvol as back-end storage for the same zpool. That is specifically disallowed, and I'm guessing that you can't use a zvol from one zpool to back-end another zpool either. This is a very bizarre (also, probably dumb) thing to do, but even this can be overridden if you're really desperate. For more practical applications, everything else just works: at least in FreeBSD, you can "hide" the block devices behind all different kinds of abstractions to provide 4k writes, encryption, whatever, and zfs will consume those virtual block devices just fine.

  21. Re:Little Baby Linux by TangoMargarine · · Score: 2

    FreeBSD has had ZFS for what, over five years now? They are the reason it exists in any actual use (OpenSolaris/Illumos don't count) on any non-Sun/Oracle platform.

    God forbid it take the Linux guys longer to get it up and running when Sun purposely licensed it to be difficult to do so on Linux.

    And Linux's wannabee ZFS competitor BTRFS (oooh, look at us) sucks so bad it can't get off the ground.

    So, this being Linux, some guys* also designed Btrfs to do the same things in the meantime. How dare they!? Sun released ZFS after 4 years of work; Btrfs, 2. Presumably they were working under more of an "agile" setup? Which doesn't really make sense for an FS but hey.

    So what does Linux do.... import (steal) ZFS from OpenZFS/FreeBSD

    It's called porting, and I don't see how you can call it "stealing" in any honest way.

    and start posting about how great all their work with ZFS is, and how Linux bloggers now say 'oh yeah, ZFS is actually solid, so we can use it'. As if they are the only/first ones to certify ZFS.

    If you actually skim the article he is saying ZFS On Linux is ready, not ZFS itself.

    Thing is, ZFS was always solid. When bashing ZFS Linux was really just babbling about ZFS's more open and free BSD License and their own failure of BTRFS.

    Was there bashing of it? Being on Slashdot only since 2007/8 I thought it was more Linux people being irked that they couldn't play with it due to the licensing rather than saying it was crap.

    Also, I really hope you're aware that the CDDL and the BSD License(s) are not the same thing. ZFS is CDDL.

    If you want an integrated system that just works, try FreeBSD.

    You're using "just works" and ZFS in the same argument? With a straight face? The intersection of "Just Works" and people who use ZFS has to be pretty small. If you want Just Works just slap an ext3 or ext4 partition on your desktop and be done with it.

    * Interestingly, Wikipedia says Btrfs is (was?) actually an Oracle project. Oracle, of course, bought Sun, which made ZFS. So maybe "competitor" isn't entirely accurate?

    --
    Unity? Screw that: XFCE. Slashdot Beta? Screw that: SoylentNews. Australis? Screw that: Pale Moon. UX developers DIAF
  22. Re:above, below, and at the same level. ZFS is eve by brambus · · Score: 2

    iSCSI doesn't need to be baked into ZFS, in fact, even on Illumos it isn't. It's in a completely different subsystem and will happily work with any block device as its backend storage (be it a physical drive, a ZFS zvol, a loopback block device or anything else, really).

  23. Re:above, below, and at the same level. ZFS is eve by devman · · Score: 5, Informative

    Anything that can be represented as a block device can added to a zpool. This also includes files which is handy when your trying to understand complicated interactions you can mock up a small zpool based on files instead of devices for testing.

    On the otherside of the abstraction ZFS can also expose block devices called zvols that will be backed by the zpool. So if you wanted to run a dmcrypted EXT4 filesystem backed by a zpool you can certain do that using a zvol and still get all the benefits of ZFS integrity protection and snapshoting.

    Plenty of layering can be done with ZFS.

  24. Re: Unfamiliar by devman · · Score: 2

    1GB RAM per TB Storage is only needed if you require dedupe. Dedupe is honestly more trouble than its worth anyway and it isn't enabled by default. Without dedupe RAM requirements are closer to a standard fileserver.

  25. Re:Unfamiliar by Guspaz · · Score: 5, Interesting

    Adding additional drives to a raidz vdev is not supported, no. Apparently it's a use case that is extremely rare in enterprise, which is where zfs was intended for. Adding additional capacity is easy if you have no redundancy (12x2TB drives in a pool? Just add 2x2TB more drives to the pool and boom, more space), but not as easy if you want redundancy.

    So you can't expand an existing vdev, but you can add a new vdev to the zpool. For example, say your current configuration is 12x2TB in raidz2 (the zfs equivalent of raid6). That's giving you 20TB of capacity, after redundancy. You need to add 4TB of additional usable capacity...

    There are a few options. ZFS doesn't enforce redundancy, so there's nothing stopping you from adding two bare 2TB drives to the zpool. You'd get your extra 4TB, but data on those drives would be unprotected. Instead, you'd probably have to take 4x2TB, put them in a new raidz2 vdev, and then add that to your zpool. Then you'd have 12x2TB & 4x2TB, giving you that 12TB of usable capacity, and every disk in the array has dual redundancy.

    My home file server currently has 7x4TB & 8x2TB. They're both raidz2 arrays, in the same zpool, for 32TB of usable capacity on 44TB of raw storage. I started out with 5x2TB in raidz1 and migrated the data between various configurations. The iterations looked like this:

    Configuration 1: 5x2TB (raidz1)

    Configuration 2: 5x2TB (raidz1) + 5x2TB (raidz1)

    Configuration 3: 7x4TB (raidz2) + 8x2TB (raidz2)

    The migration process was:

    1 to 2: Add the new 5x2TB (raidz1) vdev to the existing storage pool

    2 to 3: Add the new 7x4TB (raidz2) vdev to a new storage pool, zfs send the file system from the old pool to the new pool, wipe the old 2TB drives, add back 8 of them in a new raidz2 vdev, add that new vdev to the existing new pool

    The server only has 15 hotswap bays (the 2-to-3 migration required opening the case to get some of the drives hooked up directly), so my next migration will involve replacing the 2TB drives with something larger (probably 8TB by the time I need to expand). To do that, the process in zfs is that you replace a drive, re-silver the array, replace a drive, resilver the array, etc. When you have replaced the last drive, zfs automatically will expand the vdev to use the new capacity. Resilvering a completely empty drive is not fast, so I expect the process will probably take me about a week, since I'd probably start a new resilver each night before bed. But since I run raidz2, at no point would I be without redundancy, so it should be safe.

  26. Re:rsync causes lockups? by wagnerrp · · Score: 3, Interesting

    If you intend to send the snapshots over the network, as is often the case with rsync, you need to pair it with some independent communication tool, and since the output of "zfs send" tends to be very bursty, you need a sizable memory buffer.

  27. Re:rsync causes lockups? by TechyImmigrant · · Score: 2

    Does the sky fall in if your buffer isn't 'sizable'? Or does it just run a bit slower?

    --
    I should use this sig to advertise my book ISBN-13 : 978-1501515132.
  28. Re:Unfamiliar by wagnerrp · · Score: 3, Informative

    So you can't expand an existing vdev

    While you cannot add new drives to a vdev, you can expand a vdev by incrementally replacing all of its drives with larger versions. Replace a drive, resilver, replace a drive, resilver... and when you're all done, just export the pool, import it back, and you have the full capacity of the new drives available.

  29. Re:rsync causes lockups? by bragr · · Score: 3, Informative

    Back when I did OpenSolaris work, we used a tool called mbuffer which is basically netcat with a buffer on each end. It wouldn't been suitable for internet backups (no encryption) but it works pretty well for cross campus backups and the like.

    IIRC it works like this on the sending side: 'zfs send pool/fs@snap | mbuffer -s 128k -m 4G -O 10.0.0.1:9090'

    And on the receive side: 'mbuffer -s 128k -m 4G -I 9090 | zfs receive pool/fs'

    It can still be pretty bursty but it smoothes out a lot of it.

  30. Re: Unfamiliar by ericloewe · · Score: 3, Informative

    Dedup easily needs 5GB of RAM per TB.

    For general usage (no dedup), 1GB per TB is a good rule of thumb.

  31. Re: Unfamiliar by stoploss · · Score: 3, Funny

    Dedup easily needs 5GB of RAM per TB.

    For general usage (no dedup), 1GB per TB is a good rule of thumb.

    This. Don't starve the ARC. You wouldn't like it when it's angry.

  32. Re: Unfamiliar by smash · · Score: 2

    1 GB of RAM is worth about $20 these days anyhow (less?).

    And yes, de-dup is expensive. Most of the time in my experience you get far better benefits from compression anyhow (source: real world enterprise datasets at work).

    --
    I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
  33. I used it for about a year by Solandri · · Score: 2

    And was very impressed. It was a new 4-drive system I'd put together to operate as both a NAS/fileserver and a host for virtual machines. I had originally intended to use RAID 5, but decided to give ZFS a try after reading about it. My initial config had it booting Ubuntu (maybe Mint? I don't recall), with ZFS for Linux installed as the main non-boot filesystem with one-drive redundancy. I had all sorts of problems with drives dropping out of the array, which I eventually tracked down to the motherboard shipping with bad SATA cables. ZFS handled this admirably. At first I didn't notice one of the drives had dropped, and continued using the system for about a day. When I got the drive working again, as I understand it RAID 5 would have had to do a complete array rebuild because of the changed files. ZFS noticed most of my old data was on the "new" drive and simply validated the checksums as still accurate, then noticed I had written new files and automatically created new redundancy files for them on the "new" drive. The entire "rebuild" only took a little over an hour instead of the 20+ hours I was expecting (how long it takes me to backup the data over eSATA).

    If you're wondering why ZFS trusts the checksums on the "new" drive instead of reading the entire file, it will read the entire file and compare it to the checksum every time you access it. Once a month by default, it runs a "scrub" where it reads every file and verifies they haven't suffered bit rot and still match the checksums. Apparently the strategy after a dropped drive is to get the redundant filesystem up and running again ASAP, then do the file integrity scrub afterwards at its leisure. (You can manually force this check at any time with a zfs scrub.)

    The other main advantage I'd say is that it's incredibly flexible when you're putting together redundant arrays. RAID 5 normally requires 3+ drives or partitions of the same size. ZFS lets you mix together drives, partitions, files (yes, one of your ZFS "drives" can be a file on another filesystem), other devices like SAS drives, etc. You can even put the 3+ "drives" needed for redundancy onto a single drive if you just want to play around with it for testing.

    The only problem I ran into was with deduplication. Dedup was part of the reason I decided to try ZFS, and is one of the features frequently mentioned by ZFS advocates. While dedup does work, it is an incredible memory and performance hog. Writes to the ZFS array went from 65+ MB/s (bunch of mixed random files) down to about 8 MB/s with dedup turned on, and memory use climbed to where I ordered more RAM to bump the system up to 16 GB. In the end I decided the approx 2% disk space I was saving with dedup wasn't worth it and disabled it.

    I eventually switch to FreeNAS (based on FreeBSD, which has a native port of ZFS) because it was annoying having to reinstall ZFS for Linux after an Ubuntu/Mint update, and I couldn't see myself doing that after every new release because I wanted features which were added to the core OS. (And if you're wondering, dedup performance is just as bad under FreeNAS.)

  34. Re:rsync causes lockups? by kingramon0 · · Score: 3, Funny

    The sky won't fall but the walls might.

    -Shaka

  35. Re:rsync causes lockups? by Guspaz · · Score: 2

    They're working on fixing that, but in the mean time you can pipe it through mbuffer or something similar to resolve the issue.

  36. Re: Unfamiliar by Guspaz · · Score: 2

    ZFS only supports on-the-fly dedupe. For batch dedupe, you're probably thinking of HAMMER in DragonFly BSD.

    BSD consumes insane amounts of RAM and has a massive performance penalty. It's almost never worth it, because the cost of extra RAM will be more than if you had just bought more disks in the first place.

    Compression, on the other hand, requires very little RAM or CPU resources, gives a tangible performance improvement, and saves space. Once ZFS implemented LZ4 (which is extremely fast) it begun making sense to simply always enable compression globally on every filesystem. They should probably make it enabled by default.

  37. Re:Example? by Sloppy · · Score: 2

    (I still do things the classic way: filesystem on lvm on luks on mdadm. not using ZFS yet.) I'm not sure it's exactly about what's required.

    Consider wear leveling on SSDs. Only the filesystem really understands which blocks need to preserve data and which ones are don't-care. So to do SSDs right, it needs to pass info about unallocated storage down to the volume manager, whch then passes it to the encryption, which then passes it to the RAID, which then gives it to old-school "real" block device (which then passes it to the wear-leveling firmware, I guess). Sure, that can work. But when the filesystem can talk to the physical block device, it's easier. If you're writing block devices that implement things like volumes and encryption and RAID, from your PoV, things that are allocated vs not-allocated are totally different than how the filesystem sees it. To you, a block is just a block and a whole bunch of ioctls are totally irrelevant and not related to what you're working on. You're going to find this type of information to be pesky and you might not handle it right (or more likely, it takes a long time before you handle it at all). And in fact that has happened a few times, where certain block devices' feature set lagged a bit, behind what people with SSDs needed.

    I suppose another easily-contrived example would be if you have a few gigabytes of data on a few terabytes of RAID, and need to [re]build the RAID. If your RAID doesn't know which blocks actually have data, then it'll need to copy/xor a few terabytes. If it's a unified system, then it can be complete after copying/xoring a few gigabytes.

    --
    As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
  38. Re:rsync causes lockups? by Syberghost · · Score: 3, Informative

    You can kludge on encryption in the pipeline:

    http://sourceforge.net/project...