Slashdot Mirror


Oracle Engineer Talks of ZFS File System Possibly Still Being Upstreamed On Linux (phoronix.com)

New submitter fstack writes: Senior software architect Mark Maybee who has been working at Oracle/Sun since '98 says maybe we "could" still see ZFS be a first-class upstream Linux file-system. He spoke at the annual OpenZFS Developer Summit about how Oracle's focus has shifted to the cloud and how they have reduced investment in Solaris. He admits that Linux rules the cloud. Among the Oracle engineer's hopes is that ZFS needs to become a "first class citizen in Linux," and to do so Oracle should port their ZFS code to Oracle Linux and then upstream the file-system to the Linux kernel, which would involve relicensing the ZFS code.

131 comments

  1. Having it NOT be in upstream is more flexible by ZorinLynx · · Score: 5, Insightful

    One nice thing about ZFS not being in upstream is that it is currently maintained and updated separate from the Linux kernel.

    Now, it would be nice to relicense ZFS under GPL so that it can be included in the kernel. But this should wait until the port is a bit more mature. Right now development is very active on ZFS and we have new versions coming out every few weeks; having to coordinate this with kernel releases will complicate things.

    All this said, relicensing ZFS would definitely help Oracle redeem themselves a bit. After mercilessly slaughtering Sun after acquiring them, they have a long way to go to get from the "evil" side back to the forces of good.

    1. Re:Having it NOT be in upstream is more flexible by davecb · · Score: 1

      It might also cut their maintenance costs, something Oracle often likes.

      --
      davecb@spamcop.net
    2. Re:Having it NOT be in upstream is more flexible by Neo-Rio-101 · · Score: 2

      Now, it would be nice to relicense ZFS under GPL so that it can be included in the kernel. But this should wait until the port is a bit more mature. Right now development is very active on ZFS and we have new versions coming out every few weeks; having to coordinate this with kernel releases will complicate things.

      Funny, I thought ZFS was very mature by now.
      Getting it open and into Linux would result in perhaps some cross-pollination between OpenZFS and Oracle's official ZFS.

      --
      READY.
      PRINT ""+-0
    3. Re:Having it NOT be in upstream is more flexible by GerryGilmore · · Score: 1

      I'm gonna disagree with you a bit here. Each portion of the kernel has its own maintainer and I don't see how ZFS being upstreamed would change that at all. Likewise, does not the maintainer of, say, the TTY subsystem (just a random pick...) make active changes *between* release cycles, submitting their LAG to the various RCs? Not saying that you are 100% wrong, but...help me out here.

    4. Re:Having it NOT be in upstream is more flexible by Anonymous Coward · · Score: 0

      Nice post, "+1 Informative". But "-1 Wrong" about Oracle ever being non-evil. Ever.

    5. Re:Having it NOT be in upstream is more flexible by Anonymous Coward · · Score: 3, Insightful

      Oracle is evil ... period. There is no going back.

    6. Re:Having it NOT be in upstream is more flexible by JBMcB · · Score: 5, Interesting

      Funny, I thought ZFS was very mature by now.

      It's very mature, on Solaris. Linux has a different ABI to the storage layer, and different requirements on how filesystems are supposed to behave. So it's not so much a port as a re-implementation.

      --
      My Other Computer Is A Data General Nova III.
    7. Re:Having it NOT be in upstream is more flexible by Kjella · · Score: 1

      Likewise, does not the maintainer of, say, the TTY subsystem (just a random pick...) make active changes *between* release cycles, submitting their LAG to the various RCs?

      Not to RCs. As I understand it the kernel is on a three month cycle, one month merge window and roughly two months of weekly RCs that are only supposed to be bug fixes. Otherwise you might get an undiplomatic response from Mr. Torvalds. Worse yet, many distros ship kernels much older than that and despite having "proper channels" bugs often go directly upstream with a resolution of "we fixed that two years ago, update... sigh, waste of time". So if you're not really ready for production use, being in the kernel might just be a bother. Plus if you make a kernel API then Linus will make you support it forever. From what I understand you can mark your code "experimental" though, which basically means all bets are off.

      --
      Live today, because you never know what tomorrow brings
    8. Re:Having it NOT be in upstream is more flexible by Anonymous Coward · · Score: 2, Insightful

      I don't believe this is Oracle's better nature or whatever; ZFS has to transition from Solaris to Linux because Solaris is dead.

      It's really that simple. If Oracle can gin up a little excitement and maybe score some kudos then great, why not? But ultimately this has to happen or the official Oracle developed ZFS will die with its only official platform.

    9. Re: Having it NOT be in upstream is more flexible by dilvish_the_damned · · Score: 1

      Oracle is evil ... period. There is no going back

      More like âoecompletely ambivalentâ, not really the same as âoeevilâ.
      At least thatâ(TM)s what I was going to say until I remembered the click-through mess they put in front of downloading the jre and jdk. Pure malice.

      --
      I think you underestimate just how much I just dont care.
    10. Re:Having it NOT be in upstream is more flexible by Anonymous Coward · · Score: 0

      The Oracle dev seems to be talking about Oracle's ZFS implementation which is not what most of the rest of the world uses. FreeBSD, Illumos, FreeNAS, Linux, etc are using OpenZFS. Having Oracle relicense their version to GPL would guarantee it would never get used on any platform other than Linux (no one else would allow GPLed code in their kernel) and the Linux version would not be completely compatible with other operating systems (including existing Linux OpenZFS volumes).

      Which means Oracle would not only need to relicense the code AND port it to Linux AND get it upstreamed, but they would also need to work with OpenZFS to make the two implementations compatible again after Oracle forced a forked. Considering how hostile Oracle is, I don't see all of that happening.

    11. Re:Having it NOT be in upstream is more flexible by Anonymous Coward · · Score: 0

      Maintainer, Andrew Morton weighed in on this years ago, saying "ZFS is a rampant layering violation" ( https://lkml.org/lkml/2006/6/9... ) . Even with a compatible license, moving ZFS in-tree is very unlikely since folks like Andrew are the ones that would have to be convinced.

      A GPL compatible license would allow distributions to include it in their kernels however, and skip the dkms hassles.

    12. Re:Having it NOT be in upstream is more flexible by barbariccow · · Score: 1

      Generally, the filesystem follows a storage spec. Just because it's an alternate implementation doesn't mean the underlying data is incompatible, in fact quite the opposite. That's why both are called "ZFS" - They follow the same data model. If they didn't, it would have a different name.

    13. Re:Having it NOT be in upstream is more flexible by Anonymous Coward · · Score: 0

      Obligatory response to "rampant layering violation".

      To be fair, that statement was made a decade ago; more than enough time for one to appreciate being wrong, which does happen to the best of us. However the Linux maintainers feel today, Jeff Bonwick's reply is worth repeating.

    14. Re:Having it NOT be in upstream is more flexible by Anonymous Coward · · Score: 0

      This is where someone's supposed to pick up the ball and tell us all what some of the differences are!

      Please? Grey beards? Anyone? :)

    15. Re:Having it NOT be in upstream is more flexible by DrXym · · Score: 1
      I don't see how the choice of being an upstream is lost by going GPL. If the ZFS group said "we're GPL now but we're not ready to land yet, give us some time" then the kernel won't land it.

      But this is Oracle we're talking about. I doubt they would GPL something because in their minds they'd lose control of it and allow the competition to exploit their code. After all, that's what Oracle has done itself to competitors like Red Hat. Aside from that, assuming they did GPL it, then it would immediately fork because Oracle suck at stewarding open source projects.

    16. Re:Having it NOT be in upstream is more flexible by ls671 · · Score: 1

      Why does Oracle often like maintenance costs? This seem awkward to me.

      --
      Everything I write is lies, read between the lines.
    17. Re:Having it NOT be in upstream is more flexible by pr0nbot · · Score: 1

      ZFS is mature, but has some curious omissions.

      For example, as far as I know it can't use a disk span within an RAID set. I.e. You can't mirror a 4TB drive, say, with 2x2TB drives spanned to present as a 4TB device. (Which is the kind of thing that would make a small home NAS be able to really easily re-use small disks.) If I'm not wrong then I can only assume that's too niche a case to be interesting in the enterprise environment.

    18. Re:Having it NOT be in upstream is more flexible by JBMcB · · Score: 1

      You can't mirror a 4TB drive, say, with 2x2TB drives spanned to present as a 4TB device.

      It doesn't do it natively, but you can hardware RAID the 2x2TB drives and it will treat it like a single 4TB device. It's not best practice, because ZFS uses the SMART counters to warn you of impending drive failure, and hardware RAID masks those, but you can do it.

      --
      My Other Computer Is A Data General Nova III.
    19. Re:Having it NOT be in upstream is more flexible by Aaden42 · · Score: 1

      Do any existing RAID systems allow you to do that in one step?

      You could do in in Linux w/ ZoL by using md to stripe or concatenate the two smaller devices & feed the md block device to ZFS. It should work, but I could see ZFS making some ungood decisions based on hiding the underlying hardware from it. Dunno about performance either.

    20. Re:Having it NOT be in upstream is more flexible by Aaden42 · · Score: 1

      If Oracle licensed ZFS as GPL in addition to the current CDDL, (nearly) everyone could use it. CDDL is incompatible with GPL (intentionally so), but CDDL is NOT incompatible with BSD or most other non-GPL open licenses.

      The BSD's used Sun's own ZFS code for years before OpenZFS was founded. I ran my NAS on FreeBSD with it for about three years until ZoL stabilized enough that I jumped back to Gentoo. CDDL isn't copyleft, so the BSD's can use it without any problem. If they stripped the CDDL option and released as GPL-only, that would screw the BSD's.

      Alternatively, they could release it under a BSD style license, and everybody could use it under one license, but I'm probably dreaming...

    21. Re:Having it NOT be in upstream is more flexible by Aaden42 · · Score: 2

      OpenZFS and Oracle ZFS have diverged a bit. The on-disk pool contains a version number which identifies with certainty whether you can import it on a given implementation, so there's at least no chance of mistaken mis-importing & data loss from that. They're interoperable for pools that aren't upgraded past the highest pool version supported in the final CDDL release of Oracle ZFS. Beyond that, they won't work.

      Oracle ZFS has since added file-level encryption. The encryption and the on-disk structure aren't readable by OpenZFS. OpenZFS has incremented the pool version number by a large jump (5000) past the last Oracle ZFS version and has fixed & enhanced some things in such a way that the on-disk isn't compatible with Oracle ZFS. For info about OpenZFS version & feature flags, see http://open-zfs.org/wiki/Featu...

      I don't think it would take a tremendous amount of effort to merge the functionality one way or the other if the licensing issues were solved, but they're definitely not on-disk compatible if you're running the latest pool version supported by either release.

    22. Re:Having it NOT be in upstream is more flexible by the_B0fh · · Score: 1

      Says who? I've done that. A bunch of examples if you google for it.

    23. Re:Having it NOT be in upstream is more flexible by the_B0fh · · Score: 1

      Nice post, "+1 Informative". But "-1 Wrong" about Oracle ever being non-evil. Ever.

      https://www.youtube.com/watch?...

      You need to think of Larry Ellison the way you think of a lawnmower. You don't anthropomorphize your lawnmower, the lawnmower just mows the lawn, you stick your hand in there and it'll chop it off, the end. You don't think 'oh, the lawnmower hates me' -- lawnmower doesn't give a shit about you, lawnmower can't hate you. Don't anthropomorphize the lawnmower. Don't fall into that trap about Oracle." -- Bryan Cantrill https://www.youtube.com/watch?...

    24. Re:Having it NOT be in upstream is more flexible by devman · · Score: 1

      You can add any block device you want to a vdev. The recommendation is that it be a physical drive, but there is no software limitation.

    25. Re:Having it NOT be in upstream is more flexible by ilsaloving · · Score: 1

      they have a long way to go to get from the "evil" side back to the forces of good.

      What do you mean "back"? I can't ever remember a time when Oracle wasn't obnoxious.

    26. Re:Having it NOT be in upstream is more flexible by pr0nbot · · Score: 1

      I googled extensively at the time I was setting up my home NAS (~4 years ago). If it's possible without doing the spanning using something outside ZFS (e.g. hw RAID as others have suggested) I'd be really interested, as from time to time I grow the storage and have to partition the disks in interesting ways.

    27. Re:Having it NOT be in upstream is more flexible by rl117 · · Score: 1

      That would remove some of the data integrity guarantees and healing of corrupt data which ZFS provides. There are good reasons for the layered design of ZFS, and giving it full control of the underlying storage is required if you want the best performance and robustness out of it. Create a mirrored vdev like ZFS wants and you'll have a much happier experience if one of them fails. Like fast resilvering instead of a full array sync.

    28. Re:Having it NOT be in upstream is more flexible by rl117 · · Score: 1

      You can absolutely do this. You create two mirrored vdevs, so that reads and writes are striped across the pair of mirrors. I have this exact setup in my home NAS; see this example for how it's set up. Note the sizes aren't strictly true for the used space since it takes compression into account; the discs are paired 1.8 and 2.7T discs; I've upgraded one pair to larger capacity drives, and I'll likely do the same for the other pair next time I upgrade it (or add a third mirror).

    29. Re:Having it NOT be in upstream is more flexible by davecb · · Score: 1

      It's cutting maintenance costs they like. A lot: they just laid off most of the Solarii.

      --
      davecb@spamcop.net
    30. Re:Having it NOT be in upstream is more flexible by bill_mcgonigle · · Score: 1

      No. I've been running ZoL for almost a decade. It's constantly being bitten by kernel API changes and the kernel devs will break ZFS without a second thought and it happens all the time.

      It's been a little while since we've been three months without a working ZFS-head build on Fedora (or other newish kernels) but there's still nothing stopping it from happening.

      Dual-licensing to something GPL-compatible would allow parts of the SPL/ZFS stack to be brought in-kernel, even if most of it stayed outside, at least maintaining compatibility when a new kernel release is cut.

      It's so bad that I moved all of my main storage to FreeBSD, even though I prefer to run linux for most everything else. My laptops stay with ZoL but for application compatibility, not because it's easy to maintain.

      Anybody who thinks they will start a new fork of ZFS to be quickly in-kernel is seriously delusional and has a month's worth of reading github issues to look forward to, but easing compatibility and distribution would be a heck of a good start.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    31. Re: Having it NOT be in upstream is more flexible by Brockmire · · Score: 1

      How does raid hide those? When I log into the various raid configs, they have smart reporting. Do you just mean it doesn't take action on the smart values?

    32. Re:Having it NOT be in upstream is more flexible by rl117 · · Score: 1

      This has always been possible. You grow the storage by adding more vdevs, or by upgrading the capacity of an existing vdev. So you can start with a vdev of a single mirror, and then you can add another mirror vdev, and another... and the zpool will be striped across all the vdevs. This increases the number of iops the pool can sustain linearly with the number of vdevs. For each mirror vdev, you can swap out a disc with a larger capacity drive, resilver it and then repeat for the other drive, which will increase the capacity of that vdev; you can also do this for raid vdevs, all while online with no service interruption. While you can't remove vdevs or shrink them, both of these options provide easy means to increase the capacity of a pool.

    33. Re:Having it NOT be in upstream is more flexible by the_B0fh · · Score: 1

      You realize you can use a file as a vdev, right? This means if you want to use two 1TB drives as "one" device, you just create a pool using 2x 1TB drives, and then in your new set up you just refer to the file that was created.

      zpool create test /tank/test/zpool
      or
      zpool create test /dev/blah/fake-2TB-device

    34. Re:Having it NOT be in upstream is more flexible by thegarbz · · Score: 1

      You can't mirror a 4TB drive, say, with 2x2TB drives spanned to present as a 4TB device.

      Err yes you can. You just really shouldn't given you're doubling the failure rate of one of the vdevs, and also hosing the flexibility of ZFS which would at best benefit from using all three drives in a single RAIDZ and upgrading them on failure at which point your pool automatically grows to the lowest sized drive.

      You can do whatever you want. ZFS won't stop you from turning your hardware into an maintainable mess. The fact that you CAN do this in ZFS I see it as a huge downside. The only end result here will be you posting in a forum a few years down the line for help trying to get your pool recovered after a hardware failure and a bunch of people asking you why the heck you set it up like that.

  2. Good: it's about time. by davecb · · Score: 1

    Some folks don't like the particular set of tradeoffs, but for a filesyste (as opposed to an object store, one of which I'm testing right now), it's a very good offering. I definitely want it on my Fedora dev laptop, along with a write cache on flash.

    --
    davecb@spamcop.net
  3. Mainstream in FreeBSD... by Anonymous+Cashews · · Score: 0, Redundant

    I've been using ZFS in my FreeNAS (FreeBSD) file server for over two years now.

    1. Re:Mainstream in FreeBSD... by Anonymous Coward · · Score: 0

      The problem with ZFS on Linux is that some aspects of it are redundant with the kernel. The result is generally poor performance on Linux. That's why you're probably better off running BTRFS.

    2. Re:Mainstream in FreeBSD... by Anonymous Coward · · Score: 1, Interesting

      Holy shit are you serious? Like SERIOUS? OMG why don't we all switch to BSD! Everyone stop! I know Linux is *everywhere* but BSD has ZFS! Did you guyz know this????

    3. Re:Mainstream in FreeBSD... by HuguesT · · Score: 1

      The version in BSD is a older version derived from when Solaris was open-source, in 2007. It is independently maintained and a part of OpenZFS. In fact the ZFS stacks in IllumOS (a fork of open-source Solaris), FreeBSD, Linux and OS/X share a lot of code and are compatible, in the sense that if you create a ZFS filesystem on one of these OSes, it will work on the others.

      OpenZFS has made enormous progress. I have been using it on my FreeBSD, Linux and OS X (macOS) boxes for over 3 years now.

    4. Re:Mainstream in FreeBSD... by HuguesT · · Score: 1

      As you may know, RedHat has deprecated BTRFS in RHEL7.4 whereas many distributions like Ubuntu fully support ZFS.

      I woud say that the status of BTRFS is worse than that of OpenZFS on Linux. See also here for an interesting article.

    5. Re:Mainstream in FreeBSD... by DeHackEd · · Score: 1

      Yes, there is a lot of duplicated code in ZFS for Linux, such as an SHA256 implementation, RAID parity, compression, and lately a whole crypto library.

      The reason is either the kernel doesn't reliably support this natively or the implementation isn't usable. Linux doesn't allow non-GPL modules to access a lot of features (eg: the crypto library) or some features are version-specific (eg: LZ4 (de)compressor). The simplest solution is to import the Solaris versions.

      But they've improved. SSE and AVX instructions are available for many of the above. And if ZFS does get re-licensed to GPL, then sure maybe we can make use of some of that stuff natively. Until then, ZFS on Linux has to deal with the reality of a non-GPL non-Linux driver on a GPL Linux kernel.

    6. Re:Mainstream in FreeBSD... by Jerry · · Score: 1

      I would say you are wrong.

      That RH has not retained qualified Btrfs programmers is their business decision and has little to nothing to do with Btrfs or its usability.
      https://www.itwire.com/open-sa...

      KDE Neon User Edition has zfs-fuse and a version of OpenZFS in its repository. I've played with the fuse version and was unimpressed.

      After I tried zfs-fuse I tried Btrfs. I've been using it without a single fault or problem for 2 1/2 years.

      --

      Running with Linux for over 20 years!

    7. Re:Mainstream in FreeBSD... by lucm · · Score: 1

      Whatever the reason, btrfs is not supported in production on RHEL. It has never been, it's always been in "preview" and will soon be out of the picture completely.

      It's been going on for years so I would agree with the above that OpenZFS would have a brighter future.

      --
      lucm, indeed.
    8. Re:Mainstream in FreeBSD... by barbariccow · · Score: 1

      The issue is they wanted to own all the code so instead of donating fixes or adding hooks as needed they encapsulated the whole thing into something they could own and control.

    9. Re:Mainstream in FreeBSD... by Anonymous Coward · · Score: 0

      I hope you aren't using the raid5/6 feature, since that will lead to almost guaranteed data loss if you experience disk failure.
      The file system is marked experimental in its own wiki.
      No idea why people continue to insist it is stable.

    10. Re:Mainstream in FreeBSD... by TheRaven64 · · Score: 1

      It's probably more accurate to say that the version in Solaris is a fork of an older version. Most of the ZFS developers left Oracle quite early on after they bought Sun and most of the rest left when Oracle decided to stop releasing CDDL versions of Solaris. The version that ended up in OpenZFS has been actively developed by the same people who created ZFS for the last 10 years. The version that Oracle owns has had a few incremental changes. This also means that it would be difficult for Oracle to GPL ZFS in a useful way: the version that's had all of the work done on it for the last decade contains a load of CDDL code that Oracle doesn't own.

      --
      I am TheRaven on Soylent News
    11. Re:Mainstream in FreeBSD... by pnutjam · · Score: 1

      openZFS is not even in the Kernel, btrfs is in the upstream kernel. It's silly to think that OpenZFS will ever get better RHEL support the btrfs.
      I use btrfs on openSuSE and it works great.

    12. Re: Mainstream in FreeBSD... by CAIMLAS · · Score: 1

      Right, btrfs has proven itself as a stable filesystem which is not prone to corrupting itself on single or mirrored drive configurations.

      Except it isn't, and does so regularly.

      The practical implications of zfs over btrfs far outweigh the architectural encapsulation of zfs. This limitation primarily relates to arc, a situation which has plauged freebsd, illumos, and even Solaris since the changeover from sparc to x86. It is drastically better now across the board, particularly on linux, where the native memory mapping has been taught to play nice.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    13. Re:Mainstream in FreeBSD... by rl117 · · Score: 1

      Use the right tool for the right job. If you care about your data and you want to use ZFS, it makes sense to use FreeBSD for its top-notch ZFS implementation. Use it when it's the best tool for the job. When I set up a home NAS I specifically went with FreeBSD because this is a genuinely excellent feature, and ZFS on Linux is not yet up to scratch in comparison. We also use it on FreeBSD VMs at work for CI and testing work where the snapshot support is worth having. But at home and work everything else is still Linux on Ubuntu, Ubuntu LTS and CentOS as appropriate; we use them rather than FreeBSD because they have features and usability which FreeBSD does not. We use ZFS on Linux where appropriate as well.

  4. maybe by Anonymous Coward · · Score: 0

    Mark Maybee [...] says maybe.

    Yeah right.

  5. My butthole itches by Anonymous Coward · · Score: 0

    Feels like a donkey trying to fist fuck a cat.

  6. And once it's in the kernel, Oracle will sue... by sconeu · · Score: 1

    *cough*Java*cough*

    --
    General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
  7. Careful there by JBMcB · · Score: 1

    ZFS wants to live in a fairly specific configuration. It wants a bunch of drives, a bunch of memory, and not much competition for system resources. It's really a NAS filesystem, which is why there are no recovery utilities for it. If your filesystem takes a dump, you're SOL, hope you have a backup.

    You can run it on a single drive on a desktop machine, but you are incurring a bunch of overhead and not getting the benefits of a properly set up ZFS configuration.

    --
    My Other Computer Is A Data General Nova III.
    1. Re:Careful there by dnaumov · · Score: 4, Insightful

      ZFS wants to live in a fairly specific configuration. It wants a bunch of drives, a bunch of memory, and not much competition for system resources.

      Except for the part where it works with 2 drives, on a system with 4GB of RAM and under constant heavy load just fine.

    2. Re:Careful there by HuguesT · · Score: 1

      Precisely, a bunch of drives, or a RAID, starts at two drives.

    3. Re:Careful there by Anonymous Coward · · Score: 0

      Calling "two" a "bunch" is a stretch. And you know it.

    4. Re:Careful there by whoever57 · · Score: 1

      Precisely, a bunch of drives, or a RAID, starts at two drives.

      Being pedantic here, but you are wrong, and there are circumstances where this matters.

      You cam make a RAID1 array with one drive plus a failed (non-existent) drive. Hence the minimum is actually 1 drive, not two.

      --
      The real "Libtards" are the Libertarians!
    5. Re:Careful there by Aighearach · · Score: 1

      Generally in computers it is best to go from "only 1 device" directly to "n devices" and not to waste time special-casing 2 devices, 3 devices, 4 devices.

    6. Re:Careful there by dfghjk · · Score: 1

      RAID, as defined in the original paper, involves data striping and striping cannot be implemented with less than 2 drives.

      If you desire redundancy, RAID requires a minimum of 3 drives. A mirrored drive pair is not RAID, it is just mirroring.

    7. Re:Careful there by bingoUV · · Score: 1

      Depends on what you mean by a drive. I have a horrible hard drive which was declared almost in its grave by SMART long ago. I made 2 partitions, run "software RAID1" across the 2 partitions , and store one final backup on it.

      If it dies, nothing is lost.

      --
      Bingo Dictionary - Pragmatist, n. A myopic idealist.
    8. Re:Careful there by thegarbz · · Score: 1

      Precisely, a bunch of drives, or a RAID, starts at two drives.

      Actually you're more than happy to run it on 1 drive as well. There's nothing "precise" about the GP's assertion that ZFS wants a fairly specific configuration.

    9. Re:Careful there by Anonymous Coward · · Score: 0

      RAID, as defined in the original paper, involves data striping and striping cannot be implemented with less than 2 drives.

      That excludes RAID1. So, you are basically saying that the original paper started at RAID2.

    10. Re:Careful there by Bongo · · Score: 1

      Whilst I run it on bunches of drives, I also use it on single drives when I want to know all data is correct. Backups are great, but silent data corruption, which gets copied to backup, can mess everything up.

    11. Re:Careful there by TheRaven64 · · Score: 1

      I use ZFS on a NAS with a bunch of drives, but I also use it on a hosted VM with under 1GB of RAM on a single (virtual) drive and a few local VMs. The benefits that I'm apparently not getting include:

      • It's trivial to add more storage. If I want to expand a VM, I attach a new virtual disk and simply expand my storage onto it.
      • It's trivial to back up - I can snapshot all of my ZFS filesystems and use zfs send / zfs receive to send incremental snapshots of them to another system (where I can reconstruct all of the snapshots and have the ability to look at earlier configs, or I can simply store the latest one).
      • Block-level checksums mean that I can detect errors early, even if I can't recover from them unless I've set copies=2 or more on a filesystem.
      • I can snapshot a filesystem before I make big config changes, so if I make a mistake I have an easy undo function.
      • I get transparent LZ4 compression, which saves on storage space and bandwidth.
      • I can run jails inside my VMs that use CoW copies of a single system install, so have very low space overhead
      --
      I am TheRaven on Soylent News
    12. Re:Careful there by thegarbz · · Score: 1

      ZFS wants to live in a fairly specific configuration.

      ZFS wants nothing, but many of its advanced features require certain configuration. You want to run it with 12 drives, 32GB of RAM on a simple file server, go for it, it really shines. You want to run it on a single drive on a system with 2GB of RAM, go for it, there's no downsides there vs any other file system.

      It's really a NAS filesystem, which is why there are no recovery utilities for it.

      There's no recovery utilities because they are rarely needed. The single most common configuration involves redundancy. ZFS's own tools include those required to fix zdb errors and recover data on a block level if you ever get to that stage. The system itself is its own recovery system with native checksumming and error correction, snapshotting, and really simple ways to duplicate / backup pools. It's like complaining that there's no GUIs available for windows because windows fundamentally includes one.

      but you are incurring a bunch of overhead

      The only overheads are copy-on-write overheads. You'll start seeing these with every modern filesystem eventually. The result is that small file writes and deletes become expensive. Standard read and write operations are just as fast as any other competing filesystem.

    13. Re:Careful there by cas2000 · · Score: 1

      Two drives is no big deal - anyone who cares at all about their data always stores it on, at minimum, two mirrored drives.

      Storage costs are effectively doubled but since disk drives are the least reliable part of any computer, mirroring removes the biggest/worst single-point-of-failure on any desktop or laptop or server system.

      (backups are still essential to cope with other potential disasters/failures, but mirroring keeps the system running for the common disk-failure case)

      and if you're going to have RAID-1 on all your drives, you may as well use ZFS rather than mdadm and get all the benefits of ZFS as well (subvolumes aka datasets, zvols, snapshots, error detection and correction, compression, zfs send/receive, and more)

      or if you want to stick with mainline kernel code, use btrfs and get many, but not all, of those benefits (and one really useful one that ZFS doesn't offer, rebalancing)

  8. A step beyond cloud storage: pingfs by Anonymous Coward · · Score: 0

    Why bother storing data on servers when you can store it in the network itself?!
    http://code.kryo.se/pingfs/

    1. Re:A step beyond cloud storage: pingfs by barbariccow · · Score: 1

      Considering ICMP traffic is the lowest priority and frequently dropped in congestion situations, they would have been better off implementing it via their own protocol rather than piggy-backing off of that. I don't suppose I need to explain why the theory is bad, and just how DMCA violations would just stack up using this filesystem..

  9. Not the best fit since it's schizophrenic by raymorris · · Score: 2

    > The problem with ZFS on Linux is that some aspects of it are redundant with the kernel.

    Probably ALL aspects of it. Linux already has a raid implementation in-kernel. It already has filesystems. It already has multiple volume managers, which handle whichever type of snapshots you prefer. It already has IO schedulers. ZFS, or rather something that looks just like it, can be implemented as a few configuration lines for pre-existing Linux components.

    Because Linux normally lets you use your choice of file system on top of your choice of volume manager, on top of whichever RAID implementation you choose, with your choice of IO scheduling options, ZFS isn't exactly the best fit. ZFS mashes all those different things into one big blob. That's not really how Linux is designed.

    That's the same issue as systemd - it may (or may not) be a good init system. It may or may not be a good logging system. It may possibly be a good DNS server (probably not). But it can't seem to decide wtf it is.
     

    1. Re:Not the best fit since it's schizophrenic by DeHackEd · · Score: 1
      ZFS manages the whole stack for a reason. Its first priority is data safety. With checksums everywhere it can detect corruption and repair it.

      That last bit is important. If ZFS doesn't have a way to put its hands into the RAID, it can't attempt to rebuild known corrupted data. Until mdadm and hardware RAID controllers allow you to issue a "read, but try to give a different result" operation you can't do this. (Said operation would attempt to use parity even on a healthy array in an attempt to give a different block content by pretending a disk is dead). BTRFS does the same thing as ZFS - handles the RAID internally, and can repair corruption all on its own.

      And yes, "first priority" means exactly that. ZFS has some design decisions that negatively affect performance in the name of data protection.

    2. Re:Not the best fit since it's schizophrenic by Anonymous Coward · · Score: 1

      The ZFS architecture is a very well-factored, layered design with clear abstraction barriers between the different pieces, and it solves a single well-defined problem: managing your storage. Comparing it to systemd is a gross disservice to ZFS.

      Sometimes it really works better to have a monolithic, well thought out abstraction providing some service as opposed to the kernel giving you a handful of ill-fitting subcomponents that you're responsible for gluing together. This is clear from the example of Linux containers, it's just a half-baked mess that will never quite work right. Compared to Solaris zones which had a coherent, secure design from day 1. Or looking to userland programs, you have git: a random collection of tools for debugging the internal data structures which you're expected to build a UI on top of.

    3. Re:Not the best fit since it's schizophrenic by UnknownSoldier · · Score: 5, Insightful

      > Because Linux normally lets you use your choice of file system on top of your choice of volume manager, on top of whichever RAID implementation you choose, with your choice of IO scheduling options, ZFS isn't exactly the best fit. ZFS mashes all those different things into one big blob. That's not really how Linux is designed.

      Criticizing ZFS for "rampant layering violation" has been discussed to death before

      "Dumb" API's, such as the ones implemented in Linux, have a STRICT layered approach like this:

      * Volume Management
      * File Management
      * Block (RAID)

      Problems start when each layer needs information at the layer above it. This is epitomized with the design flaw in hardware RAID via the write-hole. Link to English version

      In contradistinction ZFS takes a holistic, unified approach:

      * Volument Management <--> File Management <--> Block

      e.g.
      The original RAIDZ implementation was written in 599 lines of code in vdev_raidz.c -- less code equals less bugs.
      https://github.com/illumos/ill...

      > That's the same issue as systemd

      No it doesn't. You are comparing apples to oranges. ZFS works because it intentionally "Flattened the stack" -- Yes, this runs counter to the layered Unix approach -- but sometimes that is NOT the best design decision.

      Meanwhile Oracle keeps flailing about with Btrfs.

    4. Re:Not the best fit since it's schizophrenic by Anonymous Coward · · Score: 2, Interesting

      ZFS mashes all those different things into one big blob. That's not really how Linux is designed.

      That's because Linux isn't designed, it's grown organically in a hodgepodge fashion. Some people think this is a good thing. Others do not.

      A weblog post by Jeff Bonwich, one of the creators of ZFS, from ten years ago**:

      Andrew Morton has famously called ZFS a "rampant layering violation" because it combines the functionality of a filesystem, volume manager, and RAID controller. I suppose it depends what the meaning of the word violate is. While designing ZFS we observed that the standard layering of the storage stack induces a surprising amount of unnecessary complexity and duplicated logic. We found that by refactoring the problem a bit -- that is, changing where the boundaries are between layers -- we could make the whole thing much simpler.

      https://blogs.oracle.com/bonwick/rampant-layering-violation

      He gives a reasonable answer as to why glomming all that together has its advantages. Good intro slide deck:

      https://wiki.illumos.org/download/attachments/1146951/zfs_last.pdf

      Note that "ZFS" is actually made of of three layers: the SPA (which talks to disks), the DMU (which takes objects and breaks up into the RAID stripes to send them to the SPA), the ZPL (ZFS POSIX layer, which is your Unix-y file system).

      You can actually link directly to the DMU (which has a userland library) and treat "ZFS" as an pure object store without POSIX semantics. You could also take another file system (ext3/4, UFS, XFS) and plug it into the DMU as well, and treat the lower layers as a replacement to LVM.

      ** Ten years? Holy shit! I remember reading that shortly after it was posted.

    5. Re:Not the best fit since it's schizophrenic by dfghjk · · Score: 0

      "If ZFS doesn't have a way to put its hands into the RAID, it can't attempt to rebuild known corrupted data."

      Nonsense.

      "Until mdadm and hardware RAID controllers allow you to issue a "read, but try to give a different result" operation you can't do this."

      More nonsense.

      "(Said operation would attempt to use parity even on a healthy array in an attempt to give a different block content by pretending a disk is dead)."

      Apparently you believe that redundancy information can't be checked unless hardware provides an option to do it for you.

      You should probably stop talking.

    6. Re:Not the best fit since it's schizophrenic by barbariccow · · Score: 1

      That's irrelevant. The point of raid is that with a redundant configuration it automatically handles errors and rebuilds (best it can). The problem is ZFS tries to implement raid, I/O scheduling, and filesystem as one big blob. If there's a better way discovered to keep data safe, it will be implemented in the COMMON layer, which already has years of theory and practice baked into its stability. It's like writing an application that directly writes pixels to the graphics card. It becomes huge, doesn't interact well with the existing layers present, nor does it benefit from the widespread testing and fixes applied to those layers as issues are discovered.

    7. Re:Not the best fit since it's schizophrenic by DeHackEd · · Score: 1

      Oh it can be "checked" by RAID controllers. The question is, how do you know which copy is correct? In the case of a RAID-1, if the 2 disks don't have identical data, which do you assume is the right data? ZFS has checksums to figure out which is right. MDADM doesn't.

      And if there is an API to allow you to ask for data from a specific disk rather than letting the RAID driver pick one, I'm interested.

    8. Re:Not the best fit since it's schizophrenic by UnknownSoldier · · Score: 1

      > Because Linux normally lets you use your choice of file system on top of your choice of volume manager,

      The problem is: btrfs, exfat, ext3, ext4, fat, jfs, reisderfs, and xfs ALL SUCK -- they all propagated write errors


      FS / read / write /silent
      btrfs.. | prop prop prop
      exfat.. | prop prop ignore
      ext3... | prop prop ignore
      ext4... | prop prop ignore
      fat.... | prop prop ignore
      jfs.... | prop ignore ignore
      reiserfs | prop prop ignore
      xfs.... | prop prop ignore

    9. Re:Not the best fit since it's schizophrenic by thegarbz · · Score: 1

      Probably ALL aspects of it.

      That's like saying ext4 is redundant because ext2 exists. Just because parts of ZFS serve the same function as parts in all places of the Linux I/O stack, doesn't make it comparable or redundant.

      Actually the only word I would use is "better" given the feature list.

    10. Re:Not the best fit since it's schizophrenic by jabuzz · · Score: 1

      Unfortunately ZFS does not allow you to detect corruption until it is too late to do anything about other than throw the file away.

      That is you write something to disk with checksums which will allow you to detect that it is corrupt. However unless you read what you have written back immediately you have no idea whether what you have just written actually made it to the storage device intact.

      If you want to make sure what you have just written to the storage device made it there intact, and is still intact when you read it back later then you need DIX/DIF which happens to be file system independent.

  10. Config? by JBMcB · · Score: 1

    Are you doing Z+1? Or just striping with an L2ARC, which is nearly pointless? What's the areal density of the drives? 'Cause if you are using anything above 2TB the odds of getting uncorrectable errors on both drives becomes non-trivial.

    At this point you are better off using XFS with a really good backup strategy.

    --
    My Other Computer Is A Data General Nova III.
    1. Re:Config? by dfghjk · · Score: 1

      So they say. Don't you find it odd that a drive can't possibly correct for errors but a filesystem can?

      I wonder if drive vendors acknowledge that 100% of their high capacity drives are incapable of functioning without uncorrectable errors. Perhaps they should implement ZFS internally and all problems would be solved.

    2. Re:Config? by barbariccow · · Score: 1

      So they say. Don't you find it odd that a drive can't possibly correct for errors but a filesystem can?

      That's because the filesystem can just write to a different spot on the device, but if a specific spot on the physical device goes bad it's bad. In fact, almost all drives automatically error correct, you can see the stats through utils like "smartctl". A drive generally has +10%-+20% of advertised capacity, and exports a virtual mapping of the drive. As sectors start to show signs of failing, the address is transparently mapped to some of this "extra" space and things continue as normal. It's only a drive-failure scenario when you've mapping the majority of this extra space, or there is a sudden failure (like water damage) versus a worn sector.

    3. Re:Config? by Tough+Love · · Score: 1

      the filesystem can just write to a different spot on the device, but if a specific spot on the physical device goes bad it's bad.

      That's not true at all. Modern HDDs can remap sectors, to other tracks if necessary.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
  11. ZFS sure, but what about boot environments? by Anonymous Coward · · Score: 0

    I can't tell you how many times boot environments saved my bacon after a particularly hairy upgrade flopped. I would really like to see that added to Linux.

    1. Re:ZFS sure, but what about boot environments? by barbariccow · · Score: 1

      Why not just have a small partition with a stable system installed on it for recovery purposes? Or a live cd?

  12. What does Oracle ZFS offer at this point? by Anonymous Coward · · Score: 0

    OpenZFS hasn't been static, and has become a very successful cross-platform effort, which may be considered the main ZFS branch at this point, and most of the developers have left Oracle anyway. The introduction of feature flags has enabled a number of native features to be developed in parallel and development is thriving, with a native zfs-crypto implementation awaiting integration, among other goodies.

    The thought of ZFS in the mainline kernel is attractive, but how would it be done without fragmenting the community? If Oracle contributes ZFS as GPL-only, that would be worse than useless, as it would require a break from the larger ZFS community and leave everyone with incompatible pools. Even if Oracle were more willing to reconcile their poor behavior, it isn't clear how the licensing differences with OpenZFS could be resolved, though ideas are welcome.

    1. Re:What does Oracle ZFS offer at this point? by Anonymous Coward · · Score: 0

      Well, I think Oracle's goal with getting a GPLed ZFS into mainline is to effectively hinder the other forks.

    2. Re:What does Oracle ZFS offer at this point? by Anonymous Coward · · Score: 0

      That might be the goal, but OpenZFS is a unified cross-platform effort, not disparate haphazard forks. There is ongoing effort to minimize differences and maintain compatibility across all systems supporting OpenZFS. Most ZFS developers are no longer employed at Oracle, and it seems unlikely that they would abandon years of effort, community, and cross-platform compatibility for a handful of closed-source scraps from a company that has essentially destroyed Sun and all of their good works.

      If the goal is to hobble competition and splinter the community, I'd be surprised if Linus didn't have some choice words for Oracle, and I'd look forward to hearing them.

    3. Re:What does Oracle ZFS offer at this point? by TheRaven64 · · Score: 1

      Add to that, a lot of the most active ZFS developers work on the various OpenSolaris forks. A GPL'd version is completely useless to them. It's also not clear if Oracle even could release a GPL'd version. If they've taken any code from OpenZFS, then their version will include CDDL'd code that they don't own the copyright to, which would make relicensing impossible without replacing all of that code.

      --
      I am TheRaven on Soylent News
  13. Oracle ZFS != ZoL / OpenZFS by Anonymous Coward · · Score: 0

    One nice thing about ZFS not being in upstream is that it is currently maintained and updated separate from the Linux kernel.

    Now, it would be nice to relicense ZFS under GPL so that it can be included in the kernel.

    I think you are a bit confused. You are thinking of ZFS-on-Linux (ZoL) which takes its code from the OpenZFS project (OpenZFS also runs on Illumos/OpenSolaris, BSD, etc.). What the presenter was talking about was Oracle-ZFS, which currently is Oracle Solaris-only.

  14. Btrfs by Jerry · · Score: 1

    I played with zfs-fuse on KDE Neon a couple years ago after reading from its acolytes that it was "more advanced" and "better" than EXT4 or Btrfs. It wasn't. A lot of it is missing in the fuse rendition.

    I switched to Btrfs. I have three 750Gb HD's in my laptop. I use one as a receiver of @ and @home backup snapshots. I've configured the other two as a 2 HD pool and then as a RAID1, and then back to a pool again. In 2 1/2 years of using Btrfs I've never had a single hiccup with it.

    There are some excellent posts on the KubuntuForums.net website by Oshunluver which describe how to use Btrfs to install many different distros to a single Btrfs installation, and how to use Btrfs in general.

    --

    Running with Linux for over 20 years!

    1. Re:Btrfs by caseih · · Score: 1

      ZFS fuse is not ZFS on Linux. Not sure why you'd pass judgement on ZFS having only used it years ago with the fuse version. If you want a real test, try the latest ZFS on Linux releases. They are kernel modules not fuse drivers.

      I have run BtrFS for about 5 years now, and I must say it works well on my Laptop with SSD. However on my desktop with spinning disk, it completely falls over. It started out pretty fast for the first few years, but now it's horrible. The slightest disk I/O can freeze my system for long periods of time. I've done a lot of research but haven't found any suggestions for fixing this. I've tried recent kernels, rebalancing, defraging, etc. Nothing had any effect on the I/O load problem. The drive itself is fine. No smart errors, no bad sector relocations, etc. I know the on-disk structure has changed over the years; I'm reasonably certain the on-disk format is the latest. Anyway, I'm tired of waiting for BtrFS. I'm guessing BtrFS has had a lot of issues for others as RedHat has officially abandoned support for it. It's a shame because on paper BtrFS truly rocks.

      I've talked to a couple of other BtrFS users and they have also had horrible experiences with spinning disks. I'm going to re-install my system here this winter, and it won't be going back to BtrFS. I'll miss write-able snapshots. But Ext4 is much much faster on my spinning disks. I might try ZFS and see how it runs with just 8 GB of RAM.

      I used ZFS for years in the enterprise and it rocked there. I used snaphots heavily for backup and archival purposes. Was awesome, though I hated working with Solaris.

    2. Re:Btrfs by mvdwege · · Score: 1

      Quick question for you: do you have quota's enabled? Updating qgroups takes an enormous amount of time, I had the same symptoms on my laptop on a 1T drive, and turning of quotas and removing qgroups solved it.

      --
      "I know I will be modded down for this": where's the option '-1, Asking for it'?
    3. Re:Btrfs by Anonymous Coward · · Score: 0

      So, which drive is your swap partition on?
      Since as far as I've been able to determine btrfs still doesn't support swap.

    4. Re:Btrfs by thegarbz · · Score: 1

      I played with zfs-fuse on KDE Neon a couple years ago after reading from its acolytes that it was "more advanced" and "better" than EXT4 or Btrfs.

      They should have mentioned no such thing. ZFS-Fuse was a shitty work around to a licensing issue that many people are still arguing may not actually be real. It has effectively been undeveloped for many years and also as a fuse module was not capable of implementing the entire ZFS stack as required.

      Switching to btrfs from zfs-fuse has nothing to do with ZFS itself. You just switched from the worst option to the second best. btrfs is still preferable to ext4 in my opinion, but it doesn't hold a candle to ZFS in performance, maturity, support, and active development.

    5. Re:Btrfs by caseih · · Score: 1

      Oh you had me excited there for a second. But no, alas, quotas and qgroups are not enabled, as near as I can tell.

    6. Re:Btrfs by mvdwege · · Score: 1

      Heh. It's a nice system, and I like it for snapshotting and incremental backups, but yeah, it still has weird hangups here and there.

      --
      "I know I will be modded down for this": where's the option '-1, Asking for it'?
  15. Can he make that decision? by Anonymous Coward · · Score: 0

    My guess is that an engineer is not authorized to make that decision. His comments sound like pure speculation.

  16. New to ZFS by AlanObject · · Score: 3, Informative

    Just as this article popped up I was assembling a JBOD array (twelve 4TB drives) for a new data center project, my first in quite a while. Also self funded so I don't have to defer to anyone in decisions.

    When I started I did a bit of reading trying to decide what RAID hardware to get. To make a long story short once I read the architecture of ZFS and several somewhat-polemic-but-well-reasoned blog entries I decided that is what I wanted.

    Only two months ago I had an aged Dell RAID array let me down. I have no idea what actually happened, but it appears some error crept in one of the drives and it got faithfully spread across the array and there was just no recovering it. If I didn't have good backups that would have been about 12 years of the company's IP up in smoke. I just thought I'd share.

    So I ended up as a prime candidate (with new found distrust for hardware RAID) to be a new ZFS-as-my-main-storage user. I've just recently learned stuff that was well established five years ago and I can't understand why doesn't everybody do it this way.

    Wow. snapshots? I can do routine low-cost snapshots? Data compression? Sane volume management? (I consider LVM to the the crazy aunt in the attic. Part of the family but ...) Old Solaris hands are probably rolling their eyes but this is like mana from heaven to me.

    Given the plethora of benefits I am sure the incentive is high enough to keep ZFS on Linux going onward. ZFS root file system would be nice but I am more than willing to work around that now.

    1. Re:New to ZFS by UnknownSoldier · · Score: 1

      > Only two months ago I had an aged Dell RAID array let me down. I have no idea what actually happened, but it appears some error crept in one of the drives and it got faithfully spread across the array and there was just no recovering it. If I didn't have good backups that would have been about 12 years of the company's IP up in smoke. I just thought I'd share.

      It may have been the RAID write hole ?

      See Page 17

    2. Re:New to ZFS by Drew+Sullivan · · Score: 1

      I have a similar configuration at home. zfs send/recv is a godsend for backups in that you can have all of the old snapshots sent as well as the current top level and it ship only the data that has changed not everything.

      I have run this configuration where I have had controllers, power supplies, multiple drive (more than 2 at the same time) go bad and it still kept on chugging with no errors and full confidence in the data.

      --
      -- Linux Consultant
    3. Re:New to ZFS by JoeRandomHacker · · Score: 1

      You may also want to take a look at btrfs. It sounds like a match for the feature set that interests you, and it is already available on Linux.

    4. Re:New to ZFS by Anonymous Coward · · Score: 0

      Given the plethora of benefits I am sure the incentive is high enough to keep ZFS on Linux going onward. ZFS root file system would be nice but I am more than willing to work around that now.

      I regulary use this Debian on ZFS root script.

    5. Re:New to ZFS by thegarbz · · Score: 1

      ZFS is also "already available" on Linux and has been for several years. By comparison btrfs is still in diapers, and currently support has been dropped by all major linux vendors save for SUSE, and whatever the fuck Oracal is doing in the Linux world right now (trying to appear relevant).

      ZFS is more mature and in far more active development.

    6. Re:New to ZFS by pnutjam · · Score: 1

      Only Red Hat has dropped support for btrfs. Mainly because they use a patchwork kernel that is really old.

    7. Re:New to ZFS by AlanObject · · Score: 1

      It may have been the RAID write hole ?

      I was wondering if that is what it was, but with the stress of having a major file server down I just couldn't justify the hours it would take to a) learn how to diagnose it and then b) do an analysis. That system had only the one VM left on it so I was just happy enough to take the latest VM image and put it on another hypervisor.

      One drive was making ugly noises so maybe (probably) a head crash. The confident product theory of hardware RAID is that shouldn't have mattered the remaining good drive(s) should have just continued service but it didn't. Fortunately I never fell into the trap of i-got-hardware-raid-so-i-don't-need-backup. In my mind that is about the same level of false confidence as a drunk has getting behind the wheel.

    8. Re:New to ZFS by thegarbz · · Score: 1

      That would be insignificant if anyone else other than SUSE was throwing anything behind btrfs. btrfs seems to be losing favour ever since Ubuntu decided to change their roadmap from potentially including to btrfs as a default to declaring it outright experimental with the current roadmap favouring zfs as the future default.

      By support being dropped I don't mean technical support or official support, I mean that the major vendors (other than SUSE) are no longer supporting the idea of btrfs becoming the next gen default filesystem for Linux. At least Ubuntu is doing something, Debian placed their arse firmly on the fence, Google have dropped the idea of btrfs on Android due to the lack of native encryption so that was a feature based decision, but RedHat's drop back to XFS seems particularly bizarre. Mind you the entire Red Hat situation in general is given that Oracle themselves also favour btrfs on their Linux distribution which is built on Red Hat's kernel.

      It used to be that btrfs was the future filesystem while ZFS was a license encumbered also ran. Now that has flipped around, and Red Hat's decision is just one of a couple of votes of no confidence.

    9. Re:New to ZFS by rl117 · · Score: 1

      You can use ZFS on the root filesystem. I'm writing this on a system which has been using ZFS on root for 18 months now (since Ubuntu 16.04, upgraded through to 17.10 without a hitch).

    10. Re:New to ZFS by Anonymous Coward · · Score: 0

      FreeBSD supports ZFS on root.

    11. Re:New to ZFS by pnutjam · · Score: 1

      btrfs is still the future, Red Hat and Ubuntu are not cutting edge distros, they are focusing on stability.

    12. Re:New to ZFS by thegarbz · · Score: 1

      btrfs is still the future

      For whom?

      Dismissing the two biggest players in the industry as not cutting edge, despite the fact that they aren't abandoning it out of conservatism (Ubuntu isn't anyway) doesn't paint it as "the future".

      Especially giving that ZFS is further in development, more actively developed and has a more advanced roadmap, I question if btrfs is the same kind of "future" as Clean Coal or a slightly more efficient car, etc. If btrfs is the future, you're going to have a hard time convincing people of it.

    13. Re:New to ZFS by pnutjam · · Score: 1

      Show me the distribution that is shipping zfs as a core component. I can show you the one shipping btrfs, opensuse.

    14. Re:New to ZFS by thegarbz · · Score: 1

      Ubuntu.

      But nice attempting to change the focus of the discussion. Remember the word I used over and over again? "Future" Now please scroll back to the start and read that entire thread over again.

  17. It's called scrubbing, and RAID has always done it by raymorris · · Score: 1

    > Until mdadm and hardware RAID controllers allow you to issue a "read, but try to give a different result" operation you can't do this. (Said operation would attempt to use parity even on a healthy array in an attempt to give a different block content by pretending a disk is dead).

    So until the late 1980s? That's called RAID scrubbing and I believe it was mentioned toward the end of the original RAID paper in 1987 or 1988. Certainly 10 years ago I had a "mdadm check" command in my crontab. I know this for sure because I still have a copy of my 2007 server image.

    The "mdadm repair" command was also in use by then.
    Cool "new feature" you've got there.

    I'll respond to your other two gross misunderstandings about raid by replying to your other post.

  18. Heard of RAID levels 2 through 6? by raymorris · · Score: 1

    > ZFS has checksums to figure out which is right. MDADM doesn't.

    You have no idea how RAID works, do you? Neither through the mdadm UI or any other.

    RAID level 2 uses Hamming error correction codes.
    Levels 3 through 5 use checksums much like ZFS does. Level 6 uses two independent sets of checksums, so even if you lose half your checksums, you're still okay.

    >. if there is an API to allow you to ask for data from a specific disk rather than letting the RAID driver pick one, I'm interested.

    An API to read from sda? Uhm, it's called read(). You very simply read from sda or whichever drive rather than reading from md0. That's how you can boot from a RAID 1 partition without the BIOS or bootloader knowing anything about RAID - it just reads from any of the member disks.

    1. Re:Heard of RAID levels 2 through 6? by DeHackEd · · Score: 1

      Again, wrong. RAID-2 might have ECC, but mdadm doesn't support it. You got RAID-1, 5 and 6 (4 is identical to 5 with parity being distributed rather than on a single disk). But that's not a checksum, it's parity. It recovers from a drive outright failing, not from errors returning incorrect data but not detected as bad. I have seen it happen. RAID-5 can only tell you there's an inconsistency, not which disk has the bad data. The RAID controller's consistency check usually just updates the parity under the assumption that parity is at fault if that happens.

    2. Re:Heard of RAID levels 2 through 6? by Anonymous Coward · · Score: 0

      uses Hamming error correction codes.

      You have no idea how checksumming works. Maybe pick your answers in a way that actually addresses the discussion.

  19. Note "HOPES" by Torp · · Score: 1

    He hopes that... but he has no decision power, i bet. Maybe he's on the next firing list.

    This is Oracle that we're talking about, it's more likely they'll let you license ZFS for a couple thousand per month...

    --
    I apologize for the lack of a signature.
  20. Drawback of separate developmend by DrYak · · Score: 2

    One nice thing about ZFS not being in upstream is that it is currently maintained and updated separate from the Linux kernel.

    And that's actually a huge problem that makes it a major obstacle to its upstream adoption.
    Mainly due to code duplication.

    ZFS (and its competitor BTRFS) is peculiar, because it's not just a filesystem. It's a whole integrated stack that includes a filesystem layer on the top, but also a volume management and replication layer underneath (ZFS and BTRFS on their own a the equivalent of a full EXT4 + LVM + MDADM stack).

    That is a necessity, due to some features in these : e.g. the checksuming going on in the filesystem layer is also useful to determine correct copies in case of bitrot in the replication layer.

    But how this is handled is the big difference between ZFS and BTRFS.

    ZFS on Linux just packs all the needed bits together with it.
    It comes with its own volume management and replication code.
    That is a duplicate of functionnality existing elsewhere in the kernel.
    And duplication is always bad for maintenance.

    BTRFS being developped on Linux tries to leverage as much as possible :
    - the Zstd compression currently being introduced to BTRFS, uses the same routines as the Zstd compression being introduced into the kernel loader : both leverage the in-kernel compression facilities of the crypto modules
    - the device mapper facilities are used by lvm, mdadm and dmraid but also by btrfs. There was a plan to develop code to support more than 2 parity blocks (more than RAID6), that would have been beneficial to both btrfs and mdadm.

    That's why developers complain of boundaries/layers violation with ZFS but not about BTRFS.
    ZFS comes with its own tangled mess of layers, BTRFS is just a wrapper around facilities already existing in-kernel.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
    1. Re:Drawback of separate developmend by the_B0fh · · Score: 1

      Don't worry about it. One day, systemd will manage it all.

  21. ZFS vs BTRFS by DrYak · · Score: 2

    In contradistinction ZFS takes a holistic, unified approach:

    * Volument Management <--> File Management <--> Block

    {...}

    ZFS works because it intentionally "Flattened the stack" -- Yes, this runs counter to the layered Unix approach

    The problem is that ZFS implement this by rolling everything in the same "rampant layering violation" package.
    It is one single "flattened stack".

    On the other hand, BTRFS shares as much code as possible with in-kernel facilities (it leverages "device mapper" routines that are used also by lvm, mdadm, mdraid, etc. it leverages in-kernel compression routine that are also used by the kernel loader and the crypto module, etc.)
    It's not as much a "rampant layering violation" as a wrapper against layer facilities already existing in kernel.

    -- but sometimes that is NOT the best design decision.

    So basically, the problem isn't the overall design, but that actual code re-use vs. re-write.

    Meanwhile Oracle keeps flailing about with Btrfs.

    Btrfs works. It's in kernel, It's a first class filesystem in opensuse, and its copy-on-write facilities are extensively used for versioning with snapper.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  22. Scrub of death by emil · · Score: 1
    1. Re:Scrub of death by thegarbz · · Score: 1

      Why? All the articles you link to describe one failure mode which is not only theoretical but all can be avoided by simply not scrubbing the pool. No one is forcing you to do that, and you can run ZFS just as happily as any other file system with non-ECC RAM and still get some of the benefits including the filesystem potentially alerting you to failing RAM rather than silently screwing your system as it would with any other filesystem.

    2. Re:Scrub of death by thegarbz · · Score: 1

      Correction: I have sat down and read all your links in detail. All of the claims that ZFS scrubbing will destroy your pool on non-ECC RAM is actually garbage which doesn't take into account the actual failure mechanism of the RAM or the response of the scrub which is to leave data untouched if an unfixable error occurs. So scrub away.

      GP was right, there is no special hardware requirements for ZFS and you should have no problem letting him administer your sensitive data.

  23. Antergos native ZFS for the root filesystem by emil · · Score: 1

    Unfortunately, it's not quite there. Very close though.

    https://antergos.com/wiki/miscellaneous/zfs-under-antergos/

  24. Btrfs works? No it most certainly does not. by emil · · Score: 1

    Just ask SUSE:

    "we are still refusing to support 'Automatic Defragmentation', 'In-band Deduplication' and higher RAID levels, because the quality of these options is not where it ought to be"

  25. Feature set by DrYak · · Score: 1

    Just ask SUSE:

    Just learn to read the docs if you insist on having esoteric options turned.
    In 2017, RAID56 are still marked incomplete.

    Modern filesystem are a huge pile of diverse features, some are fully stable and used in production (e.g.: RAID0 and 1) other are still in development (e.g.: RAID56).
    Complain that BTRFS is completely crap because RAID5/6 isn't fully functionnal and production ready, is like complaining that the venerable XFS is utter crap because its copy-on-write and snapshotting doesn't work yet.

    (and BTW, in-band deduplication doesn't even exist yet in BTRFS. ZFS is supposed to have it, but is an ultra-massive performance killer from what I've heard)
    (auto-defrag works, but is a write-perfomance killer. alternatives a no defrag at all, which is a read-performance killer. or using out-band defrag, which requires maintenance and kills snapshot correlation.
    all these are problem which are specific to copy-on-write (ZFS, BTRFS) and log-structured (UDF, F2FS) filesystems)
    (

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  26. Which docs? by emil · · Score: 1
  27. Errors by JBMcB · · Score: 1

    A drive can correct for errors if a block is bad. Problem is, as areal densities increase, the odds of data changing randomly increases. This is mainly due to cosmic rays or other natural sources of radiation, but there can be other factors. The drive doesn't know anything about the data itself, it only knows if it can read a block or not, and that's really the way you want it. You want the drive to be structure and data agnostic. Otherwise you would need a specific drive for a specific file system, which would be a nightmare.

    --
    My Other Computer Is A Data General Nova III.
  28. I still like BTRFS way better by cmaurand · · Score: 1

    BTRFS is the future. ZFS is an incredible memory hog.