Oracle Engineer Talks of ZFS File System Possibly Still Being Upstreamed On Linux (phoronix.com)

← Back to Stories (view on slashdot.org)

Oracle Engineer Talks of ZFS File System Possibly Still Being Upstreamed On Linux (phoronix.com)

Posted by BeauHD on Wednesday October 25, 2017 @10:00AM from the partly-cloudy-with-a-chance-of-rain dept.

New submitter fstack writes: Senior software architect Mark Maybee who has been working at Oracle/Sun since '98 says maybe we "could" still see ZFS be a first-class upstream Linux file-system. He spoke at the annual OpenZFS Developer Summit about how Oracle's focus has shifted to the cloud and how they have reduced investment in Solaris. He admits that Linux rules the cloud. Among the Oracle engineer's hopes is that ZFS needs to become a "first class citizen in Linux," and to do so Oracle should port their ZFS code to Oracle Linux and then upstream the file-system to the Linux kernel, which would involve relicensing the ZFS code.

13 of 131 comments (clear)

Min score:

Reason:

Sort:

Having it NOT be in upstream is more flexible by ZorinLynx · 2017-10-25 10:03 · Score: 5, Insightful

One nice thing about ZFS not being in upstream is that it is currently maintained and updated separate from the Linux kernel.
Now, it would be nice to relicense ZFS under GPL so that it can be included in the kernel. But this should wait until the port is a bit more mature. Right now development is very active on ZFS and we have new versions coming out every few weeks; having to coordinate this with kernel releases will complicate things.
All this said, relicensing ZFS would definitely help Oracle redeem themselves a bit. After mercilessly slaughtering Sun after acquiring them, they have a long way to go to get from the "evil" side back to the forces of good.
1. Re:Having it NOT be in upstream is more flexible by Neo-Rio-101 · 2017-10-25 10:30 · Score: 2
  
  Now, it would be nice to relicense ZFS under GPL so that it can be included in the kernel. But this should wait until the port is a bit more mature. Right now development is very active on ZFS and we have new versions coming out every few weeks; having to coordinate this with kernel releases will complicate things.
  Funny, I thought ZFS was very mature by now.
  Getting it open and into Linux would result in perhaps some cross-pollination between OpenZFS and Oracle's official ZFS.
  
  --
  READY.
  PRINT ""+-0
2. Re:Having it NOT be in upstream is more flexible by Anonymous Coward · 2017-10-25 10:47 · Score: 3, Insightful
  
  Oracle is evil ... period. There is no going back.
3. Re:Having it NOT be in upstream is more flexible by JBMcB · 2017-10-25 10:59 · Score: 5, Interesting
  
  Funny, I thought ZFS was very mature by now.
  It's very mature, on Solaris. Linux has a different ABI to the storage layer, and different requirements on how filesystems are supposed to behave. So it's not so much a port as a re-implementation.
  
  --
  My Other Computer Is A Data General Nova III.
4. Re:Having it NOT be in upstream is more flexible by Anonymous Coward · 2017-10-25 11:02 · Score: 2, Insightful
  
  I don't believe this is Oracle's better nature or whatever; ZFS has to transition from Solaris to Linux because Solaris is dead.
  It's really that simple. If Oracle can gin up a little excitement and maybe score some kudos then great, why not? But ultimately this has to happen or the official Oracle developed ZFS will die with its only official platform.
5. Re:Having it NOT be in upstream is more flexible by Aaden42 · 2017-10-26 01:52 · Score: 2
  
  OpenZFS and Oracle ZFS have diverged a bit. The on-disk pool contains a version number which identifies with certainty whether you can import it on a given implementation, so there's at least no chance of mistaken mis-importing & data loss from that. They're interoperable for pools that aren't upgraded past the highest pool version supported in the final CDDL release of Oracle ZFS. Beyond that, they won't work.
  Oracle ZFS has since added file-level encryption. The encryption and the on-disk structure aren't readable by OpenZFS. OpenZFS has incremented the pool version number by a large jump (5000) past the last Oracle ZFS version and has fixed & enhanced some things in such a way that the on-disk isn't compatible with Oracle ZFS. For info about OpenZFS version & feature flags, see http://open-zfs.org/wiki/Featu...
  I don't think it would take a tremendous amount of effort to merge the functionality one way or the other if the licensing issues were solved, but they're definitely not on-disk compatible if you're running the latest pool version supported by either release.
Re:Careful there by dnaumov · 2017-10-25 10:37 · Score: 4, Insightful

ZFS wants to live in a fairly specific configuration. It wants a bunch of drives, a bunch of memory, and not much competition for system resources.
Except for the part where it works with 2 drives, on a system with 4GB of RAM and under constant heavy load just fine.
Not the best fit since it's schizophrenic by raymorris · 2017-10-25 10:47 · Score: 2

> The problem with ZFS on Linux is that some aspects of it are redundant with the kernel.
Probably ALL aspects of it. Linux already has a raid implementation in-kernel. It already has filesystems. It already has multiple volume managers, which handle whichever type of snapshots you prefer. It already has IO schedulers. ZFS, or rather something that looks just like it, can be implemented as a few configuration lines for pre-existing Linux components.
Because Linux normally lets you use your choice of file system on top of your choice of volume manager, on top of whichever RAID implementation you choose, with your choice of IO scheduling options, ZFS isn't exactly the best fit. ZFS mashes all those different things into one big blob. That's not really how Linux is designed.
That's the same issue as systemd - it may (or may not) be a good init system. It may or may not be a good logging system. It may possibly be a good DNS server (probably not). But it can't seem to decide wtf it is.
1. Re:Not the best fit since it's schizophrenic by UnknownSoldier · 2017-10-25 11:30 · Score: 5, Insightful
  
  > Because Linux normally lets you use your choice of file system on top of your choice of volume manager, on top of whichever RAID implementation you choose, with your choice of IO scheduling options, ZFS isn't exactly the best fit. ZFS mashes all those different things into one big blob. That's not really how Linux is designed.
  Criticizing ZFS for "rampant layering violation" has been discussed to death before
  "Dumb" API's, such as the ones implemented in Linux, have a STRICT layered approach like this:
  * Volume Management
  * File Management
  * Block (RAID)
  Problems start when each layer needs information at the layer above it. This is epitomized with the design flaw in hardware RAID via the write-hole. Link to English version
  In contradistinction ZFS takes a holistic, unified approach:
  * Volument Management <--> File Management <--> Block
  e.g.
  The original RAIDZ implementation was written in 599 lines of code in vdev_raidz.c -- less code equals less bugs.
  https://github.com/illumos/ill...
  > That's the same issue as systemd
  No it doesn't. You are comparing apples to oranges. ZFS works because it intentionally "Flattened the stack" -- Yes, this runs counter to the layered Unix approach -- but sometimes that is NOT the best design decision.
  Meanwhile Oracle keeps flailing about with Btrfs.
2. Re:Not the best fit since it's schizophrenic by Anonymous Coward · 2017-10-25 12:16 · Score: 2, Interesting
  
  ZFS mashes all those different things into one big blob. That's not really how Linux is designed.
  That's because Linux isn't designed, it's grown organically in a hodgepodge fashion. Some people think this is a good thing. Others do not.
  A weblog post by Jeff Bonwich, one of the creators of ZFS, from ten years ago**:
  
  Andrew Morton has famously called ZFS a "rampant layering violation" because it combines the functionality of a filesystem, volume manager, and RAID controller. I suppose it depends what the meaning of the word violate is. While designing ZFS we observed that the standard layering of the storage stack induces a surprising amount of unnecessary complexity and duplicated logic. We found that by refactoring the problem a bit -- that is, changing where the boundaries are between layers -- we could make the whole thing much simpler.
  https://blogs.oracle.com/bonwick/rampant-layering-violation
  He gives a reasonable answer as to why glomming all that together has its advantages. Good intro slide deck:
  https://wiki.illumos.org/download/attachments/1146951/zfs_last.pdf
  Note that "ZFS" is actually made of of three layers: the SPA (which talks to disks), the DMU (which takes objects and breaks up into the RAID stripes to send them to the SPA), the ZPL (ZFS POSIX layer, which is your Unix-y file system).
  You can actually link directly to the DMU (which has a userland library) and treat "ZFS" as an pure object store without POSIX semantics. You could also take another file system (ext3/4, UFS, XFS) and plug it into the DMU as well, and treat the lower layers as a replacement to LVM.
  ** Ten years? Holy shit! I remember reading that shortly after it was posted.
New to ZFS by AlanObject · 2017-10-25 14:45 · Score: 3, Informative

Just as this article popped up I was assembling a JBOD array (twelve 4TB drives) for a new data center project, my first in quite a while. Also self funded so I don't have to defer to anyone in decisions.
When I started I did a bit of reading trying to decide what RAID hardware to get. To make a long story short once I read the architecture of ZFS and several somewhat-polemic-but-well-reasoned blog entries I decided that is what I wanted.
Only two months ago I had an aged Dell RAID array let me down. I have no idea what actually happened, but it appears some error crept in one of the drives and it got faithfully spread across the array and there was just no recovering it. If I didn't have good backups that would have been about 12 years of the company's IP up in smoke. I just thought I'd share.
So I ended up as a prime candidate (with new found distrust for hardware RAID) to be a new ZFS-as-my-main-storage user. I've just recently learned stuff that was well established five years ago and I can't understand why doesn't everybody do it this way.
Wow. snapshots? I can do routine low-cost snapshots? Data compression? Sane volume management? (I consider LVM to the the crazy aunt in the attic. Part of the family but ...) Old Solaris hands are probably rolling their eyes but this is like mana from heaven to me.
Given the plethora of benefits I am sure the incentive is high enough to keep ZFS on Linux going onward. ZFS root file system would be nice but I am more than willing to work around that now.
Drawback of separate developmend by DrYak · 2017-10-25 22:42 · Score: 2

One nice thing about ZFS not being in upstream is that it is currently maintained and updated separate from the Linux kernel.
And that's actually a huge problem that makes it a major obstacle to its upstream adoption.
Mainly due to code duplication.
ZFS (and its competitor BTRFS) is peculiar, because it's not just a filesystem. It's a whole integrated stack that includes a filesystem layer on the top, but also a volume management and replication layer underneath (ZFS and BTRFS on their own a the equivalent of a full EXT4 + LVM + MDADM stack).
That is a necessity, due to some features in these : e.g. the checksuming going on in the filesystem layer is also useful to determine correct copies in case of bitrot in the replication layer.
But how this is handled is the big difference between ZFS and BTRFS.
ZFS on Linux just packs all the needed bits together with it.
It comes with its own volume management and replication code.
That is a duplicate of functionnality existing elsewhere in the kernel.
And duplication is always bad for maintenance.
BTRFS being developped on Linux tries to leverage as much as possible :
- the Zstd compression currently being introduced to BTRFS, uses the same routines as the Zstd compression being introduced into the kernel loader : both leverage the in-kernel compression facilities of the crypto modules
- the device mapper facilities are used by lvm, mdadm and dmraid but also by btrfs. There was a plan to develop code to support more than 2 parity blocks (more than RAID6), that would have been beneficial to both btrfs and mdadm.
That's why developers complain of boundaries/layers violation with ZFS but not about BTRFS.
ZFS comes with its own tangled mess of layers, BTRFS is just a wrapper around facilities already existing in-kernel.

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
ZFS vs BTRFS by DrYak · 2017-10-25 22:52 · Score: 2

In contradistinction ZFS takes a holistic, unified approach:
* Volument Management <--> File Management <--> Block
{...}
ZFS works because it intentionally "Flattened the stack" -- Yes, this runs counter to the layered Unix approach
The problem is that ZFS implement this by rolling everything in the same "rampant layering violation" package.
It is one single "flattened stack".
On the other hand, BTRFS shares as much code as possible with in-kernel facilities (it leverages "device mapper" routines that are used also by lvm, mdadm, mdraid, etc. it leverages in-kernel compression routine that are also used by the kernel loader and the crypto module, etc.)
It's not as much a "rampant layering violation" as a wrapper against layer facilities already existing in kernel.

-- but sometimes that is NOT the best design decision.
So basically, the problem isn't the overall design, but that actual code re-use vs. re-write.

Meanwhile Oracle keeps flailing about with Btrfs.
Btrfs works. It's in kernel, It's a first class filesystem in opensuse, and its copy-on-write facilities are extensively used for versioning with snapper.

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]