Ubuntu Plans To Make ZFS File-System Support Standard On Linux
An anonymous reader writes: Canonical's Mark Shuttleworth revealed today that they're planning to make ZFS standard on Ubuntu. They are planning to include ZFS file-system as "standard in due course," but no details were revealed beyond that. However, ZFS On Linux contributor Richard Yao has said they do plan on including it in their kernel for 16.04 LTS and the GPL vs. CDDL license worries aren't actually a problem. Many Linux users have been wanting ZFS on Linux, but aside from the out of tree module there hasn't been any luck in including it in the mainline kernel or with tier-one Linux distributions due to license differences.
is anything like "ZFS will be the default". He just said that it would be in the distro.
-73, de n1ywb
www.n1ywb.com
It's really quite simple. ZFS is a great filesystem. It's reliable, performant, featureful, and very well documented. Btrfs has a subset of the ZFS featureset, but fails on all the other counts. It has terrible documentation and it's one of the least reliable and least performant filesystems I've ever used. Having used both extensively over several years, and hammered both over long periods, I've suffered from repeated Btrfs dataloss and performance problems. ZFS on the other hand has worked well from day one, and I've yet to experience any problems. Neither are as fast as ext4 on single discs, but you're getting resilience and reliability, not raw speed, and it scales well as you add more discs; exactly what I want for storing my data. And having a filesystem which works on several operating systems has a lot of value. I took the discs comprising a ZFS zpool mirror from my Linux system and slotted them into a FreeBSD NAS. One command to import the pool (zpool import) and it was all going. Later on I added l2arc and zil (cache and log) SSDs to make it faster, both one command to add and also entirely trouble-free.
Over the years there have been lots of publicity about the Btrfs featureset and development. But as you said in your comment that it's "rapidly getting there". That's been the story since day one. And it's not got there. Not even close. Until its major bugs and unfortunate design flaws (getting unbalanced to unusability, silly link limits) are fixed, it will never get there. I had high hopes for Btrfs, and I was rewarded with severe dataloss or complete unusability each and every time I tried it over the years since it was started. Eventually I switched to ZFS out of a need for something that actually worked and could be relied upon. Maybe it will eventually become suitable for serious production use, but I lost hope of that a good while back.
zfsonlinux hit both unstable and stable releases on Linux earlier than btrfs: if your only definition of stable is how long it's been around on Linux, then btrfs is still less mature.
Being in-tree says nothing about the stability of a module, but ZFS doesn't need to be under the GPL to be in Linus' tree: the GPL does not forbid code aggregation. That said, neither Linus nor the ZoL team want ZoL in Linus' tree.
To name a few: A variety of flavors of built-in RAID / replication. Built in error detection and correction. Snapshots. The ability to send and receive deltas between snapshots from one server to another.
I mean the performance gains as you add more discs.
And regarding adding discs to an array, you certainly can. Just add addtional raid sets to the pool. That is, rather than adding discs to the existing array, you scale it up by adding additional arrays to the same pool. See the documentation.
I don't know what your definition of "significant" is, but the BTRFS wiki says "The one missing piece, from a reliability point of view, is that it is still vulnerable to the parity RAID 'write hole', where a partial write as a result of a power failure may result in inconsistent parity data." ZFS RAIDZ is expressly free from the write hole. That is very significant to me.
RAIDZ's write hole advantage is a product of three specifics: (1) RAID5 has n data disks plus one dedicated parity-only disk; ZFS distributes all data and all parity across all disks - (2) ZFS updates metadata before data; RAID5 has no concept of metadata - and (3) COW (both have this).
And before you object "but UPS" - UPSs and power supplies can fail, too - and a kernel panic is essentially a "power failure" too; one which a UPS is powerless to prevent.
If that Wiki should be out of date, you can show me something that isn't, but all I find out there is a lot of outdated stuff.
The features you list as "specific" to zfs exist in btrfs. btrfs can have dedicated parity drives or you can spread the data and parity across multiple drives in any order or pattern you would like.
The write hole in btrfs is AFIAK also present in zfs and listed as a risk of a power failure during write on a raid pool with COW filesystems. This risk is that loss of power during write can result in multiple different parity blocks for the same data and that in such an instance the filesystem cannot identify the correct data or parity (depending on the order you write them) and there are only a few solutions to this that involve resorting to a known good (older) copy and result in lost data (from the write).
IIRC this is a listed risk in the FAQ for ZFS. Just as the same write hole risk exists in btrfs. Also IIRC ZFS takes the path of writing parity before data such that it will lose new data rather than risk a corruption of existing parity blocks. Whereas, again IIRC btrfs COW's the new data then COW's the parity block which risks inconsistent parity but at less risk of data loss (as parity can be recomputed).
Two different solutions to the same problem that is intrinsic to COW filesystems with parity data. Neither is particularly better IMO as both run the risk of data loss in an extreme event. Though such events are rare.
Regardless of what Ubuntu has convinced themselves of, in this context the ZFS filesystem driver would be an unlicensed derivative work. If they don't want it to be so, it needs to be in user-mode instead of loaded into the kernel address space and using unexported APIs of the kernel.
A lot of people try to deceive themselves (and you) that they can do silly things, like putting an API between software under two licenses, and that such an API becomes a "computer condom" that protects you from the GPL. This rationale was never true and was overturned by the court in the appeal of Oracle v. Google.
Bruce Perens.