OpenZFS Project Launches, Uniting ZFS Developers

← Back to Stories (view on slashdot.org)

OpenZFS Project Launches, Uniting ZFS Developers

Posted by Soulskill on Tuesday September 17, 2013 @12:05PM from the putting-the-band-together dept.

Damek writes "The OpenZFS project launched today, the truly open source successor to the ZFS project. ZFS is an advanced filesystem in active development for over a decade. Recent development has continued in the open, and OpenZFS is the new formal name for this community of developers, users, and companies improving, using, and building on ZFS. Founded by members of the Linux, FreeBSD, Mac OS X, and illumos communities, including Matt Ahrens, one of the two original authors of ZFS, the OpenZFS community brings together over a hundred software developers from these platforms."

30 of 297 comments (clear)

Min score:

Reason:

Sort:

I'm addicted by MightyYar · 2013-09-17 12:09 · Score: 4, Interesting

I love ZFS, if one can love a file system. Even for home use. It requires a little bit nicer hardware than a typical NAS, but the data integrity is worth it. I'm old enough to have been burned by random disk corruption, flaky disk controllers, and bad cables.

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
1. Re:I'm addicted by Anonymous Coward · 2013-09-17 12:18 · Score: 5, Funny
  
  I love ZFS too, but I'd fucking kill for and open ReiserFS...
2. Re:I'm addicted by Virtucon · 2013-09-17 12:42 · Score: 4, Funny
  
  I think that anything having to do with ReiserFS is a dead end.
  
  --
  Harrison's Postulate - "For every action there is an equal and opposite criticism"
3. Re:I'm addicted by The+Last+Gunslinger · 2013-09-17 21:19 · Score: 4, Insightful
  
  I'm sure most readers here "got" it. It just wasn't funny.
4. Re:I'm addicted by drinkypoo · 2013-09-17 23:31 · Score: 3, Funny
  
  OK stop already, you guys are driving this joke into the woods.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
all i want is BP-rewrite by Anonymous Coward · 2013-09-17 12:13 · Score: 5, Informative

If this gets us BP-rewrite, the holy grail of ZFS i'll be a happy man.
For those who don't know what it is - BP-rewrite is block pointer rewrite, a feature promised for many years now but has never come. It's a lot like cold fusion is that its always X years away from us.
BP-rewrite would allow implementation of the following features
- Defrag
- Shrinking vdevs
- Removing vdevs from pools
- Evacuating data from a vdev (say you wanted to destroy you're old 10 disk vdev and add it back to the pool as a different numbered disk vdev)
1. Re:all i want is BP-rewrite by saleenS281 · 2013-09-17 15:36 · Score: 5, Informative
  
  Because a COW filesystem will become fragmented over time simply by the way it works. As you delete files, you're only free-ing up small segments of contiguous blocks. Over time, this leads to fragmentation because writes are sometimes forced into non-optimal disk placement due to lack of free space. Granted - if you never fill the pool beyond 50%, it won't be a problem. For everyone else, it's a matter of when, not if it will become fragmented.
2. Re:all i want is BP-rewrite by Above · 2013-09-18 01:02 · Score: 3, Insightful
  
  You are correct that the disk will become fragmented, but the implication is fragmentation is a problem and that's simply not true. One of the prime causes of the misunderstanding is that fragmentation in Unix file systems is night and day different than fragmentation in a FAT file system, where most people are used to defragging windows drives. Unix file systems use much better algorithms to control fragmentation, so there is (generally) a lot less on a per file basis. They also automatically defragment, there are cases where when a fragmented file is written to the file system will defragment part of that file and rewrite it.
  The Berkeley FFS was the first to "solve" this problem, reserving 10% of the disk space primarily to avoid fragmentation. Decades of experience show that for all but the most corner of corner cases, that is enough, causing no significant amount of fragmentation, or performance degradation.
  * http://www.eecs.harvard.edu/~keith/research/tr94.html
  * http://www.cs.berkeley.edu/~brewer/cs262/FFS.pdf
  * http://www.cs.rutgers.edu/~pxk/416/notes/12-fs-studies.html
  * http://pages.cs.wisc.edu/~remzi/OSTEP/file-ffs.pdf
  The result is that for most applications fragmentation is a complete non-issue. After 25 years of playing with various file systems I've only seen it be an issue once, on an NNTP server that reached 20% fragmentation. Most user desktops and general purpose servers have under 1% fragmentation at all times. Generally, if you have a fragmentation problem it's because the storage is too full, and you need to add storage anyway (the aforementioned NNTP server was a good example). Adding the storage makes the problem go away.
  Most users of Unix file systems will never need to give fragmentation a second thought.
Still CDDL... by volkerdi · 2013-09-17 12:15 · Score: 4, Informative

Oh well. I'd somehow hoped "truly open source" meant BSD license, or LGPL.
1. Re:Still CDDL... by larry+bagina · 2013-09-17 13:26 · Score: 3, Informative
  
  CDDL is basically LGPL on a per-file basis.
  
  --
  Do you even lift?
  These aren't the 'roids you're looking for.
Patents? by Danathar · 2013-09-17 12:19 · Score: 3, Insightful

Not to rain on anybody's parade,but will the commercial holders of ZFS allow this? Or will they unleash some unholy patent suit to keep it from happening?
1. Re:Patents? by utkonos · 2013-09-17 12:49 · Score: 4, Informative
  
  FAQ much? There is no central source repository for OpenZFS. Each supported operating system has it's own repository. The previous also has a link to the source tree for each of the supported projects under the umbrella.
Re:FINALLY. by Anonymous Coward · 2013-09-17 12:53 · Score: 3, Informative

Been using btrfs for several non-essential file systems. Working great so far, and have even done several successful bedup runs. Has worked great for minimizing disk usage on some Maven repositories with lots of duplicate files between Jenkins and Nexus. Maybe not tested enough for your server that you need to stay up all the time, but great for the home desktop (provided you're sane and are keeping backups, which you should be doing already anyway). The more testing it gets, the sooner it becomes "tested enough" for the needs-to-always-be-available server.
Re:Advatages of ZFS over BTRFS? by Vesvvi · 2013-09-17 12:59 · Score: 5, Informative

I don't have any practical experience with BTRFS, but I use ZFS heavily at work.
The advantage of ZFS is that it's tested, and it just works. When I started with our first ZFS testbed, I abused that thing in scary ways trying to get it to fail: hotplugging RAID controller cards, etc. Nothing really scratched it. Over the years I've made additional bad decisions such as upgrading filesystem versions while in a degraded state, missing logs, etc, but nothing has ever caused me to lose data, ever.
The one negative to ZFS (if you can call it that) is that it makes you aware of inevitable failures (scrubs catch them). I'll lose about 1 or 2 files per year (out of many many terrabytes) just due to lousy luck, unless I store redundant high-level copies of data and/or metadata. Right now I use use stripes over many sets of mirrored drives, but it's not enough when you read or write huge quantities of data. I've ran the numbers and our losses are reasonable, but it's sobering to see the harsh reality that "good enough" efforts just aren't good enough for 100% at scale.
Re:If you're successful, Larry will come a callin' by stoploss · 2013-09-17 13:08 · Score: 4, Funny

Collecting money from opensource-companys? Daryl McBride will turn in his grave if Larry is even stupid enough to try it...
Eh? I don't think that the Mormons bury their living, no matter how ghoulish are the corporations that they helm.
I'm afraid Daryl McBride will be quite operational when your friends' commits arrive...
Re:FINALLY. by Virtucon · 2013-09-17 13:13 · Score: 3, Informative

licensing or patent issues?
What you also forget is that Oracle was the leading proponent of BTRFS and yes it had to do with licensing and patents from Sun. Once they acquired Sun that all went out the window. If I were the CEO at Oracle I'd ask "Why two file systems that essentially do the same thing? One's mature and the other, not so much" That's why BTRFS still survives but now with less Oracle support. Wait, is that a bad thing?

--
Harrison's Postulate - "For every action there is an equal and opposite criticism"
Re:If you're successful, Larry will come a callin' by Bengie · 2013-09-17 13:15 · Score: 5, Informative

Oracle released ZFS under a BSD compatible license. Anyone is allowed to do whatever to the opensource code. Going forward, Oracle has not opened an code after v28, which is the last OpenSource version to be compatible with Oracle ZFS.
Re:Cool, but.. by Bengie · 2013-09-17 13:17 · Score: 4, Insightful

Everything else is already handled with LVM and software RAID.
You have a great sense of humor, keep it up.
Re:Cool, but.. by smash · 2013-09-17 13:38 · Score: 3, Informative

That. Those who don't understand ZFS are condemned to reinvent it, poorly.

--
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 13:45 · Score: 3, Informative

You don't understand. ZFS didn't lose that data -- ZFS detected that the underlying disk drives lost that data. You can run ZFS in a highly redundant modes that allow it to reconstruct lost data, but it sounds like OP's redundancy is such that sufficient drives may lose bytes as to cause lost files.
Re:ZFS for Windows? by tlambert · 2013-09-17 14:15 · Score: 5, Informative

It doesn't have to be POSIX compliant to have it ported to it and it doesn't require somebody to pay for licensing. With the Features of ZFS one could argue that a port to at least Windows Server would be great and it would garnish quite a following from those who've had to put up with the way NTFS views disk volumes and storage.
Windows isn't a very friendly development platform for Open Source, starting with the licensing requirements for tools and distribution restrictions on binaries derived from those tools when using header files containing substantial code, or runtime libraries. Part of this is an intentional legal defense against WINE and CrossOver Office, and part of it is just scale management by limiting the support community requirements to "serious developers".
In addition, a lot of the installable filesystem and similar code, as well as a lot of the necessary VM internals (memory mapped files and paging/swapping from filesystems) are not adequately explained (i.e. they involve locking text regions with level 0 locks, which require a level 3 lock then a level 0 lock, and to do this to get the offsets on the physical media for the blocks in question. This used to not work on removable media in NT as of 4.0.1; not sure if it's supported yet, but it was the reason you couldn't install it in JAZZ drives or even regular hard drives in removable carriers.
Having developed a filesystem for Windows95 IFSMgr, and reverse engineered all this crap, and having done it again for NT3.51, I would not look forward to having to repeat the process for Windows 7 or Windows 8, which are the only useful versions to target for by the time the code ends up functional.
So unless someone wanted to seriously underwrite the effort (read: it's have to be done by Oracle, or by a startup who had a monetization strategy that Microsoft wouldn't preempt, like they did when my team, at a previous employer, ported UFS + Soft Updates to Windows 95, and they announced Longhorn-which-never-happened, and then put together a lawsuit about "deep reverse engineering" which would have precluded using it as a bootable FS... no thanks.
aka bcache + any filesystem you want by raymorris · 2013-09-17 14:26 · Score: 3, Informative

Using a small, fast SSD as a cache for large, slow disks can be awesome for some workloads, mostly servers with many concurrent users.
To do that with ANY filesystem, bcache is now part of the mainline kernel . dmcache does the same thing, and there is another one that Facebook uses.
Re: Data integrity by MightyYar · 2013-09-17 14:51 · Score: 4, Informative

Not sure what you mean. You certainly can set up a mirrored pair (or triplet or quadruplet), but you can also set up what's referred to as raidz, where it stripes the redundancy across multiple disks. You can configure how much redundancy... 1, 2, or more disks if you like. You can also tell ZFS to keep multiple copies of blocks, and it will spread those copies out among the disks. You can set that policy per sub-volume (file system in zfs-speak), so that if you decide that some of your data deserves more redundancy, you can set up a folder that will keep 2 copies of everything, but leave all the other folders at 1 copy. It's super geeky. I've had it detect (and correct) corruption in a failing disk, detect corruption because of a flaky disk controller that would otherwise pretend to work fine, and detect corruption when a SATA cable came loose. Combined with the ECC RAM in the server, I feel more comfortable about the integrity of my data than I ever have. I've lost family photos before to random drive corruption, so I'm sensitive to this stuff :)

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re: Data integrity by saleenS281 · 2013-09-17 15:34 · Score: 4, Informative

One point to be extremely clear on however - when you set copies = 2 on a folder level, it does NOT guarantee those copies end up on different physical spindles. Early on there were many people who lost files because they skipped RAID thinking that copies=X would protect their data. It is NOT meant as a means to protect against hardware failures.
Re:Advatages of ZFS over BTRFS? by batkiwi · 2013-09-17 16:36 · Score: 5, Insightful

Nice FUD there. You picked the btrfs-progs, which are the userspace tools, not the actual btrfs filesystem driver.
http://git.kernel.org/cgit/linux/kernel/git/josef/btrfs-next.git/log/
Re:Advatages of ZFS over BTRFS? by Vesvvi · 2013-09-17 17:30 · Score: 4, Interesting

This is correct.
It is statistically assured that you will lose some data with anything less than obscene redundancy. I've run the numbers and we've settled on what's acceptable to us: we have offline backups far more frequently than 2 times/year for everything, so dropping about 2 files/year that are completely unrecoverable without backups isn't a big deal.
These systems are running a moderate number of very large static files, mixed with a very large number of very small files. The small files are SQLite-style records, and we churn through them very rapidly. I don't know exactly why, but it is always these small files that we lose: there is clearly a bias towards things that are written frequently. The analyst in me is quick to point out that implies failures in ZFS itself, beyond just the disks and "bit rot", but the accelerated failure isn't enough to worry about. So our non-failure rate is easily 6-nines or better per year on the live storage system, but it's still a bit uncomfortable to know that some data is going to be gone, despite that.
With a minimal amount of effort you can get hardware and software which is not longer the biggest threat to your data. I am personally the most likely source of a catastrophic failure: operator error is more likely than an obscure hardware failure. ZFS has allowed me to reduce that operator error (snapshots, piping filesystems, nested datasets with inheritance), and simultaneously it's outperforming other options on both speeds and security. Overall, I'm extremely pleased.
Re: Data integrity by greg1104 · 2013-09-17 17:40 · Score: 4, Interesting

ECC RAM is an important part here, due to how scrubbing works in ZFS. The background disk scrubbing can check every block on the filesystem to see if it still matches its checksum, and it tries to repair issues found too. But if your memory is prone to flipping a bit, that can result in scrubbing actually destroying data that was perfectly fine until then. The worst case impact could even destroy the whole pool like that. It's a controversial issue; the odds of a massive pool failure and associated doom and gloom are seen as overblown by many people too. There's a quick summary of a community opinion survey at ZFS and ECC RAM, but sadly the mailing list links are broken and only lead to Oracle's crap now.
Re: Data integrity by kthreadd · 2013-09-17 18:09 · Score: 3, Informative

That's what you have backups for.
Re: Data integrity by TheRaven64 · 2013-09-17 21:06 · Score: 4, Insightful

ZFS doesn't have ECC, but it does checksum each block, so it can detect per-block errors. If you have valuable data, you can set the copies property to some value greater than 1 for that data set and it will ensure that each block is duplicated on the disk so if one fails a checksum then the other will be used to recover. If you have three disks, you can use RAID-Z, which loses you 1/3 of the space (not 1/2) and allows any single-disk failures to be recovered. Running zfs scrub will make it validate all of the data and when any read fails the checksums recover the data from the other two.
The reason it doesn't use ECC is that ECC doesn't mesh well with the failure modes of disks. ECC is used in RAM because when it gets hot, hit by a solar ray, or whatever, it is common for a single bit to flip (in a single direction, which makes the error correction easier). In a disk, you typically have an entire block fail, not a single bit. Modern disks use multiple levels, so the smallest failure that is even theoretically possible might be a single byte (or nibble) in a block. And since the failure isn't biased, you'd need a fairly large amount of space. A better approach would be for the filesystem to generate something like Reed–Solomon code blocks for every n blocks that are written. This would allow single-block errors to be recovered, as long as the other blocks are okay. The down side of this approach is that the error correcting block would need to be rewritten whenever any of the other blocks is modified. this might be relatively easy to add to ZFS, as it uses a CoW structure, so block-overwrites are relatively rare (although erasing a lot of data would require a lot of checksums to be recalculated). This would mean that a single-block write would end up triggering a lot of reads and that would hurt performance. For ZFS, this might actually be easier to implement, as blocks are written out in transaction groups and so including an error correction block at the end might be a fairly simple modification.

--
I am TheRaven on Soylent News
Re: FINALLY. by Eunuchswear · 2013-09-18 01:14 · Score: 3, Funny

You don't have a multi-petabyte array with mission criitical data at home?

--
Watch this Heartland Institute video