Slashdot Mirror


OpenZFS Project Launches, Uniting ZFS Developers

Damek writes "The OpenZFS project launched today, the truly open source successor to the ZFS project. ZFS is an advanced filesystem in active development for over a decade. Recent development has continued in the open, and OpenZFS is the new formal name for this community of developers, users, and companies improving, using, and building on ZFS. Founded by members of the Linux, FreeBSD, Mac OS X, and illumos communities, including Matt Ahrens, one of the two original authors of ZFS, the OpenZFS community brings together over a hundred software developers from these platforms."

19 of 297 comments (clear)

  1. I'm addicted by MightyYar · · Score: 4, Interesting

    I love ZFS, if one can love a file system. Even for home use. It requires a little bit nicer hardware than a typical NAS, but the data integrity is worth it. I'm old enough to have been burned by random disk corruption, flaky disk controllers, and bad cables.

    --
    W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    1. Re:I'm addicted by Anonymous Coward · · Score: 5, Funny

      I love ZFS too, but I'd fucking kill for and open ReiserFS...

    2. Re:I'm addicted by Virtucon · · Score: 4, Funny

      I think that anything having to do with ReiserFS is a dead end.

      --
      Harrison's Postulate - "For every action there is an equal and opposite criticism"
    3. Re:I'm addicted by The+Last+Gunslinger · · Score: 4, Insightful

      I'm sure most readers here "got" it. It just wasn't funny.

  2. all i want is BP-rewrite by Anonymous Coward · · Score: 5, Informative

    If this gets us BP-rewrite, the holy grail of ZFS i'll be a happy man.

    For those who don't know what it is - BP-rewrite is block pointer rewrite, a feature promised for many years now but has never come. It's a lot like cold fusion is that its always X years away from us.

    BP-rewrite would allow implementation of the following features
    - Defrag
    - Shrinking vdevs
    - Removing vdevs from pools
    - Evacuating data from a vdev (say you wanted to destroy you're old 10 disk vdev and add it back to the pool as a different numbered disk vdev)

    1. Re:all i want is BP-rewrite by saleenS281 · · Score: 5, Informative

      Because a COW filesystem will become fragmented over time simply by the way it works. As you delete files, you're only free-ing up small segments of contiguous blocks. Over time, this leads to fragmentation because writes are sometimes forced into non-optimal disk placement due to lack of free space. Granted - if you never fill the pool beyond 50%, it won't be a problem. For everyone else, it's a matter of when, not if it will become fragmented.

  3. Still CDDL... by volkerdi · · Score: 4, Informative

    Oh well. I'd somehow hoped "truly open source" meant BSD license, or LGPL.

  4. Re:Patents? by utkonos · · Score: 4, Informative

    FAQ much? There is no central source repository for OpenZFS. Each supported operating system has it's own repository. The previous also has a link to the source tree for each of the supported projects under the umbrella.

  5. Re:Advatages of ZFS over BTRFS? by Vesvvi · · Score: 5, Informative

    I don't have any practical experience with BTRFS, but I use ZFS heavily at work.

    The advantage of ZFS is that it's tested, and it just works. When I started with our first ZFS testbed, I abused that thing in scary ways trying to get it to fail: hotplugging RAID controller cards, etc. Nothing really scratched it. Over the years I've made additional bad decisions such as upgrading filesystem versions while in a degraded state, missing logs, etc, but nothing has ever caused me to lose data, ever.

    The one negative to ZFS (if you can call it that) is that it makes you aware of inevitable failures (scrubs catch them). I'll lose about 1 or 2 files per year (out of many many terrabytes) just due to lousy luck, unless I store redundant high-level copies of data and/or metadata. Right now I use use stripes over many sets of mirrored drives, but it's not enough when you read or write huge quantities of data. I've ran the numbers and our losses are reasonable, but it's sobering to see the harsh reality that "good enough" efforts just aren't good enough for 100% at scale.

  6. Re:If you're successful, Larry will come a callin' by stoploss · · Score: 4, Funny

    Collecting money from opensource-companys? Daryl McBride will turn in his grave if Larry is even stupid enough to try it...

    Eh? I don't think that the Mormons bury their living, no matter how ghoulish are the corporations that they helm.

    I'm afraid Daryl McBride will be quite operational when your friends' commits arrive...

  7. Re:If you're successful, Larry will come a callin' by Bengie · · Score: 5, Informative

    Oracle released ZFS under a BSD compatible license. Anyone is allowed to do whatever to the opensource code. Going forward, Oracle has not opened an code after v28, which is the last OpenSource version to be compatible with Oracle ZFS.

  8. Re:Cool, but.. by Bengie · · Score: 4, Insightful

    Everything else is already handled with LVM and software RAID.

    You have a great sense of humor, keep it up.

  9. Re:ZFS for Windows? by tlambert · · Score: 5, Informative

    It doesn't have to be POSIX compliant to have it ported to it and it doesn't require somebody to pay for licensing. With the Features of ZFS one could argue that a port to at least Windows Server would be great and it would garnish quite a following from those who've had to put up with the way NTFS views disk volumes and storage.

    Windows isn't a very friendly development platform for Open Source, starting with the licensing requirements for tools and distribution restrictions on binaries derived from those tools when using header files containing substantial code, or runtime libraries. Part of this is an intentional legal defense against WINE and CrossOver Office, and part of it is just scale management by limiting the support community requirements to "serious developers".

    In addition, a lot of the installable filesystem and similar code, as well as a lot of the necessary VM internals (memory mapped files and paging/swapping from filesystems) are not adequately explained (i.e. they involve locking text regions with level 0 locks, which require a level 3 lock then a level 0 lock, and to do this to get the offsets on the physical media for the blocks in question. This used to not work on removable media in NT as of 4.0.1; not sure if it's supported yet, but it was the reason you couldn't install it in JAZZ drives or even regular hard drives in removable carriers.

    Having developed a filesystem for Windows95 IFSMgr, and reverse engineered all this crap, and having done it again for NT3.51, I would not look forward to having to repeat the process for Windows 7 or Windows 8, which are the only useful versions to target for by the time the code ends up functional.

    So unless someone wanted to seriously underwrite the effort (read: it's have to be done by Oracle, or by a startup who had a monetization strategy that Microsoft wouldn't preempt, like they did when my team, at a previous employer, ported UFS + Soft Updates to Windows 95, and they announced Longhorn-which-never-happened, and then put together a lawsuit about "deep reverse engineering" which would have precluded using it as a bootable FS... no thanks.

  10. Re: Data integrity by MightyYar · · Score: 4, Informative

    Not sure what you mean. You certainly can set up a mirrored pair (or triplet or quadruplet), but you can also set up what's referred to as raidz, where it stripes the redundancy across multiple disks. You can configure how much redundancy... 1, 2, or more disks if you like. You can also tell ZFS to keep multiple copies of blocks, and it will spread those copies out among the disks. You can set that policy per sub-volume (file system in zfs-speak), so that if you decide that some of your data deserves more redundancy, you can set up a folder that will keep 2 copies of everything, but leave all the other folders at 1 copy. It's super geeky. I've had it detect (and correct) corruption in a failing disk, detect corruption because of a flaky disk controller that would otherwise pretend to work fine, and detect corruption when a SATA cable came loose. Combined with the ECC RAM in the server, I feel more comfortable about the integrity of my data than I ever have. I've lost family photos before to random drive corruption, so I'm sensitive to this stuff :)

    --
    W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
  11. Re: Data integrity by saleenS281 · · Score: 4, Informative

    One point to be extremely clear on however - when you set copies = 2 on a folder level, it does NOT guarantee those copies end up on different physical spindles. Early on there were many people who lost files because they skipped RAID thinking that copies=X would protect their data. It is NOT meant as a means to protect against hardware failures.

  12. Re:Advatages of ZFS over BTRFS? by batkiwi · · Score: 5, Insightful

    Nice FUD there. You picked the btrfs-progs, which are the userspace tools, not the actual btrfs filesystem driver.

    http://git.kernel.org/cgit/linux/kernel/git/josef/btrfs-next.git/log/

  13. Re:Advatages of ZFS over BTRFS? by Vesvvi · · Score: 4, Interesting

    This is correct.

    It is statistically assured that you will lose some data with anything less than obscene redundancy. I've run the numbers and we've settled on what's acceptable to us: we have offline backups far more frequently than 2 times/year for everything, so dropping about 2 files/year that are completely unrecoverable without backups isn't a big deal.

    These systems are running a moderate number of very large static files, mixed with a very large number of very small files. The small files are SQLite-style records, and we churn through them very rapidly. I don't know exactly why, but it is always these small files that we lose: there is clearly a bias towards things that are written frequently. The analyst in me is quick to point out that implies failures in ZFS itself, beyond just the disks and "bit rot", but the accelerated failure isn't enough to worry about. So our non-failure rate is easily 6-nines or better per year on the live storage system, but it's still a bit uncomfortable to know that some data is going to be gone, despite that.

    With a minimal amount of effort you can get hardware and software which is not longer the biggest threat to your data. I am personally the most likely source of a catastrophic failure: operator error is more likely than an obscure hardware failure. ZFS has allowed me to reduce that operator error (snapshots, piping filesystems, nested datasets with inheritance), and simultaneously it's outperforming other options on both speeds and security. Overall, I'm extremely pleased.

  14. Re: Data integrity by greg1104 · · Score: 4, Interesting

    ECC RAM is an important part here, due to how scrubbing works in ZFS. The background disk scrubbing can check every block on the filesystem to see if it still matches its checksum, and it tries to repair issues found too. But if your memory is prone to flipping a bit, that can result in scrubbing actually destroying data that was perfectly fine until then. The worst case impact could even destroy the whole pool like that. It's a controversial issue; the odds of a massive pool failure and associated doom and gloom are seen as overblown by many people too. There's a quick summary of a community opinion survey at ZFS and ECC RAM, but sadly the mailing list links are broken and only lead to Oracle's crap now.

  15. Re: Data integrity by TheRaven64 · · Score: 4, Insightful

    ZFS doesn't have ECC, but it does checksum each block, so it can detect per-block errors. If you have valuable data, you can set the copies property to some value greater than 1 for that data set and it will ensure that each block is duplicated on the disk so if one fails a checksum then the other will be used to recover. If you have three disks, you can use RAID-Z, which loses you 1/3 of the space (not 1/2) and allows any single-disk failures to be recovered. Running zfs scrub will make it validate all of the data and when any read fails the checksums recover the data from the other two.

    The reason it doesn't use ECC is that ECC doesn't mesh well with the failure modes of disks. ECC is used in RAM because when it gets hot, hit by a solar ray, or whatever, it is common for a single bit to flip (in a single direction, which makes the error correction easier). In a disk, you typically have an entire block fail, not a single bit. Modern disks use multiple levels, so the smallest failure that is even theoretically possible might be a single byte (or nibble) in a block. And since the failure isn't biased, you'd need a fairly large amount of space. A better approach would be for the filesystem to generate something like Reed–Solomon code blocks for every n blocks that are written. This would allow single-block errors to be recovered, as long as the other blocks are okay. The down side of this approach is that the error correcting block would need to be rewritten whenever any of the other blocks is modified. this might be relatively easy to add to ZFS, as it uses a CoW structure, so block-overwrites are relatively rare (although erasing a lot of data would require a lot of checksums to be recalculated). This would mean that a single-block write would end up triggering a lot of reads and that would hurt performance. For ZFS, this might actually be easier to implement, as blocks are written out in transaction groups and so including an error correction block at the end might be a fairly simple modification.

    --
    I am TheRaven on Soylent News