OpenZFS Project Launches, Uniting ZFS Developers
Damek writes "The OpenZFS project launched today, the truly open source successor to the ZFS project. ZFS is an advanced filesystem in active development for over a decade. Recent development has continued in the open, and OpenZFS is the new formal name for this community of developers, users, and companies improving, using, and building on ZFS. Founded by members of the Linux, FreeBSD, Mac OS X, and illumos communities, including Matt Ahrens, one of the two original authors of ZFS, the OpenZFS community brings together over a hundred software developers from these platforms."
I love ZFS, if one can love a file system. Even for home use. It requires a little bit nicer hardware than a typical NAS, but the data integrity is worth it. I'm old enough to have been burned by random disk corruption, flaky disk controllers, and bad cables.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
If this gets us BP-rewrite, the holy grail of ZFS i'll be a happy man.
For those who don't know what it is - BP-rewrite is block pointer rewrite, a feature promised for many years now but has never come. It's a lot like cold fusion is that its always X years away from us.
BP-rewrite would allow implementation of the following features
- Defrag
- Shrinking vdevs
- Removing vdevs from pools
- Evacuating data from a vdev (say you wanted to destroy you're old 10 disk vdev and add it back to the pool as a different numbered disk vdev)
Oh well. I'd somehow hoped "truly open source" meant BSD license, or LGPL.
Not to rain on anybody's parade,but will the commercial holders of ZFS allow this? Or will they unleash some unholy patent suit to keep it from happening?
As long as Oracle's patents are valid, can anyone seriously believe this will go anywhere?
His fleet of boats isn't going to pay for itself.
http://lkml.org/lkml/2005/8/20/95
I'm sure I'll be corrected if I'm wrong, but does it offer any advantage over BTRFS? I'm not trying to start a flame war; I'm honestly asking.
Been using btrfs for several non-essential file systems. Working great so far, and have even done several successful bedup runs. Has worked great for minimizing disk usage on some Maven repositories with lots of duplicate files between Jenkins and Nexus. Maybe not tested enough for your server that you need to stay up all the time, but great for the home desktop (provided you're sane and are keeping backups, which you should be doing already anyway). The more testing it gets, the sooner it becomes "tested enough" for the needs-to-always-be-available server.
It doesn't have to be POSIX compliant to have it ported to it and it doesn't require somebody to pay for licensing. With the Features of ZFS one could argue that a port to at least Windows Server would be great and it would garnish quite a following from those who've had to put up with the way NTFS views disk volumes and storage. There are applications that run well on Windows, especially on the Server side of things so I wouldn't call it dead quite yet. Besides, with Server 2012 we now have Storage Spaces and ReFS which brings some ZFS features to the table, but it's nowhere as sophisticated ad ZFS. There's already been one attempt but it doesn't appear to be actively maintained and it's read only. Oracle has software for Windows Server that interfaces to the Sun ZFS Storage Server (SAN) that works at the VSS level. It's not exposing a ZFS filesystem to windows either, but ZFS is configurable in the SAN. That's a hefty uplift if you're already in deep with EMC or NetApp.
Harrison's Postulate - "For every action there is an equal and opposite criticism"
licensing or patent issues?
What you also forget is that Oracle was the leading proponent of BTRFS and yes it had to do with licensing and patents from Sun. Once they acquired Sun that all went out the window. If I were the CEO at Oracle I'd ask "Why two file systems that essentially do the same thing? One's mature and the other, not so much" That's why BTRFS still survives but now with less Oracle support. Wait, is that a bad thing?
Harrison's Postulate - "For every action there is an equal and opposite criticism"
Everything else is already handled with LVM and software RAID.
You have a great sense of humor, keep it up.
That. Those who don't understand ZFS are condemned to reinvent it, poorly.
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
It doesn't have to be POSIX compliant to have it ported to it and it doesn't require somebody to pay for licensing. With the Features of ZFS one could argue that a port to at least Windows Server would be great and it would garnish quite a following from those who've had to put up with the way NTFS views disk volumes and storage.
Windows isn't a very friendly development platform for Open Source, starting with the licensing requirements for tools and distribution restrictions on binaries derived from those tools when using header files containing substantial code, or runtime libraries. Part of this is an intentional legal defense against WINE and CrossOver Office, and part of it is just scale management by limiting the support community requirements to "serious developers".
In addition, a lot of the installable filesystem and similar code, as well as a lot of the necessary VM internals (memory mapped files and paging/swapping from filesystems) are not adequately explained (i.e. they involve locking text regions with level 0 locks, which require a level 3 lock then a level 0 lock, and to do this to get the offsets on the physical media for the blocks in question. This used to not work on removable media in NT as of 4.0.1; not sure if it's supported yet, but it was the reason you couldn't install it in JAZZ drives or even regular hard drives in removable carriers.
Having developed a filesystem for Windows95 IFSMgr, and reverse engineered all this crap, and having done it again for NT3.51, I would not look forward to having to repeat the process for Windows 7 or Windows 8, which are the only useful versions to target for by the time the code ends up functional.
So unless someone wanted to seriously underwrite the effort (read: it's have to be done by Oracle, or by a startup who had a monetization strategy that Microsoft wouldn't preempt, like they did when my team, at a previous employer, ported UFS + Soft Updates to Windows 95, and they announced Longhorn-which-never-happened, and then put together a lawsuit about "deep reverse engineering" which would have precluded using it as a bootable FS... no thanks.
Using a small, fast SSD as a cache for large, slow disks can be awesome for some workloads, mostly servers with many concurrent users.
To do that with ANY filesystem, bcache is now part of the mainline kernel . dmcache does the same thing, and there is another one that Facebook uses.
Windows isn't a very friendly development platform for Open Source, starting with the licensing requirements for tools and distribution restrictions on binaries derived from those tools when using header files containing substantial code, or runtime libraries.
Well, the tools are free and there isn't a redistribution problem, never has been.
Now, you could argue that ZFS and Windows won't work unless MS does it because ZFS is the whole disk I/O stack rolled into one, and no driver is going to work with the kernel to allow the ZFS system to work in windows, but thats another story entirely. Theres no way to bypass the disk cache for instance, not in a way ZFS would be compatible with. ZFS must use its own cache, and directly access the raw devices, and provide the filesystem driver all rolled into one ... but spread all across the kernel, in order to get proper performance.
Could get pretty close with some good hacks though, such as FUSE.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
I wish they had encryption... *sigh*
No, I don't want workarounds, I want it to be built in to ZFS like in Solaris 11.
Not sure what you mean. You certainly can set up a mirrored pair (or triplet or quadruplet), but you can also set up what's referred to as raidz, where it stripes the redundancy across multiple disks. You can configure how much redundancy... 1, 2, or more disks if you like. You can also tell ZFS to keep multiple copies of blocks, and it will spread those copies out among the disks. You can set that policy per sub-volume (file system in zfs-speak), so that if you decide that some of your data deserves more redundancy, you can set up a folder that will keep 2 copies of everything, but leave all the other folders at 1 copy. It's super geeky. I've had it detect (and correct) corruption in a failing disk, detect corruption because of a flaky disk controller that would otherwise pretend to work fine, and detect corruption when a SATA cable came loose. Combined with the ECC RAM in the server, I feel more comfortable about the integrity of my data than I ever have. I've lost family photos before to random drive corruption, so I'm sensitive to this stuff :)
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
One point to be extremely clear on however - when you set copies = 2 on a folder level, it does NOT guarantee those copies end up on different physical spindles. Early on there were many people who lost files because they skipped RAID thinking that copies=X would protect their data. It is NOT meant as a means to protect against hardware failures.
ECC RAM is an important part here, due to how scrubbing works in ZFS. The background disk scrubbing can check every block on the filesystem to see if it still matches its checksum, and it tries to repair issues found too. But if your memory is prone to flipping a bit, that can result in scrubbing actually destroying data that was perfectly fine until then. The worst case impact could even destroy the whole pool like that. It's a controversial issue; the odds of a massive pool failure and associated doom and gloom are seen as overblown by many people too. There's a quick summary of a community opinion survey at ZFS and ECC RAM, but sadly the mailing list links are broken and only lead to Oracle's crap now.
That's what you have backups for.
Temporary files and swap aren't a problem...
Swap can and should be stored on a separate partition, and encrypted using a randomly generated key so its completely lost after a reboot.
On a properly configured system, only a very small number of locations will be writable by the user, typically the user's home directory and a temporary area... The temporary area can be stored in ram/swap since it doesn't matter if its contents are lost and home can be encrypted.
It's trivial to add a hardware key logger to virtually any system irrespective of how the software is configured, if someone untrusted has had unescorted physical access to the system then the system should be considered compromised anyway. A hardware keylogger is also os independent, doing it on software requires the malicious party to know what os you're using in advance in order to have a compatible keylogger, and also to work around any non standard configuration you might have.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
ZFS doesn't have ECC, but it does checksum each block, so it can detect per-block errors. If you have valuable data, you can set the copies property to some value greater than 1 for that data set and it will ensure that each block is duplicated on the disk so if one fails a checksum then the other will be used to recover. If you have three disks, you can use RAID-Z, which loses you 1/3 of the space (not 1/2) and allows any single-disk failures to be recovered. Running zfs scrub will make it validate all of the data and when any read fails the checksums recover the data from the other two.
The reason it doesn't use ECC is that ECC doesn't mesh well with the failure modes of disks. ECC is used in RAM because when it gets hot, hit by a solar ray, or whatever, it is common for a single bit to flip (in a single direction, which makes the error correction easier). In a disk, you typically have an entire block fail, not a single bit. Modern disks use multiple levels, so the smallest failure that is even theoretically possible might be a single byte (or nibble) in a block. And since the failure isn't biased, you'd need a fairly large amount of space. A better approach would be for the filesystem to generate something like Reed–Solomon code blocks for every n blocks that are written. This would allow single-block errors to be recovered, as long as the other blocks are okay. The down side of this approach is that the error correcting block would need to be rewritten whenever any of the other blocks is modified. this might be relatively easy to add to ZFS, as it uses a CoW structure, so block-overwrites are relatively rare (although erasing a lot of data would require a lot of checksums to be recalculated). This would mean that a single-block write would end up triggering a lot of reads and that would hurt performance. For ZFS, this might actually be easier to implement, as blocks are written out in transaction groups and so including an error correction block at the end might be a fairly simple modification.
I am TheRaven on Soylent News
(You seem to write well so you'll probably appreciate being reminded it's "garner" not "garnish")
Windows isn't a very friendly development platform for Open Source, starting with the licensing requirements for tools and distribution restrictions on binaries derived from those tools when using header files containing substantial code, or runtime libraries.
Well, the tools are free and there isn't a redistribution problem, never has been.
Not according to this document; the runtime components are not redistributable. This is an Anti-WINE license measure:
http://msdn.microsoft.com/en-us/library/ms235299(v=vs.90).aspx
Now, you could argue that ZFS and Windows won't work unless MS does it because ZFS is the whole disk I/O stack rolled into one, and no driver is going to work with the kernel to allow the ZFS system to work in windows, but thats another story entirely. Theres no way to bypass the disk cache for instance, not in a way ZFS would be compatible with. ZFS must use its own cache, and directly access the raw devices, and provide the filesystem driver all rolled into one ... but spread all across the kernel, in order to get proper performance.
Could get pretty close with some good hacks though, such as FUSE.
This is actually reverse-engineerable. FUSE isn't an option, since pages which get memory mapped and dirtied are not propagated up via invalidation events. This is the same problem the Heidemann stacking framework has if you stack FS A on top of FS B, and then expose both of them as visible in the mount hierarchy namespace. You can do some things, but you can't do really complicated things.
You don't have a multi-petabyte array with mission criitical data at home?
Watch this Heartland Institute video