The State of ZFS On Linux
An anonymous reader writes: Richard Yao, one of the most prolific contributors to the ZFSOnLinux project, has put up a post explaining why he thinks the filesystem is definitely production-ready. He says, "ZFS provides strong guarantees for the integrity of [data] from the moment that fsync() returns on a file, an operation on a synchronous file handle is returned or dirty writeback occurs (by default every 5 seconds). These guarantees are enabled by ZFS' disk format, which places all data into a Merkle tree that stores 256-bit checksums and is changed atomically via a two-stage transaction commit.. ... Sharing a common code base with other Open ZFS platforms has given ZFS on Linux the opportunity to rapidly implement features available on other Open ZFS platforms. At present, Illumos is the reference platform in the Open ZFS community and despite its ZFS driver having hundreds of features, ZoL is only behind on about 18 of them."
Maybe wrong place for a bug report, but are there any known issues with rsync? Maybe on OpenZFS?
Apart from that, love the free ZFS and all the work done.
I'm still quite unfamiliar with all the concept of ZFS. How would it compare to a LVMed RAID-5 with EXT4?
To quote Kevin Smith: Man... I don't know what the FUCK you just said little kid, but you're special man, you reached out and you touch a brother's heart.
It's unfortunate that the code is being ported to Linux, not rewritten. This means there will never be native Debian support for it. As a result, unsurprisingly, packages are only available for the Debian amd64 architecture.
I've been using ZFSonLinux for a year in production. No problems at all. It's my storage back end for Xen Virtual machines. Just make sure you use ECC RAM and a decent hard disk controller. Instant snapshots and ZFS send/receive functions are awesome, have reduced my backup times by an order of magnitude. I use a Debian Wheezy/Unstable hybrid.
I've been using this for a production fileserver for about a year and a half. Prior to that I was using ZFS on FUSE for about a year.
The only minor negative things I can say is that when you do have some odd kind of failure ZFS (and this may be the case on BSD and Solaris) gives you some pretty scary messages like "Please recover from backup" but usually exporting and importing the FS brings it back at least in a degraded state. My other caveat might just be my linux distro but I've often had problems with older versions of the libraries hanging around and causing the command line tools to break.
It's a killer file system. Once you've used it, you won't be able to leave it.
So how much space does the chechsums take up? How much does all this behind the scenes work slow down the data retrieval/writing?
Is this something that a normal consumer would use for their main storage?
Troll is not a replacement for I disagree.
For all the technobabble in that summary, I still don't know what ZFS offers me over other filesystems. Maybe the guys working on the system should do a little marketing course, or work on their 'elevator pitch'...
"Fix it? It has been disintegrated, by definition it cannot be fixed!" - Gru in Despicable Me.
https://pthree.org/2013/12/10/zfs-administration-appendix-c-why-you-should-use-ecc-ram/
It's really too bad laptops don't offer ECC RAMs still. I'm willing to pay a little more for ECC RAM capable motherboard.
I've been using ZFS on Linux for about a year. I can summarise my position on the experience with two words: it's magic.
It is still tricky to run one's root system off ZFS (at least on Debian). That, I think, is for those who are brave and have to time to deal with issues that might arise following updates. But for non-root filesystems, ZFS is, as I said, magic. It's fast, reliable, caches intelligently, adaptable to a large variety of mirror/striping/RAID configurations, snapshots with incredible efficiency, and simply works as advertised.
Someone once (before the port to other OSes) said that ZFS was Solaris' "killer app". Having used it in production for a year, I can understand why they said that.
I bought a new external 4TB drive and decided to use ZFS on it. I can't. 'zpool create' always freezes, or at least I always kill it after a day. I ran badblock and didn't find any errors. OS: LMDE w/zfs-fuse
I've been hearing this forever now by sysadmins. I have also seen very horrible implementations on large transactional databases with horribly misconfigured ZFS settings causing slower performance and even data loss... despite the heavy push to use ZFS in this case. Use ZFS in the right place and I'm sure it is fine, stop trying to use it for everything... and stop assuming allocating memory to ZFS will outperform allocating that same memory to a database who knows exactly how to get to all the data and has metadata to help accomplish it efficiently.
How can it be production-ready if it still lacks SELinux support.. the ZOL FAQ suggests either permissive or disabling of it entirely.
makes it a showstopper for us. last time i checked, it was scheduled for 0.8.0, which does not even has due date.
clearly doesn't know what "production ready" means....... /dev/sd** designation changes with every reboot
you have to add devices by-id because the
and don't try to do a dist-upgrade without exporting the pool and praying.... prayer isn't even going to work, ask me how i f****** know....
FreeBSD has had ZFS for what, over five years now? They are the reason it exists in any actual use (OpenSolaris/Illumos don't count) on any non-Sun/Oracle platform.
And Linux's wannabee ZFS competitor BTRFS (oooh, look at us) sucks so bad it can't get off the ground.
So what does Linux do.... import (steal) ZFS from OpenZFS/FreeBSD and start posting about how great all their work with ZFS is, and how Linux bloggers now say 'oh yeah, ZFS is actually solid, so we can use it'. As if they are the only/first ones to certify ZFS. Thing is, ZFS was always solid. When bashing ZFS Linux was really just babbling about ZFS's more open and free BSD License and their own failure of BTRFS.
Any more Linux is just a craptank of unreliable mashed up bloatware.
If you want an integrated system that just works, try FreeBSD.
> ZFS is a layer below LVM.
Typically you'd layer raid, then LVM, then the filesystem. ZFS tries to be all three. It's raid, and it's a volume manager, and it's a filesystem. There are some benefits to integration, and some drawbacks. With the raid>lvm>filesystem approach, it's trivial to add dm-cache, bcache, iscsi, or any other piece of storage technology. With ZFS, anything you want to add has to be specifically supported within ZFS.
The Unix tradition is small, single purpose tools that do one thing well. Witness sort, grep, wc, etc. Want to count the log entries that mention Slashdot? You don't need a special tool for that, just grep slashdot | wc -l . Tools like mdadm and lvm are building blocks that can be combined to suit your need, the Unix way. ZFS is a big monolithic package that does everything, much like Microsoft Word or Outlook. ZFS is more in the Microsoft tradition.
Hey, I'm the guy who got modded +5 funny for replying to the 8/10TB disk announcement with "of course they did, I ordered 6TB drives 2 hours ago". Well, I switched my home NAS over to ZFS last month. So, yay for me, for once I'm ahead in at least some minimal sense or other!
Seriously though, I have found ZFS to be a damned good solution so far. (FYI, CentOS, Core i5, 4GB, 6x4TB with 2-disk parity, 2 eSATA -> port multipliers...) I really don't think I will ever deploy hardware RAID again.
I think you're giving the wrong idea here. I have yet to find a format of storage capacity that zfs won't support, with one exception: you can't create a zvol on a zpool, then attach that zvol as back-end storage for the same zpool. That is specifically disallowed, and I'm guessing that you can't use a zvol from one zpool to back-end another zpool either. This is a very bizarre (also, probably dumb) thing to do, but even this can be overridden if you're really desperate. For more practical applications, everything else just works: at least in FreeBSD, you can "hide" the block devices behind all different kinds of abstractions to provide 4k writes, encryption, whatever, and zfs will consume those virtual block devices just fine.
ZFS on Linux would be cooler if they could port Time Slider to Linux from Open Solaris. http://java.dzone.com/news/kil...
iSCSI doesn't need to be baked into ZFS, in fact, even on Illumos it isn't. It's in a completely different subsystem and will happily work with any block device as its backend storage (be it a physical drive, a ZFS zvol, a loopback block device or anything else, really).
Anything that can be represented as a block device can added to a zpool. This also includes files which is handy when your trying to understand complicated interactions you can mock up a small zpool based on files instead of devices for testing.
On the otherside of the abstraction ZFS can also expose block devices called zvols that will be backed by the zpool. So if you wanted to run a dmcrypted EXT4 filesystem backed by a zpool you can certain do that using a zvol and still get all the benefits of ZFS integrity protection and snapshoting.
Plenty of layering can be done with ZFS.
Have you confirmed using a zvol underneath a zpool, and if so was it a different zpool?
I've wanted to do that in the past, but it was specifically blocked. It's a pretty ugly thing to do, but it does give you a "new" block device that could be imported as a mirror on-demand. With enough drives in the zpool, that new device is nearly independent from its mirror, from a failure perspective.
But it still won't stem the flow of Linux refugees to *BSD due to the trash that is being put into the Linux ecosystem such as systemd.
Not really. Fundamentally, a filesystem's job is to store data in a structured manner on an unstructured array of blocks. For everything ZFS does, it still comes down to that.
There are a great many advantages to having that structure include duplicate blocks and checksums.
If you really prefer, you can reasonably build a non-redundant ZFS pool on top of a RAID volume though you will lose a few advantages that way.
Several other posting here, fans of ZFS, are saying on this very page that ZFS really needs to be accessing bare disks. it will allow you to use another block device, they say, but data corruption is highly likely.
# touch /export/lun/0 /export/lun/0 # file backing /dev/rdsk/c1t1d0s0 # raw disk backing
# sbdadm create-lu -s 10g
# sbdadm create-lu
The Linux implementation lacks support for the Archive, Hidden, ReadOnly, and System file attributes; those are needed for MS-DOS support:-]
I was talking about iSCSI support, not how the ZFS themselves pools are built. Yes, ZFS *does* prefer raw disk backing (due to certain management simplifications), but it does not need it. In fact, it's quite possible and people do frequently run ZFS on top of RAID arrays.
You must have an enormous collection of Linux Distributions at home to need that much storage.
I was promised a flying car. Where is my flying car?
You know you can lay a zfs filesystem on files, right?
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
And was very impressed. It was a new 4-drive system I'd put together to operate as both a NAS/fileserver and a host for virtual machines. I had originally intended to use RAID 5, but decided to give ZFS a try after reading about it. My initial config had it booting Ubuntu (maybe Mint? I don't recall), with ZFS for Linux installed as the main non-boot filesystem with one-drive redundancy. I had all sorts of problems with drives dropping out of the array, which I eventually tracked down to the motherboard shipping with bad SATA cables. ZFS handled this admirably. At first I didn't notice one of the drives had dropped, and continued using the system for about a day. When I got the drive working again, as I understand it RAID 5 would have had to do a complete array rebuild because of the changed files. ZFS noticed most of my old data was on the "new" drive and simply validated the checksums as still accurate, then noticed I had written new files and automatically created new redundancy files for them on the "new" drive. The entire "rebuild" only took a little over an hour instead of the 20+ hours I was expecting (how long it takes me to backup the data over eSATA).
If you're wondering why ZFS trusts the checksums on the "new" drive instead of reading the entire file, it will read the entire file and compare it to the checksum every time you access it. Once a month by default, it runs a "scrub" where it reads every file and verifies they haven't suffered bit rot and still match the checksums. Apparently the strategy after a dropped drive is to get the redundant filesystem up and running again ASAP, then do the file integrity scrub afterwards at its leisure. (You can manually force this check at any time with a zfs scrub.)
The other main advantage I'd say is that it's incredibly flexible when you're putting together redundant arrays. RAID 5 normally requires 3+ drives or partitions of the same size. ZFS lets you mix together drives, partitions, files (yes, one of your ZFS "drives" can be a file on another filesystem), other devices like SAS drives, etc. You can even put the 3+ "drives" needed for redundancy onto a single drive if you just want to play around with it for testing.
The only problem I ran into was with deduplication. Dedup was part of the reason I decided to try ZFS, and is one of the features frequently mentioned by ZFS advocates. While dedup does work, it is an incredible memory and performance hog. Writes to the ZFS array went from 65+ MB/s (bunch of mixed random files) down to about 8 MB/s with dedup turned on, and memory use climbed to where I ordered more RAM to bump the system up to 16 GB. In the end I decided the approx 2% disk space I was saving with dedup wasn't worth it and disabled it.
I eventually switch to FreeNAS (based on FreeBSD, which has a native port of ZFS) because it was annoying having to reinstall ZFS for Linux after an Ubuntu/Mint update, and I couldn't see myself doing that after every new release because I wanted features which were added to the core OS. (And if you're wondering, dedup performance is just as bad under FreeNAS.)
The cost of this approach has always been performance. It is faster, for example, to use grep's -c switch than to pipe grep's output into wc -l (as is commonly done in poorly-written scripts).
When it comes to storage, the performance penalty of using separate layers, which aren't well-aware of each other, becomes big enough to justify integration...
In Soviet Washington the swamp drains you.
I've used ZFS on Linux for years now and it's been fantastic for my long-term storage. One 2 TB drive runs all the time and I power another one up periodically to auto-sync with the first one. That saves power and drive wear vs an always-on RAID0 setup.
Data corruption isn't likely. The reason to use partitioned disks is for performance so that ZFS can control the disk caches directly.
The cost of this approach has always been performance.
Uh, no.
Sure, in the specific case you cite, that's correct, because grep can easily count the number of lines it outputs. In the general case, however, you'll find a pipe is probably faster, because the two processes can run on different cores.
I used ZFS under FreeBSD its was good for a few months until it got slow and I needed to defrag it, oh no ZFS is too good for a defrag tool so I zapped it and installed Debian with XFS, much much more faster and it comes with a online defrag tool.
Certain required features could not be done if they were separate features. In order to properly do certain things, the 3 layers must understand each other. I'm not talking about "fun" features, I'm talking about problems that have been plaguing data centers and there was no other way.
Do you have an example? The storage system I'm using provides every important feature I'm aware of in ZFS, and it keeps the layers separate. As ZFS has matured, it seems to be a way of getting all of those features out-of-the-box, without needing to think about how to put it together. LVM is one volume manager provides most of the same features, though. Then put your choice of filesystem on top of LVM. Can you think of any feature that actually requires the volume manager to be stirred together with the filesystem?
To get all the features of data integrity and error correction you need to avoid hardware raid with ZFS.
If ZFS controls the drives and you have striping or mirroring and one of the drives has corrupt data, ZFS can log an error and fix the data corruption. If hardware RAID controls the drive, it may realize a copy of the data is bad, and pass back the good copy, but RAID won't log an error nor fix the bad block. ZFS won't fix it either because it won't know since RAID handled returning the correct data to ZFS.
ZFS is self healing if it is NOT on top of hardware RAID. To reap all of the benefits of ZFS you need lots of RAM, that RAM should be ECC and two or more disks without hardware RAID.
vi +
How can it be production-ready if it still lacks SELinux support.. the ZOL FAQ suggests either permissive or disabling of it entirely", Kahenraz
ref ref
More than that, since you're effectively virtualizing your EXT4 filesystem, you can expand it pretty easily too. You're backed by a storage pool, which means you can expand that pool by adding or replacing drives, and then simply resize the EXT4 filesystem live. EXT4 need not know about the fact that you've added a new raid array to the storage pool.
You're assuming that the single process isn't multithreaded. ZFS is multithreaded.
Technically, it's the zpool that can't be but what that also means that if you contribute a device for ZFS use, you're stuck with it forever.
Want to count the log entries that mention Slashdot? You don't need a special tool for that, just grep slashdot | wc -l .
Not since journalctl took over.
You're not really understanding how ZFS does and can work. It already has hooks to provide 'features' such as you talk about. It does require crossing several traditional Unix boundaries, thats true, but its an accepted trade off to get the benefits that go with ZFS, but the hooks to include such features at the typical boundary points still exist in the ZFS code. Pretending that ZFS has to be totally and completely aware of what you hook in isn't really fair. What you hook in has to integrate with the API, which is well defined, and that really isn't any different than with the approach you seem to prefer.
And for reference: dm-cache and cache are not needed with ZFS, l2arc already covers them, and it does it better because it knows whats going on across all 3 layers. I seem to have no problem doing iscsi sharing of ZFS storage space nor do I seem to have any problem using iscsi targets as part of zdevs. Hell, technically you can still use dm-cache and bcache with ZFS, if you're ignorant enough to do so. You can even run whatever file system you want on top of zvols. You'd be stupid to do it in most cases, but the ability is there if need be.
Since you want to use the word Unix, lets get a few things clear. Linux is not and likely never will have a Unix certification. Sun on the other hand had two operating systems that were certified Unix and they were doing it before Linus had a computer to start Linux on. Drop the 'my OS does it right' bullshit because your OS isn't what you're claiming it to be, and the system you arguing against was written by people who did make something you're claiming it isn't.
I don't disagree with the Unix tradition in the least, compartmentalized code with strong boundaries and good interoperablility where ever possible ... and occasionally you tear down the walls for specific reasons. Graphical performance is an example where your philosophy sucks, which is why Windows kicks the ever living shit out of Linux performance. Note: Linux, NOT Unix. SGI had a terrific graphics stack as an example, and Sun's wasn't too horrible.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
For me the big missing feature is the lack of a `dump` program. In the case of ZFS, I would imagine this could simply be a wrapper around `zfs send` so dumping the filesystem might need a command like `zfs send | xfsdump`, but to my knowledge no one has implemented this yet. `dump` is a very good tool for backups, keeping filesystem images with `rsync` can be handy, but I still want a proper implementation of `dump`.
Yeah, that's the thing about systemd- it's not Unixy. It might be great, but it's not designed according to Unix principles.
> Drop the 'my OS does it right' bullshit because your OS isn't what you're claiming it to be,
Where did I say one approach was right and the other wrong? In fact, I said each approach has it's advantages and disadvantages. What I said is that ZFS is not designed according to the Unix tradition of "do one small thing, and do it right". Apparently you agree that's the case:
> don't disagree with the Unix tradition in the least, compartmentalized code with strong boundaries and good interoperablility where ever possible
That's why some who appreciate the Unix approach hate systemd. It would be more at home on Windows.
Re Sun, if you look at an old Sun Solaris box, you'll find some of the was written by a guy named Ray Morris. Coincidentally, this post was also written by Ray Morris.
If you can do layering with ZFS, shouldn't ZFS be split into separate packages for each of the layers?
Yet the kernel does scheduling, memory management, user access control, filesystems, device management, TCP/IP, power management, etc.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
> ZFS is a layer below LVM.
Typically you'd layer raid, then LVM, then the filesystem. ZFS tries to be all three. It's raid, and it's a volume manager, and it's a filesystem. There are some benefits to integration, and some drawbacks. With the raid>lvm>filesystem approach, it's trivial to add dm-cache, bcache, iscsi, or any other piece of storage technology.
ZFS can have things added on top just as trivially. I've never done smb in ZFS, I used Samba. On solaris, I did NFS with ZFS, but on Linux, I've kept it in /etc/exports.
With RAID+LVM+filesystem+network share, I need to unshare, umount. The resize LVM, resize the filesystem partition (if it is allowed), mount and export. With ZFS I do zfs set quota= zpool/directory. On the fly with no downtime. That alone is worth the difference.
ZFS does compression also. And you can toggle back & forth while running. zfs compression=on zpool/directory and every new write is compressed. compression=off and new items are not compressed.
ZFS also does checksums to verify the data. If you have a dodgy sata cable or controller firmware that corrupts your data, it will be detected. If ZFS is doing RAID, it has a 2nd copy of that data and can self correct. If it doesn't have a 2nd copy, it will shut the filesystem down so nothing else gets corrupted. You can't do that with hardware RAID.
With ZFS, anything you want to add has to be specifically supported within ZFS.
Not true at all.
The Unix tradition is small, single purpose tools that do one thing well. Witness sort, grep, wc, etc. Want to count the log entries that mention Slashdot? You don't need a special tool for that, just grep slashdot | wc -l . Tools like mdadm and lvm are building blocks that can be combined to suit your need, the Unix way. ZFS is a big monolithic package that does everything, much like Microsoft Word or Outlook. ZFS is more in the Microsoft tradition.
Some of the things that ZFS does would be difficult if it wasn't the whole system. Newer systems like btrfs and ceph take a similar approach.
The zpool and zfs layers could be considered seperate in the same way - especially with things like zfs send to move filesystem snapshots to other pools which are usually on other machines. The filesystem does not get influenced by the nature of the pool and vice versa, so long as it's big enough to fit. Global options on pools (eg. no atime) are really just passed down to the filesystems instead of it being a zpool operation.
It's really more like if LVM and ext4 were done by the same development team than a totally different and totally monolithic approach.
If I call fsync() is legitimately expect it to not ever return before the data is on disk or to return with an error. Any other behavior is just completely unacceptable and a rather severe fault.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
You listed some great examples, examples of the opposite of what you probably meant to show.
Take scheduling- there what, six different interchangeable, removeable kernel modules to do scheduling in different ways, including the option to not do it at all. The scheduler only does scheduling, and nothing else. The rest of the kernel doesn't know or care about the scheduling. You mentioned filesystems as well. Yep, you can choose from dozens of different filesystems. The rest of the kernel doesn't care which filesystem you're using, because those other modules do their job and nothing more. You can use any scheduler with any filesystem.
Enter zfs, a popular volume manager similar to LVM. It just manages volumes, so you choose whichever filesystem to lay on top. Er, no. If you want to use the ZFS volume manager, you probably need to use the ZFS filesystem. That's cool, it'll also provide an extra level of resiliency on top of that great hardware raid you have. Actually, not so much. It doesn't play nicely with most enterprise storage hardware. You need to use dumb hardware and use ZFS raid to avoid problems. Wait, what? ZFS, a filesystem, is telling you which hardware to use? That's not like the interchangeable kernel modules at all.
SELinux isn't production-ready either
Sorry, I'm not that familiar with OpenSolaris.
Don't the first and second commands create a zpool backed by a file? That's not what's at question here, I want to know if you can back a zpool with a zvol created on that same zpool.
A quick test showed that it does work on FreeBSD to create a zpool upon a zvol from a different zpool. The circular version has made it hang for a not-insignificant amount of time...
I think you misunderstood my reply. I was replying to the poster talking about iSCSI having to be implemented in ZFS. That's what I was addressing.
You talk about a different thing altogether - ZFS backing. The zvol-on-another-zpool solution should work, although performance will suck. The zvol-on-the-same-zpool solution can and will hang for obvious reasons.
ZFS is a big monolithic package that does everything, much like Microsoft Word or Outlook. ZFS is more in the Microsoft tradition.
Well, that is well within the Unix tradition. ZFS is a *kernel* module, not a userland application. Just because the cli interface is comprised of 2 commands, it doesn't mean its monolithic. Its as monolithic as ifconfig and other complex utilities.
And I'd take anyday the zfs/zpool command format over the lvm ugly mess.
I tried to run backups to ZFS on a crypted USB disk. It worked for a while, but if something fails (like the backup disk going to sleep), the entire chain hangs. I can't disconnect the crypt device, and I can't disconnect the ZFS pool, zpool and zfs hangs. What I do with the USB cable and hardware no longer has any impact. I stopped doing that. (I didn't have better luck with btrfs.) Although I don't really blame ZFS that much other than it can't handle hanged devices. USB on Linux is still flaky.
The other problem I have is that it after a while happily uses up 30GB of my 32GB on the computer, and extremely reluctantly gives them away again. I can't seem to be able to control how much ZFS will use. And the rest of the system isn't really happy with just 2GB to run programs in (several virtual machines of 8GB RAM each, for instance).
grep -c slashdot
The RAID, LVM, Filesystem approach is defunct in the modern world. Also, ZFS already incorporates multi-protocol support, ability to turn any host with local storage into a target (via the COMSTAR framework). Not sure how much of this is in the linux port, but I suspect that if it's close enough to Illumos, it should have these features.
ZFS is not in the microsoft tradition, it is a departure from 20th century storage design/architecture. The very idea that there has to be a RAID/LVM/FS is archaic and has been thoroughly disproven. In my previous shop we had petabytes of storage in ZFS pools and hardly ever lost data.
The Pool-based model that eliminates the layers of RAiD/LVM/FS results in better performance, easier supportability and superior diagnostics capabilities.
Do you realize that almost every major storage vendor first bashed ZFS and then about 3-4 years later started building architecture that was eerily like ZFS?
My shop was one of the early adopters of ZFS since back 2007. There were a few bugs then, but over the years I have been absolutely impressed with the efficiency and stability of ZFS.
I only have experience with ZFS, so I don't know if this is a general feature of RAIDs. I've run tests on my home zpool; the benchmarks show that read speeds on the pool increase from 20-200% as more disks (parity or data) are added (the new forum layout crops the plot; blue/red/yellow = RAIDZ1/2/3). I presume this is because IO operations can be partially parallelized across the pool. For me the biggest selling point is that my file server is only dependent on my disks; if any part of the server dies, I am not locked in to anything. For that matter I can even have 3 out of the 11 drives die, and my data are still intact and can be moved to other hardware if needed.
ZFS plays just fine, the problem is in order to fully benefit from ZFS, ZFS must manage its own redundancy. You can still use RAID5 on your SAN, but you'll still want RAID5 with ZFS, which is just that much more wasted space. You also get the disadvantage that when a drive dies in the SAN RAID, performance will take a bigger hit than it needs to be, because the hardware RAID has no idea how the file system works.
In most situations, you're better off having each layer completely independent, but in the case of ZFS, it seems that when you don't make the layers entirely generic, but make them specialized to each other, the end product is much greater than the sum of the parts.
> ZFS is not in the microsoft tradition
Balsa (Gnome email client): 2.5 MB, reads email. Optionally use libgtkhtml (315kb) to render HTML email.
Microsoft Outlook: (Microsoft email client): Several GBs. Reads email, handles calendar, embedded mail server, task list weather reports(?!?!) fax, rss, html templates, _sharing_ calendars. Loads MS Word (several GB) to partially display HTML messages.
Let's break this down into three statements and see where we disagree:
1. It appears that the Microsoft tradition is big monolithic packages that do everything. Including weather reports embedded in their email client.
Do you disagree with that?
2. Do you disagree with the statement that the Unix tradition (from ed to grep to elm and balsa) is small, focused tools?
3. Do you disagree with the statement that ZFS is a volume manager, a filesystem, a raid-like redundancy system, and a few other other things as well? In other words, that it's a big, monolithic package tat does many things. Do you disagree with that?
You LIKE ZFS. I understand that. It does a lot of cool things. It does a lot of boring things. It does a lot of things. Just like Microsoft Office.
The poster I was responding to referred to "Unix tradition". The tradition started on single-CPU systems...
Even on modern multi-core computes, piping data from stdout to stdin is inefficient. Very convenient, but inefficient nonetheless. When the cost of developing (such as shell scripts written to either be one-offs or rarely executed) exceeds the costs of the inefficiency, it is justified.
But with storage — the code, that is used by millions thousands (millions!) times per day, it makes all the sense to invest in developing the subsystem.
Indeed, various OS-vendors (free and otherwise) all spend a lot of effort (and money) on improving their offerings. ZFS is just an example of something better than all (or most?) of the competition.
In Soviet Washington the swamp drains you.
The poor documentation. I'm a ZFS guru, been using it for several years and I know the ins and outs. But I'll be damned if the documentation seems to be more inaccurate then accurate. Just look at stuff like https://github.com/zfsonlinux/pkg-zfs/wiki/HOWTO-install-Ubuntu-to-a-Native-ZFS-Root-Filesystem. The package zfs-grub hasn't been released for Ubuntu in 2 or 3 release cycles, yet the "guide" still references it. What's worse is that it's *documented* to be broken for months (check the "issues" page for yourself) and still isn't fixed. Somewhat disappointing to me and makes me question things. I tried to find someone that actually had good directions for something that works, and I see *lots* of people asking from all over the place for guides for ZFS on root for 14.04, Mint 17, and Debian. But nobody seems to have taken the time to provide a guide that actually works.
Then you look at feature flags. Someone (I forget who) tried to implement a feature flag earlier this year (read the whole ticket in github). First it was "delayed for a few weeks" then had to be fixed so it could be committed. Then it wasn't committed and the poor dev had to fix it two more times. It's still not committed. Sorry, but that's a poor use of developer resources to keep asking someone to fix their commit to work in the master branch, then not implement it when they fix it and change the master branch and expect the dev to fix it yet again.
I could find a dozen or so more problems that scare me. I'm weary of a project that is supposed to store and protect my data but has horribly inaccurate documentation. If their documentation can't even be maintained how am I supposed to trust that the code is? From my perspective the code and documentation for said code is written in mud. And it's raining outside. You can't tell me something is stable in those conditions.
On the flipside, I do have an acquaintance that uses ZFS on linux for his home desktop and he says it's worked fine for him. He's not a power user and did it because he wanted software RAID and didn't trust LVM. To me that doesn't make it "production ready" though.
What really bothers me is that if they are calling ZFS on Linux "production ready" now and still have these kinds of serious problems then either they are sticking a label on it hoping for wider distribution or they haven't figured out what the heck "production ready" means.
I do hope the project works out. I truely do. But store my data? No thanks. Not right now in the condition its in.
Posting as Anon because I'm sure I'll be labeled as flame bait for disagreeing with the story.
Except that LVM is a PITA, mixing with RAID makes it even more so, and the RAID is unaware of the actual used space, making RAID 5 or 6 very expensive, not to mention it cant assist FS level checksumming with restoring individual blocks, you need to fail the whole drive. Implementing network transparency at the block level is inefficient, but no other FS has ZFS connect functionality.
I know tobacco is bad for you, so I smoke weed with crack.
3. Do you disagree with the statement that ZFS is a volume manager, a filesystem, a raid-like redundancy system, and a few other other things as well? In other words, that it's a big, monolithic package tat does many things. Do you disagree with that?
I'm suggesting that concepts such as "volume manager", "filesystem" and "raid-like redundancy system" don't need to be separate entities. The concepts such as "filesystem" and "volume" etc exist to conform to a 20th century vocabulary. And it's not that revolutionary any more. Companies like EMC, Netapp etc went that route too...thereby simplifying things like HSM etc at a "pool of disks" level.
> > 3. Do you disagree with the statement that ZFS is a volume manager, a filesystem, a raid-like redundancy system, and a few other other things as well? In other words, that it's a big, monolithic package tat does many things. Do you disagree with that?
> I'm suggesting that concepts such as "volume manager", "filesystem" and "raid-like redundancy system" don't need to be separate entities.
Absolutely they don't NEED to be.
I'm suggesting that concepts such as "mail client, calendar, fax, RSS reader and weather reports" don't need to be separate entities.
In fact, if you smash them all together into one big entity, you can sell it for $109.99 and LOTS of people will buy it.
So we agree that because those things don't HAVE to be separate, systemd combines all of them together, into one package that does everything. We also agree that:
> 1. It appears that the Microsoft tradition is big monolithic packages that do everything.
Ergo, system fits the Microsoft tradition, the Microsoft way of doing things. That way certainly isn't impossible - Microsoft has made billions of dollars doing it that way. Unix traditionally does things a different way.
Absolutely they don't NEED to be.
I'm suggesting that concepts such as "mail client, calendar, fax, RSS reader and weather reports" don't need to be separate entities.
In fact, if you smash them all together into one big entity, you can sell it for $109.99 and LOTS of people will buy it.
So we agree that because those things don't HAVE to be separate, systemd combines all of them together, into one package that does everything. We also agree that:
> 1. It appears that the Microsoft tradition is big monolithic packages that do everything.
Ergo, system fits the Microsoft tradition, the Microsoft way of doing things. That way certainly isn't impossible - Microsoft has made billions of dollars doing it that way. Unix traditionally does things a different way.
There is a difference between a low-level tool and a high-level product. You could say that it is not "UNIX-like" to put together a bunch of individual components into a single program (monolith?) - why doesn't everyone use awk, sed, grep etc do their text processing? The reason why people build higher level products out of low level components is to make life easier.
I don't particularly see any problem with Word, Excel, Powerpoint etc. See, even google does it with Google Docs...so it must be right :)
I don't know if you have ever used ZFS -- if you have you would know what I mean. Having had to deal with migrating thousands of LUNs from one Storage array to another (via host-side migration - Veritas Volume Manager), i can tell you the ease of use and simplicity of a product that is not "Disk and LVM and Filesystem" (aka ZFS which is pool, vdev and volume) is a lot simpler. I have to just replace disks at the pool level, not have to worry about timing the mirroring and detachment of the mirrors (of disks from 2 separate Storage arrays) such that it doesn't kill performance of my 20TB DWH running on the box. Or wanting to accelerate things using the L2 ARC capabilities, etc.
Your last two posts seem to be getting at the idea that being monolithic isn't bad. I never said it was.
I said that monolithic packages are the way Microsoft and other Windows developers traditionally do things,
and that small, single purpose tools are the Unix tradition.
> I don't particularly see any problem with Word, Excel, Powerpoint etc.
I didn't say there was a problem with those. I said Microsoft builds software like that. Do you disagree?