Does Linux "Fail To Think Across Layers?"
John Siracusa writes a brief article at Ars Technica pointing out an exchange between Andrew Morton, a lead developer of the Linux kernel, and a ZFS developer. Morton accused ZFS of being a "rampant layering violation." Siracusa states that this attitude of refusing to think holistically ("across layers") is responsible for all of the current failings of Linux — desktop adoption, user-friendliness, consumer software, and gaming. ZFS is effective because it crosses the lines set by conventional wisdom. Siracusa ultimately believes that the ability to achieve such a break is more likely to emerge within an authoritative, top-down organization than from a grass-roots, fractious community such as Linux.
Alternativ approaches to implementing subsystems of the Linux kernel are often developed concurrently, in parallel, and there's a system you can compare to darwinistic evolution that decides (in most cases) which one of a given set of workalikes makes it into the mainline tree in the end. That's why the Linux kernel itself incorporates, or tries to adhere to, a UNIX-like philosophy - make a large system consist of small interchangeable parts that work well together and do one task as close to perfect as possible.
That's why there are so many generic solutions to crucial things - like "md", a subsystem providing RAID-levels for any given blockdevice, or lvm, providing volume management for any given blockdevice. Once those parts are in place, you can easily mingle their functions together - md works very nice on top of lvm, and even so vice versa, since all block devices you "treat" with one of lvm's or md's functions/features, again, result in a block device. You can format one of these blockdevices with a filesystem of choice (even ZFS would be perfectly possible, I suppose), and then incorporate this filesystem by mounting to whereever you happen to feel like it.
There are other concepts deep down in there in the kernel's inner workings that closely resemble this pattern of adaptability, like, for example, the vfs-layer, which defines a set of reuqirements every file-system has to adhere and comply to. This ensures a minimal set of viable functionality for any given filesystem, makes sure those crucial parts of the code are well-tested and optimized (since everyone _has_ to use them), and also makes it easier to implement new ideas (or filesystems, in this sepcific case).
Now, zfs provides at least two of those already existing and very well working facilites, namely md and lvm, completely on its own. That's what's called "code-duplication" (or rather "feature-duplication" - I suppose that's more appropriate here), and it's generally known as a bad thing.
I do notice that zfs happens to be very well-engineered, but this somewhat monolithic architecture still bears the probability of failure: suppose there's a crucial flaw found somewhere deep down in this complex system zfs inevitably is - chances are you've got to overhaul all of its interconnecting parts massivley.
Suppose there's a filesystem developed in the future that's even better than zfs, or at least better suited to given tasks or workloads - wouldn't it be a shame if it had to implement mirroring, striping and volume-management again on its own?
Take an approach like md and lvm, and that's not even worth wasting a single thought on. The systems are already there, and they're working fantastically (I'm an avid user of md and lvm for years by now, and I frankly cannot imagine anything doing these jobs noticeably better). I'd say that this system of interchangeable functional equivalents, and the philosophy of "one tool doing one job" is absolutely ideal for a distributed development model like Linux'.
It seems to be working since the early nineties. There must be something right about it, I suppose.
:%s/Open Source/Free Software/g
YTARY!
ZFS is a file system developed by Sun over the past several years. But the important thing is, in this context, that the ZFS design philosophy (never mind the actual design, which isn't what this discussion is about) differs from that of ordinary file system design. Most file systems make strong assumptions about reliability of the underlying block storage facility: there's some gizmo down there, whether it be a disk (for itsy-bitsy systems), a RAID set (for not so bitsy systems), or a SAN, that reliably stores and retrieves blocks with reasonable performance. ZFS doesn't do this. It manages many details of the storage layers -- it does RAID its own way (to get around problems that conventional RAID doesn't solve), and does volume management itself as well.
From the point of view of a UNIX/Linux file system person, this seems very weird. However, these ideas are not really new or revolutionary (there are new things in ZFS, but this philosophy isn't one of them). It pretty much describes how network storage vendors (NetApp, EMC, etc) have been building things all along.
Am I part of the core demographic for Swedish Fish?
Do you have a copy of StarOffice from the mid-to-late 90's? Try running that in Linux now. Do you have a copy of MetroX from say, 1998? Try running that in Linux now. Are you still using the original Linux binaries for any games released in the late 90's?
I'm still using a copy of AutoCAD released in 1995 for the Windows 3.1 Win32S API, and it works fine in Windows 2000 and Windows XP except for that it's got the old 8.3 filename limitation. I am still using WordPerfect Suite 8, the current version is 13, I think. I know someone that is still using Corel Draw 7, the current version is 13. All these programs still work fine in XP/2000, and I think that is a splendid record for binaries that were unpatched between Windows updates.
The DirectX architecture has changed between the 9X and the NT lines, but otherwise, the legacy APIS are generally well-preserved and allows very complex software to work without a patch.
You're right, that's why nobody is using Linux for real systems.</sarcasm>
The problem with a "traditional" layered model is that the file system has to assume that the underlying storage device is a single consistent unit of storage, where a single write either succeeds, or it fails (in which case the data you wrote may or may not have been written). This all sounds very good and file systems like ext2 are written based on this assumtion.
However, if the underlying storage system is RAID5, and there is a power loss during the write, the entire stripe can become corrupt (read the Wikipedia article on the subject for more information). The file system can't solve this problem because it has no knowledge about the underlying storage stucture.
ZFS solves this problem in two ways, both of which reuires the storage model to be part of the filesystem:
- Each physical write never overwrites "live" data on the disk. It writes the stripe to a new location, and once it's been completely committed to disk the old data is marked as free.
- ZFS uses variable stripe width, so that it does not have to write larger stripes than nescessary. In other words, a large write can be directly translated to a write to a large stripe on the sotrage system, and a smaller write can use a smaller stripe width. This can improve performance since it can reduce the amount of data written.
There are plenty of other areas where this integration is needed, including snapshotting, but I hope the above explanation explains that the layered model is not always good.I think you'll find that it is you that doesn't understand what a snapshot could be. Take a look at ZFS, try it, and see if you think of snapshots the same way again. In ZFS, a snapshot can be promoted to a clone, which is a writeable copy of the original filesystem, sharing unmodified blocks using a copy-on-write algorithm.
This is increadibly powerful and useful. For example, a single master 'image' volume can have customizations added for specific purposes. This is useful in desktop deployment, iSCSI or NFS network boot, etc...
Would you expect a 'first class' writeable clone to have a name like 'dev/mapper/snapshotted-hda' or 'dev/hda.1'? Which one makes more sense? Why would the original have a special name, when the clone is identical?
It's this kind of narrow 'snapshots are throwaway' thinking that causes artifical limitations in APIs and operating system design that serve no real purpose.
The direct quote is "I've long seen the Linux community's inability to design, plan, and act in a holistic manner as its greatest weakness."
You can see the meaning has been completely changed in the summary from one of positive criticism to one of arrogant condemnation.
Through this change, we can see the posters true feelings, feelings that are shared by many in the Linux community. That is to respond immaturely and get all bent out of shape if somebody builds anything that doesn't follow the "Linux philosophy".
The Truth. Both Linux in general, and ZFS are amazing, and powerful tool. One of best philosophy I've encountered is "use the right tool for the job".
Nobody is forcing Linux devs to port ZFS, or even use, or even think about it. The only reason this is an issue, is because many in the Linux community realize how powerful ZFS is, and they're subconsciously pissed off that they can't have it. So they respond like a 3rd grade bully by attacking it in a self defeating attempt to minimize its importance.
You don't seem to understand snapshots
:-)
/dev/hda at 12:15, then you'll get /dev/mapper/snapshotted-hda as it was at 12:15, while /dev/hda will continue being possible to modify... Why would you change anything over?
/dev/hda directly when it is snapshotted. You must access /dev/hda through some other device and that some other device must located in the /dev/mapper directory. No wonder you apparently mixed up what is a snapshot and what is being snapshotted - the way we currently do this in Linux is quite unnatural and is a wide open invitation to such confusion, not to mention a pointless makework project for system administrators.
If you say so
A snapshot works by creating a copy of the device, with the contents it had when the snapshot was created. If you make a snapshot of
Because with the incumbent volume management strategy you may not continue to use
Have you got your LWN subscription yet?
Most of all, to me, I am astonished that almost everyone talks 'virtualisation', VM, QEMU, Xen.
When it comes to filesystems, suddenly many seem to want to do everything on their own, on physical platters: partition, volumes/RAID, format. ZFS is a virtual filesystem, where none of such is physically needed. There is a nice http://www.opensolaris.org/os/community/zfs/demos
Of course, filesystem should be a black box, an object, instead of the user having to do low-level work. ZFS provides this, and more relevant: of course it needs to be cross-layered therefore.
Snapshots ought to be available easily, at any moment in time, without taking much space. ZFS does so, by only storing the changes and sharing the unmodified data. If you want to do so, you need an abstraction of the hardware. That is, crossing layers. Not to mention writeable snapshots.
Adding new drives without partitioning, slicing, formatting. Just adding to the existing pool. Inclusive striping being adapted automagically. This needs a cross-layer interface, right ?
The transactional filesystem guarantees uncorrupted data at power failures and OS crashes. If you do this across a pool of physical platters, you need operations across layers.
There is an interesting blog on the usage of ZFS for home users. It contains some good arguments, why ZFS is useful for Linux' Desktop Stride. You find it here: http://uadmin.blogspot.com/2006/05/why-zfs-for-ho
Last ot least, the online checking of all your data ('scrubbing' and 'resilvering') is a valuable feature for Linux (and the home user) as well.
To me it looks like, as of today, that about everyone liked the features of ZFS. Now, as it requires to break some old habits, suddenly we resist change and rather stick to older concepts.
As if GPLv2 vs GPLv3 was not enough of a threat to Linux, now we unashamedly permit a new-from-the-bottom-up filesystem to overtake us as well ?
Technically, the subsystems in NT are user-mode processes, though they are (to my knowledge) the only user-mode processes that cause blue screens when they crash. To my knowledge, the only layers in the NT Kernel are between the executor and the drivers.
Think of subsystems as being like shells with system-specific behavior. For example, filenames are case-sensitive in the POSIX subsystem but not in the Win32 subsystem.
Honestly, I think that WIndows has the *wrong* layers. The subsystem layer was intended to allow for compatibility with software written for other operating systems but to my knowledge only the Win32 subsystem has ever been consistantly maintained (the POSIX subsystem is maintained at the moment, but only *after* Microsoft bought OpenNT). Windows doesn't need this functionality, but they really need nice VFS and inode layers in their filesystem.
Finally, the grandparent's post about NT4 being a credible gaming platform is just laughable. I don't even know where to start. It seems to me that it is more likely to have been made to get additional performance out of CAD/CAM applications which also use 3d acceleration. So you are write about the GP poster not knowing what he writes about.
LedgerSMB: Open source Accounting/ERP