ZFS For Mac OS X Source Code Available
nezmar writes "Noel Dellofano, who is part of the ZFS development team at Apple, has a post on Mac OS Forge announcing a late Christmas gift: he is making available binaries and source code, plus instructions, of the ZFS filesystem for Mac OS X."
How stable is it, and how soon till I can get it on my Mac by default?
Reading their FAQ, it sounds like there are lot of niggles to fix yet - including assumptions in other parts of Mac OS. All in all it sounds like ZFS isn't ready for general use on the Mac just yet. Maybe Mac OS X 10.6 will ship with this by default?
I installed this last week, got it working. It's still very early beta, managed to crash my machine half a dozen times before deciding to wait a little. Remember to do zpool exports before you eject external hard drives. But yes, very promising technology. OS X has gone from having a wonky 1/0 implementation to having one of the better software raid systems available. Back to scoping out four and eight drive usb sata enclosures and cheap 500gb hard drives. ;-)
This reads like a nerd's unsubstantiated wet dream.
An absolutely, positively, amazing feature set. I can't wait until it's stable enough for production use. After 7 years of staying away from Apple products, I'm going back to the Mac.
Why are you letting these clowns ruin our country?
I wonder what Apple thinks of this.
Interesting, I wonder if this could help in the effort to port ZFS to linux? It'd depend on the license they release it under though.
It's a shame that I'm gunshy with new (to the OS) filesystems. ZFS has so much to offer, but every time I try out a new filesystem, I end up with data loss, even ones that are supposedly new and wonderful and robust. (Even when ext3 was new but stable, I lost stuff on it.) I can't wait to hear lots of positive feedback on its stability and performance, so I can get up the nerve to try it.
Love many, trust a few, do harm to none.
I have been using ZFS (on Solaris) for more than a year, both at work and at home, and I am following closely the latest developments. IMHO the best intro on ZFS is the official ZFS slides (36 pages): http://opensolaris.org/os/community/zfs/docs/zfs_last.pdf
ZFS is designed to perform writes asynchronously. If the write should be able to complete, it returns success and then goes off to do it. It's a different way of thinking about a filesystem. You need to do a "zpool export" or something before you can unplug a detachable disk to avoid the panic when you unplug it. That's not a bug. It's by design.
No it isn't. You're just misunderstanding the semantics of ZFS.
No it isn't. It's just not a filesystem that's suitable for the masses. Average users cannot understand or manage an advanced storage pool system like ZFS. They're better off with filesystems that make sense to them, like HFS+, ext2 or NTFS.
Shame on all the geeks for telling everyone that ZFS will solve all their problems. ZFS is great under certain circumstances. It does what it does very well, but it isn't a filesystem for the masses.
Just plain not reporting errors is a bug. ZFS asynchronous write semantics is intentional, although counter-intuitive, behaviour.
I say all this because I know Apple stuff is pretty well refined and I know for a fact that ZFS beat all native Linux file systems according to some benchmarks on operations with large files exceeding 3.7GB. There were reports that ZFS could copy the entire Linux kernel source code in only 3 seconds! Amazing, but not good with software or hardware raid.
As I write this, I am reminded that there could be license issues with ZFS source code but hope none of this stuff prevents a gifted slashdotter from porting this ZFS bugger to Linux. I am eagerly waiting.
Now, if we can only get it to talk to important things like NTFS, and Ext3, and Reiser...
I know it may be unheard of to those reading /., but Noel is a girl.
Huh? Music, movies, Office, and porn. And lots of web browsing (goes with the porn).
Do I want ZFS or not?
a damn fine she at that
parturiunt montes, nascetur ridiculus mus
Hadn't thought of that...
Noel is a she. I met her last year soon after Apple hired her away from Sun.
Nothing short of a complete redesign could have rescued WinFS. The design as it was was flawed from nearly the beginning.
It's a windows family tradition.
It wasn't that easy to set up a RAID in Linux the last time I tried (admittedly long ago), but even in comparison, setting up a RAID-Z in ZFS is just a single line: "zpool create mypool raidz disk4s2 disk5s2 disk6s2"
mdadm create -l 5 -n 4 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
If you like living dangerously, you don't need to do anything else except put LVM2 on it and add an fstab entry; md automatically goes hunting for raid partitions when it is loaded, and the md superblock info contains all the info needed to assemble and array, no config needed. However, if you set up mdadm with information about the raid array, it'll behave more intelligently when things go wrong.
That said, ZFS is not "raid", and md is not a file system and volume manager. ZFS offers a lot you won't find anywhere else, but the basic lack of standard features found in ZFS compared to a large number of RAID and/or volume management systems means that ZFS has a ways to go: ZFS does not support increasing the number of drives in a pool, and you CANNOT migrate between any of the various vdevs. You cannot go from a single drive to a pair of mirrored drives, or from a single drive or mirror to RAIDZ. You cannot increase the number of drives in a RAIDZ set. Instead, they force you to add entire redundant vdevs to the pool. All of the aforementioned resilvering is stuff Linux RAID has been able to do for years (well, okay, the RAID 5 expansion stuff is a little new.)
I migrated a single drive to a mirrored array a couple months ago. Then I migrated that to 3-drive RAID5. Then I migrated that to a four-drive array. None of that would have been doable with ZFS.
However, from what I understand, Sun is working on the expansion stuff...and a defragment tool (thank god that, like SGI, they don't subscribe to the bullshit myth that modern filesystems don't get fragmented. It's not true with NTFS, it's not true with HFS+, and it sure as hell isn't true with ext2/3 OR reiserfs...I wish people would stop perpetuating that bullshit myth!)
Please help metamoderate.
http://video.google.com/videoplay?docid=8100808442979626078
Disclaimer: I'm assuming that ZFS on Mac does, in fact, cause kernel panics when an external drive is removed without unmounting; as the parent posts imply. I haven't tested it myself...
main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
Suppose I ported ZFS to Linux (not that I could, just suppose) as a native kernel module, and published the source code. If then I used ZFS on Linux, and some others also grabbed the 'Linux ZFS' code, built it and used it. What laws if any would I be breaking? Who and under what grounds could sue me / Linux ZFS users?
Unless some people who 1)have the technical knowledge and 2)don't care about licensing restrictions make a linux port of it anyway. Seriously, who is going to do anything about it? Is Sun going to sue people for increasing compatibility with their products? Or is Richard Stallman going to hunt them down and beat them to death with the Free Software club?
Frankly, no one is going to do anything about it, so I look forward to the day that common sense breaks out and we quit letting legal mumbo jumgo get in the way of progress.
then you need to mkfs, and if you run out of space you're screwed because you can't easily grow.
All of Linux's md raid modes are grow-able.
LVM2, XFS, and ext3 are all capable of not just expansion, but *online* expansion. With xfs, it's one command- xfs_grow -d. It automatically senses the new block device size and presto, you've got a larger file system.
BTDT two weeks ago when I added a drive to my RAID5 array, expanded the LVM2 physical volume, grew the logical volume, and then grew the XFS volume (I make the choice to run LVM2 on top of the array- I could have just as easily put XFS directly on the array device itself.) The only caveat is that you won't see the extra space until the resilvering is done.
I'm not saying it's equal to ZFS, but Linux's filesystems and volume management are a lot more capable than you're claiming, and everyone needs to calm down and realize that RAID is not ZFS, ZFS is not RAID, etc.
Please help metamoderate.
Then, use mdadm to add another drive as a spare, and grow the raid device out (ie using -n to change the number of devices along with the grow command.)
One last note: I accidentally 'added' a drive straight to the array without changing the number of drives. It seemed to just mirror the array onto the third drive. I believe the important bit is to add it as a spare, and then grow with a new #-of-devices param (-n). You might be able to do the add & change-# at the same time, and I just forgot to give the -n option.
One good way to test all this: loopback devices :-) Just do it with a filesystem on that fake raid set, and a file on the filesystem for which you've calculated the checksum, etc.
Please help metamoderate.
The design of ZFS is intended to ensure that the data on the disk is _always_ a valid file system. If a system panics when a ZFS file system is unexpectedly removed, that is a different issue.
Then, of course, checksumming everything does wonders to protect against bit rot and flaky cables.
Have you tried NTFS-3G? It really is very stable, no doubt due to the exhaustive testing regime on every release - see http://www.ntfs-3g.org/quality.html - and is used by default in most Linux distros. It's a different codebase to the older Linux-NTFS and Captive NTFS projects, and has reasonably good performance.
Since ZFS is new, I don't think your scenario applies, and it's not intended for DVD/CD use.
No it isn't. It's just not a filesystem that's suitable for the masses. Average users cannot understand or manage an advanced storage pool system like ZFS. They're better off with filesystems that make sense to them, like HFS+, ext2 or NTFS.
I beg to differ. Even with a single disk preformatted with ZFS from the factory, users can benefit from ZFS's paranoid checksumming.
Well, any Linux developer could sue you for "contributory infringement" for posting the Linux kernel module.
Whether they would succeed would depend on the judge. The FSF claim that you cannot circumvent the GPL by distributing proprietary (or in this case, free under an incompatible license) as separate modules, with the expectation that the user does the link. I haven't seen too many people who believe in the FSF interpretation.
Let me forestall any further confusion by assuring you that I have nothing against ZFS; to the contrary I am sure it is a fantastic and reliable filesystem.
main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
From the FAQ: Downloading music via iTunes onto a ZFS target volume does not work yet. iTunes will complain it can't write to the volume. So there seems to be a link between iTunes and HFS+? Sounds like someone needs to do some reverse engineering...
Quisque verborum suorum optimus interpres...
I was thinking of playing around with ZFS in the upcoming FreeBSD 7.0 release. Can anyone tell me how it is for hosting samba shares, ftp and http, etc with a pool of about 2TB? From what I've heard, it's still shaky, but I don't think it'd be under too much stress in my situation. Any tips or insight? I'm still kinda new to FreeBSD.
o wait stupid licensing issues!
we need some rouge kernel devs that go crazy and just dont care about licenses!
Who is this Colonel Panic? Is he in the same division as General Failure?
Anyway, from a design point of view, ZFS should not cease to operate if a drive (removable or not) ceases to be available unexpectedly. Given that all writes are meant to be atomic, you should end up with the disk in a consistent state, but not necessarily the state you expect. If your application relies on non-atomic operations for referential integrity (e.g. an application depends on two separate writes to two separate files both completing successfully to keep the data consistent) you'll have problems.
It's actually a user interface issue - in return for all the benefits ZFS brings, the user commits to notifying it in advance of drive removals. If you don't want to do that, that is fine too - you just get a performance hit as no write caching takes place.
You shouldn't need to tell the OS via CLI or GUI that the drive is going away 'though - a spin/down or synch button or equivalent should be provided on the drive itself.
Press button on removable device. Filesystem notified of pending removal. Filesystem flushes caches. Once safe, LED on device goes from Red to Green. Green LED - device can be removed.
Or design a connector that notifies the OS that it is being removed _and_ gives enough notice to allow cache flushing to succeed. That's a tall order given the size of write caches and the speed of USB connections. You might get 100 ms to write 8 Mebibytes of data, if you are lucky.
If that's the case I'm honestly sorry for the mistake. My apologies to Dellofano.
nda
Even if you wrote the whole thing from scratch under GPL 2 for Linux, Sun could still be dicks and sue you for patent infringement.
That is of course unless you're in a country such as the UK which doesn't believe in software patents, then it's the users problem, not yours.
ZFS is a wonderful technology breakthrough but it's not perfect. It has two horrendous flaws that many people won't care about at all, but others will consider to be dealbreakers.
1) Quota implementation is... really bizarre. And very inflexible. You basically have to have a separate filesystem per quotaed ID. Very 1970.
2) RAIDZ (and RAIDZ2), which are approxiately like RAID5 and RAID6, are disastrously bad for performance in many cases. Small reads and writes are basically limited to the performance of a single spindle. You can solve this by using mirroring, but that's not very green (or cost-effective).
Contrast both of these problems with NetApp, which does *exactly* what you want in both cases.
ZFS is still wonderful, but it's not for everyone. Not even every bigtime storage admin.
Your guys can check this http://oss.oracle.com/projects/btrfs/ and this http://www.zumastor.org/
Funny, same thing here.
Lost stuff on ReiserFS (when it was supposed to be stable for a long time) and ext3 crapped out on one of my friends servers so bad we had to ditch the whole thing about 1 1/2 years ago.
XFS was the only thing that I never had a problem with. How old is that thing? Has BSD ever changed their default filesystem?
From the http://trac.macosforge.org/projects/zfs/wiki/issues page: The trash does not empty on a ZFS volume. For now, you can workaround this by simply manually removing items from the .Trash directory
Regardless of the benefits, the FS needs to handle the basics before I'd consider trying it out.
->
It's not just checksumming either -- now that ZFS has ditto blocks (replicating active blocks into "free" disk space), it can now self-heal in some circumstances on a single drive configuration.
Hasn't ZFS always had ditto blocks? Anyway, yes, self-healing is cool. It's too bad that Sun is keeping it away from Linux.
There is an easy way to have a reliable, read/write ZFS on you Mac right now. get VMware Fusion and then install Solaris in a VM. Solaris is free. If you look you can even find a VM image with Solaris alredy installed so the installation is easier even than "triveal". The solaris running in the VM can then export the ZFS files to Mac Os X using the interal virtual network.
I bring this up just so that maybe more people can get to see first hand how ZFS works. If you happen to already have Fusion then getting Solaris is just a download away.
Sun has been providing GNOME for some time directly.
As for other utilities, there are the SunFreeware repositories out there that have most stuff you need.
IANAL but write like a drunk one.
"We"? The FreedBSD folk (and others) have adopted ordered metadata updates as a higher-throughput alternative. FreeBSD 7 does include the "gjournal" plugin for GEOM (meaning that you can put the journal on anything that GEOM supports, including weird constructs like an encrypted RAID of network devices). In practice, though, everyone seems to use "softupdates".
Dewey, what part of this looks like authorities should be involved?
That might be what Stallman says, but Linus has explicitly stated that's not the interpretation used for the Kernel. They build APIs, you use them, you're good. If you wrote it directly into the kernel, then you would have an issue - probably more likely from SUN than from KernelDev who would most likely just roast you over your own burning sourcecode.
No, it's a bug. No software designer deliberately panics the kernel when a disk that is designed to be detachable is accidentally removed from the system while still mounted. If this behaviour is real (I haven't tried it myself), it's a safe bet that the Mac ZFS team consider it a bug.
All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
UFS might not have all the bells and whistles of ZFS, but it's still been the most reliable and robust file system I've used in the past 25 years. It's got decades of work on making it stable and solid, and thanks to the tools available to work with it and the redundancy in the format I've even been able to recover data from UFS partitions that had been partially reformatted.
HFS? I've had HFS partitions get corrupted just be letting them get too full. That's just nuts.
ZFS? Sun says ZFS doesn't need file system check and repair tools, it can't fail. That's what DEC said about AdvFS, than then later on came up with salvage tools to pull data out of a damaged AdvFS file system. That's what the Linux folks used to say about Reiser FS, too. Even before the Hans Reiser incident it had become clear that it wasn't true, and I've got no reason to assume that ZFS will be any better, not over the long term.
The only journalled file system I've found that has come anywhere near that goal has been Network Appliance's, and they have complete control over the hardware and software and no third-party applications and drivers running on the hardware. And, of course, few places have very many NetApps (we certainly never had more than 4 at a time) so I can't say that the apparent stability of our boxes isn't due to the fact that we simply never had many of them...
Apple refreshed UFS for Panther, bringing in SoftUpdates to give it the performance advantages of journalling, then dropped it.
Apple has created layers that run over network file systems that implement almost all of the application-visible differences between HFS and remote CIFS and NFS shares, but you can't take full advantage of these for local UFS file systems. Why not? Don't ask me, ask Apple.
I blame corporate ADHD.
I started setting my system up with UFS because I was tired of the reliability problems of HFS+, but too many applications had too many problems with it, so I eventually gave up.
Mostly older applications, both Classic and Carbon-based, including Office.
What are you running, and what kind of adaptations have you had to make? I might give it another try.
On the other hand, it can't be much LESS reliable than HFS+. Just filling HFS+ up can corrupt its on-disk data structures to the point that you have to backup and restore or use a payware third-party application to fix it.
On the gripping hand, they *do* have some UFS support. And UFS is a well understood and by now extremely reliable FS.
UFS with SoftUpdates has extremely good performance deleting files.
And BSDL is GPL-compatible.
NTFS support would be really nice, since then I could use an OS that didn't suck to fix broken Windows file systems.
But what about the plethora of Windows partitioning schemes? Does OS X handle primary and secondary and extended partitions, and the new Dynamic disks?
Just what is the latest licensing scheme for Plan 9 anyway? I lost track.
The way the previous post was formed, it seemed to imply that it is all Linux's fault for not being compatible with Solaris.
It would be nice if Linus had picked a license that allowed more flexibility. On the other hand I really admire how far he's gone to keep the Linux kernel open to GPL-incompatible code (such as the nVidia drivers) despite pressure from people to take a hard line on the GPL.
I'm actually not sure that it would really be impossible to come up with a way to incorporate ZFS in the Linux kernel. It would require a lot of care, and cooperation from the Linux side, but stranger things have happened.
What is going to happen is that they will gradually take over from the 80x86 line. They will run old MS-DOS programs by interpreting the 80386 in software.
Yes, that started happening around 1992, and Intel even called it the "RISC Core" at the time.
These days a huge chunk of any 80x86-compatible CPU is effectively a hardware-accelerated just-in-time translator that converts 80x86 codes to a RISC or VLIW internal instruction set that the actual processor core really runs. One company, Transmeta, even tried doing it without the hardware acceleration, but that turned out to be a bit of a false economy.
I thought that was Network Appliance. :p
OMG! ZFS on OSX! I'm gonna wet myself!
"If it can't be forked, it's not open source".
I don't know how much point to home use of Plan 9 there is, in any case. I had a talk with Dennis Ritchie about that at Usenix a few years back, and that was his conclusion too... Plan 9 doesn't make a lot of sense without a pretty substantial computing facility. I've had a substantial (for the time) multi-user shared environment at home since the early '90s, starting with a used System V box, and I'm only now getting to the point where I've got multiple file and compute servers... and that's really for business.
Those are all impressive tests, but they're all testing things they know they're watching for, or they're testing things that are specific to ZFS. It doesn't represent real life. Only the last of those tests, or maybe the last two, is really applicable to a traditional file system, and only the last one really represents any kind of "real life" abuse.
Try having a customer running it for six months on a system with bad RAM that's randomly corrupting the file system, crashing, and then recovering and running FSCK using the same bad RAM, over and over again, until the customer finally notices and ships the corpse to you to recover.
I wasn't able to get everything back, but I can't imagine any "advanced" file system surviving that kind of abuse even vaguely intact. Now if they were randomly corrupting BOTH sides of the mirror and surviving...
Hi,
you can find it here: http://tinyurl.com/2qxtrv
I have never seen such full of flaims discussion. Enjoy! They go very deep in the harddrive and ZFS architecture to prove their points.
A.