Running ZFS Natively On Linux Slower Than Btrfs
An anonymous reader writes "It's been known that ZFS is coming to Linux in the form of a native kernel module done by the Lawrence Livermore National Laboratory and KQ Infotech. The ZFS module is still in closed testing on KQ infotech's side (but LLNL's ZFS code is publicly available), and now Phoronix has tried out the ZFS file-system on Linux and carried out some tests. ZFS on Linux via this native module is much faster than using ZFS-FUSE, but the Solaris file-system in most areas is not nearly as fast as EXT4, Btrfs, or XFS."
Using BTRFS :)
Jesus had a UNIX beard.
Who would have thought that a first-release Beta kernel module would not run as fast or be as reliable as the stable implementation for other operating systems, or the stables on Linux?
Thirty four characters live here.
ext2 is faster than ext3, simply because it does less. ZFS has many, many features most other FS don't have but they do come at a price.
I can write the fastest file system around, assuming you don't put much weight on the whole 'being able to read the data back' thingie.
- These characters were randomly selected.
On similar hardware of course.
It occurs to me that ZFS does a lot more than EXT4 and Btrfs too.
So, because ext3 implementations on other OSes are slow, that means ext3 is slow? Got it.
Try running ZFS on FreeBSD, or better yet, on the original OS: Solaris.
- oZ
// i am here.
OpenAFS, which still today provides features unavailable in any other production-ready network filesystem, is a nightmare to use in the real world because of its lack of integration with the mainline kernel. It's licensed under the "IPL", which like the CDDL is free-software/open source but not GPL compatible.
ZFS is very cool, but this approach is doomed to fail. It's much better to devote resources to getting our native filesystems up to speed -- or, ha, into convincing Oracle to relicense.
Personally, I was pretty sure Sun was going to go with relicensing under the GPLv3, which gives strong patent protection and would have put them in the hilarious position of being more-FSF free software than Linux. But with Oracle trying to squeeze the monetary blood from every last shred of good that came from Sun, who knows what's gonna happen.
I was confused as to what versions of ZFS were available on which distros so I made a chart that lists the different distros and which version of ZFS they support:
http://petertheobald.blogspot.com/2010/11/101-zfs-capable-operating-systems.html
Hope it's helpful...
- For the complete works of Shakespeare: cat
What features does ZFS have that ext4 doesnt? Its a simple question, but you had to act like an ass. Good job.
If I have a bicycle that I ride everywhere, and never seen nor heard of a car. I would not know what a car could do for me, would I? SO if someone comes along and says, Hey cars are cool, they are just a little more expensive. I would ask something like.. What features does a car have over a bicycle.
You are entitled to your own opinions, not your own facts.
You can save your stuff in /dev/null quite fast too!
I know! It is friggin crazy fast. I've been using it for backups for years. Even with terrabytes of data I've never managed to fill it up or slow it down!
Get a web developer
Couldn't they name the file system something better than butterface?
He who knows best knows how little he knows. - Thomas Jefferson
ZFS is, until BtrFS hits truly enterprise stable, the only FS for large disks, in my opinion. I currently run ZFS on about 10 TB. I never worry about a corrupt file system, never have to fsck it. And snapshots are cheap and fast. I shapshot the entire 10 TB array in about 30 minutes (about 2000 file systems). Then I back up from the snapshot. In other areas of the disk I do hourly snapshotting. Indeed snapshots are the kill feature for me for ZFS. LVM has snapshots, true, but they are not quick or convenient compared to ZFS. In LVM I can only snapshot to unused space in the volume set. With ZFS you can snapshot as long as you have free space. The integration of volume management and the file system may break a lot of people's ideas of clear separation between layers, but from the admin's point of view it is really nice.
We'll ditch ZFS and Solaris once BtrFS is ready. BtrFS is close, though; should work well for things like home servers, so try it out if you have a large MythTV system.
XFS
30 minutes? That's insane. An LVM2 snapshot would take seconds. I fail to see how that's not quick, and how "lvcreate -s" is less convenient.
I can't even make sense of these two sentences. What you're saying is, an LVM snapshot requires free space, and er, a ZFS snapshot requires free space?
Which of the ZFS features most impact its performance?
Compression enabled by default can't help (available in btrfs).
Checksum for all blocks probably doesn't help, but definitely helps detect corrupt data/corruption (available in btrfs).
Forcing any file that requires more than a single block to use a tree of block pointers probably doesn't help. The dnode only has one block pointer and the block pointer can only point to a single block (no extents). On the plus side, the block size can vary between 512 bytes and 64 KiB per object, so slack space is kept low. If more than a single block is necessary it creates a tree of block pointers. Each block pointer is 128 bytes in size, so the tree can get deep fairly quick.
Three copies of almost all file system structures (such as inodes, but called dnodes in ZFS) by default can't help (which are compressed of course).
Snapshots.
And I don't just mean any snapshots.
Done right, like in ZFS, they are fast.
Faster than BSD's UFS snapshots, faster than using LVM's fs-agnostic snapshots. For people who need them, they're great.
hmmm, well the most obvious feature that ZFS has that Ext4 does not is check summing.
That feature is one reason why ZFS is better (it will tell you if your disk is going bad, and if you have a raid setup, it will go get the good data for you). However, this is also one reason why ZFS is slower... it spends time making sure your data is safe and that it always gives you the correct bits from your disk.
That single feature is why I run FreeBSD (looking forward to kFreeBSD/debian!) on my file server in a mirrored raid configuration. Yes, it is "slower", but I still pull data off that server at over 50MB/sec on my home gigabit lan! The specs on that server aren't great either... 2GB ram, and an old 1.6GHZ single core sempron.
Well, don't forget to use that magic rewinding tape that mysteriously never fills no matter how many backups you use it for. Better safe than sorry I always say.
It's still under development. But it's already pretty competitive, doing reasonably well in many tests.
And then there's this (on the last page) "Ending out our tests we had the PostMark test where the performance of the ZFS Linux kernel module done by KQ Infotech and the Lawrence Livermore National Laboratories was slaughtered. The disk transaction performance for ZFS on this native Linux kernel module was even worse than using ZFS-FUSE and was almost at half the speed of this test when run under the OpenSolaris-based OpenIndiana distribution."
Ok, maybe someone can disabuse me of a misconception that I have, but: There's no reason that ZFS in the kernel should be slower than a FUSE version. That means there's something wrong. If they figure out what's wrong and fix it, that could very likely affect the results in some or all of the other tests.
ZFS isn't done yet, and it already looks like it might be worth the trade-off for the features ZFS provides. And performance might get somewhat better. This article is good news (though that final benchmark is distressing, especially when you look at the ZFS running on OpenSolaris).
It says: "When KQ Infotech releases these ZFS packages to the public in January and rebases them against a later version of ZFS/Zpool, we will publish more benchmarks."
and I'm looking forward to that new article.
Actually, I've run into this problem, not with ZFS (haven't used it), but with other filesystems, on Linux only. It seems not all filesystems are truly endian-aware, so moving a USB disk created on a big-endian system and moving it to a little endian system results in a non-working filesystem. Had to actually go and use that system to mount the disk.
Somewhat annoying if you want to pull a RAID array our of a Linux-running big-endian system in the hopes that you can recover the data... only to find out it was using XFS or other non-endian-friendly FS and basically not be able to get at the data...
Question about ZFS, say I have a bunch of ZFS filesystems on a bunch of physical drives or drive arrays on Solaris/OpenSolaris/OpenIndiana.
How do I figure out which physical drives/devices a particular ZFS filesystem depends on?
And if a physical drive is faulty, how would I know which actual physical drive it is? e.g. get its serial number or physical slot/bay/position or whatever.
zpool status
That's the command you are looking for. The zfs-fuse lists disks by id which means if you go into /dev/disks/by-id/ and do a ls -al you'll see which devices they are linked to.
It is done this way to make it easier in Linux, in BSD/Solaris the disks are by gpt name (well they were for me) so this keeps it sane.
Hope it helps.
Maq
L2ARC is a HUGE performance improvement for many workloads, it essentially allows you to use faster disks to cache the most frequently used data. If they had combined the SSD and the 7200 RPM SATA drive and benchmarked a real world workload the ZFS implementation would have probably stomped the others because it would have used the SSD for the 'hot' data, the best you can do with btrfs is to place the metadata on the SSD.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
You mean "> /dev/null"?
Picking on ZFS for being slow when ported to a different OS and running on atypical hardware is like criticizing Stephen Hawking for being a poor juggler. It's focussing on the wrong thing. The goals of ZFS are, in no particular order:
- Scalability to enormous numbers of devices
- Highly assured data integrity via checksumming
- Fault tolerance via redundancy
- Manageability/usability features (i.e., snapshots) that conventional file systems simply don't have
Oh, and if it's fast, well, that's gravy.
Am I part of the core demographic for Swedish Fish?
Thanks for replying like a jerk, that really helps us all out. Nobody is going to simply transition to a new way of doing things just because it's new, they need to know what they'll get from the new way that makes the transition worthwhile.
BTRFS can probably never be shipped with any other major OS other than linux
It's not BTRFS's fault that other operating systems use licenses with more restrictions than Linux.
Give me Classic Slashdot or give me death!
Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens. A power failure on a large XFS volume is almost guaranteed to lead to truncated files and general lost data. Not so on ZFS.
Glad to know LVM is faster though. However, as I stated before it's not convenient. With ZFS I do the following things:
- snapshot the works every night, and keep 7 days worth of snapshots.
- some directories are snapshotted every night, but I keep 365 snapshots (one year). For example the directories that our financial folk use.
- snapshot important directories every hour, keep 24 hours worth
You simply cannot do that with LVM. Sorry. How would I know how much free volume space to plan for? If I have a 10 TB disk, do I plan to use 6 TB of it and leave 4 TB for snapshots? Snapshots consume as much space as subsequent changes. For the 365 say snapshots, this could be a lot or very little depending on what has been touched.
It's very simple. LVM snapshots require free volume set space. If your volume group is 10 TB, then you must leave unallocated space on it for the snapshots to consume. On ZFS you don't need to do this. Any free space on the file system can be used for either files or snapshots; it's all the same pool. To do snapshots with LVM the way I do with ZFS would require me to set aside a lot of space. Very unefficient and wasteful.
As far as I can tell, BtrFS will work in a similar way to ZFS, bypassing the need for LVM. Which I'm totally okay with.
Try a RAID-10 array of /dev/null's - it's even faster.
Custom electronics and digital signage for your business: www.evcircuits.com
Um... WTF? Compression is a performance *improvement* and a massive one, at that. The trivial cost in CPU time is offset by the massive reduction in IO time, which is more expensive by far. This has been true since 2000 or even earlier. Modern multi-core CPUs just take the CPU penalty from negligible to nonexistent. Unless your CPU cores are all running at 100%, and possibly even if they are, compression will improve performance.
Note that this is true on a wide variety of filesystems; it's nothing special to these particular ones. Hell, NTFS has had built-in compression for a decade or more. You can improve performance on a Windows system by right-clicking the C: drive and selecting Properties -> Compress this drive. You can do it from the command line using
compact.exe /C /S:C:\ /A
This will compress all files in or under the root of the C drive, including hidden or system files (requires admin, of course) and marks all the directories so that any files written to them will also get compressed.
There's no place I could be, since I've found Serenity...
BREAKING NEWS! Journaling filesystems with write caching, including the ever-popular NTFS, are vulnerable to data loss in sudden power failures! Total noobs were left with no idea how to go about fixing the problem.
"If only there were some way to run a check on the file system and perform automatic repairs! OH GOD WHAT DO I DO!?!?!" one commented.
"When information is power, privacy is freedom" - Jah-Wren Ryel
A homage to Spinal tap:
Nigel Tufnel: My RAID array are all RAID-11. Look, right across the rack, RAID-11, RAID-11, RAID-11and...
Marty DiBergi: Oh, I see. And most arrays go up to RAID-10?
Nigel Tufnel: Exactly.
Marty DiBergi: Does that mean it's faster? Is it any faster?
Nigel Tufnel: Well, it's one faster, isn't it? It's not RAID-10. You see, most blokes, you know, will be serving files at RAID-10. You're on RAID-10 here, all the way up, all the way up, all the way up, you're on RAID-10 on your database backup. Where can you go from there? Where?
Marty DiBergi: I don't know.
Nigel Tufnel: Nowhere. Exactly. What we do is, if we need that extra push over the cliff, you know what we do?
Marty DiBergi: Put it up to RAID-11.
Nigel Tufnel: RAID-11. Exactly. One faster.
Marty DiBergi: Why don't you just make RAID-10 faster and make RAID-10 be the top performer and make that a little faster?
Nigel Tufnel: [pause] These go to RAID-11.
Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens. A power failure on a large XFS volume is almost guaranteed to lead to truncated files and general lost data. Not so on ZFS.
On ZFS, if the system goes down uncleanly you should avoid data corruption so long as every part of the chain from ZFS to your hard drive's platters behaves as ZFS expects and writes data in the order it wants. If it doesn't, you can easily end up with filesystem corruption that can't be repaired without dumping the entire contents of the ZFS pool to external storage, erasing it, and recreating the filesystem from scratch. If you're even more unlucky, the corruption will tickle one of the bugs in ZFS and even trying to mount the FS will cause a kernel panic, though this was more of a problem in older versions.
Unless, of course, the files you're storing are already compressed, in that case it's just a pure loss. As with many things, what's "best" is strongly dependent on what you want to do with it.
Half of which's results will be one discussion forum or another where people who are not smug asses thoughtfully took a moment to answer a person's question.
You had time to post this self-important drivel, surely you have time to answer the question as well -- but you elected for the drivel. And you think that somehow says something about the people asking the question rather than about you?
What? That's true of any filesystem, and especially ZFS as practical experience shows. The only way to reliably keep any filesystem going is to keep it on a UPS and talking about 'nine nines' in that context is just laughable.
I keep hearing this shit over and over, mostly on idiot infested Linux distribution and Solaris fanboy forums, and it's just getting unbearable to see.
You make it sound like you need an extra 10 terabytes to backup a 10 terabyte volume with LVM. You don't. It takes a snapshot and the free space you need is for further changes to the volume. ZFS is the same, except it's more intelligent about how it can use any free space over multiple volumes for snapshots and with things like dedpluication it will get much better, but you still need free space to perform them. You make it sound like ZFS snapshots are completely free as I see many ZFS proponents saying, and it's crap. The OP is also right about the time that ZFS snapshots can take. It's far too long.
This is a road Btrfs will have to travel because it also has to be *the* general purpose Linux filesystem and will have to solve problems and be in places where ZFS is not.
What features does ZFS have that ext4 doesnt? Its a simple question, but you had to act like an ass. Good job.
Jeez, where to start? They're night and day. EXT4 has more in common with FAT32 or UFS than it does ZFS.
It's got a handful of core features, all of which are significant on their own:
* copy-on-write, so you know your data gets committed
* integral RAID-like functionality, integrated with the filesystem. This reduces overhead and eliminates the need for archaic RAID controllers (almost) entirely (complete with their shitty firmware and quirks, etc.) - just the controller, please.
* Due to the above two, eliminates the RAID5 write hole
* instant (like, a second or two) snapshotting of very large amounts of data.
* You can transparently 'piggyback' any filesystem on top of ZFS to provide said filesystem with ZFSs' protection
* Integral iSCSI provider. Nice to have with the above feature!
Shortcomings might include:
* No fdisk. IMO it's a bit of a serious limitation, but "it's not needed". Still, it can't help you recover from something like...
* The potential loss of your zpool definition file. Unlike (say) mdraid on Linux, there are no block backups within the filesystem (as far as I know) so the pool definition can tenably be lost (if you have a backup file somewhere, it's easy enough to recover, but still..)
As for the original post "not terribly fast" diss? Sorry, not buying it. They really needed to compare the performance against (say) other ZFS-based systems to show it's utility - there are a lot of people 'forced' to use solaris and or FreeBSD because it's got ZFS. Another significant thing to consider will be its maturity/stability and feature-completeness (eg. FreeBSD is a good way behind Solaris/OS/Illumos in these departments).
Finally, this is still pretty beta code. The only 'significant' not-as-good performance failure is the Postmark benchmark, which may or may not be conclusive (I don't know what it does). If you compare it to this postmark benchmark for PCBSD, it doesn't look that bad (particularly when you consider the above linked article figures are 500 points or so higher across the board than the 'new' benchmarks) - and the new implementation appears better than XFS, which is still quite a decent filesystem.
Oh, yeah - consider it's still 'beta'. Noteably, considerably more 'beta' than Butter. Consider me excited. I'm not going to jump until I get fairly certain news that it's at least as stable as the FreeBSD implementation (while requiring less 'tuning' - bah!); I can do without features if it's stable. CoW and the basic RAID-like implementation on their own is enough to jump ship for.
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers