Slashdot Mirror


Running ZFS Natively On Linux Slower Than Btrfs

An anonymous reader writes "It's been known that ZFS is coming to Linux in the form of a native kernel module done by the Lawrence Livermore National Laboratory and KQ Infotech. The ZFS module is still in closed testing on KQ infotech's side (but LLNL's ZFS code is publicly available), and now Phoronix has tried out the ZFS file-system on Linux and carried out some tests. ZFS on Linux via this native module is much faster than using ZFS-FUSE, but the Solaris file-system in most areas is not nearly as fast as EXT4, Btrfs, or XFS."

40 of 235 comments (clear)

  1. First post! by halfaperson · · Score: 5, Funny

    Using BTRFS :)

    --
    Jesus had a UNIX beard.
  2. Using a first beta slower than stable? Wha?!?!? by tysonedwards · · Score: 4, Insightful

    Who would have thought that a first-release Beta kernel module would not run as fast or be as reliable as the stable implementation for other operating systems, or the stables on Linux?

    --
    Thirty four characters live here.
    1. Re:Using a first beta slower than stable? Wha?!?!? by chrb · · Score: 2, Informative

      Who would have thought that a first-release Beta kernel module would not run as fast or be as reliable as the stable implementation for other operating systems, or the stables on Linux?

      The full release is supposed to be coming out in the first week of January. Given the short time frame, it would seem like this is probably closer to the final release than the words " first beta" imply.

      Surprises:

      • Native ZFS beat XFS on several of the benchmarks - XFS is usually a good performer in these kind of tests
      • Native ZFS does very well on the Threaded IO Test, where it ties for first place.
      • Btrfs is really bad on the SQLite test, taking 5 times longer than XFS on both 2.6.32 and 2.6.37 (bug?)
      • XFS IOzone write performance increased by 45% going from 2.6.32 to 2.6.37 (!) XFS increased on FS-Mark by 37%. I thought XFS would be pretty much at the point where there would be no such great improvements.
      • "Real" Solaris+ZFS gets absolutely slaughtered on the Threaded IO Test and the PostMark Test, with ext4 pushing almost 10x more transactions per second.
      • Tests were done on a SSD, apparently there was no difference in relative performance of the filesystems on SSD versus HD

      Notes:

      • "Real" Solaris+ZFS results are not shown for most tests
      • Would be nice to know how many replicates they did of each test
      • This is an interesting set of results that will get people talking/arguing :-) Thanks to Phoronix for starting the discussion.
  3. Re:They Why ZFS? by klingens · · Score: 5, Insightful

    ext2 is faster than ext3, simply because it does less. ZFS has many, many features most other FS don't have but they do come at a price.

  4. Re:They Why ZFS? by Rakshasa+Taisab · · Score: 4, Insightful

    I can write the fastest file system around, assuming you don't put much weight on the whole 'being able to read the data back' thingie.

    --
    - These characters were randomly selected.
  5. how about versus ZFS on Solaris or FreeBSD? by Anonymous Coward · · Score: 2, Insightful

    On similar hardware of course.

    It occurs to me that ZFS does a lot more than EXT4 and Btrfs too.

  6. Re:They Why ZFS? by outZider · · Score: 3, Insightful

    So, because ext3 implementations on other OSes are slow, that means ext3 is slow? Got it.

    Try running ZFS on FreeBSD, or better yet, on the original OS: Solaris.

    --
    - oZ
    // i am here.
  7. Doomed to failure by license conflict by mattdm · · Score: 4, Interesting

    OpenAFS, which still today provides features unavailable in any other production-ready network filesystem, is a nightmare to use in the real world because of its lack of integration with the mainline kernel. It's licensed under the "IPL", which like the CDDL is free-software/open source but not GPL compatible.

    ZFS is very cool, but this approach is doomed to fail. It's much better to devote resources to getting our native filesystems up to speed -- or, ha, into convincing Oracle to relicense.

    Personally, I was pretty sure Sun was going to go with relicensing under the GPLv3, which gives strong patent protection and would have put them in the hilarious position of being more-FSF free software than Linux. But with Oracle trying to squeeze the monetary blood from every last shred of good that came from Sun, who knows what's gonna happen.

    1. Re:Doomed to failure by license conflict by mattdm · · Score: 2, Interesting

      Um, just who do you think is writing BTRFS? http://en.wikipedia.org/wiki/Btrfs I know its fashionable to knock Oracle every chance you get... but Look at the line:

      As I understand it, Chris Mason brought his btrfs work with him when he started at Oracle, or at least the ideas for it. A kernel hacker of his caliber probably started the job with an agreement of exactly how that was going to go.

      Oracle is a big organization; it's not surprising they act in apparently contradictory ways. They've done a reasonable amount of good open source work and made community contributions. But I stand by the statement that it's impossible to make a good prediction as to what Oracle is going to do with anything that comes from the Sun acquisition -- but you certainly don't need to take my word for it that most of the behavior so far seems to be aimed at short-term monetization rather than long-term community growth.

  8. Different ZFS distros by hoggoth · · Score: 3, Informative

    I was confused as to what versions of ZFS were available on which distros so I made a chart that lists the different distros and which version of ZFS they support:

    http://petertheobald.blogspot.com/2010/11/101-zfs-capable-operating-systems.html

    Hope it's helpful...

    --
    - For the complete works of Shakespeare: cat /dev/random (may take some time)
  9. Re:They Why ZFS? by Cwix · · Score: 3, Insightful

    What features does ZFS have that ext4 doesnt? Its a simple question, but you had to act like an ass. Good job.

    If I have a bicycle that I ride everywhere, and never seen nor heard of a car. I would not know what a car could do for me, would I? SO if someone comes along and says, Hey cars are cool, they are just a little more expensive. I would ask something like.. What features does a car have over a bicycle.

    --
    You are entitled to your own opinions, not your own facts.
  10. Re:That's not a solaris filesystem by datapharmer · · Score: 3, Funny

    You can save your stuff in /dev/null quite fast too!

    I know! It is friggin crazy fast. I've been using it for backups for years. Even with terrabytes of data I've never managed to fill it up or slow it down!

    --
    Get a web developer
  11. Btrfs naming convention by digitaldc · · Score: 3, Funny

    Couldn't they name the file system something better than butterface?

    --
    He who knows best knows how little he knows. - Thomas Jefferson
    1. Re:Btrfs naming convention by timeOday · · Score: 2, Funny

      What are you complaining about? I always thought it was "bitter farts."

    2. Re:Btrfs naming convention by Abstrackt · · Score: 2, Funny

      Unfortunately Gimp was already taken.

      --
      They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance. - Terry Pratchett
  12. Re:They Why ZFS? by caseih · · Score: 5, Interesting

    ZFS is, until BtrFS hits truly enterprise stable, the only FS for large disks, in my opinion. I currently run ZFS on about 10 TB. I never worry about a corrupt file system, never have to fsck it. And snapshots are cheap and fast. I shapshot the entire 10 TB array in about 30 minutes (about 2000 file systems). Then I back up from the snapshot. In other areas of the disk I do hourly snapshotting. Indeed snapshots are the kill feature for me for ZFS. LVM has snapshots, true, but they are not quick or convenient compared to ZFS. In LVM I can only snapshot to unused space in the volume set. With ZFS you can snapshot as long as you have free space. The integration of volume management and the file system may break a lot of people's ideas of clear separation between layers, but from the admin's point of view it is really nice.

    We'll ditch ZFS and Solaris once BtrFS is ready. BtrFS is close, though; should work well for things like home servers, so try it out if you have a large MythTV system.

  13. Re:They Why ZFS? by Anonymous Coward · · Score: 2, Informative

    ZFS is...the only FS for large disks

    XFS

    I shapshot the entire 10 TB array in about 30 minutes (about 2000 file systems)...LVM has snapshots, true, but they are not quick or convenient compared to ZFS.

    30 minutes? That's insane. An LVM2 snapshot would take seconds. I fail to see how that's not quick, and how "lvcreate -s" is less convenient.

    In LVM I can only snapshot to unused space in the volume set. With ZFS you can snapshot as long as you have free space.

    I can't even make sense of these two sentences. What you're saying is, an LVM snapshot requires free space, and er, a ZFS snapshot requires free space?

  14. Re:They Why ZFS? by daha · · Score: 5, Informative

    Which of the ZFS features most impact its performance?

    Compression enabled by default can't help (available in btrfs).

    Checksum for all blocks probably doesn't help, but definitely helps detect corrupt data/corruption (available in btrfs).

    Forcing any file that requires more than a single block to use a tree of block pointers probably doesn't help. The dnode only has one block pointer and the block pointer can only point to a single block (no extents). On the plus side, the block size can vary between 512 bytes and 64 KiB per object, so slack space is kept low. If more than a single block is necessary it creates a tree of block pointers. Each block pointer is 128 bytes in size, so the tree can get deep fairly quick.

    Three copies of almost all file system structures (such as inodes, but called dnodes in ZFS) by default can't help (which are compressed of course).

  15. Re:They Why ZFS? by Anonymous Coward · · Score: 5, Insightful

    Snapshots.
    And I don't just mean any snapshots.
    Done right, like in ZFS, they are fast.
    Faster than BSD's UFS snapshots, faster than using LVM's fs-agnostic snapshots. For people who need them, they're great.

  16. Checksums - 1 feature ZFS has that Ext4 doesn't by yup2000 · · Score: 4, Informative

    hmmm, well the most obvious feature that ZFS has that Ext4 does not is check summing.

    That feature is one reason why ZFS is better (it will tell you if your disk is going bad, and if you have a raid setup, it will go get the good data for you). However, this is also one reason why ZFS is slower... it spends time making sure your data is safe and that it always gives you the correct bits from your disk.

    That single feature is why I run FreeBSD (looking forward to kFreeBSD/debian!) on my file server in a mirrored raid configuration. Yes, it is "slower", but I still pull data off that server at over 50MB/sec on my home gigabit lan! The specs on that server aren't great either... 2GB ram, and an old 1.6GHZ single core sempron.

  17. Re:That's not a solaris filesystem by hedwards · · Score: 2, Insightful

    Well, don't forget to use that magic rewinding tape that mysteriously never fills no matter how many backups you use it for. Better safe than sorry I always say.

  18. Not bad news by wonkavader · · Score: 4, Interesting

    It's still under development. But it's already pretty competitive, doing reasonably well in many tests.

    And then there's this (on the last page) "Ending out our tests we had the PostMark test where the performance of the ZFS Linux kernel module done by KQ Infotech and the Lawrence Livermore National Laboratories was slaughtered. The disk transaction performance for ZFS on this native Linux kernel module was even worse than using ZFS-FUSE and was almost at half the speed of this test when run under the OpenSolaris-based OpenIndiana distribution."

    Ok, maybe someone can disabuse me of a misconception that I have, but: There's no reason that ZFS in the kernel should be slower than a FUSE version. That means there's something wrong. If they figure out what's wrong and fix it, that could very likely affect the results in some or all of the other tests.

    ZFS isn't done yet, and it already looks like it might be worth the trade-off for the features ZFS provides. And performance might get somewhat better. This article is good news (though that final benchmark is distressing, especially when you look at the ZFS running on OpenSolaris).

    It says: "When KQ Infotech releases these ZFS packages to the public in January and rebases them against a later version of ZFS/Zpool, we will publish more benchmarks."

    and I'm looking forward to that new article.

  19. Re:They Why ZFS? by tlhIngan · · Score: 4, Interesting

    The main reason to use ZFS over the other ones, even in cases where the features are the same is that ZFS is more widely available. Admittedly, it's far from universal, but right now it's officially supported in more than one OS. I'm not aware of a filesystem that provides similar functionality to ZFS which is more widely available.

    Actually, I've run into this problem, not with ZFS (haven't used it), but with other filesystems, on Linux only. It seems not all filesystems are truly endian-aware, so moving a USB disk created on a big-endian system and moving it to a little endian system results in a non-working filesystem. Had to actually go and use that system to mount the disk.

    Somewhat annoying if you want to pull a RAID array our of a Linux-running big-endian system in the hopes that you can recover the data... only to find out it was using XFS or other non-endian-friendly FS and basically not be able to get at the data...

  20. Re:They Why ZFS? by TheLink · · Score: 2, Interesting

    Question about ZFS, say I have a bunch of ZFS filesystems on a bunch of physical drives or drive arrays on Solaris/OpenSolaris/OpenIndiana.

    How do I figure out which physical drives/devices a particular ZFS filesystem depends on?

    And if a physical drive is faulty, how would I know which actual physical drive it is? e.g. get its serial number or physical slot/bay/position or whatever.

    --
  21. Re:They Why ZFS? by Maquis196 · · Score: 5, Informative

    zpool status

    That's the command you are looking for. The zfs-fuse lists disks by id which means if you go into /dev/disks/by-id/ and do a ls -al you'll see which devices they are linked to.

    It is done this way to make it easier in Linux, in BSD/Solaris the disks are by gpt name (well they were for me) so this keeps it sane.

    Hope it helps.
    Maq

  22. Re:They Why ZFS? by afidel · · Score: 2, Interesting

    L2ARC is a HUGE performance improvement for many workloads, it essentially allows you to use faster disks to cache the most frequently used data. If they had combined the SSD and the 7200 RPM SATA drive and benchmarked a real world workload the ZFS implementation would have probably stomped the others because it would have used the SSD for the 'hot' data, the best you can do with btrfs is to place the metadata on the SSD.

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  23. Re:They Why ZFS? by Anonymous Coward · · Score: 2, Funny

    I can write the fastest file system around, assuming you don't put much weight on the whole 'being able to read the data back' thingie.

    You mean "> /dev/null"?

  24. For ZFS, speed is a secondary goal by pedantic+bore · · Score: 3, Insightful

    Picking on ZFS for being slow when ported to a different OS and running on atypical hardware is like criticizing Stephen Hawking for being a poor juggler. It's focussing on the wrong thing. The goals of ZFS are, in no particular order:
    - Scalability to enormous numbers of devices
    - Highly assured data integrity via checksumming
    - Fault tolerance via redundancy
    - Manageability/usability features (i.e., snapshots) that conventional file systems simply don't have
    Oh, and if it's fast, well, that's gravy.

    --
    Am I part of the core demographic for Swedish Fish?
  25. Re:They Why ZFS? by Jeff+DeMaagd · · Score: 3, Insightful

    Thanks for replying like a jerk, that really helps us all out. Nobody is going to simply transition to a new way of doing things just because it's new, they need to know what they'll get from the new way that makes the transition worthwhile.

  26. Re:I'm using btrfs on my home partition. by Hatta · · Score: 3, Insightful

    BTRFS can probably never be shipped with any other major OS other than linux

    It's not BTRFS's fault that other operating systems use licenses with more restrictions than Linux.

    --
    Give me Classic Slashdot or give me death!
  27. Re:They Why ZFS? by caseih · · Score: 2, Interesting

    XFS

    Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens. A power failure on a large XFS volume is almost guaranteed to lead to truncated files and general lost data. Not so on ZFS.

    30 minutes? That's insane. An LVM2 snapshot would take seconds. I fail to see how that's not quick, and how "lvcreate -s" is less convenient.

    Glad to know LVM is faster though. However, as I stated before it's not convenient. With ZFS I do the following things:
    - snapshot the works every night, and keep 7 days worth of snapshots.
    - some directories are snapshotted every night, but I keep 365 snapshots (one year). For example the directories that our financial folk use.
    - snapshot important directories every hour, keep 24 hours worth

    You simply cannot do that with LVM. Sorry. How would I know how much free volume space to plan for? If I have a 10 TB disk, do I plan to use 6 TB of it and leave 4 TB for snapshots? Snapshots consume as much space as subsequent changes. For the 365 say snapshots, this could be a lot or very little depending on what has been touched.

    I can't even make sense of these two sentences. What you're saying is, an LVM snapshot requires free space, and er, a ZFS snapshot requires free space?

    It's very simple. LVM snapshots require free volume set space. If your volume group is 10 TB, then you must leave unallocated space on it for the snapshots to consume. On ZFS you don't need to do this. Any free space on the file system can be used for either files or snapshots; it's all the same pool. To do snapshots with LVM the way I do with ZFS would require me to set aside a lot of space. Very unefficient and wasteful.

    As far as I can tell, BtrFS will work in a similar way to ZFS, bypassing the need for LVM. Which I'm totally okay with.

  28. Re:They Why ZFS? by guruevi · · Score: 2, Funny

    Try a RAID-10 array of /dev/null's - it's even faster.

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
  29. Re:They Why ZFS? by cbhacking · · Score: 4, Informative

    Um... WTF? Compression is a performance *improvement* and a massive one, at that. The trivial cost in CPU time is offset by the massive reduction in IO time, which is more expensive by far. This has been true since 2000 or even earlier. Modern multi-core CPUs just take the CPU penalty from negligible to nonexistent. Unless your CPU cores are all running at 100%, and possibly even if they are, compression will improve performance.

    Note that this is true on a wide variety of filesystems; it's nothing special to these particular ones. Hell, NTFS has had built-in compression for a decade or more. You can improve performance on a Windows system by right-clicking the C: drive and selecting Properties -> Compress this drive. You can do it from the command line using

    compact.exe /C /S:C:\ /A

    This will compress all files in or under the root of the C drive, including hidden or system files (requires admin, of course) and marks all the directories so that any files written to them will also get compressed.

    --
    There's no place I could be, since I've found Serenity...
  30. Re:They Why ZFS? by GameboyRMH · · Score: 2, Insightful

    BREAKING NEWS! Journaling filesystems with write caching, including the ever-popular NTFS, are vulnerable to data loss in sudden power failures! Total noobs were left with no idea how to go about fixing the problem.

    "If only there were some way to run a check on the file system and perform automatic repairs! OH GOD WHAT DO I DO!?!?!" one commented.

    --
    "When information is power, privacy is freedom" - Jah-Wren Ryel
  31. Re:They Why ZFS? by ebuck · · Score: 2, Funny

    A homage to Spinal tap:

    Nigel Tufnel: My RAID array are all RAID-11. Look, right across the rack, RAID-11, RAID-11, RAID-11and...
    Marty DiBergi: Oh, I see. And most arrays go up to RAID-10?
    Nigel Tufnel: Exactly.
    Marty DiBergi: Does that mean it's faster? Is it any faster?
    Nigel Tufnel: Well, it's one faster, isn't it? It's not RAID-10. You see, most blokes, you know, will be serving files at RAID-10. You're on RAID-10 here, all the way up, all the way up, all the way up, you're on RAID-10 on your database backup. Where can you go from there? Where?
    Marty DiBergi: I don't know.
    Nigel Tufnel: Nowhere. Exactly. What we do is, if we need that extra push over the cliff, you know what we do?
    Marty DiBergi: Put it up to RAID-11.
    Nigel Tufnel: RAID-11. Exactly. One faster.
    Marty DiBergi: Why don't you just make RAID-10 faster and make RAID-10 be the top performer and make that a little faster?
    Nigel Tufnel: [pause] These go to RAID-11.

  32. Re:They Why ZFS? by makomk · · Score: 2, Informative

    Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens. A power failure on a large XFS volume is almost guaranteed to lead to truncated files and general lost data. Not so on ZFS.

    On ZFS, if the system goes down uncleanly you should avoid data corruption so long as every part of the chain from ZFS to your hard drive's platters behaves as ZFS expects and writes data in the order it wants. If it doesn't, you can easily end up with filesystem corruption that can't be repaired without dumping the entire contents of the ZFS pool to external storage, erasing it, and recreating the filesystem from scratch. If you're even more unlucky, the corruption will tickle one of the bugs in ZFS and even trying to mount the FS will cause a kernel panic, though this was more of a problem in older versions.

  33. Re:They Why ZFS? by sjames · · Score: 3, Insightful

    Unless, of course, the files you're storing are already compressed, in that case it's just a pure loss. As with many things, what's "best" is strongly dependent on what you want to do with it.

  34. Re:They Why ZFS? by Dhalka226 · · Score: 3, Informative

    Half of which's results will be one discussion forum or another where people who are not smug asses thoughtfully took a moment to answer a person's question.

    You had time to post this self-important drivel, surely you have time to answer the question as well -- but you elected for the drivel. And you think that somehow says something about the people asking the question rather than about you?

  35. Re:They Why ZFS? by segedunum · · Score: 2, Informative

    Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens.

    What? That's true of any filesystem, and especially ZFS as practical experience shows. The only way to reliably keep any filesystem going is to keep it on a UPS and talking about 'nine nines' in that context is just laughable.

    I keep hearing this shit over and over, mostly on idiot infested Linux distribution and Solaris fanboy forums, and it's just getting unbearable to see.

    It's very simple. LVM snapshots require free volume set space. If your volume group is 10 TB, then you must leave unallocated space on it for the snapshots to consume.

    You make it sound like you need an extra 10 terabytes to backup a 10 terabyte volume with LVM. You don't. It takes a snapshot and the free space you need is for further changes to the volume. ZFS is the same, except it's more intelligent about how it can use any free space over multiple volumes for snapshots and with things like dedpluication it will get much better, but you still need free space to perform them. You make it sound like ZFS snapshots are completely free as I see many ZFS proponents saying, and it's crap. The OP is also right about the time that ZFS snapshots can take. It's far too long.

    This is a road Btrfs will have to travel because it also has to be *the* general purpose Linux filesystem and will have to solve problems and be in places where ZFS is not.

  36. Re:They Why ZFS? by CAIMLAS · · Score: 2, Informative

    What features does ZFS have that ext4 doesnt? Its a simple question, but you had to act like an ass. Good job.

    Jeez, where to start? They're night and day. EXT4 has more in common with FAT32 or UFS than it does ZFS.

    It's got a handful of core features, all of which are significant on their own:

    * copy-on-write, so you know your data gets committed
    * integral RAID-like functionality, integrated with the filesystem. This reduces overhead and eliminates the need for archaic RAID controllers (almost) entirely (complete with their shitty firmware and quirks, etc.) - just the controller, please.
    * Due to the above two, eliminates the RAID5 write hole
    * instant (like, a second or two) snapshotting of very large amounts of data.
    * You can transparently 'piggyback' any filesystem on top of ZFS to provide said filesystem with ZFSs' protection
    * Integral iSCSI provider. Nice to have with the above feature!

    Shortcomings might include:
    * No fdisk. IMO it's a bit of a serious limitation, but "it's not needed". Still, it can't help you recover from something like...
    * The potential loss of your zpool definition file. Unlike (say) mdraid on Linux, there are no block backups within the filesystem (as far as I know) so the pool definition can tenably be lost (if you have a backup file somewhere, it's easy enough to recover, but still..)

    As for the original post "not terribly fast" diss? Sorry, not buying it. They really needed to compare the performance against (say) other ZFS-based systems to show it's utility - there are a lot of people 'forced' to use solaris and or FreeBSD because it's got ZFS. Another significant thing to consider will be its maturity/stability and feature-completeness (eg. FreeBSD is a good way behind Solaris/OS/Illumos in these departments).

    Finally, this is still pretty beta code. The only 'significant' not-as-good performance failure is the Postmark benchmark, which may or may not be conclusive (I don't know what it does). If you compare it to this postmark benchmark for PCBSD, it doesn't look that bad (particularly when you consider the above linked article figures are 500 points or so higher across the board than the 'new' benchmarks) - and the new implementation appears better than XFS, which is still quite a decent filesystem.

    Oh, yeah - consider it's still 'beta'. Noteably, considerably more 'beta' than Butter. Consider me excited. I'm not going to jump until I get fairly certain news that it's at least as stable as the FreeBSD implementation (while requiring less 'tuning' - bah!); I can do without features if it's stable. CoW and the basic RAID-like implementation on their own is enough to jump ship for.

    --
    ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers