Slashdot Mirror


Running ZFS Natively On Linux Slower Than Btrfs

An anonymous reader writes "It's been known that ZFS is coming to Linux in the form of a native kernel module done by the Lawrence Livermore National Laboratory and KQ Infotech. The ZFS module is still in closed testing on KQ infotech's side (but LLNL's ZFS code is publicly available), and now Phoronix has tried out the ZFS file-system on Linux and carried out some tests. ZFS on Linux via this native module is much faster than using ZFS-FUSE, but the Solaris file-system in most areas is not nearly as fast as EXT4, Btrfs, or XFS."

235 comments

  1. First post! by halfaperson · · Score: 5, Funny

    Using BTRFS :)

    --
    Jesus had a UNIX beard.
  2. They Why ZFS? by BoRegardless · · Score: 1

    If 3 other file systems are "faster", then is ZFS somehow "better"?

    1. Re:They Why ZFS? by klingens · · Score: 5, Insightful

      ext2 is faster than ext3, simply because it does less. ZFS has many, many features most other FS don't have but they do come at a price.

    2. Re:They Why ZFS? by Rakshasa+Taisab · · Score: 4, Insightful

      I can write the fastest file system around, assuming you don't put much weight on the whole 'being able to read the data back' thingie.

      --
      - These characters were randomly selected.
    3. Re:They Why ZFS? by Anonymous Coward · · Score: 1, Insightful

      Sooo, are any of those features I'd particularly care about?
      Ext4 seems to do all my simple needs (and those of my services) require.

    4. Re:They Why ZFS? by Anonymous Coward · · Score: 0, Troll

      do you require random corruption?

      because ext4 it does comes in a fast variant, but that means the random complete disk erasing failure.

    5. Re:They Why ZFS? by HogGeek · · Score: 0, Flamebait

      I'm going to answer your question with a question.

        What exactly is it you care about?

      Since ext4 meets 'your' needs - why continue developeing anything. Close up shop, we're done here...

    6. Re:They Why ZFS? by outZider · · Score: 3, Insightful

      So, because ext3 implementations on other OSes are slow, that means ext3 is slow? Got it.

      Try running ZFS on FreeBSD, or better yet, on the original OS: Solaris.

      --
      - oZ
      // i am here.
    7. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      Ok, let me rephrase.
      Which of the ZFS features most impact its performance? Since OP says features come at a price.

      And, are they shiny and cool? Some details at least...

    8. Re:They Why ZFS? by windcask · · Score: 1

      Does RAID-Z ring a bell?

    9. Re:They Why ZFS? by Cwix · · Score: 3, Insightful

      What features does ZFS have that ext4 doesnt? Its a simple question, but you had to act like an ass. Good job.

      If I have a bicycle that I ride everywhere, and never seen nor heard of a car. I would not know what a car could do for me, would I? SO if someone comes along and says, Hey cars are cool, they are just a little more expensive. I would ask something like.. What features does a car have over a bicycle.

      --
      You are entitled to your own opinions, not your own facts.
    10. Re:They Why ZFS? by caseih · · Score: 5, Interesting

      ZFS is, until BtrFS hits truly enterprise stable, the only FS for large disks, in my opinion. I currently run ZFS on about 10 TB. I never worry about a corrupt file system, never have to fsck it. And snapshots are cheap and fast. I shapshot the entire 10 TB array in about 30 minutes (about 2000 file systems). Then I back up from the snapshot. In other areas of the disk I do hourly snapshotting. Indeed snapshots are the kill feature for me for ZFS. LVM has snapshots, true, but they are not quick or convenient compared to ZFS. In LVM I can only snapshot to unused space in the volume set. With ZFS you can snapshot as long as you have free space. The integration of volume management and the file system may break a lot of people's ideas of clear separation between layers, but from the admin's point of view it is really nice.

      We'll ditch ZFS and Solaris once BtrFS is ready. BtrFS is close, though; should work well for things like home servers, so try it out if you have a large MythTV system.

    11. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      ZFS has many, many features most other FS don't have

      Many of which are unnecessary within the filesystem itself on any well designed and modern OS which has proper separation of various functional layers. On Linux LVM2 and the md layers do the job of ZFS zones and RAIDZ, and they do it without jamming everything into a giant blob of code.

    12. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      Huh. Referring to:

      http://en.wikipedia.org/wiki/Ext4#Delayed_allocation_and_potential_data_loss ?

      If so, that seems a rather biased and inaccurate way to describe it...
      "Other Linux file systems like XFS have never offered ext3-like behavior."

      And certainy is not random.

    13. Re:They Why ZFS? by hedwards · · Score: 1

      Indeed. The main reason to use ZFS over the other ones, even in cases where the features are the same is that ZFS is more widely available. Admittedly, it's far from universal, but right now it's officially supported in more than one OS. I'm not aware of a filesystem that provides similar functionality to ZFS which is more widely available.

      And it's hardly fair to compare a filesystem that's being run in such a convoluted way to one that's able to be much more tightly integrated, especially considering that it's a licensing issue not a technical one that mandates the approach.

      And yes, I've personally used ZFS on both FreeBSD and Solaris, and I haven't had any complaints about speed. Resource utilization yes, but that's been greatly improved.

      I'm sure that Hammer and Btrfs are both great filesystems, but like EXT4FS, they aren't particularly useful in cross platform computing at present, and while servers aren't going to be doing that, it is something to consider when you've got massive arrays of disks that if you can't take it directly that you'll be stuck with some sort of really annoying migration process for the disks as well as the rest of it.

    14. Re:They Why ZFS? by Anonymous Coward · · Score: 2, Informative

      ZFS is...the only FS for large disks

      XFS

      I shapshot the entire 10 TB array in about 30 minutes (about 2000 file systems)...LVM has snapshots, true, but they are not quick or convenient compared to ZFS.

      30 minutes? That's insane. An LVM2 snapshot would take seconds. I fail to see how that's not quick, and how "lvcreate -s" is less convenient.

      In LVM I can only snapshot to unused space in the volume set. With ZFS you can snapshot as long as you have free space.

      I can't even make sense of these two sentences. What you're saying is, an LVM snapshot requires free space, and er, a ZFS snapshot requires free space?

    15. Re:They Why ZFS? by daha · · Score: 5, Informative

      Which of the ZFS features most impact its performance?

      Compression enabled by default can't help (available in btrfs).

      Checksum for all blocks probably doesn't help, but definitely helps detect corrupt data/corruption (available in btrfs).

      Forcing any file that requires more than a single block to use a tree of block pointers probably doesn't help. The dnode only has one block pointer and the block pointer can only point to a single block (no extents). On the plus side, the block size can vary between 512 bytes and 64 KiB per object, so slack space is kept low. If more than a single block is necessary it creates a tree of block pointers. Each block pointer is 128 bytes in size, so the tree can get deep fairly quick.

      Three copies of almost all file system structures (such as inodes, but called dnodes in ZFS) by default can't help (which are compressed of course).

    16. Re:They Why ZFS? by Anonymous Coward · · Score: 5, Insightful

      Snapshots.
      And I don't just mean any snapshots.
      Done right, like in ZFS, they are fast.
      Faster than BSD's UFS snapshots, faster than using LVM's fs-agnostic snapshots. For people who need them, they're great.

    17. Re:They Why ZFS? by icebraining · · Score: 1

      Yeah, especially since:

      The SPL packages provide the Solaris Porting Layer modules for emulating some Solaris primitives in the Linux kernel, as such, this ZFS implementation is not ported to purely take advantage of the Linux kernel design.

    18. Re:They Why ZFS? by tlhIngan · · Score: 4, Interesting

      The main reason to use ZFS over the other ones, even in cases where the features are the same is that ZFS is more widely available. Admittedly, it's far from universal, but right now it's officially supported in more than one OS. I'm not aware of a filesystem that provides similar functionality to ZFS which is more widely available.

      Actually, I've run into this problem, not with ZFS (haven't used it), but with other filesystems, on Linux only. It seems not all filesystems are truly endian-aware, so moving a USB disk created on a big-endian system and moving it to a little endian system results in a non-working filesystem. Had to actually go and use that system to mount the disk.

      Somewhat annoying if you want to pull a RAID array our of a Linux-running big-endian system in the hopes that you can recover the data... only to find out it was using XFS or other non-endian-friendly FS and basically not be able to get at the data...

    19. Re:They Why ZFS? by TheLink · · Score: 2, Interesting

      Question about ZFS, say I have a bunch of ZFS filesystems on a bunch of physical drives or drive arrays on Solaris/OpenSolaris/OpenIndiana.

      How do I figure out which physical drives/devices a particular ZFS filesystem depends on?

      And if a physical drive is faulty, how would I know which actual physical drive it is? e.g. get its serial number or physical slot/bay/position or whatever.

      --
    20. Re:They Why ZFS? by LWATCDR · · Score: 1

      Well they tested on a single SSD.
      I have not used ZFS or Btrfs but I have read a lot about ZFS.
      This is not really the use case for ZFS. ZFS has many features for things like using an SSD to cache for the HDDs , RAID like functions, data compression and so on.
      The idea that a simpler less full featured file system is faster is no big shock.
      I would like to see tests with maybe two wan servers each with say 12 HHDs and an SSD for cacheing. That is more the use case for ZFS than a workstation with a single SSD.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    21. Re:They Why ZFS? by Maquis196 · · Score: 5, Informative

      zpool status

      That's the command you are looking for. The zfs-fuse lists disks by id which means if you go into /dev/disks/by-id/ and do a ls -al you'll see which devices they are linked to.

      It is done this way to make it easier in Linux, in BSD/Solaris the disks are by gpt name (well they were for me) so this keeps it sane.

      Hope it helps.
      Maq

    22. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      zpool status

    23. Re:They Why ZFS? by afidel · · Score: 2, Interesting

      L2ARC is a HUGE performance improvement for many workloads, it essentially allows you to use faster disks to cache the most frequently used data. If they had combined the SSD and the 7200 RPM SATA drive and benchmarked a real world workload the ZFS implementation would have probably stomped the others because it would have used the SSD for the 'hot' data, the best you can do with btrfs is to place the metadata on the SSD.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    24. Re:They Why ZFS? by Anonymous Coward · · Score: 1, Informative

      If you evenly split your 1TB disk into 2 LVM volume sets and volume1 is full, you can't make a snapshot of it. volume2 is sitting there empty, but the snapshot can't use it.

    25. Re:They Why ZFS? by ranulf · · Score: 1
      He's saying that LVM can only snapshot to unallocated space, whereas ZFS can snapshot to space that is allocated to a partition not isn't currently being used.

      This is simply because LVM works at a layer above the FS, whereas ZFS is the filesystem.

    26. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      I've never actually backed myself into this situation so I have no idea, but surely if you just continue to add physical volumes to the existing volume group you can create a logical volume snapshot anywhere within that volume group without having to worry about which physical disk the snapshot is on?

    27. Re:They Why ZFS? by Anonymous Coward · · Score: 2, Funny

      I can write the fastest file system around, assuming you don't put much weight on the whole 'being able to read the data back' thingie.

      You mean "> /dev/null"?

    28. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      ZFS is, until BtrFS hits truly enterprise stable, the only FS for large disks, ...

      Do you even KNOW what a "large disk" is? 10 TB is child's play.

      Go use something like Sun/Oracle SAM/QFS or IBM's GPFS or SGI's XFS (as another poster noted - but only on "native" Irix because in Linux XFS is crappy crippleware).

      ZFS is stink-slow compared to those. Hell, IMO ZFS is stink-slow compared to just about any competent file system. Anybody using ZFS and having performance requirements has to toss way too much hardware at their problems. ZFS has some good features, but speed sure ain't one of them.

      And also note that there's no FOSS alternative to any of those. Sorry, folks, but if you want a file system that can take petabytes worth of 8 GB/sec fiber arrays and run them at line speed transferring data over actual shared file systems for hours on end, FOSS fails and fails badly.

    29. Re:They Why ZFS? by Anonymous Coward · · Score: 1, Informative

      L2ARC is a HUGE performance improvement for many workloads, it essentially allows you to use faster disks to cache the most frequently used data. If they had combined the SSD and the 7200 RPM SATA drive and benchmarked a real world workload the ZFS implementation would have probably stomped the others because it would have used the SSD for the 'hot' data, the best you can do with btrfs is to place the metadata on the SSD.

      L2ARC is just another cache. The ultimate IO limits of the filesystem are still set by limitations of the final backing store.

      So if you're moving lots and lots of data, the L2ARC is pretty useless.

      Set yourself up a ZFS file system, then start benchmarking it. If you're running on Solaris, run something like "iostat -sndxz 1" so you can see actual IO to your physical LUNs every second. Under heavy write load, you'll see ZFS go for extended periods without writing anything, then it'll hang your box badly as it flushes to disk. That's bad for two reasons - the relatively long periods of time ZFS isn't writing are IO opportunities lost, and the hanging of the box is horrible.

      ZFS's IO pattern gives away available bandwidth, and then ZFS hammers your system to its knees.

    30. Re:They Why ZFS? by bigredradio · · Score: 1

      ZFS is not only a filesystem but also contains volume management. It's a filesystem that could replace LVM.

    31. Re:They Why ZFS? by Jeff+DeMaagd · · Score: 3, Insightful

      Thanks for replying like a jerk, that really helps us all out. Nobody is going to simply transition to a new way of doing things just because it's new, they need to know what they'll get from the new way that makes the transition worthwhile.

    32. Re:They Why ZFS? by caseih · · Score: 2, Interesting

      XFS

      Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens. A power failure on a large XFS volume is almost guaranteed to lead to truncated files and general lost data. Not so on ZFS.

      30 minutes? That's insane. An LVM2 snapshot would take seconds. I fail to see how that's not quick, and how "lvcreate -s" is less convenient.

      Glad to know LVM is faster though. However, as I stated before it's not convenient. With ZFS I do the following things:
      - snapshot the works every night, and keep 7 days worth of snapshots.
      - some directories are snapshotted every night, but I keep 365 snapshots (one year). For example the directories that our financial folk use.
      - snapshot important directories every hour, keep 24 hours worth

      You simply cannot do that with LVM. Sorry. How would I know how much free volume space to plan for? If I have a 10 TB disk, do I plan to use 6 TB of it and leave 4 TB for snapshots? Snapshots consume as much space as subsequent changes. For the 365 say snapshots, this could be a lot or very little depending on what has been touched.

      I can't even make sense of these two sentences. What you're saying is, an LVM snapshot requires free space, and er, a ZFS snapshot requires free space?

      It's very simple. LVM snapshots require free volume set space. If your volume group is 10 TB, then you must leave unallocated space on it for the snapshots to consume. On ZFS you don't need to do this. Any free space on the file system can be used for either files or snapshots; it's all the same pool. To do snapshots with LVM the way I do with ZFS would require me to set aside a lot of space. Very unefficient and wasteful.

      As far as I can tell, BtrFS will work in a similar way to ZFS, bypassing the need for LVM. Which I'm totally okay with.

    33. Re:They Why ZFS? by DJProtoss · · Score: 1

      don't forget the intent log -being able to recover from failed power issues is great, but unless you use a separate flash zil device, it ain't quick ('course, that assumes they are using sync'd writes).

      --
      "Success is based on knowing how far to go in going too far"
    34. Re:They Why ZFS? by Bengie · · Score: 1

      People like my cousin who run a data center with 10,000+ hard drives and by requirement must have a File System that has been considered stable for at least 5 years. Any data loss is unacceptable. Unless God targets you with His wrath, you have no excuse for any data loss or corruption.

    35. Re:They Why ZFS? by caseih · · Score: 1

      In an enterprise you're typically dealing with SAN. Just simply "adding physical volumes" isn't quite so simple. What if your disk array is full? Just tack a USB disk on the server? For us, all our SANs are hardware RAID (we don't use RAID-Z), so adding new volumes, as you suggest, involves buying at least 4 disk (RAID-6), sticking them in the chassis and creating a hardware volume set. It's quite an undertaking to expand storage. LVM can certainly accommodate our hardware, but would certainly not be efficient use of our disks. LVM's need to have unallocated space for snapshots has always been a weak spot. LVM snapshots are actually writable, though (BtrFS and ZFS snapshots are read-only). ZFS snapshots may be slower, but they are easier and more flexible. Perhaps some of the speed issues with ZFS snapshots come from the idea of thousands of sub-file-systems. Some overhead there, but the flexibility is totally worth it.

    36. Re:They Why ZFS? by DJProtoss · · Score: 1

      zfs snapshots are much more akin to a block level version of rsnapshot. lvm snapshots are more like zfs clones (although not quite, as even they are done Copy on Write (CoW).

      --
      "Success is based on knowing how far to go in going too far"
    37. Re:They Why ZFS? by clang_jangle · · Score: 1

      ZFS's IO pattern gives away available bandwidth, and then ZFS hammers your system to its knees.

      That's where the tuning foo comes in handy. But you're right, it takes a pretty high level of competence to successfully run ZFS, at least on FreeBSD.

      --
      Caveat Utilitor
    38. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      built ARC caching too.. also scaling data sets into infinity and beyond. There's no need to use a partition ever again. In a year or 2 when solid state drives drop in price, the 'performance' difference between ZFS and other filesystems will not even be noticed. Deduplication. And more.

      ZFS > *

    39. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      XFS

      Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens. A power failure on a large XFS volume is almost guaranteed to lead to truncated files and general lost data. ...

      How much of that historically has been due to running XFS on LVM? LVM ignored file system barriers until very recently - kernel 2.6.30 or so, IIRC.

    40. Re:They Why ZFS? by guruevi · · Score: 2, Funny

      Try a RAID-10 array of /dev/null's - it's even faster.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    41. Re:They Why ZFS? by afidel · · Score: 1

      Like I said, under most real world workloads the L2ARC will have significant impact. There will always be edge cases and artificial benchmarks that can swamp any cache, but I run a midsized enterprise on an array with only 8GB of cache and it absorbs 99.5% of the write workload and a fair percentage of the non-database read workload so a 64GB+ SSD would just be that much better. With L2ARC you can achieve a high 95-99% IOPS watermark with a small dollar investment, and because the cache is servicing most of the hot data it also means that the available backend IOPS are more available to service the hard cases.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    42. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      In an enterprise you're typically dealing with SAN.

      Yes I am, you're right. However the rest of argument is total bunkum: if you are an enterprise user to hope to hell you monitor your disk usage and perform capacity planning. No one waits for their SAN to fill up and then go "Oh shit, now I have to add more disk!". I'm really at a loss as to why people think having to have a bit of space available for an LVM snapshot is a big deal. It's basic capacity planning.

    43. Re:They Why ZFS? by cbhacking · · Score: 4, Informative

      Um... WTF? Compression is a performance *improvement* and a massive one, at that. The trivial cost in CPU time is offset by the massive reduction in IO time, which is more expensive by far. This has been true since 2000 or even earlier. Modern multi-core CPUs just take the CPU penalty from negligible to nonexistent. Unless your CPU cores are all running at 100%, and possibly even if they are, compression will improve performance.

      Note that this is true on a wide variety of filesystems; it's nothing special to these particular ones. Hell, NTFS has had built-in compression for a decade or more. You can improve performance on a Windows system by right-clicking the C: drive and selecting Properties -> Compress this drive. You can do it from the command line using

      compact.exe /C /S:C:\ /A

      This will compress all files in or under the root of the C drive, including hidden or system files (requires admin, of course) and marks all the directories so that any files written to them will also get compressed.

      --
      There's no place I could be, since I've found Serenity...
    44. Re:They Why ZFS? by GameboyRMH · · Score: 2, Insightful

      BREAKING NEWS! Journaling filesystems with write caching, including the ever-popular NTFS, are vulnerable to data loss in sudden power failures! Total noobs were left with no idea how to go about fixing the problem.

      "If only there were some way to run a check on the file system and perform automatic repairs! OH GOD WHAT DO I DO!?!?!" one commented.

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    45. Re:They Why ZFS? by GameboyRMH · · Score: 1

      Look it up. The fact that you compare ext4 with ZFS or BTRFS shows that you know little to nothing about them.

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    46. Re:They Why ZFS? by ne0n · · Score: 1

      To desktop users ext3 is arguably slow. Creating a 4GB file on ext3? Watch the system grind to a halt for a few minutes. Try deleting any file over a few hundred MB, and wait for a few seconds to a few minutes until the OS becomes usable again. It's painful. Anything other than small file operations really put the irons to the ext3 user. Extents probably caused more upgrades to ext4 than any other feature.

      --
      $ :(){ :|:& };:
    47. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      in that case you should hope you have network connectivity on your x-endian system so you can migrate your data over the network which will fix the endian-ness problem.

    48. Re:They Why ZFS? by ebuck · · Score: 2, Funny

      A homage to Spinal tap:

      Nigel Tufnel: My RAID array are all RAID-11. Look, right across the rack, RAID-11, RAID-11, RAID-11and...
      Marty DiBergi: Oh, I see. And most arrays go up to RAID-10?
      Nigel Tufnel: Exactly.
      Marty DiBergi: Does that mean it's faster? Is it any faster?
      Nigel Tufnel: Well, it's one faster, isn't it? It's not RAID-10. You see, most blokes, you know, will be serving files at RAID-10. You're on RAID-10 here, all the way up, all the way up, all the way up, you're on RAID-10 on your database backup. Where can you go from there? Where?
      Marty DiBergi: I don't know.
      Nigel Tufnel: Nowhere. Exactly. What we do is, if we need that extra push over the cliff, you know what we do?
      Marty DiBergi: Put it up to RAID-11.
      Nigel Tufnel: RAID-11. Exactly. One faster.
      Marty DiBergi: Why don't you just make RAID-10 faster and make RAID-10 be the top performer and make that a little faster?
      Nigel Tufnel: [pause] These go to RAID-11.

    49. Re:They Why ZFS? by ne0n · · Score: 1

      Here's hoping there's a push for clean integration into the user experience, a la Sun's TimeSlider integration in Nautilus. I was surprised to use Windows 7 recently, it's got a crude TimeSlider now. I think OSX has something similar too.

      I'm surprised nobody in Canonical has put this forward yet but it's only a matter of time, once btrfs is declared stable.

      --
      $ :(){ :|:& };:
    50. Re:They Why ZFS? by Anonymous Coward · · Score: 1, Informative

      Don't do this for any files that regularly get random writes (like, say, database files). Compression uses bigger blocks (64K I think) so a write to a single block becomes a decompression of several blocks, an update and a recompression of the blocks. Which will kill performance.

    51. Re:They Why ZFS? by sirsnork · · Score: 1

      Not two tests, but a good overview of the benefits you can get from a ZFS system compared to some other storage options. http://www.anandtech.com/show/3963/zfs-building-testing-and-benchmarking

      --

      Normal people worry me!
    52. Re:They Why ZFS? by davester666 · · Score: 0

      Welcome to the Internet.

      Obviously, you just started using it, otherwise you would know that if you want to find out information about something, you type in what you want information about in the 'Search' field at the top of your browser window.

      --
      Sleep your way to a whiter smile...date a dentist!
    53. Re:They Why ZFS? by makomk · · Score: 2, Informative

      Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens. A power failure on a large XFS volume is almost guaranteed to lead to truncated files and general lost data. Not so on ZFS.

      On ZFS, if the system goes down uncleanly you should avoid data corruption so long as every part of the chain from ZFS to your hard drive's platters behaves as ZFS expects and writes data in the order it wants. If it doesn't, you can easily end up with filesystem corruption that can't be repaired without dumping the entire contents of the ZFS pool to external storage, erasing it, and recreating the filesystem from scratch. If you're even more unlucky, the corruption will tickle one of the bugs in ZFS and even trying to mount the FS will cause a kernel panic, though this was more of a problem in older versions.

    54. Re:They Why ZFS? by sjames · · Score: 3, Insightful

      Unless, of course, the files you're storing are already compressed, in that case it's just a pure loss. As with many things, what's "best" is strongly dependent on what you want to do with it.

    55. Re:They Why ZFS? by Dhalka226 · · Score: 3, Informative

      Half of which's results will be one discussion forum or another where people who are not smug asses thoughtfully took a moment to answer a person's question.

      You had time to post this self-important drivel, surely you have time to answer the question as well -- but you elected for the drivel. And you think that somehow says something about the people asking the question rather than about you?

    56. Re:They Why ZFS? by Galactic+Dominator · · Score: 1

      ZFS is both a filesystem and volume manager. I can't see how anyone would actually prefer the LVM management style to the All-in-One of ZFS, but whatever cocks their pistol.

      Also it's absolutely shocking that phoronix would have benchmark which resulted in a Linux component clearly out preforming a roughly equivalent component from another OS. That's not their MO or anything. I'm sure they took great pains to ensure equality as they always do.

      ZFS/RAIDZ is a great thing, but raw performance is not it's strength.

      --
      brandelf -t FreeBSD /brain
    57. Re:They Why ZFS? by BronsCon · · Score: 1

      'mount /dev/null /' ?

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    58. Re:They Why ZFS? by davester666 · · Score: 1

      Give a man a fish, and he's fed for a day.

      Teach a man how to fish, and he's fed for life.

      --
      Sleep your way to a whiter smile...date a dentist!
    59. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      Give a man a fire, and he will be warm for a day. Light a man on fire and he will be warm for the rest of his life...

    60. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      Stupid fscking hard disks.

    61. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      L2ARC isn't going to have as much impact as the kernel's page cache, which is in RAM. With any decent amount of RAM, when you're using paged IO the page cache hit rate isn't going to be much lower than the L2ARC hit rate. About the only place L2ARC is going to have significant impact is in synchronous write operations.

    62. Re:They Why ZFS? by ChatHuant · · Score: 1

      Give a man a fish, and he's fed for a day.

      Teach a man how to fish, and he's fed for life.

      Give a man a fire and he's warm for a day.

      Set a man on fire and he's warm for the rest of his life.

      (with thanks to Sir Terry Pratchett, who's responsible for a lot of the hilarity in the Chathuant household.)

    63. Re:They Why ZFS? by segedunum · · Score: 2, Informative

      Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens.

      What? That's true of any filesystem, and especially ZFS as practical experience shows. The only way to reliably keep any filesystem going is to keep it on a UPS and talking about 'nine nines' in that context is just laughable.

      I keep hearing this shit over and over, mostly on idiot infested Linux distribution and Solaris fanboy forums, and it's just getting unbearable to see.

      It's very simple. LVM snapshots require free volume set space. If your volume group is 10 TB, then you must leave unallocated space on it for the snapshots to consume.

      You make it sound like you need an extra 10 terabytes to backup a 10 terabyte volume with LVM. You don't. It takes a snapshot and the free space you need is for further changes to the volume. ZFS is the same, except it's more intelligent about how it can use any free space over multiple volumes for snapshots and with things like dedpluication it will get much better, but you still need free space to perform them. You make it sound like ZFS snapshots are completely free as I see many ZFS proponents saying, and it's crap. The OP is also right about the time that ZFS snapshots can take. It's far too long.

      This is a road Btrfs will have to travel because it also has to be *the* general purpose Linux filesystem and will have to solve problems and be in places where ZFS is not.

    64. Re:They Why ZFS? by fe105 · · Score: 1

      I did some benchmarking a while ago on freebsd7 solaris10 and centos4.6, all on the same hardware. At the time, Veritas came out quite well.

      on 8 disks, raidz/5:

      FreeBSD7: 144/73 MB/s r/w using dd
      Solaris10: 150/92
      Fedora7 md raid: 164/130
      Solaris/veritas: 169/35

      raid0/stripe:

      FreeBSD7 (zfs): 236/155
      Solaris10 (zfs): 169/132
      Fedora7 (md0): 259/188
      Centos4+vertias: 236/270

      Solaris10 (zfs) random write was as fast as veritas and much faster than all the others (~300MB/s versus 150-ish).

      I'm now running btrfs and am quite happy with it, should probably run those benchmarks again and clean up the page a bit http://www.crystalconsulting.eu/bench/

    65. Re:They Why ZFS? by falconwolf · · Score: 1

      I've disagreed with you before but I agree here.

      Falcon

    66. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      I hadn't even caught that in the first post

      10TB

      being large disk. Hell I've got that much in my home server what with 5x 2TB drives. If he'd been talking 10PTB, I'd understand ZFS being of use there.

    67. Re:They Why ZFS? by suutar · · Score: 1

      I kinda like the looks of block-level data deduplication, but depending on what you're doing it may be useless to you.

    68. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      Close up shop, we're done here...

      If by done, you mean done listening to you, ever again, then yes, we are indeed done.

      Twit.

    69. Re:They Why ZFS? by falconwolf · · Score: 1

      ZFS is, until BtrFS hits truly enterprise stable, the only FS for large disks

      What about HFS+? It can work with large drives, up to 8EB.

      Is because it's an Apple format?

      Falcon

    70. Re:They Why ZFS? by jd · · Score: 1

      It's an interesting question, but not necessarily the right question. I'll explain what I mean. In some cases, a UDP connection with error-handling and retry mechanisms at each end will be faster than a TCP connection. They have the same feature set, but the results are different.

      In this case, the question is surely "what features does ZFS have that (some other fs) does not, what is the cost for each feature, and for those features duplicatable outside the FS, what would be the cost to gain those features by other means?"

      Without the additional information, you have an incomplete picture as you don't know the cost:benefit ratio of that particular implementation versus other implementations. It is only the complete picture that would let you get a good understanding for how the FS plays out in practice.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    71. Re:They Why ZFS? by ArsonSmith · · Score: 1

      Hmm, A mirrored set of mirrors. I don't think that's going to be fast at all. And it's going to waist a lot of space.

      --
      Paying taxes to buy civilization is like paying a hooker to buy love.
    72. Re:They Why ZFS? by dannycim · · Score: 1

      XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens. A power failure on a large XFS volume is almost guaranteed to lead to truncated files and general lost data. Not so on ZFS.

      [Citation Needed] as wikipedia would say. XFS is no more prone to data corruption than any other journalled filesystem in the event of unexpected halts.

      You should see the fireworks I got on Solaris 10 while I was running a script that did a bunch of zpool commands just as the power went out. Borked everything.

      I love ZFS, but I'm not deluded into thinking it's magic.

    73. Re:They Why ZFS? by jd · · Score: 1

      My understanding of daha's post is that since (some) other FS' also support things like compression, compression cannot be used as a definitive reason for using ZFS. Of course, this depends on the type of compression used (not all methods are equal) and on how good the different implementations are, but compression in and of itself is not a differentiator when both sides of the equation use it.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    74. Re:They Why ZFS? by ArsonSmith · · Score: 1

      I hate how LVM has to snapshot to unused space where ZFS has the advanced feature to snapshot to free space.

      --
      Paying taxes to buy civilization is like paying a hooker to buy love.
    75. Re:They Why ZFS? by jd · · Score: 1

      Depends. Is the null device in userspace or kernelspace? If the latter, you've context switches to include.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    76. Re:They Why ZFS? by Frnknstn · · Score: 1

      As a user of ZFS for a database, I disagree with your post on several points.

      Firstly, ZFS compression works on 128k (uncompressed) blocks. By default. If you wish to try and game the system, you can adjust the block size to match that of your database (Postgres, as an example, uses 8k blocks: no matter how small a row is, it will result in a read of at least 8k.)

      Secondly, on a modern system there are so many levels of caching and IO-optimization, it is almost impossible to get a hard drive that only reads the 8k in question. The OS, your DBMS, or even the drive itself will probably read and cache the surrounding bytes, so there will probably be no additional IO latency.

      Thirdly, on rotational media such as hard drives, the rotational time required to read 8k or 128k is trivial in the purest sense. The seek time is by far and away the determining factor on these drives, *especially* if your load is mostly random-access.

      Fourthly, and on a related tack, random access writes are not as random as you think. Database internals, and database design best practices make this so. The often-changed parts of your database tend to end up clustered together. Further, chances are if your load is really that high, there will be multiple rows in a cached 128k block that need updating. If your load /isn't/ all that high, why are we having this conversation?

      Finally, to cover the bases, remember that in almost every database server, there is always processing time to spare. Once again random access databases illustrate this best, with access times almost exclusively being the determining factor in throughput.

      --
      If it's in you sig, it's in your post.
    77. Re:They Why ZFS? by CAIMLAS · · Score: 2, Informative

      What features does ZFS have that ext4 doesnt? Its a simple question, but you had to act like an ass. Good job.

      Jeez, where to start? They're night and day. EXT4 has more in common with FAT32 or UFS than it does ZFS.

      It's got a handful of core features, all of which are significant on their own:

      * copy-on-write, so you know your data gets committed
      * integral RAID-like functionality, integrated with the filesystem. This reduces overhead and eliminates the need for archaic RAID controllers (almost) entirely (complete with their shitty firmware and quirks, etc.) - just the controller, please.
      * Due to the above two, eliminates the RAID5 write hole
      * instant (like, a second or two) snapshotting of very large amounts of data.
      * You can transparently 'piggyback' any filesystem on top of ZFS to provide said filesystem with ZFSs' protection
      * Integral iSCSI provider. Nice to have with the above feature!

      Shortcomings might include:
      * No fdisk. IMO it's a bit of a serious limitation, but "it's not needed". Still, it can't help you recover from something like...
      * The potential loss of your zpool definition file. Unlike (say) mdraid on Linux, there are no block backups within the filesystem (as far as I know) so the pool definition can tenably be lost (if you have a backup file somewhere, it's easy enough to recover, but still..)

      As for the original post "not terribly fast" diss? Sorry, not buying it. They really needed to compare the performance against (say) other ZFS-based systems to show it's utility - there are a lot of people 'forced' to use solaris and or FreeBSD because it's got ZFS. Another significant thing to consider will be its maturity/stability and feature-completeness (eg. FreeBSD is a good way behind Solaris/OS/Illumos in these departments).

      Finally, this is still pretty beta code. The only 'significant' not-as-good performance failure is the Postmark benchmark, which may or may not be conclusive (I don't know what it does). If you compare it to this postmark benchmark for PCBSD, it doesn't look that bad (particularly when you consider the above linked article figures are 500 points or so higher across the board than the 'new' benchmarks) - and the new implementation appears better than XFS, which is still quite a decent filesystem.

      Oh, yeah - consider it's still 'beta'. Noteably, considerably more 'beta' than Butter. Consider me excited. I'm not going to jump until I get fairly certain news that it's at least as stable as the FreeBSD implementation (while requiring less 'tuning' - bah!); I can do without features if it's stable. CoW and the basic RAID-like implementation on their own is enough to jump ship for.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    78. Re:They Why ZFS? by CAIMLAS · · Score: 1

      OS friendliness is one thing; endian (hardware) friendliness is another.

      That said, "run on multiple architectures" is a bit of a misnomer, considering the difference between ZFS on FreeBSD and Solaris. It's night/day. There are very few, if any, filesystems which are truly portable in that fashion, that I'm aware of. (NTFS on Linux, maybe?)

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    79. Re:They Why ZFS? by Lennie · · Score: 1

      "The OS, your DBMS, or even the drive itself will probably read and cache the surrounding bytes, so there will probably be no additional IO latency."

      IF it is located nearby, it will help, but that obviously isn't always true. Maybe not even true many times.

      --
      New things are always on the horizon
    80. Re:They Why ZFS? by Lennie · · Score: 1

      "* Integral iSCSI provider. Nice to have with the above feature!"

      This is not a function of ZFS but OpenSolaris, Linux is can act as a iSCSI target as well.

      --
      New things are always on the horizon
    81. Re:They Why ZFS? by Lennie · · Score: 1

      They did actually benchmark openindiana as well, which includes the lastest version of ZFS which is available through open source.

      Which was faster in some tests and slower in some other tests. It was mostly slower in tests that did include openindiana, but we don't know how it performed in the others. So the test was kind of useless.

      Phoronix tested with a single HHD and did tests with a single SSD.

      I do however think that the important lesson here is, that from the few people who tested it, on large installations (a large number of SSDs), ZFS/FreeBSD is faster then btrfs/Linux. For example, from Aug 05 2010:

                                ZFS BtrFS
      1 SSD 256 MiByte/s 256 MiByte/s
      2 SSDs 505 MiByte/s 504 MiByte/s
      3 SSDs 736 MiByte/s 756 MiByte/s
      4 SSDs 952 MiByte/s 916 MiByte/s
      5 SSDs 1226 MiByte/s 986 MiByte/s
      6 SSDs 1450 MiByte/s 978 MiByte/s
      8 SSDs 1653 MiByte/s 932 MiByte/s
      16 SSDs 2750 MiByte/s 919 MiByte/s

      http://marc.info/?l=linux-btrfs&m=128101763830740&w=2

      --
      New things are always on the horizon
    82. Re:They Why ZFS? by Lennie · · Score: 1

      How about 16 SSD's ?:

                                ZFS BtrFS
      1 SSD 256 MiByte/s 256 MiByte/s
      2 SSDs 505 MiByte/s 504 MiByte/s
      3 SSDs 736 MiByte/s 756 MiByte/s
      4 SSDs 952 MiByte/s 916 MiByte/s
      5 SSDs 1226 MiByte/s 986 MiByte/s
      6 SSDs 1450 MiByte/s 978 MiByte/s
      8 SSDs 1653 MiByte/s 932 MiByte/s
      16 SSDs 2750 MiByte/s 919 MiByte/s

      http://marc.info/?l=linux-btrfs&m=128101763830740&w=2

      --
      New things are always on the horizon
    83. Re:They Why ZFS? by Lennie · · Score: 1

      Data loss because God targets you ? That is what the off-site back up is for right ?

      --
      New things are always on the horizon
    84. Re:They Why ZFS? by grumbel · · Score: 1

      as wikipedia would say. XFS is no more prone to data corruption than any other journalled filesystem in the event of unexpected halts.

      Little anecdote: Using reiserfs and ext3 on LVM I had zero data corruption over the last five years. After switching to XFS, also on LVM, I had data corruption on the very first day of using it and again and again in the coming weeks. XFS basically killed files on every second crash (which where the fault of faulty OpenGL drivers). Switching back to ext3, zero data corruption since.

      I'll never touch XFS again and I really can't see the point of it. Why have journaling in the first place when you have to restore from a backup after a crash anyway?

    85. Re:They Why ZFS? by Lennie · · Score: 1

      OK, I should add, after some changes/tuning he got:

      Reference figures:
      16* single disk (theoretical limit): 4092 MiByte/s
      fio data layer tests (achievable limit): 3250 MiByte/s
      ZFS performance: 2505 MiByte/s

      BtrFS figures:
      IOzone on 2.6.32: 919 MiByte/s
      fio btrfs tests on 2.6.35: 1460 MiByte/s
      IOzone on 2.6.35 with crc32c: 1250 MiByte/s
      IOzone on 2.6.35 with crc32c_intel: 1629 MiByte/s
      IOzone on 2.6.35, using -o nodatasum: 1955 MiByte/s

      --
      New things are always on the horizon
    86. Re:They Why ZFS? by Lennie · · Score: 1

      OK, maybe I should add that after some tuning it got much better:

      Reference figures:
      16* single disk (theoretical limit): 4092 MiByte/s
      fio data layer tests (achievable limit): 3250 MiByte/s
      ZFS performance: 2505 MiByte/s

      BtrFS figures:
      IOzone on 2.6.32: 919 MiByte/s
      fio btrfs tests on 2.6.35: 1460 MiByte/s
      IOzone on 2.6.35 with crc32c: 1250 MiByte/s
      IOzone on 2.6.35 with crc32c_intel: 1629 MiByte/s
      IOzone on 2.6.35, using -o nodatasum: 1955 MiByte/s

      --
      New things are always on the horizon
    87. Re:They Why ZFS? by TheLink · · Score: 1

      Yes, but say I see "c7d0s0 ONLINE" how do I know which drive c7d0s0 really is (get its serial number or SCSI ID), and which file systems (including stuff like swap) are depending on it?

      --
    88. Re:They Why ZFS? by Yosho · · Score: 1

      I've got the opposite anecdote. I've been using XFS on several drives for years and never had data corruption problem. I've had three drives with ext3 that had corrupted files.

      So... do you know of any statistics?

      --
      Karma: Terrifying (mostly affected by atrocities you've committed)
    89. Re:They Why ZFS? by TheLink · · Score: 1

      On a Linux box I can use stuff similar to "smartctl -a /dev/sda", and then from "/dev/sda" I can figure out which PV, then from PVs its LVs, then filesystems.

      On opensolaris the "smartctl" equivalent appears to be:
      kstat -p cmdkerror

      And I see:
      cmdkerror:0:cmdk0,error:class device_error
      cmdkerror:0:cmdk0,error:crtime 73.574017361
      cmdkerror:0:cmdk0,error:Device Not Ready 0
      cmdkerror:0:cmdk0,error:Hard Errors 0
      cmdkerror:0:cmdk0,error:Illegal Request 0
      cmdkerror:0:cmdk0,error:Media Error 0
      cmdkerror:0:cmdk0,error:Model WDC WD200BB-75A
      cmdkerror:0:cmdk0,error:No Device 0
      cmdkerror:0:cmdk0,error:Recoverable 0
      cmdkerror:0:cmdk0,error:Revision
      cmdkerror:0:cmdk0,error:Serial No WD-WMA6Y3097081
      cmdkerror:0:cmdk0,error:Size 20020396032
      cmdkerror:0:cmdk0,error:snaptime 2583.235683722
      cmdkerror:0:cmdk0,error:Soft Errors 0
      cmdkerror:0:cmdk0,error:Transport Errors 0

      And say there were some recoverable errors how would I know which zpool is on "cmdk0"?

        zpool status
          pool: rpool
        state: ONLINE
        scrub: none requested
      config:

                      NAME STATE READ WRITE CKSUM
                      rpool ONLINE 0 0 0
                          c7d0s0 ONLINE 0 0 0

      Or maybe kstat isn't the right tool for this? What would be the right tool then?

      --
    90. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      Somebody's going to have to learn to accept the unacceptable...

      Sounds like an old boss we had. Guy wanted guaranteed results and zero risk. Some people are out of touch with reality.

    91. Re:They Why ZFS? by hardwarefreak · · Score: 1

      Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens. A power failure on a large XFS volume is almost guaranteed to lead to truncated files and general lost data. Not so on ZFS.

      You are just full of misinformation. Sudden downing of any _busy_ system due to power loss or panic is going to lead to data loss due to the buffer cache in Linux, and other factors. Full stop. Data loss != data corruption. XFS is absolutely not prone to "data corruption" under any circumstances. If you pull the plug on any system running any filesystem, you _will_ get truncated files and loss of data. This is not a function of the filesystem per se, but of the Linux buffer cache, and any write caches on the RAID card and/or disk drives themselves. Write all of your applications to fsync and the truncation after power loss problem will be less severe, but your performance will suck horribly. Again, this is equal for _all_ filesystems, not just XFS.

      Maximum performance and maximum data integrity have always been mutually exclusive, and always will be. XFS kicks the crap out of all comers in write performance with delayed logging enabled. This means more data is held in RAM before flushing, more so than with the standard config, optimizing data transfer throughput and placement on disk decreasing fragmentation. Yes, this comes at a cost: If power drops, everything pending in the write buffer gets lost. This is why (real) datacenters are designed with many large and redundant UPSs and generators. However, again, data loss != data corruption. You claim above that XFS is prone to data corruption. That is horse shit.

      Don't spread FUD. You don't use XFS, or at least any remotely recent (2007 on) version of it, so you don't really have a clue. And you obviously don't have an understanding of the function of the Linux VFS system and the buffer cache. As I stated, _any_ filesystem will suffer truncation and data loss if a busy system loses power when many writes are pending. If your apps make heavy use of fsync, you won't suffer as many truncations, but you will still have them occur, with ZFS, EXTx, BTRFS, etc. It _will_ happen with all of these. The only difference will be the severity, depending on how many write transactions were in flight at the time the plug was pulled.

      I'm anxious to see you claim again, in your response to my points above, that ZFS is immune to truncation due to power loss. The only way this is possible is by disabling all caching, on the disk drives and controllers up through the entire software stack on the host, and using only applications that call fsync on every write operation. If this was actually done by anyone, disk write performance would drop to the point the system would be unusable. And this still wouldn't guarantee you wouldn't have a single file in the write buffer that gets truncated.

    92. Re:They Why ZFS? by rodgerd · · Score: 1

      LARC is a huge benefit in the Solaris world, where the comparison is UFS. Solaris UFS is unbelievably primitive, and the reason why job 1 in many SOlaris shops is to buy a third-party filesytem for any serious work.

      In the Linux world, where aggressive use of caching has been standard for over a decade it compares a lot less favourably.

      That said, btrfs management is a shit sandwich. ZFS's management tools actually let you, you know, use the advertised features of the filesystem.

    93. Re:They Why ZFS? by Frnknstn · · Score: 1

      I apologise, I wasn't clear at that point. Allow me to restate: In that point, I was highlighting that an 8k read will often result in more that 8k being read from the disk. The drive can read more, the OS IO subsystems could predict that the adjacent blocks be needed and also request them, or your database can look at the structure and ask for surrounding data, too.

      --
      If it's in you sig, it's in your post.
    94. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      It must be opposites day. Here's a second anecdote to go with the post above. I have multiple medium to large (500GB to 3TB) XFS file systems and I have not suffered data loss on any of them. In fact I currently have two 3.1TB XFS filesystems which are acting as a backing store to a 3.1TB distributed replicating GlusterFS volume which is currently under heavy write loads as I sync 2.2TB of existing data on to it, and it's performed like a champ.

    95. Re:They Why ZFS? by grumbel · · Score: 1

      How often did your system crash? As the trouble with XFS isn't when everything goes to plan, but when it doesn't and the system goes down unexpected for one reason or another.

    96. Re:They Why ZFS? by Cwix · · Score: 1

      I'm glad we could find something to agree upon. Any harsh words I had earlier, I apologize for overreacting.

      This would be a better world if everyone (especially myself) can remember that there is always some common ground.
      - Cwix

      --
      You are entitled to your own opinions, not your own facts.
    97. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      I'm the AC you replied to and I was responding to a comment about NTFS which does, I think, use 64K blocks.

      Running SQL Server or Oracle (and probably any other database) on Windows the OS is definitely not doing any caching. Data is copied directly from the database process's address space to disk and vice versa. The drive may well be doing some caching, but that's it.

      But this is irrelevant anyway. Combined with a database, file system compression (at best) converts a single write to a single read and a single write. You've just doubled your I/O (at best).

      If compression was such a win for databases they would have been doing it themselves before file system support was available. And in some specialized cases (e.g. read-only column stores for data mining) they do. But in the general case, it really isn't a good idea.

    98. Re:They Why ZFS? by raynet · · Score: 1

      I've had plenty of crashes with XFS due to rtorrent triggering a bug in kernel when using XFS. No problems so far with corruption and as all come with a torrent, I can always check them for problems (and easily repair them). Still I do prefer ZFS and use it on my fileservers, the reason I use XFS is that it is fast in deleting gazillion files and falloc on XFS is insanely fast, which is nice when you are allocating dozen 11GB files from torrents.

      Also I managed to "crash" my ZFS, was running scrub on the bigger array and one of the hdds died so that it would read the data but only after lots of *whiir-click* sounds and after doing this for several hours the drive would end up in weird mode, wouldn't accept commands and Solaris would freeze. Solaris never marked the drive as faulty, had to do that manually.

      --
      - Raynet --> .
    99. Re:They Why ZFS? by Maquis196 · · Score: 1

      From memory, if you run format in Solaris you'll get some more information on which disk is what.

      As for the solaris naming convention you might need to look inside your system and work out which drive it is, the d0 means its the first disk on that controller, if you have more then one controller then you'll have more trouble working out which is which.

      prtdiag will give you more information about whats in the box and you might be able to work out the controller through that AND delving into /dev/pci*/ it find out.

    100. Re:They Why ZFS? by LWATCDR · · Score: 1

      I did read that. That is why I said that I didn't think this test was using a good use case for ZFS.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    101. Re:They Why ZFS? by LWATCDR · · Score: 1

      But here is the question?
      How much ARC was used?
      Did they use L2ARC? If so that may not have been ideal when using SSDs.
      Did they use a ZIL drive?
      Does Btrfs support thinks like block deduplication compression, and snapshot shipping?
      What about adaptive replacement cache (ARC), L2ARC and intent log drives?

      I am not dismissing Btrfs at all. It sounds like it is an excellent workstation file system and probably a very good server file system. I do not know if it has all the features that ZFS has and what it doesn't have it may gain.
      Truth is that things like L2ARC, ARC, ZIL, block deduplication, and compression are not all that useful on most workstations.
      But on a SAN or NAS server they can be really good to have.
      There is more to a file system than just speed.
      I will probably use btrfs on my next workstation. But I really want to use ZFS for my next NAS build.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    102. Re:They Why ZFS? by CAIMLAS · · Score: 1

      No, it actually is a filesystem feature (of sorts). The ability to export iSCSI 'from the filesystem' that works well, with the filesystem, is a nice added bonus.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    103. Re:They Why ZFS? by Just+Some+Guy · · Score: 1

      ZFS supports per-filesystem settings. I have "compression=no" on my MP3 collection.

      --
      Dewey, what part of this looks like authorities should be involved?
    104. Re:They Why ZFS? by TheLink · · Score: 1

      Basically I want to detect when a drive is having errors, and if it is, figure out what file systems are affected. And preferably automate that- after all an enterprise class OS should make something like that easy to automate right? But I can't seem to find info on this. Any ideas? What am I missing?

      I remember trying to make sense of the solaris naming convention and seems it's not really very consistent e.g. sometimes the target is omitted, so instead of c0t3d0s6 you get c0d0s6, and so on.

      Then after that there's the "cmdk0" naming convention (and the sd equiv?). I haven't figured out a way to link those to the relevant device.

      --
    105. Re:They Why ZFS? by Frnknstn · · Score: 1

      Given that this article is about ZFS, and this thread was originally about ZFS filesystem compression, I assumed that the mention of NTFS was incidental.

      As I don't have access to the source code for SQL Server or Oracle, I find it difficult to comment on its internals. By default I would expect Windows to keep the data it writes in the page cache and perform read ahead caching, like it does for all the other IO, ever since MS SmartDrive for DOS. I would also have suspected that the DBMSs you mentioned kept some kind of cache for themselves, but I will defer to you on this as well.

      You may want to get hold of the Oracle people, because they are also misinformed.

      Regarding your claim of doubling the IO, I suggest you read my post again. You did not actually account for, consider, or address any point I made. You just repeated yourself. That's not how a discussion works. As an example, I will now address your point:

      In the most trivial cases, raw data blocks being written into already allocated space in file, you would be almost correct: only the bytes being written would have to be sent to the drive. That case does not apply to any database I have worked with (except maybe MS Access or SQLite, I am not sure). Any real world database has behaviour that is far more sophisticated than that, as I explained in my previous comment. Even in your hypothetical trivial database that directly modifies the data in place, an 'unnecessary' read is unlikely: Conceptually, an INSERT would likely be placed at the end of the file, where all the other recent inserts would have been, thus the data is likely to have been cached. An UPDATE would almost certainly have had to read the rows anyway to determine their eligibility for the update, so the read would also have been necessary.

      As for why the feature was not included in the database software in the past, remember that the 'cheapness' of this transparent compression is a result of the uneven evolution of storage devices and processors. Processors have become more powerful a lot faster than the decrease in storage access performance (the explosion of multi-core chips helped a lot, too). We are now able to trade processing power we didn't have before to get increased IO performance.

      That notwithstanding, your assumption was also wrong. Databases have had compression for a long time. For example, IBM's DB2 has had support for tablespace-level compression since at least 1993.

      In fact, now that I think about it, your even more basic assumption is wrong: What's the most widely-used database in office environments? Here's a hint: It's made by Microsoft, and it's not SQL Server. Or Access.

      Answer: NTFS (or possibly FAT, all those USB sticks and SD cards...)

      A filesystem *IS* a (buzzword alert) no-sql database. To talk about filesystem compression as separate from database compression is misleading. Personal filesystem compression has been around since at least 1990. Didn't you use Stacker in DOS? With it you could see big benefits in system performance for exactly the same reasons as have been outlined in this topic; the difference there was since there was so much less processing power available, that would quickly become the application bottleneck. That is not true for modern database server.

      --
      If it's in you sig, it's in your post.
    106. Re:They Why ZFS? by daha · · Score: 1

      From my observations, it appears that ZFS tests the benefit of compression before actually writing the data. Each block written for a file may or may not be compressed. The compression type is stored in each block pointer.

      I agree with your choice to turn compression off for an MP3 collection. It saves the effort of attempting to compress every block before writing it.

    107. Re:They Why ZFS? by daha · · Score: 1

      You are correct, that was part of my reasoning. Though I generally view compression support in a file system as an unfavorable feature for various reasons.

      One of which is for performance for a workstation scenario. If I have all cores running at high utilization, I'd rather they be working on whatever processes I've requested instead of trying to compress data for writing as well. Space is cheap at that scale.

      Admittedly, that is for my own, personal usage. In a data center with a lot of rarely touched, and ever rapidly increasing amounts of data I would strongly consider compression.

      My other distaste for file system compression is that it adds another layer of complexity to overall storage. If something goes wrong, compression does not make things easier in terms of recovery. At times it completely kills it.

    108. Re:They Why ZFS? by daha · · Score: 1

      I personally find it interesting that even though file system compression has been around for a long time, not many people actually use it.

      ZFS is one of the first, if not the first, file systems that I've noticed enable it by default. It's interesting that MS doesn't enable it by default.

    109. Re:They Why ZFS? by jd · · Score: 1

      Actually, at the enterprise level you'd be best using an intermediate piece of hardware to handle compression - for much the same reason. Throughput.

      Since most enterprise-level systems use some form of SAN or NAS storage, it's not too bad to do something like this. A SAN/NAS device that compresses the data isn't a massive bottleneck over and above all the other bottlenecks created by this type of storage.

      Alternatively, I don't see why the disk controllers themselves can't handle compression. They're already fairly complex, it would allow the manufacturers to advertise the drives as larger without actually needing to change anything, and it would squeeze more data into the buffers giving them greater read performance.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    110. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      He explained how to compress an entire NTFS drive. I explained that this is often a bad idea. Which was hardly incidental.

      Windows doesn't cache all IO. Read this and ask yourself why this functionality exists and why SQL Server uses it.

      I'm well aware that databases run their own caches. I'm also aware that high-end databases bypass the OS cache. And I know that an UPDATE will most likely be accessing an in-memory page. I also know that reading 4K from a disk is faster than reading 64K or 128K. And that IO bandwidth isn't infinite.

      I know that a filesystem is a database. And I've already given an example of a database in which compression is useful. But that doesn't mean that compression is always a good idea for all workloads.

      If, as you claim, compression was a guaranteed win then IBM would surely make that very clear in their DB2 literature. But it's all "can improve performance", "may improve performance". Why do you think that is?

      Anyway, I don't know why I'm bothering. It's completely obvious (and my experience) that for some databases and workloads compression will increase the amount of IO and that will slow the system down. I assume compression works for you but it's not a panacea.

    111. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      How often did your system crash?

      One of the servers is a backup head node connected to three SANs giving it somewhere in the region of 45TB of storage, carved up with LVM2 and all running XFS. It's crashed a few times and it's never lost data.

    112. Re:They Why ZFS? by Lennie · · Score: 1

      No ARC and no L2ARC and no ZIL I think.

      But I do know at that speed, you possible don't want to add block deduplication, you need a lot of memory for that and CPU is probably already taxed. Just look at the thread about btrfs on the mailinglist, you will see that just doing checksums has to be done the right way just to keep up.

      I think the -o nodatasum option actually disables the checksums, so that should tell you that the CPU's are probably already the bottleneck.

      I think it is just a matter of time before some of these things will be implemented and btrfs is considered stable. The cool thing about btrfs is is, most of the features are added to the Linux kernel itself, it just is btrfs is one of the many users of that infrastructure. So other (future ?) filesystems can make of it as well.

      I read somewhere that the btrfs b-tree is newer and thus possible even more suitable for filesystems like this. If that is true, maybe btrfs could end up at the top.

      Here is some more info about a recent company using ZFS for their needs:

      http://www.anandtech.com/show/3963/zfs-building-testing-and-benchmarking

      http://www.zfsbuild.com/

      --
      New things are always on the horizon
    113. Re:They Why ZFS? by quickOnTheUptake · · Score: 1

      what does COW have to do with knowing your data got committed?

      --
      Mod points: Guaranteed to remove your sense of humor.
      Side effects may include gullibility and temporary retardation
    114. Re:They Why ZFS? by caseih · · Score: 1

      I'm glad you are able to assume so much about my experience with XFS. I wouldn't have mentioned it all if I hadn't had at least some experience with it, recently (this year even). Give me at least that much credit please before patronizing me.

      Anyway, as it happens I have run XFS on a 6 TB array until about two months ago on RHEL 5. I have no idea if RHEL5 has the latest kernel fixes that were mentioned in this thread for XFS. I lost data on two occasions to crashes in the last 2 years, but the bigger problem, and the problem that has turned me right off of XFS, is that *every* unintended reboot (crash) of my server resulted in me having to manually run fsck on the console and fix a corrupted file system. This was not even on a busy server. Maybe two users using files over Samba. Yet the file system needed manual repair. I have never ever had this happen with Ext3 (with light usage). Fortunately such crashes were rare. Anyway, somehow the fsck built into the init system couldn't ever repair it during the boot process. As near as I could tell this happened to me every time. The most recent crash was about a year ago. When it came time to replace the SAN, I just formatted the volume to Ext3 for now until I decide how to best proceed.

      Perhaps as you say my understanding of ZFS is faulty. But I understand that ZFS is designed to make file truncation much more improbably because blocks are never re-written. Files are guaranteed to be, at any moment, intact and consistent. Only changes to the files (new blocks, copy-on-write) are lost. Is this not true? My experience seems to suggest that it is true in general. Sun certainly claims this. And in a paper from UW-M (zhang, Rajimwale, Arpaci-Dusseau, Arpaci-Dusseau) they say:

      In our analysis, we find that ZFS is indeed robust to a
      wide range of disk corruptions, thus partially confirming
      that many of its design goals have been met. However,
      we also find that ZFS often fails to maintain data integrity
      in the face of memory corruption. In many cases, ZFS is
      either unable to detect the corruption, returns bad data to
      the user, or simply crashes. We further find that many of
      these cases could be avoided with simple techniques.

      So ZFS is certainly prone to corruption, but more than likely runtime corruption (bad RAM, etc) than corruption of the FS due to power failure, even during busy times. Sun themselves claim to have pulled the power on busy SANs on numerous occasions and never had file system corruption, though as you say changes being made are lost, but I don't think open files were truncated.

      Given my experience, if I can run the same ZFS version on Linux that I do on Solaris, I'll take it over XFS any day. At least for my particular usage needs. Given that I really don't like Solaris, and I don't care about ZFS specifically, I'm anxiously awaiting BtrFS on Linux. Maybe then I'll be a fan boy you can accuse of not knowing what I'm talking about.

    115. Re:They Why ZFS? by Frnknstn · · Score: 1

      I understand your need to retreat to a more defensible position, but I ask you, once again, to actually read the my posts, and possibly your posts too.

      I never claim that compression is a cure-all for anyone's database woes. You, on the other hand, *do* make an absolute claim:

      Don't do this for any files that regularly get random writes (like, say, database files).

      You go so far as to say ALL databases will suffer decreased performance. My thesis is that using well-implemented filesystem-level compression can improve performance for most databases that heavily use random writes, for the reasons in my earlier posts. I cannot, as I stated, comment on the internals of SQL Server, Oracle or Windows, but I do explain theoretical reasons why they should see improved performance. I am still waiting for you to address those claims.

      --
      If it's in you sig, it's in your post.
    116. Re:They Why ZFS? by hardwarefreak · · Score: 1

      If your SAN array(s) are of any size (10TB+) with lots of drives (8 or more) you should take another look at XFS. The version of RHEL you were using is eons old now, shipped with kernel 2.6.18 from 2003, and probably didn't have the 2007 fixes. Also, if you were running your / filesystem on XFS, that's not recommended, because most Linux distros don't have the right integration for doing so, as you learned. Run / on EXT2/3 and your data filesystems on XFS. / filesystems aren't write transaction heavy, and don't experience lots of random IO, merely log writes, so you get zero benefit from XFS, and you can experience integration headaches, depending on your distro.

      XFS is currently the most heavily developed FS on the planet, more than BTRFS. It also has superior performance to all other filesystems with almost every workload. Even more so with the recent delay log option. When configured and managed properly, it is also one of the most robust filesystems available. It is very widely deployed, especially on massive DB servers, due to its superior performance with O_DIRECT.

      If you had corruption issues, not merely trunc'd files, in the past two years, then you were running an OS without the 2007 patch.

    117. Re:They Why ZFS? by LWATCDR · · Score: 1

      I have read the anadtech story.
      The thing is that I was posting a reply to the person that said if btrfs is faster why would anyone use ZFS.

      I am sure that btrfs will get a lot of the features of ZFS soon. To be honest the cost of storage and processing power is now getting so low it makes my head hurt. 1TB drives are less than $100 each and falling. Soon it will be common to have multi TB workstations. Throw in that you can get six core AMD CPUs for under $300 and it is just getting to the really silly stage.
      We really are at the point where PCs have many times the performance of a CRAY1 did in 1975.
      At this rate we will need ZFS for cell phones in a few years.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    118. Re:They Why ZFS? by Anonymous Coward · · Score: 0

      Wrong answer. XFS is extremely prone to data corruption if the system goes down uncleanly for any reason. We may strive for nine nines, but stuff still happens.

      What? That's true of any filesystem, and especially ZFS as practical experience shows. The only way to reliably keep any filesystem going is to keep it on a UPS and talking about 'nine nines' in that context is just laughable.

      Wrong. In XFS a simple computer crash cost me half of my home directory (NULLed files and their FAQ states that's totally by design). Great. On the other hand, using e.g. ext4 or reiserfs with appropriate mount options, computer crashes have not done any such thing to me, ever. Now, I don't require the prime performance that XFS and other file systems provide so I am totally ok with frequent commits, data journaling or proper ordering of writes. Can XFS do this for me? No? Great, I'll use a different filesystem without a UPS.

    119. Re:They Why ZFS? by badkarmadayaccount · · Score: 1

      Did anyone else read that as c1@|_1$?

      --
      I know tobacco is bad for you, so I smoke weed with crack.
    120. Re:They Why ZFS? by phoenix_rizzen · · Score: 1

      You really should re-run the benchmarks. Depending on which version of FreeBSD 7.x, you were using either ZFSv6 (ancient and slow), ZFSv13 (faster), or ZFSv14 (faster still). Comparing ZFSv6 to anything is pointless to the extreme, except to show just how far ZFS speed has come since then. :)

    121. Re:They Why ZFS? by phoenix_rizzen · · Score: 1

      You make it sound like you need an extra 10 terabytes to backup a 10 terabyte volume with LVM. You don't. It takes a snapshot and the free space you need is for further changes to the volume.

      How would you configure a 10 TB disk array, using LVM, such that you can keep 365 daily snapshots for a single filesystem?

      With ZFS, it's simple: create the storage pool, create the filesystem, create snapshot each night. Done.

      With LVM, it's a nightmare I don't want to even think about, involving creating volumes, creating filesystems, saving "unallocated" space for the snapshots, deciding how much space to reserve for each snapshot, etc, etc, etc

      LVM snapshots have one purpose: creating a device that won't change that you can use as a source of backups, that is easily destroyed after the backup run is finished. *VERY* different from ZFS snapshots which are meant to be persistent, with all kinds of nice extra features LVM lacks, like snapshot roll-back, snapshot cloning, promoting snapshots to filesystems, deleting individual snapshots no matter how old they are, etc.

      There's really no comparison to volume management and snapshots via LVM, and storage pool management and snapshots with ZFS.

    122. Re:They Why ZFS? by phoenix_rizzen · · Score: 1

      Question about ZFS, say I have a bunch of ZFS filesystems on a bunch of physical drives or drive arrays on Solaris/OpenSolaris/OpenIndiana. How do I figure out which physical drives/devices a particular ZFS filesystem depends on?

      You don't. That's the whole point of pooled storage: you stop thinking into terms of "filesystem on partition on disk" and just start thinking "filesystem on storage". ZFS filesystems are not "dependent on a single drive". You don't partition disks, you don't create filesystems on partitions, you don't worry about figuring out filesystem sizes ahead of time. You just aggregate all your storage into a single pool, and create filesystems on top. If a filesystem needs space and it's available in the pool, it uses it. Simple as that.

      Hence, why you create redundant vdevs (think of them like RAID arrays) beneath the pool, so that if something does go wrong with a drive, you can easily replace it without losing data.

    123. Re:They Why ZFS? by phoenix_rizzen · · Score: 1

      c == controller
      d == disk
      s == slice (although you shouldn't partition drives, just use the whole thing)

      Thus, c0d0 would be the first disk on the first controller. c0d1 would be the second disk on the first controller, etc.

      Not sure what t stands for. But once you know the rest, it's easy to work out where things are.

      It's really not that hard.

      (One of the nice things about FreeBSD and the GEOM framework is all the labelling and layering you can do, which makes it a *lot* easier to manage large numbers of drives.)

    124. Re:They Why ZFS? by Anonymous Coward · · Score: 0
  3. Using a first beta slower than stable? Wha?!?!? by tysonedwards · · Score: 4, Insightful

    Who would have thought that a first-release Beta kernel module would not run as fast or be as reliable as the stable implementation for other operating systems, or the stables on Linux?

    --
    Thirty four characters live here.
    1. Re:Using a first beta slower than stable? Wha?!?!? by chrb · · Score: 2, Informative

      Who would have thought that a first-release Beta kernel module would not run as fast or be as reliable as the stable implementation for other operating systems, or the stables on Linux?

      The full release is supposed to be coming out in the first week of January. Given the short time frame, it would seem like this is probably closer to the final release than the words " first beta" imply.

      Surprises:

      • Native ZFS beat XFS on several of the benchmarks - XFS is usually a good performer in these kind of tests
      • Native ZFS does very well on the Threaded IO Test, where it ties for first place.
      • Btrfs is really bad on the SQLite test, taking 5 times longer than XFS on both 2.6.32 and 2.6.37 (bug?)
      • XFS IOzone write performance increased by 45% going from 2.6.32 to 2.6.37 (!) XFS increased on FS-Mark by 37%. I thought XFS would be pretty much at the point where there would be no such great improvements.
      • "Real" Solaris+ZFS gets absolutely slaughtered on the Threaded IO Test and the PostMark Test, with ext4 pushing almost 10x more transactions per second.
      • Tests were done on a SSD, apparently there was no difference in relative performance of the filesystems on SSD versus HD

      Notes:

      • "Real" Solaris+ZFS results are not shown for most tests
      • Would be nice to know how many replicates they did of each test
      • This is an interesting set of results that will get people talking/arguing :-) Thanks to Phoronix for starting the discussion.
    2. Re:Using a first beta slower than stable? Wha?!?!? by Anonymous Coward · · Score: 0

      Tests were done on a SSD, apparently there was no difference in relative performance of the filesystems on SSD versus HD

      Ah, but here's a nice trick: with ZFS you can use plain disks for raw storage, but slot in an SSDs transparently so that you get performance boosts in read and writes. You don't have to spend a lot of cash on pricey SSDs, just enough to caching your working set:

      http://blogs.sun.com/studler/entry/zfs_and_the_hybrid_storage

    3. Re:Using a first beta slower than stable? Wha?!?!? by Anonymous Coward · · Score: 0

      A guy from LLNL gave a talk at Supercomputing 2010 last week about the status ZFS on Linux; he said they have done zero performance tuning. Zero. As in, "don't expect ZFS on Linux to be fast."

      http://sc10.supercomputing.org/schedule/event_detail.php?evid=bof186

    4. Re:Using a first beta slower than stable? Wha?!?!? by diegocg · · Score: 1

      XFS recently implemented a new journaling subsystem that should speed up metadata-intensive operations. Once they turn it on, it will gain even more performance (and Ext4 is also getting many scalability improvements)

    5. Re:Using a first beta slower than stable? Wha?!?!? by Lennie · · Score: 1

      ZFS port for Linux is beta, btrfs is beta everywhere. I think it is kind of fair.

      --
      New things are always on the horizon
  4. how about versus ZFS on Solaris or FreeBSD? by Anonymous Coward · · Score: 2, Insightful

    On similar hardware of course.

    It occurs to me that ZFS does a lot more than EXT4 and Btrfs too.

    1. Re:how about versus ZFS on Solaris or FreeBSD? by mcelrath · · Score: 1

      If you had read TFA, you'd know they did benchmark on Solaris (OpenIndiana).

      --
      1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.
    2. Re:how about versus ZFS on Solaris or FreeBSD? by Anonymous Coward · · Score: 0

      read TFA -- Are you crazy?

    3. Re:how about versus ZFS on Solaris or FreeBSD? by hedwards · · Score: 1

      But that's not particularly helpful. I don't believe that Btrfs is supported beyond Linux at the moment and neither FreeBSD nor Open Solaris support both. Meaning that you're comparing a filesystem that's been grafted onto Linux via fuse with one that can ultimately be integrated into the Linux kernel.

    4. Re:how about versus ZFS on Solaris or FreeBSD? by Anonymous Coward · · Score: 1, Informative

      If you read TFA (or perhaps even the slashdot submission text) you should know that both fuse and native ports for linux are being discussed.

    5. Re:how about versus ZFS on Solaris or FreeBSD? by Anonymous Coward · · Score: 0

      If you read the fucking article, they tested ZFS-FUSE on both a -32 and -36 Linux kernel, as well as a native Linux kernel module for ZFS, as well as OpenSolaris and OpenIndiana. Go read the fucking article before you start complaining that the benchmarks are crap. Phoronix generally does a good job at their benchmarking, I would expect no less if they want to be taken seriously.

  5. That's not a solaris filesystem by Anonymous Coward · · Score: 1, Insightful

    You can't call it "the Solaris file-system". You can say that the Linux native implementation of ZFS (a Linux file-system) is slower than BTRFS, though.

    And, what does it matter it to be fast if it's not reliable? You can save your stuff in /dev/null quite fast too!

    http://www.spinics.net/lists/linux-fsdevel/msg35235.html

    1. Re:That's not a solaris filesystem by datapharmer · · Score: 3, Funny

      You can save your stuff in /dev/null quite fast too!

      I know! It is friggin crazy fast. I've been using it for backups for years. Even with terrabytes of data I've never managed to fill it up or slow it down!

      --
      Get a web developer
    2. Re:That's not a solaris filesystem by hedwards · · Score: 2, Insightful

      Well, don't forget to use that magic rewinding tape that mysteriously never fills no matter how many backups you use it for. Better safe than sorry I always say.

    3. Re:That's not a solaris filesystem by nrosier · · Score: 1

      I do the same thing; I usually restore from /dev/random. If you restore long enough, at some point you'll get the data you wanted :-)

    4. Re:That's not a solaris filesystem by ins0m · · Score: 1

      Yeah, and we store that tape in the circular file.

      --
      Never attribute to Hanlon that which can be adequately attributed to Heinlein.
  6. Doomed to failure by license conflict by mattdm · · Score: 4, Interesting

    OpenAFS, which still today provides features unavailable in any other production-ready network filesystem, is a nightmare to use in the real world because of its lack of integration with the mainline kernel. It's licensed under the "IPL", which like the CDDL is free-software/open source but not GPL compatible.

    ZFS is very cool, but this approach is doomed to fail. It's much better to devote resources to getting our native filesystems up to speed -- or, ha, into convincing Oracle to relicense.

    Personally, I was pretty sure Sun was going to go with relicensing under the GPLv3, which gives strong patent protection and would have put them in the hilarious position of being more-FSF free software than Linux. But with Oracle trying to squeeze the monetary blood from every last shred of good that came from Sun, who knows what's gonna happen.

    1. Re:Doomed to failure by license conflict by wonkavader · · Score: 1

      "But with Oracle trying to squeeze the monetary blood from every last shred of good that came from Sun, who knows what's gonna happen."

      Well, in the short term, we know what's not going to happen.

    2. Re:Doomed to failure by license conflict by caseih · · Score: 1

      You mean like how the Nvidia GPU driver has failed because of licensing conflict? I see no reason why the ZFS module can't be distributed in a similar manner to the nvidia driver. I'm sure that rpmfusion could host binary RPMs without problem. They wouldn't be violating the GPL because it would be you the user who taints the kernel.

      Of course ZFS on Linux probably isn't aimed at normal users anyway. It's far more likely to be used by people with existing ZFS infrastructure (large fiber-channel arrays, etc). In my opinion, ZFS on linux gives a smoother migration path away from Oracle Solaris and ZFS.

    3. Re:Doomed to failure by license conflict by Anonymous Coward · · Score: 0

      Yet a troll complaining about Oracle/Sun. Not original, true or worth listening too

    4. Re:Doomed to failure by license conflict by QuantumRiff · · Score: 1

      ZFS is very cool, but this approach is doomed to fail. It's much better to devote resources to getting our native filesystems up to speed -- or, ha, into convincing Oracle to relicense.

      Personally, I was pretty sure Sun was going to go with relicensing under the GPLv3, which gives strong patent protection and would have put them in the hilarious position of being more-FSF free software than Linux. But with Oracle trying to squeeze the monetary blood from every last shred of good that came from Sun, who knows what's gonna happen.

      Um, just who do you think is writing BTRFS? http://en.wikipedia.org/wiki/Btrfs I know its fashionable to knock Oracle every chance you get... but Look at the line:

      Btrfs, when complete, is expected to offer a feature set comparable to ZFS.[16] btrfs was considered to be a competitor to ZFS. However, Oracle acquired ZFS as part of the Sun Microsystem's merger and this did not change their plans for developing btrfs.[17]

      --

      What are we going to do tonight Brain?
    5. Re:Doomed to failure by license conflict by mattdm · · Score: 1

      It differs from the Nvidia driver because the Nvidia module until recently was needed to make very common PC hardware work at all, and even with the new free software Nouveau drivers, still needed for game-level performance. ZFS has neat features, but you don't need it in order to have storage on Linux.

      There's clearly a niche market for out-of-tree ZFS modules, or else this wouldn't have gotten funding. But if you're not already committed, it adds significant overhead. As someone who was dependent on OpenAFS for years for legacy reasons, I strongly caution people that the overhead is unlikely to be worth it.

    6. Re:Doomed to failure by license conflict by mattdm · · Score: 2, Interesting

      Um, just who do you think is writing BTRFS? http://en.wikipedia.org/wiki/Btrfs I know its fashionable to knock Oracle every chance you get... but Look at the line:

      As I understand it, Chris Mason brought his btrfs work with him when he started at Oracle, or at least the ideas for it. A kernel hacker of his caliber probably started the job with an agreement of exactly how that was going to go.

      Oracle is a big organization; it's not surprising they act in apparently contradictory ways. They've done a reasonable amount of good open source work and made community contributions. But I stand by the statement that it's impossible to make a good prediction as to what Oracle is going to do with anything that comes from the Sun acquisition -- but you certainly don't need to take my word for it that most of the behavior so far seems to be aimed at short-term monetization rather than long-term community growth.

    7. Re:Doomed to failure by license conflict by jon3k · · Score: 1

      You're assuming they'll have anything to license when the NetApp lawsuit is over.

    8. Re:Doomed to failure by license conflict by Guspaz · · Score: 1

      The lawsuit ended in September; they settled.

  7. Different ZFS distros by hoggoth · · Score: 3, Informative

    I was confused as to what versions of ZFS were available on which distros so I made a chart that lists the different distros and which version of ZFS they support:

    http://petertheobald.blogspot.com/2010/11/101-zfs-capable-operating-systems.html

    Hope it's helpful...

    --
    - For the complete works of Shakespeare: cat /dev/random (may take some time)
    1. Re:Different ZFS distros by phoenix_rizzen · · Score: 1

      FreeBSD 7.0: ZFSv6
      FreeBSD 7.1: don't recall if it was still ZFSv6 or ZFSv13
      FreeBSD 7.2: ZFSv13
      FreeBSD 7.3: ZFSv14
      FreeBSD 7.4: should still be ZFSv14 when 7.4 is released soon-ish

      FreeBSD 8.0: ZFSv14
      FreeBSD 8.1: ZFSv14
      FreeBSD 8.2: should have ZFSv15 when 8.2 is released soon-ish

      FreeBSD 9.0: should have ZFSv28 when the 9.0 release happens in 2011

  8. Btrfs naming convention by digitaldc · · Score: 3, Funny

    Couldn't they name the file system something better than butterface?

    --
    He who knows best knows how little he knows. - Thomas Jefferson
    1. Re:Btrfs naming convention by timeOday · · Score: 2, Funny

      What are you complaining about? I always thought it was "bitter farts."

    2. Re:Btrfs naming convention by Anonymous Coward · · Score: 0

      The 'r' is silent, it's actually buttface

    3. Re:Btrfs naming convention by spud603 · · Score: 1

      no, they could not.

    4. Re:Btrfs naming convention by Abstrackt · · Score: 2, Funny

      Unfortunately Gimp was already taken.

      --
      They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance. - Terry Pratchett
    5. Re:Btrfs naming convention by Anonymous Coward · · Score: 0

      I always thought it was "bitter face", like the old Keystone Beer "bitter beer face" commercials.

    6. Re:Btrfs naming convention by Tehrasha · · Score: 1

      I keep reading that as 'Bit Torrent File System'...

    7. Re:Btrfs naming convention by tangent3 · · Score: 1

      Can't read my, can't read my, no he can't read my btfs

    8. Re:Btrfs naming convention by Anonymous Coward · · Score: 0

      nope .. it's 'bit rot filesystem' to me.

    9. Re:Btrfs naming convention by Matt+Perry · · Score: 1

      I think it's pronounced Better File System.

      --
      Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
  9. As an end user... by soupforare · · Score: 1

    I've been through a few filesystem war^Wdramas and stuck with ext?fs the whole time. I liked the addition of journaling but I'm not sure that I've noticed any of the other "backstage" improvements in day to day functioning.
    Is there really a reason to jump ship as a single-workstation user?

    --
    --- Do you believe in the day?
    1. Re:As an end user... by Etherized · · Score: 1

      Snapshotting is probably the most compelling feature of either FS for workstation use. Both BTRFS and ZFS are copy-on-write, and they both feature very low overhead, very straightforward snapshotting. That's a feature that almost anybody can utilize.

      Aside from that, ZFS features a lot of datacenter-centric goodies that might have some utility on a workstation as well. Realtime (low overhead) compression, realtime (high overhead) deduplication, realtime encryption, easy and fast creation/destruction of filesystems and virtual block devices, and a ton of other odds and ends.

    2. Re:As an end user... by hedwards · · Score: 1

      The ext?fs work well unless they don't. In my, admittedly limited experience, I've lost more files on ext2fs than on all other filesystems I've dabbled in combined. Admittedly, I had backups, but any fs that depends upon you having backups to that extent should not be trusted. And while I'm sure the newer ones are better, I'm not sure that I personally trust them as ext2fs shouldn't have been that easy to corrupt. IIRC that was only a couple years ago, and it should've been both robust and well undestood by then.

    3. Re:As an end user... by icebraining · · Score: 1

      Probably not, especially considering they're still less tested. Ext3 + LVM already provide everything I need for now.

    4. Re:As an end user... by afidel · · Score: 1

      L2ARC! Use that small SSD to improve average performance for almost all your files.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    5. Re:As an end user... by mlts · · Score: 1

      For me, journaling was the reason to move from ext2 to ext3. However, for an end user, ZFS has a few cool features that are significant:

      1: Deduplication by blocks. For end users, it should save some disk space, not sure how much.
      2: File CRCs. This means file corruption is at least detected.
      3: RAID-Z. 'Nuff said. No worry about the LVM layer.
      4: Filesystem encryption.

    6. Re:As an end user... by Hatta · · Score: 1

      If you need the features provided by an advanced filesystem, you'll know. If you're not hitting your head on the limits of EXT4/LVM/RAID, then you don't really need ZFS or Btrfs.

      --
      Give me Classic Slashdot or give me death!
    7. Re:As an end user... by ArsonSmith · · Score: 1

      Only a complete and utter moron who likes to run his nuts through a coffee grinder would only ever need Ext3 + LVM. I mean ZFS has Zettapeal and electrolytes. It's what files crave!

      --
      Paying taxes to buy civilization is like paying a hooker to buy love.
  10. I'm using btrfs on my home partition. by Seth+Kriticos · · Score: 1

    It's OK, runs fairly stable, but it also locks up once in a while and does some aggressive disk I/O. No idea what exactly, probably housekeeping, but it's somewhat irksome, could use some more fine tuning.

    The main problem with btrfs right now is that it lacks fsck tools, so in case of havoc there is little chance to recuperate, which is not good for server like systems.

    As for ZFS, it's not the tech that's keeping it from Linux but the restrictive licensing. Unless that gets fixed (probably won't happen), it is off limits, and Linux folks will do their own thing, like the always do.

    1. Re:I'm using btrfs on my home partition. by larkost · · Score: 1, Informative

      "As for ZFS, it's not the tech that's keeping it from Linux but the restrictive licensing."

      Just to be clear: between CDDL (ZFS) and GPL (BTRFS), GPL is clearly the more restrictive license. BTRFS can probably never be shipped with any other major OS other than linux (at least not in kernel mode), while ZFS has already shipped with a few.

      The license restriction is one of linuxes making, not ZFS's. There are arguments for that restricion, but calling the problem one of CDDL being restrictive is a completly distorted view.

    2. Re:I'm using btrfs on my home partition. by Hatta · · Score: 3, Insightful

      BTRFS can probably never be shipped with any other major OS other than linux

      It's not BTRFS's fault that other operating systems use licenses with more restrictions than Linux.

      --
      Give me Classic Slashdot or give me death!
    3. Re:I'm using btrfs on my home partition. by Anonymous Coward · · Score: 0

      Just to be clear: between CDDL (ZFS) and GPL (BTRFS), GPL is clearly the more restrictive license.

      We want freedom for the little guy (end users), not freedom for corporations.

    4. Re:I'm using btrfs on my home partition. by tao · · Score: 1

      Hmmm, so the btrfsck I have installed on my system is imaginary? On a Debian system the relevant package is btrfs-tools, but you might be running a different distro where it's called something else.

    5. Re:I'm using btrfs on my home partition. by Anonymous Coward · · Score: 0

      BTRFS can probably never be shipped with any other major OS other than linux

      It's not BTRFS's fault that other operating systems use licenses with more restrictions than Linux.

      You mean like the highly-restrictive (seriously, can we even call such an awfully restrictive license open-source?) BSD License?

    6. Re:I'm using btrfs on my home partition. by ducomputergeek · · Score: 1

      The license FreeBSD ships with is more restrictive than linux?

      --
      "The problem with socialism is eventually you run out of other people's money" - Thatcher.
    7. Re:I'm using btrfs on my home partition. by m50d · · Score: 0

      Nor is it ZFS's fault that linux uses a more restrictive license than solaris.

      --
      I am trolling
    8. Re:I'm using btrfs on my home partition. by Anonymous Coward · · Score: 0

      Nor is it ZFS's fault that linux uses a more restrictive license than solaris.

      The CDDL less restrictive than the GPL? Depends.

      A factoid for you: It was SUN's fault that ZFS was published under a license explicitly chosen in order to make it as difficult as possible to legally integrate (in a re-distributable form) ZFS into Linux, among other reasons. Care to disprove this factoid? Do try. You'll fail.

    9. Re:I'm using btrfs on my home partition. by Anonymous Coward · · Score: 0

      BTRFS can probably never be shipped with any other major OS other than linux (at least not in kernel mode),/p>

      Don't tell anyone, but the GPL allows you to link GPL'd code with any other code that is normally shipped with the OS without having to ship the other source code under the GPL. It's right there in the license. Originally that exception it was put there to allow Emacs to link to Solaris' C library.

      OpenSolaris used that exception at some point to include Linux device drivers in their kernel

    10. Re:I'm using btrfs on my home partition. by KiloByte · · Score: 1

      No, CDDL was _designed_ to be GPL incompatible, on purpose (ref: Denise Cooper's talk on DebConf 6).

      Thus, the blame lies entirely on Sun's side.

      --
      The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
    11. Re:I'm using btrfs on my home partition. by jd · · Score: 1

      Why? Because the reference implementation is GPLed? Other implementations of the same standard can be licensed under whatever the author(s) damn well feel like. Therefore any other organization can put a re-implementation in any kernel they want.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    12. Re:I'm using btrfs on my home partition. by Burpmaster · · Score: 1

      Hmmm, so the btrfsck I have installed on my system is imaginary?

      Kind of. It will scan for and report errors, but there is absolutely no repair code in it. It's read-only.

    13. Re:I'm using btrfs on my home partition. by PastaLover · · Score: 1

      While it's true that the GP poster was probably wrong (I'm not that sure that CDDL is less restrictive, but probably), it's also true that Sun designed the CDDL specifically to be incompatible with the GPL.

      So yes, linux is under the GPL and thus can't accept ZFS into mainline, but let's not forget which kernel was first (or at least, licensed under an open source license first).

  11. Checksums - 1 feature ZFS has that Ext4 doesn't by yup2000 · · Score: 4, Informative

    hmmm, well the most obvious feature that ZFS has that Ext4 does not is check summing.

    That feature is one reason why ZFS is better (it will tell you if your disk is going bad, and if you have a raid setup, it will go get the good data for you). However, this is also one reason why ZFS is slower... it spends time making sure your data is safe and that it always gives you the correct bits from your disk.

    That single feature is why I run FreeBSD (looking forward to kFreeBSD/debian!) on my file server in a mirrored raid configuration. Yes, it is "slower", but I still pull data off that server at over 50MB/sec on my home gigabit lan! The specs on that server aren't great either... 2GB ram, and an old 1.6GHZ single core sempron.

    1. Re:Checksums - 1 feature ZFS has that Ext4 doesn't by Anonymous Coward · · Score: 0

      Hm. daha, in a response just a little ways over in the comment tree, said btrfs has this as well.
      He also gave some other potential problem areas.

      I wonder if this btrfs feature was enabled in the test mentioned in the topic.

    2. Re:Checksums - 1 feature ZFS has that Ext4 doesn't by Bengie · · Score: 1

      I want to make a file server with FreeBSD 9.0 and ZFS, but I want full gigabit speeds. After I got my new Win7 machine, I can SMB copy 114MB/sec between my computer and my wife's with only 1.5% cpu. I'm addicted to speed. 10gb card/switches have been coming down in price to. Looking into those for a File Server.

      By the time I get this going(prob in 2 years), DDR4 will be out, 22nm 4core low power server cpus will be out. ZFS + SSD + lots of memory = ftw.

    3. Re:Checksums - 1 feature ZFS has that Ext4 doesn't by Ash-Fox · · Score: 1

      it spends time making sure your data is safe and that it always gives you the correct bits from your disk.

      You forgot to mention that you need to set that up in advanced as it doesn't do that by default and having to come up with settings for any given system is horribly time consuming, compared to just having a regular incremental backup operating every X hours.

      --
      Change is certain; progress is not obligatory.
    4. Re:Checksums - 1 feature ZFS has that Ext4 doesn't by jon3k · · Score: 0

      50MB/s is pretty poor for a mirrored raid configuration over 1GbE. Even a single 7200rpm disk should be able to pull off more than that, doing sequential reads anyway. Do you see excessively high CPU utilization during the copy? Are you using software RAID?

    5. Re:Checksums - 1 feature ZFS has that Ext4 doesn't by ChrisMaple · · Score: 1

      Hard drives already do CRCs in hardware, so that they can detect errors themselves and reread if one is detected, or declare a failed read if repeated reads fail. How often is the extra complication of a software checksum going to help?

      --
      Contribute to civilization: ari.aynrand.org/donate
    6. Re:Checksums - 1 feature ZFS has that Ext4 doesn't by segedunum · · Score: 1

      God, I'm sick of hearing about this 'checksumming' bullshit from ZFS proponents. What happens if a checksum says your data is corrupted? Yer, you need a mirror and if you don't have one then it really doesn't matter how many checksums you perform if you can't recover the data. It's the same for any fileystem you use. It's a nice to have meaning you'll know you have problems sooner but that is all that it does but in these days of cheap disk storage and cheap mirroring and backups people are not going to like their filesystem being slower. If that's important to people, when that's why they spent boatloads on stuff ike Netapp.

      Additionally, the reason why ZFS has caught so many corruption problems in Solaris that people think is so brilliant is because Solaris's disk controller device drivers, mostly IDE and SATA, are utter crap.

    7. Re:Checksums - 1 feature ZFS has that Ext4 doesn't by yup2000 · · Score: 1

      i only have one computer connected to that server via gigabit, and i'm 90% sure that there is a problem with the gigabit chipset on that client computer. It's an older chipset that does't properly support jumbo frames. The harddisks/zfs aren't the bottleneck.

    8. Re:Checksums - 1 feature ZFS has that Ext4 doesn't by Guspaz · · Score: 1

      What happens if a checksum says your data is corrupted? Yer, you need a mirror and if you don't have one then it really doesn't matter how many checksums you perform if you can't recover the data.

      ZFS revolves around RAID. It's expected that you've got some form of redundancy on all zpools. If corruption is detected, then the data is recovered from the raidz/raid1/etc redundancy.

      Also, the checksums enable a few other useful features. For example, block-level deduplication (regardless of compression, since the checksum is on the original data) becomes pretty easy.

    9. Re:Checksums - 1 feature ZFS has that Ext4 doesn't by Lennie · · Score: 1

      By that time FS-Cache (general "L2ARC" for Linux: http://lwn.net/Articles/312708/ ) will be in the kernel. And maybe btrfs will be considered stable.

      Some say the b-tree variant in btrfs is smarted then the one that was used in ZFS.

      I guess time will tell.

      --
      New things are always on the horizon
    10. Re:Checksums - 1 feature ZFS has that Ext4 doesn't by phoenix_rizzen · · Score: 1

      You forgot to mention that you need to set that up in advanced as it doesn't do that by default and having to come up with settings for any given system is horribly time consuming, compared to just having a regular incremental backup operating every X hours.

      ZFS does checksumming by default. It's one of the main reasons-for-being for ZFS.

      You can manually turn it off, though.

    11. Re:Checksums - 1 feature ZFS has that Ext4 doesn't by phoenix_rizzen · · Score: 1

      On a single harddrive, you can set "copies=2" (or higher) and get multiple copies of data blocks stored in different parts of the disk. Thus, if a checksum shows corrupt data on a read, it will read one of the extra copies and save the correct data overtop of the bad data. (Btrfs has this feature as well.)

      Without checksumming, how will you know that data you wrote last month is still valid and correct? Especially if the disk reads it and passes it up the chain to the filesystem? A bit flip here and there can do wonders to corrupt large multi-MB files. And there's no fsck in the world that can help to fix that.

      It's all a trade-off between the features you find important. Some want raw throughput without caring about data corruption. Others want data safety above all else and can tolerate slow I/O. Others want a nice balance between them. Non-ZFS filesystems give you speed at the risk of data corruption with very few options for improving data safety. ZFS gives you options for increasing the raw I/O speed, without ever sacrificing data safety. AND, it gives you the options for removing data safety (non-redundant vdevs aka RAID0; disable checksums, disable scrubs, etc).

      ext*, xfs, jfs, ufs, ntfs, hfs, etc are great, well-performing filesystems. But they are pain to manage when you get over 8 physical disks, and over 10 GB of storage. Especially if you need snapshots, compression, or dedupe, which none of these FSes support, adding more and more layers to manage.

      For desktops, laptops, palmtops, etc, ZFS may be overkill.

      But for servers, there's really nothing like it available for OSS.

  12. Not bad news by wonkavader · · Score: 4, Interesting

    It's still under development. But it's already pretty competitive, doing reasonably well in many tests.

    And then there's this (on the last page) "Ending out our tests we had the PostMark test where the performance of the ZFS Linux kernel module done by KQ Infotech and the Lawrence Livermore National Laboratories was slaughtered. The disk transaction performance for ZFS on this native Linux kernel module was even worse than using ZFS-FUSE and was almost at half the speed of this test when run under the OpenSolaris-based OpenIndiana distribution."

    Ok, maybe someone can disabuse me of a misconception that I have, but: There's no reason that ZFS in the kernel should be slower than a FUSE version. That means there's something wrong. If they figure out what's wrong and fix it, that could very likely affect the results in some or all of the other tests.

    ZFS isn't done yet, and it already looks like it might be worth the trade-off for the features ZFS provides. And performance might get somewhat better. This article is good news (though that final benchmark is distressing, especially when you look at the ZFS running on OpenSolaris).

    It says: "When KQ Infotech releases these ZFS packages to the public in January and rebases them against a later version of ZFS/Zpool, we will publish more benchmarks."

    and I'm looking forward to that new article.

    1. Re:Not bad news by Anonymous Coward · · Score: 0

      ZFS in the kernel ISN'T slower than the FUSE extension. RTFS.

    2. Re:Not bad news by Anonymous Coward · · Score: 0

      ZFS isn't done yet, and it already looks like it might be worth the trade-off for the features ZFS provides

      Seeing the actual numbers they provide, that is exactly what I think. I'll take checksumming and fast snapshots any day for only doing 150MB/s instead of 180MB/s in the IOZone 4K write performance test. Hell, I usually get less with my ext4 anyway. Or how about the 64K writer performance, 173 instead of 191 for ext4. They even beat ext4 in the read performance tests. Oh and random write, 146 MB/s as opposed 88.3 for ext4. I mean, wow!

      For me btrfs doesn't count at this moment, as it is not a production ready, stable file system. Heck, I even just converted to ext4 from reiserfs, because I wanted to wait for them to iron out all the bugs. Sadly that's a reason for me not to jump ship to that particular ZFS implementation just yet and I don't have a spare machine for Solaris to be a file server.

  13. Why I use ZFS/Solaris in production for Postgres by Pengo · · Score: 1

    The throughput for large data sorts are just faster, period.

    A lot of it has to do with the reading of compressed data, and the huge ram-buffer that ZFS uses on the OS, optional commit on writes, block sizes that match the database pages.

    The system scans 3 megs of index data, what it's actually reading to get that off is say 1 meg, which it decompresses on the fly on one of the many cores the database server has. In the end throughput destroys what i get running non-compressed volumes on EXT4 or XFS on Linux. For "MY" database, it runs nearly 2-3x faster than the same hardware running on Linux. (RHEL5 is what I ran the db on for a long time).

    I have not been able to get Linux/Postgres to run even partially as fast as I have been able to get Solaris/ZFS running Postgres 8.3.

    Btrfs isn't even near production states yet, but i am really hoping that it will give me an option to get off of Solaris.

    On that note, one thing i haven't tried yet with our DB is Solid State Drives. The sheer throughput might more than make up for the benefits i get on compressed ZFS volumes.

    I for one am VERY VERY hopeful that BTRFS can get stable, and fast. Oracle's fiasco has me and a few other people at our small business very nervous. I'm not planning on replacing our Sol10 (free) distribution , and could care less about the support Oracle offers. I'm playing with Solaris Express 11 now, but not sure I want to pay the $1k a year for production use, though if it offers me the performance gains over linux that I'm currently seeing, it will probably be worth it for our Database system alone.

    Has anyone here had experience tuning Postgres on Linux versus Solaris/ZFS ? We're not a huge shop, 8 people running large data-warehouse type applications. We run on a shoestring and don't have a bunch of money to throw at the problem and would be very grateful for any ideas on how to make our database run with comparable performance on Linux. I'm hoping that I'm missing something obvious.

  14. Apples vs Oranges by Anonymous Coward · · Score: 0

    Holy crap, they only tested on a single SSD. This is akin to running DOS on a 64 core system.

    ZFS is a full volume management /file system with the ability to partition filesystems (on the fly I might add), assign different attributes (compression, deduping, max size, caching, data replication etc), add space on the fly, set storage/redudancy parameters (RAID 0,1,10 and their own raidz1, raidz2), spare assignment, live disk replacement, etc, etc.

    Try throwing 48 x 3TB disks into a chassis and configuring them (use SSDs for the ZIL/ARC, and set up the zpools correctly). Try doing the same thing with BTRFS. Actually, you can't.

    I love the fact that a native Linux driver is coming, but this review is completely useless.

  15. Re:Why I use ZFS/Solaris in production for Postgre by jimicus · · Score: 1

    Has anyone here had experience tuning Postgres on Linux versus Solaris/ZFS ? We're not a huge shop, 8 people running large data-warehouse type applications. We run on a shoestring and don't have a bunch of money to throw at the problem and would be very grateful for any ideas on how to make our database run with comparable performance on Linux. I'm hoping that I'm missing something obvious.

    What have you done so far and how are you using Postgres? Mostly reads, mostly writes or some combination of the two? Postgres as it ships is notorious for slow configuration, and many Linux distributions are consistently one major version behind the curve (which is a little annoying as much of the focus of the Postgres people for some time has been improving performance).

  16. Re:Why I use ZFS/Solaris in production for Postgre by Anonymous Coward · · Score: 0

    So, you're not a fan of Oracle I take it? Have a look at who's developing btrfs.

  17. It's just not yet there by Artem+S.+Tashkinov · · Score: 1

    When XFS was introduced in Linux it also sucked performance wise, so, I think for ZFS on Linux there's certainly a room for improvement.

    And even in this early age ZFS shows very remarkable results, so let's just wait and see.

  18. For ZFS, speed is a secondary goal by pedantic+bore · · Score: 3, Insightful

    Picking on ZFS for being slow when ported to a different OS and running on atypical hardware is like criticizing Stephen Hawking for being a poor juggler. It's focussing on the wrong thing. The goals of ZFS are, in no particular order:
    - Scalability to enormous numbers of devices
    - Highly assured data integrity via checksumming
    - Fault tolerance via redundancy
    - Manageability/usability features (i.e., snapshots) that conventional file systems simply don't have
    Oh, and if it's fast, well, that's gravy.

    --
    Am I part of the core demographic for Swedish Fish?
    1. Re:For ZFS, speed is a secondary goal by guruevi · · Score: 1

      The fast can be achieved by more/better hardware. A filesystem shouldn't have 'fast' or 'faster than ye' as it's primary focus anyway. If it's very fast but not 100% trustworthy it's not a good file system (eg. ReiserFS).

      Some features that make ZFS a bit slower are thought up by people that have years of experience in large SAN and other storage solutions. Writing metadata multiple times over different spindles might seem overkill for most but that is until you lose a N+1 spindles (or just get r/w errors on the N+1'th spindle while the others are recovering) and in a typical situation this means the whole file system is hosed but ZFS can sometimes recover a lot of and will be able to tell which files it could not fix which is nice when your system has many TB's and takes days to restore from a full backup.

      ZFS solves the fast by allowing frequently accessed data to be in memory or faster disks (like SSD's) and have small sync writes hit an intent log while optimizing async writes before putting them on-disk. Give ZFS more memory than your average desktop and you'll be a lot faster in reads, give it a small SLC SSD and see it hit 10k stable write IOPS.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    2. Re:For ZFS, speed is a secondary goal by Ash-Fox · · Score: 1

      Picking on ZFS for being slow when ported to a different OS and running on atypical hardware

      How is he picking? He's just measuring the file system performance compared to others on a specific OS.

      It's focussing on the wrong thing.

      I don't think it is, this person wanted to measure performance on Linux, not compare features and he got what he was testing. I would imagine there are plenty of people who want to know how well it performs - regardless of features - in comparison to other filesystems.

      --
      Change is certain; progress is not obligatory.
    3. Re:For ZFS, speed is a secondary goal by jd · · Score: 1

      Damn. I was going to ask Stephen Hawking if he wanted to join a juggler's association.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    4. Re:For ZFS, speed is a secondary goal by pedantic+bore · · Score: 1

      Damn. I was going to ask Stephen Hawking if he wanted to join a juggler's association.

      I didn't say that he doesn't enjoy it.

      --
      Am I part of the core demographic for Swedish Fish?
    5. Re:For ZFS, speed is a secondary goal by tyen · · Score: 1

      I would imagine there are plenty of people who want to know how well it performs - regardless of features - in comparison to other filesystems.

      These people are arguably not the population ZFS' requirements are intended to satisfy. Looking to filesystem performance to solve large data set performance issues unnecessarily narrows the scope of the architecture design problem space. Anyone with small data sets would likely be using ext[34], anyone with large data sets, frequently spread across multiple servers or server clusters, would likely be treating filesystem performance as one among a multitude of design points.

      Until btrfs is production-grade in business-critical environments, there is literally no other extant filesystem that can fit the same requirements footprint as ZFS. IMHO IBM (especially the storage business unit) missed a golden opportunity by letting Sun go to Oracle, and their services and DB2 arms are really going to regret it in 3-5 years.

    6. Re:For ZFS, speed is a secondary goal by Ash-Fox · · Score: 1

      These people are arguably not the population ZFS' requirements are intended to satisfy.

      I'm not arguing that. I still feel it's relatively useful knowledge for people who are considering ZFS outside of it's specific features. Sometimes you use a specific filesystem because the rest of your organisation's systems use it. I personally find benchmarking for a specific task to determine whether or not you should introduce more filesystem homogeneity into your environment can be beneficial if it reduces maintenence/costs long term. Or even doing the reverse and trying to determine if it's worth switching to filesystem X for some particular feature that other don't support, but still require a certain level of performance on some tasks.

      --
      Change is certain; progress is not obligatory.
  19. Let me put it this way. by jotaeleemeese · · Score: 1, Flamebait

    If you need to ask most likely you would not care about them.

    Mentioning on the same breath ZFS and Ext4 says it all about your expertise on this field really.

    --
    IANAL but write like a drunk one.
    1. Re:Let me put it this way. by Anonymous Coward · · Score: 0

      You managed to be an arrogant prick, yet add absolutely no information to the topic at hand. Congratulations, you are promoted to Troll Level 2.

  20. Benchmarking ZFS on a single disk is misleading by Guy+Smiley · · Score: 1

    Since ZFS is doing metadata replication, running the tests on a single disk is going to punish ZFS performance much more than other filesystems. It would be much more interesting to run a benchmark with an array of 6 or 8 disks with RAID-Z2, with ZFS managing the disks directly, and XFS/btrfs/ext4 running on MD RAID-6 + LVM. Next, run a test that creates a snapshot in the middle of running some long benchmark and see what the performance difference is before/after.

    1. Re:Benchmarking ZFS on a single disk is misleading by Mongo · · Score: 1

      I also don't see any mention of an ARC, I don't know if they are even looking at porting that part. But a single disk setup is absolutly not ZFS's strong point.

      With a $5000 Super Micro JBOD here are some quick number on a Solaris 11 Express system I set up this week to test the performance.

      We are testing it for COMSTAR and as a fibre channel target. With a $10K head and a $5K jbod it runs rings around $100K commercial arrays.

      I will have better stats on it's speed from a client perspective but I only have one 4GB HBA in the initiator system, and it fills that pipe quite well on non sync writes.

      Writes

      root@uszfs002:/tank/test# dd if=/dev/zero of=/tank/test/testfile bs=4k count=20000k
      20480000+0 records in
      20480000+0 records out
      83886080000 bytes (84 GB) copied, 126.938 s, 661 MB/s

      Reads.

      This is using 22 1TB SATA drives in 11 striped mirrors.
      root@uszfs002:/tank/test# dd of=/dev/zero if=/tank/test/testfile bs=4k count=20000k
      20480000+0 records in
      20480000+0 records out
      83886080000 bytes (84 GB) copied, 83.5461 s, 1.0 GB/s /usr/benchmarks/iozone/iozone -Rab /root/20diskraid10.wks

      The top row is records sizes, the left column is file sizes
      Writer Report
      4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384
      64 336889 610350 614542 1066042 2052169
      128 240124 424133 805138 1111981 1487977 3759450
      256 424604 765970 1084800 1790149 2170027 2911402 4332998
      512 602243 986529 1216149 1910902 1718254 5227492 4784885 5115422
      1024 632467 1132697 1762539 2353657 2547723 5950309 5169641 5120336 5630486
      2048 742829 1311279 1775951 2560486 3074635 5593112 5119743 5704543 5884299 5832361
      4096 784977 1364044 2008682 2592481 3105653 5587302 5603704 5713661 5447345 5996870 6005255
      8192 786033 1370125 2009834 2604824 3062418 6145757 6077276 6136976 5935548 6072979 6054786 5748846
      16384 799251 1392949 2034757 2540197 3108380 6243929 6030242 5966368 6127020 6030242 6136321 5853544 4599757
      32768 0 0 0 0 2399347 6305173 6054631 6063446 6132982 6177918 6015674 6097339 4718317
      65536 0 0 0 0 2447387 6050207 6173312 6126531 6169294 6171649 6122028 6067569 4736595
      131072 0 0 0 0 2428112 6348527 6187891 6162297 5912382 6102376 6179475 6079970 4795212
      262144 0 0 0 0 2421742 6297144 6170555 6108302 6157079 6174610 6182491 6154357 4832682
      524288 0 0 0 0 1721533 6321524 6185705 6201142 6184905 6189866 6174416 6130026 4948176
      Re-writer Report
      4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384
      64 842003 780776 1124074 1599680 1734015
      128 876086 1346196 1504659 2035099 2511022 2558895
      256 861885 1437779 1907837 3705040 3879045 3285566 3823789
      512 958781 1500891 2178404 2975154 3849877 4690818 4610256 4784885
      1024 972414 1662264 2681328 3483896 4356809 5219904 5120336 5472650 5417427
      2048 963342 1639674 2752473 3799501 4720248 4839929 5489457 5345969 5549749 5447681
      4096 993181 1706585 2773270 3821310 5069594 5711761 5664678 5634950 5535098 5519094 5506711
      8192 1003414 1794447 2905979 4059129 5000831 5806161 5721087 5661696 5713476 5572627 5821902 5826839
      16384 1003060 1814193 2967571 4046414 5278477 6104162 5997088 6003375 5771445 5923681 5981949 6054683 4720131
      32768 0

    2. Re:Benchmarking ZFS on a single disk is misleading by Guy+Smiley · · Score: 1

      Any chance you could run the same tests on this hardware under Linux with ext4 and XFS on software MD RAID? An added bonus would be to compare the Solaris ZFS performance with the Linux ZFS performance.

    3. Re:Benchmarking ZFS on a single disk is misleading by phoenix_rizzen · · Score: 1

      Exactly. Benchmarking ZFS on a single disk is (almost) pointless. While there are those who use ZFS on laptops and desktops for access to the extra data protection and snapshotting features, that's not the target audience.

      Let's see some real, enterprisey benchmarks of ZFS managing multiple disks, vs Linux MD + LVM + Ext4/XFS/JFS/Btrfs. With and without L2ARC/FSCache, with and without ZIL, with snapshots being taken in the middle of the benchmark run, with snapshots being deleted in the middle of the benchmark run, with drives being pulled in the middle of the benchmark run, etc.

      There's a *lot* more to a filesystem than raw throughput. So let's test/benchmark those things.

  21. Re:Why I use ZFS/Solaris in production for Postgre by Ash-Fox · · Score: 1

    Have a look at who's developing btrfs.

    For the curious, it's a single person called Chris Mason who happened to work also on ReiserFS (the killer filesystem).

    --
    Change is certain; progress is not obligatory.
  22. Bad Name by Anonymous Coward · · Score: 0

    btrfs naturally rolls off the tongue as bit rot filesystem..

  23. Speed? by yoshi_mon · · Score: 1

    I don't want speed from ZFS, I will do that via hardware.

    I what the tech from ZFS to give me everything that it does.

    Why judge a Nascar on it's performance when it runs on a Rally car track? (I am a bit of a car geek so I think that is a pretty good /. car analogy! ;)

    --

    Really, I know what I'm doing...Ohhhh, look at the shiny buttons!
  24. Re:Why I use ZFS/Solaris in production for Postgre by Pengo · · Score: 1

    It's a good blend of both reads and writes.

    We have tables that have as many as 100m records, where Solaris/ZFS seemed to help massively was the big reads for reporting. We have indexed it pretty aggressively, even going so far as to index statements and managed to pull amazing performance, considering the concurrency we see from a free database. (Which for the record, has never given us any problems... postgres has been rock-solid)

    for the most part it was running "ok" on linux, but the bump we got from the testing on Solaris with ZFS with identical hardware and similar configs was nothing short of amazing.

    One of the big differences between the 2 configs, we disabled the raid controler (A dell perc 6/i) to run jbod instead of Raid 1+0. I've not tried to do a stripe configuration on Linux with a similar configuration , even without compression. To be fair to the linux performance, i really need to setup and test with a similar config to make sure my results were not hardware related.

    A friend had told me where solaris and ZFS really gives the big bump on the performance is how it's not having to read each byte from the disk, it's reading a compressed block and decompressing it on the fly, which if you have the CPU cycles to spare causes the io transfers to be a lot quicker. (at times 2-3x faster than a raw read with uncompressed data)

    I'm guessing that we could probably get similar results with Linux on XFS or ext4 using solid state drives, which are now a little more affordable than they were years ago.

    Again, we're not a large shop with lots of money to throw around at the project, we're a startup just trying to get by in a brutal economy. :)

    You're right though about the default configuration. I've gone through and tuned the work memory, index cache, tuned the memory to match my hardware. (Currently 32 gigs on an array of 8 disks on a 8 core Xeon server)...

  25. Different consistency guarantees... by KonoWatakushi · · Score: 1

    The consistency guarantees provided by the tested filesystems differ significantly. Most (all?) aside from ZFS only journal metadata by default. All data and metadata written to ZFS is always consistent on disk. You won't notice the difference until you crash, and even then you still might not, but it will certainly show up in the benchmarks.

    ZFS is not a lightweight filesystem, that is a fact. The 128-bit addresses, 256-bit checksums, compression, and two or three way replicated metadata don't come for free. Also, another thing that may be working against ZFS on a Flash based SSD is the page size. By default, ZFS uses a minimum of 512 byte blocks for data, whereas most other filesystems use 4k which matches the SSD page size. It would be interesting to create the ZFS pool with a 4k asize and see how that affects the results.

    The benchmarks aside, it is the feature set which really sells it. The performance is good, the administrative interface is excellent, and it does an fine job of returning your data in an error free state. At the end of the day, that is what really matters.

    Even so, I look forward to more numbers when stable releases can be compared. It would also be nice to include DragonFlyBSDs HAMMER filesystem, to round out the modern set.

  26. Then How ZFS (was: Then Why ZFS?) by davecb · · Score: 1

    If the licenses are incompatible, then why even port it? Academic interest?

    --dave

    --
    davecb@spamcop.net
    1. Re:Then How ZFS (was: Then Why ZFS?) by Lennie · · Score: 1

      You can use it just fine. It can just not be included in the kernel.

      If I understand it correctly, it is the same situation we had with qmail before it became public domain.

      When you are installing a new server/installation (or have a central repository), you would have to do a compile from source. Because no1 can bundle the 2 for distribution.

      But I'm not a lawyer.

      --
      New things are always on the horizon
  27. Re:Why I use ZFS/Solaris in production for Postgre by Guspaz · · Score: 1

    In that Oracle is developing btrfs and Chris Mason is the one that Oracle has doing the work, yes.

  28. Re:Why I use ZFS/Solaris in production for Postgre by Anonymous Coward · · Score: 0

    I believe one of the FreeBSD kernel folks runs Postgres 9 on FreeBSD 8 with ZFS and has had similar performance gains.

    Or perhaps it may have been Ivan Voras (see his Arrow of Time blog)

  29. Re:Why I use ZFS/Solaris in production for Postgre by Anonymous Coward · · Score: 0

    ...who happened to work also on ReiserFS (the killer filesystem).

    I see what you did there.

  30. Re:Why I use ZFS/Solaris in production for Postgre by Ash-Fox · · Score: 1

    In that Oracle is developing

    I didn't think "Oracle" was a suitable response, since this was a "who" rather than a "which company" question honestly.

    --
    Change is certain; progress is not obligatory.
  31. Misleading... by CondeZer0 · · Score: 1

    > the Solaris file-system in most areas is not nearly as fast as EXT4, Btrfs, or XFS."

    If you look at the benchmarks, it is not just "not nearly as fast", but it is a few magnitude orders slower in most benchmarks!

    ZFS is more evidence that complexity and mountains of hacks don't make anything better or faster, no matter how smart the developers (claim) to be and how many buzzwords they manage to hit.

    --
    "When in doubt, use brute force." Ken Thompson
  32. Re:Why I use ZFS/Solaris in production for Postgre by Guspaz · · Score: 1

    I don't agree. While he might be the person doing the work, he's still doing the work in an official capacity. Oracle has the final say on everything he does, and can decide it wants things done differently, or assign him to something else entirely. Ultimately, the entity responsible for the development is not Chris Mason, but Oracle.

  33. ZFS performance across operating systems by Xenophon+Fenderson, · · Score: 1

    I'd like to know how ZFS performs on Solaris, OpenSolaris, FreeBSD, Linux, FUSE, etc. Comparing the Linux and FUSE ports to one another is pretty useless for estimating the progress of these ports compared to the "native" versions.

    --
    I'm proud of my Northern Tibetian Heritage
  34. Re:Why I use ZFS/Solaris in production for Postgre by phoenix_rizzen · · Score: 1

    Have you considered moving to FreeBSD, which already supports ZFS natively? At the very least, it would be a useful stepping-stone/stop-over to get you off Solaris, and save you some licensing fees.