Slashdot Mirror


Making ZFS and DTrace Work On Ubuntu Linux

New submitter Liberum Vir writes "Many of the people that I talk with who use Solaris-like systems mention ZFS and DTrace as the reasons they simply cannot move to Linux. So, I set out to discover how to make these two technologies work on the latest LTS release of Ubuntu. It turned out to be much easier than I expected. The ports of these technologies have come a long way. If you or someone you know is addicted to a Solaris-like system because of ZFS and DTrace, please, inquire within."

38 of 137 comments (clear)

  1. ZFS on Linux by dnaumov · · Score: 3, Interesting

    So what am I supposed to do about all the kernel panics and absurdly slow IO and transfer speeds?

    1. Re:ZFS on Linux by SolitaryMan · · Score: 3, Insightful

      I used to have very high expectation of OpenSolaris after Ian Murdock became the head of the project... But then Oracle came and destroyed all my hopes.

      BTW, is ZFS SSD-aware?

      --
      May Peace Prevail On Earth
    2. Re:ZFS on Linux by stox · · Score: 4, Informative

      DTrace and ZFS are quite mature running under FreeBSD.

      --
      "To those who are overly cautious, everything is impossible. "
    3. Re:ZFS on Linux by Liberum+Vir · · Score: 5, Informative

      I haven't done any performance testing so far. My objective with this was just as a proof of concept, if you will. I'm sure, if you are having kernel panics and absurdly slow IO/transfer speeds, the developers would welcome your input to make it better. Personally, I prefer LVM and ext4 for most uses. Again, this was more just to prove that it could be done.

    4. Re:ZFS on Linux by frodo+from+middle+ea · · Score: 4, Interesting
      +1, After having used linux daily for more than 14 years, I have recently ventured in to the BSD land. And I like it a lot.

      If you've been playing in Linux land, and never bothered with any of the *BSD, do yourself a favor and install one of the BSDs in a VM. You'll not be disappointed.

      --
      for the last time people, I am "frodo from middle eaRTH", not "middle eaST".
    5. Re:ZFS on Linux by greg1104 · · Score: 2

      Yes, no one would suggest ZFS as the filesystem for your phone. You might have to do things like disable checksums if you have an older or otherwise underpowered CPU, and it's tuned by default to use memory quite heavily. That anecdote isn't very relevant for today's desktop or server environments though.

    6. Re:ZFS on Linux by catmistake · · Score: 4, Informative

      I used to have very high expectation of OpenSolaris after Ian Murdock became the head of the project... But then Oracle came and destroyed all my hopes.

      Good news! Your high expectations and hopes are alive and well at the [open] crossroads of America . They're also welcome at freenode on #openindiana.

    7. Re:ZFS on Linux by MightyYar · · Score: 4, Informative

      I have to second this... Debian was always my preference but I tried FreeBSD to get ZFS. For dependencies, ports does some things... differently than APT, but they are similar enough that it won't completely shock your system.

      And just like Debian, it is easy to start with an extremely minimal system and only add what you need, so stability and boot speeds are excellent.

      I think that Debian is still faster at certain things, though that is subjective.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    8. Re:ZFS on Linux by MightyYar · · Score: 2

      This article uses the kernel module, not the userland FUSE stuff.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    9. Re:ZFS on Linux by Damouze · · Score: 2

      I am terribly disappointed with ext4. If you want a good and stable combination of volume management and a filesystem, I'd recommend LVM (which of all the software solutions I've worked with is by far the best), combined with XFS, which is a stable and reliable journalling filesystem. Where ext4 fails just as terribly as its predecessor, ext3, as far as disk performance goes (kjournald, big kernel lock and the likes), XFS just ploughs on and on.

      As for Solaris and ZFS. At the company I work for we still use UFS for everything. ZFS has its niceties, but it does one thing many applications, especially DBMSes, won't like: it does not support direct I/O (http://blogs.sybase.com/database/2009/07/directio-file-system-devices/). Even if you do the zpool, zvol-as-raw-disk-device option you'll end up terribly disappointed, because although it claims to support direct I/O, you will not bypass the primary cache. Also, using ZFS on a SAN LUN does not really make any sense. It wants to do things with your disk devices that you don't want it to do with them. So, while as a concept ZFS is marvelous, quite brilliant actually, its practical application is not well-suited for a high-performance, high-load datawarehouse. Why? Because like another software company whose name I will not mention, Sun designed ZFS to be a little too smart for its own good.

      --
      And on the Eighth Day, Man created God.
    10. Re:ZFS on Linux by unixisc · · Score: 2

      One funny thing about OpenIndiana - it's fully supported on x86/x64, but not on Sun's own SPARC servers. So ironically, if one wants ZFS and DTrace support on SPARC, one is probably better off going w/ FreeBSD 9, rather than OpenIndiana.

    11. Re:ZFS on Linux by TheRaven64 · · Score: 3, Informative

      If you want to try FreeBSD with ZFS, I recommend that you use the PC-BSD installer. This can set up a complete FreeBSD environment (with or without the extra PC-BSD stuff - I think the 'server' install is vanilla FreeBSD) on a ZFS root. Doing the same with the current version of the FreeBSD installer requires some manual intervention, which is not really fun for people who aren't experienced with FreeBSD. Or for anyone else, for that matter.

      --
      I am TheRaven on Soylent News
    12. Re:ZFS on Linux by bheading · · Score: 3, Insightful

      I agree that ZFS on a SAN doesn't make sense, but that seems to be to have been the intention; ZFS wasn't aiming to work with your SAN, it is aiming to replace it, and I'm sure had the guys at Sun remained in control SAN features would have been added to it. That's why NetApp brought them to court.

    13. Re:ZFS on Linux by Lennie · · Score: 2

      Interresting enough btrfs works really well on a phone.

      --
      New things are always on the horizon
    14. Re:ZFS on Linux by hawguy · · Score: 2

      Dude, you're doing all that...on an Atom? Doesn't it drag ass? if it were me I'd replace that sucky Atom with a cheap Phenom X4e, those support ECC and can be had for $62. Figure in $30 for an AM2+ board and $20 for a 2Gb RAM stick and for less than $115 you'd have a machine that would be a HELL of a lot faster than an Atom at multitasking.

      It runs surprisingly well, I get around 15MB/second write speeds (and over 30MB/second read) which is more than I need for what I use it for. About the only time I notice it being slow is after I've ripped a movie from DVD and am copying it over to the fileserver. Most of the time I access it via Wifi so the disk is faster than the network. It's used only as a headless fileserver, no windowing system is installed so I don't need to worry about interactive performance.

      I thought adding the webcams and zoneminder would push it over the edge, but even with doing motion detection on the 3 cams, the CPU hovers around 30% utilization, so I really have no complaints with the performance.

      Not too bad for 35 watts of power (including the UPS). The TDP of the Phenom is 95W, and the motherboard is probably not all that power efficient either, so I'd probably at least double my power consumption if I went with a faster CPU. The Atom costs me around $50/year in electricity to run,so if I doubled the power consumption, it'd cost me around $100/year.

  2. Issue is not implementation by Anonymous Coward · · Score: 2, Informative

    The issue with ZFS and Linux has always been more about copyright than implementation.

    1. Re:Issue is not implementation by Liberum+Vir · · Score: 5, Interesting

      The whole GPL/CDDL issue is still around, however, since the CDDL code is not added to the Linux Kernel, but instead a loadable kernel module distributed separately, it is possible to satisfy both the GPL of the Linux Kernel and the CDDL of ZFS and DTrace. Because of the incompatibility of CDDL with the GPL, you could not distribute a complete system using of Linux, ZFS, and DTrace. You can, however, distribute packages to allow people to build it themselves. This is what the authors of these projects have done.

  3. Huh? by Score+Whore · · Score: 4, Interesting

    So there's a list of 10 steps to install zfs and that's it? Didn't do anything? zfs/zpool upgrade -v? zvols? zfs send/receive? snapshots? rollback? Scrub? Performance tests? Compression? Encryption? Can I export my pool from my Solaris 11 SPARC system and import it into linux, make some changes and then move it? L2ARC support? Separate ZIL support? Case sensitivity?

    I know this isn't exactly a great comment, but is it at all possible that someone make a judgement as to the value and truth of a submission before putting it up?

    1. Re:Huh? by ZorinLynx · · Score: 3, Interesting

      Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. It performs fairly well in my testing so far. Yes. Yes. Yes, if the pool version is below the currently supported Linux port's version (28). Yes. Yes.

      Granted, we haven't been using it long, but so far it's been fairly stable and capable.

      http://zfsonlinux.org/

    2. Re:Huh? by Zero__Kelvin · · Score: 2

      The submission was titled "Making ZFS and DTrace Work On Ubuntu Linux". You're thinking of the article you are going to write called "Now that I got ZFS and DTrace working on Ubuntu Linux Thanks to Some Other Guy Being Nice Enough To Help My Lazy Ass, Here's What I Did With It".

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
  4. Re:OK Howto article, but missing key points by Darik · · Score: 5, Informative

    I just looked at this article as my employer uses Debian and Ubuntu heavily and I've been pushing for ZFS on our file servers. There is no mention of ZFS version, the feature set available, or even a link to the source material.

    ZoL is based on ZFS version 28 from the last open Solaris release, and currently integrating Illumos as its upstream.

    There isn't much mention of how to use ZFS. I happen to know most commands, but I think this article would be difficult for a beginner even though it seems to be targeted at that demographic.

    It looks like the Slashdot editors are doing this blogger a favor by linking to a mostly empty article.

    At a minimum, this article should link to the ZoL home page, the ZoL Launchpad page for packages, and maybe the ZFS introduction or another tutorial.

  5. Re:OK Howto article, but missing key points by Liberum+Vir · · Score: 2

    The zfsonlinux.org site is what I used to set this up. The pool version is reported as 28 and filesystem version as 5. My apologies there. My objective with this article was to create a simple to follow tutorial on how to get it set up. My objective never was on how to use ZFS or DTrace. I'm intermediately familiar with ZFS, and very much a beginner when it comes to DTrace. Again, a "how-to use" tutorial was not my objective. There are far better sources for documentation on use. My article is just to get people an environment set up in which to test the platform. I had been toying around with the idea of ZFS and DTrace on Linux for a while, however, there was nothing like my article to help people in setting up a testbed system. So, that's what I wrote once I figured out how to make it all work together.

  6. Now that it's been Oracled... by Just+Some+Guy · · Score: 3, Interesting

    I've been running ZFS on FreeBSD for a few years and it's lived up to its promises, but I think I'll be migrating off of it. The problem is that I trusted Sun. They did some goofy things, but you knew where you stood with them. They release ZFS under an Open Source license? You could take them at face value and know that you were allowed to use it. But now that Oracle holds the reins, I have no desire to depend on any Sun-borne projects anymore. Yes, ZFS is Open Source. So was Java, and Google just spent roughly a bazillion dollars defending themselves for using something that looked like it. I can't afford to take on a case like that.

    Other than the Oracle-owned btrfs, what ZFS alternatives are available and ready for use today?

    --
    Dewey, what part of this looks like authorities should be involved?
    1. Re:Now that it's been Oracled... by bmo · · Score: 5, Insightful

      So was Java, and Google just spent roughly a bazillion dollars defending themselves for using something that looked like it. I can't afford to take on a case like that.

      So you take the Oracle vs. Google case as Oracle eventually going after individual users of legitimately licensed code?

      Nonsense.

      As much as I think Larry Ellison is a douchebag, he is motivated by profit. The results of this last case were less than optimum for him, going away from the case with bupkis and a bunch of fees from BS&F. Alsup also established he fact that independent implementation of APIs are not copyright violations, ever, under current law, which had not been proven until now, which is a big win for everyone including Google, and a stupendous loss for Oracle.

      Larry Ellison learned an expensive (David Boies doesn't come cheap) lesson here, that even his bluster and hubris doesn't win court cases.

      Google was not the loser here.

      ZFS and btrfs have free licenses and it's tough to put the worms back in the can once something is under a free license. Forks happen. Look at what happened to OpenOffice and Libre Office. Sure, Oracle can close off future code, but Very Useful Stuff like this gets forked by the community. There are enough smart people poking around in the guts of ZFS and btrfs that *do not* work for Oracle and the projects will continue on in the community even if only to give Oracle the finger.

      Your fears are overblown.

      --
      BMO

    2. Re:Now that it's been Oracled... by Anonymous Coward · · Score: 3, Informative

      Check out Dragonfly BSD's Hammer FS. It has offline dedup that doesn't require much mem at all (unlike ZFS's online dedup), some other features to ensure data integrity. It is not the life the universe and everything approach of zfs and btrfs, but it has much of what would be traditionally thought of as the fs part of the functionality.

  7. Liicensing? by ducomputergeek · · Score: 3, Informative

    I always thought the hold up on ZFS and DTrace on linux was the fact the CDDL and GPL didn't play nicely with each other. It was never a technical reason.

    I've been running both on FreeBSD for a couple years now. Still don't have any production machines with ZFS yet, but I've found DTrace to be a life saver on more than a few occations.

    --
    "The problem with socialism is eventually you run out of other people's money" - Thatcher.
  8. Used both on Linux: ZFS is great, Dtrace unstable by GrumpyOldMan · · Score: 5, Interesting

    I use ZFS on Ubuntu 11.10 in "production" for my main workstation and fileserver with a 3x3TB raidz pool with an L2 ARC. I/O is blindingly fast, and it has been rock solid. It serves about 10 machines, and feels an order of magnitude faster than the md/lvm based xfs array it replaced.

    I write 10GbE drivers for Linux, MacOSX, FreeBSD and Solaris. I make heavy use of Dtrace for both debugging and performance analysis. I feel naked without Dtrace, and I've used the linux dtrace a few times for debugging. Unfortunately, I've never had dtrace run on linux for more than a few minutes without crashing a machine. This is not necessarily bad, and often just a few seconds is all I need. But I would never run linux Dtrace on any production machine, whereas I use it all the time under Solaris / FreeBSD and MacOSX and often have customers run Dtrace probes on those OSes to diagnose issues.

  9. Low bar for entry by outZider · · Score: 4, Insightful

    So an article lacking knowledge of the technologies, any sort of testing, anything beyond "make install" or "apt-get install", will make it to the Slashdot homepage? This person openly admits that they didn't test ZFS beyond creating a zpool, and they don't know enough about DTrace to try... anything.

    As an aside, why was Linux capitalized, but Solaris was not?

    --
    - oZ
    // i am here.
  10. Wrong about license by tlambert · · Score: 3, Informative

    The DTrace integration is via a kernel module, so the license on DTrace is irrelevant..

    There are a couple of interfaces in Linux that should be externalized for getting stack tracebacks into user space in a standard manner without caring about binary architecture (they are currently static). I've personally used a modified Linux with DTrace mods and these functions externalized, and it's rather stable and usable. Specultive tracing is also a lot better for finding the origin of some random errno in the kernel, or who in user space is calling gettimeofday() a bazillion times in order to time stamp X events.

    Obligatory disclosure: I was on the team that did the DTrace port to Mac OS X.

    -- Terry

  11. Just one complaint by Guspaz · · Score: 2

    I've been using zfsonlinux for a few months now, ever since I migrated my file server from OpenSolaris to Ubuntu Server, and I've generally been pretty happy. It's been stable and fast (faster than osol was, anyhow). My only complaint is that mounting filesystems on boot seems eternally broken.

    In previous options, there was a config file option for a workaround, and the filesystems usually (but not always) got mounted on boot. Then that solution was removed in favour of an updated mountall package; unfortunately, this new solution never works. I'll boot the system, no filesystems mounted, but running mountall from the command prompt gets everything mounted OK... Sigh.

  12. Re:Used both on Linux: ZFS is great, Dtrace unstab by jovius · · Score: 2

    I've been using ZFS on Linux also a while on my Ubuntu based backup/media box. No problems so far, and the average transfer rate of a 100 GB disk image has been 50 MB/s from internal drive A to internal drive B (non RAIDed, Asus E35M1-I DELUXE Mini ITX with 8GB of mem and 2*Western Digital Caviar Green 2TB SATAIII 64MB,). The CPU usage hits maximum while transferring, and ZFS also uses most of the RAM quite efficiently.

  13. not a serious OS by rubycodez · · Score: 3, Insightful

    OpenIndiana has only made three "development releases" since 2010, it is not a production grade system. Just a hobbyist system.

  14. So what about performance ? by Lennie · · Score: 2

    I did a quick test with 2 identical VMs on my desktop with Intel SSD, I installed the ubuntu-zfs as from the article and I installed btrfs-tools.

    The VMs have 4 CPUs and 4GB of memory, 3 virtual disks.

    The btrfs has RAID1 data and meta data, the ZFS setup used RAIDZ as in the article:

    mkfs.btrfs -m raid1 -d raid1 /dev/vdb1 /dev/vdc1

    (I needed to create the partitions, for some reason the ZFS version didn't want to work without it)

    My quick stupid test, create a large file:

    ZFS:
    500+0 records in
    500+0 records out
    524288000 bytes (524 MB) copied, 16.8489 s, 31.1 MB/s

    real 0m16.853s
    user 0m0.000s
    sys 0m0.480s

    btrfs:
    500+0 records in
    500+0 records out
    524288000 bytes (524 MB) copied, 15.232 s, 34.4 MB/s

    real 0m15.234s
    user 0m0.000s
    sys 0m0.640s

    --
    New things are always on the horizon
  15. Good Article by Thumper_SVX · · Score: 3

    Congrats... this is a good summary on getting these working under Ubuntu. I did the ZFS install "naked" (without a summary as good as this) with a 10.04 box about a year ago and it has run great guns. Now, having said that it's good for what I use it for which is a temporary location to dump my SQL backups to from a large email archive using dedupe prior to running it off to tape... and another zpool mounted as an archive VMFS volume through NFS to our VMware farm so we can archive decommissioned virtual machines for 30 days prior to deletion per our policy. I am not 100% convinced I would use it for anything production though; supportability is still an issue with this and as such I remain a little dubious whereas with most of our system I can call a vendor and have them fix it. As the storage admin I find this a great way to keep up with the demands for storage while having a relatively transparent way (for my admins) to put stuff into a place where it doesn't take up so much space.

    Now having said that there are some caveats; as the zpool gets really large the ability to delete files becomes slower and slower when it's deduped. This is because a lot of database transactions take place to remove the files particularly when there's a lot of deduplicated blocks... and this problem is a lot worse under Ubuntu than it was under OpenSolaris (which is where I first played with ZFS). There are times also that when reading the SQL backups to dump them to tape it can make both storage pools unresponsive enough that VMware drops the NFS datastore and I have to manually remount them. Far less than perfect... but good enough for what we use it for.

    I have recently taken a decommissioned physical server (a DL380 G5 with two processors and 16GB of RAM) and put OpenIndiana on it to play with ZFS some more and it is working fantastically well. In my tests though it still has the slowdown issues, high utilization in one pool won't cause the other pool to grind to a halt when both are deduped. Also, it's been nice to (at least in test) create a ZVOL on my ZFS and present it through fiber-channel to my VMware hosts as a potential replacement for the NFS volume on Linux (I have only Emulex cards, and I have yet to see a properly working Emulex target mode under Linux). So far my testing has gone marvelously and I have found dedupe rates to be about the same as the NFS mounted volume... though slightly lower. I suspect that's probably because the data isn't really block aligned all that well but it still saves me a bunch of storage when we have 30 almost identical virtual machines being archived! On the bright side there I have not yet seen utilization get so high on the OI box that it causes any significant issues or dropouts that cause VMware to complain; so far it's been rock solid. I may migrate my ZFS stuff to the OI box and get it off my Ubuntu box... but at the moment they're both working great and I have no complaints.

    1. Re:Good Article by Thumper_SVX · · Score: 2

      Oh yeah, and for bonus points if you have an OCD admin who is allergic to the command line, you can easily put Napp-It on your OpenIndiana server to allow them some visibility into the stats or even create their own filesystems. Simplifies my job as I can give them a zpool to play with and let them build ZFS shares or ZVOLs to their heart's content. Now whether they actually understand the statistics... well that's a different matter but at least you can tell your boss you're giving them the tools they need :)

  16. Re:I don't see Solaris users migrating by mysidia · · Score: 2

    You don't need to partition, you can use /dev/sdb and /dev/sdc instead. My question to you is, is Linux doing something that Solaris doesn't support? ie: Capable of running ZFS in a partition instead of using whole disk?

    Solaris is capable of running ZFS on a disk slice, and it's something I will rarely use, but I will use it for the boot media, when running the ZFS boot volume off a CF card, I will mirror it to a small slice on a pair of hard drives, and have another slice on those two drives mirrored for providing the "System log" / scratch disk partition.

    What you must never do is utilize a SSD slice for the ZIL. It "works", but can cause problems and has serious consequences regarding data integrity in case of a system crash. Using disk slices is not supported for L2ARC or ZIL, even though nothing technically prevents it, you don't do that.

    You can use a disk slice as a VDEV member for a ZFS pool on Solaris, but you lose a number of the benefits of ZFS by doing so, for example: ZFS will no longer be able to turn on writeback buffering for the disk and manage the write cache for the device intelligently, because that means you could have non-ZFS partitions on another slice.

    It only makes sense to do that on cryogenically cold partitions, such as system logging/scratch disks, never on mission critical production data datastores.

    Define what will make it 'production quality' please.

    Production quality means I can safely load mission critical applications on the datastore, in well-understood hardware configurations, eg disk JBODs attached to a SAS controller, where the application requires 99.9% or better uptime, without risk of getting fired because of that decision, and there are no serious doubts about the integrity of the data, availability, and performance, that are not understood about the filesystem implementation, or integration with the OS and hardware support regarding the OS combination of the filesystem and certified hardware, all the availability features you expect are there and reliable, such as fault management, IP Multipathing, Storage multipathing, etc; the Solaris ZFS implementation has been used in production for well over 7 years in this case, the ZFSOnLinux implementation is brand new.

    In the case of Solaris, the ZFS implementation has proven stability, performance, and track record for critical applications. I can also implement ZFS in an HA cluster on Solaris with shared storage, with proper design, there are no serious questions about integrity of my data and availability, that don't have good answers.

    ZFS on Solaris might not work right with all choices of hardware, but basically, my data is safe, unless I use reliable components, or I experience hardware faults that exceed the amount of redundancy in the ZFS Pool (e.g. 2 drives fail in a 2-way mirror, 3 drives fail in a raidz2 pool).

    I definitely agree NFS on Linux still needs work. Hell, it's still possible to lock up a server if you don't treat NFS just right. But, to be honest, I'd rather see the CIFS support extended more to be honest, mostly better security models offered on it

    Solaris + ZFS has a better CIFS implementation than is available on Linux, , but honestly when clients are running Unix, NFS is the better choice. When CIFS is needed, for Windows clients, the only real game in town is Windows 2008 server, for most applications. It's fine for simple file sharing, as long as your requirements are simple -- but when you start running third party database applications, support will blindly blame the CIFS server for its woes and refuse to support the application until its moved, "It's not supported, the share needs to be on a Windows server"

    CIFS leaves much to be desired in the security department, whereas NFSv4 can utilize kerberos authentication, and RPSEC-GSS is potentially available; CIFS traffic cannot be encrypted, without tunnelling or IPsec

  17. Re:I don't see Solaris users migrating by obrith · · Score: 2

    What you must never do is utilize a SSD slice for the ZIL.

    I've been looking for a valid reference for this statement for a while. I've had a couple people tell me this and simply insist with no valid reasoning.

    I understand that ZFS can't send a cache flush to a slice, but if I'm using a SSD with a supercap (say Intel 320) I have never heard any valid argument or reference. I have a LOT to gain from not allowing ZFS to 'use' the whole disk; My 300GB SSD can write something in the neighborhood of 600-800TB if I hand the whole thing to ZFS before I run out the media wear indicator. The same disk vastly under-provisioned (giving ZFS a 15GB slice) my media wear indicator will last about 4.2PB (yes, 4200TB). Under my VMware all-sync NFS load this drive might last me 6 months if I give the whole thing to ZFS where I should get years slicing it. On top of this, the drive performs approximately twice as fast vastly under-provisioned.

    Can you give me a valid reasoning I shouldn't do this... or maybe some references?

  18. What about Dtrace replacements by INowRegretThesePosts · · Score: 2

    Have you tried one of the following replacements?
    http://sourceware.org/systemtap/wiki/SystemtapDtraceComparison