Running ZFS Natively On Linux Slower Than Btrfs
An anonymous reader writes "It's been known that ZFS is coming to Linux in the form of a native kernel module done by the Lawrence Livermore National Laboratory and KQ Infotech. The ZFS module is still in closed testing on KQ infotech's side (but LLNL's ZFS code is publicly available), and now Phoronix has tried out the ZFS file-system on Linux and carried out some tests. ZFS on Linux via this native module is much faster than using ZFS-FUSE, but the Solaris file-system in most areas is not nearly as fast as EXT4, Btrfs, or XFS."
Using BTRFS :)
Jesus had a UNIX beard.
If 3 other file systems are "faster", then is ZFS somehow "better"?
Who would have thought that a first-release Beta kernel module would not run as fast or be as reliable as the stable implementation for other operating systems, or the stables on Linux?
Thirty four characters live here.
On similar hardware of course.
It occurs to me that ZFS does a lot more than EXT4 and Btrfs too.
You can't call it "the Solaris file-system". You can say that the Linux native implementation of ZFS (a Linux file-system) is slower than BTRFS, though.
And, what does it matter it to be fast if it's not reliable? You can save your stuff in /dev/null quite fast too!
http://www.spinics.net/lists/linux-fsdevel/msg35235.html
OpenAFS, which still today provides features unavailable in any other production-ready network filesystem, is a nightmare to use in the real world because of its lack of integration with the mainline kernel. It's licensed under the "IPL", which like the CDDL is free-software/open source but not GPL compatible.
ZFS is very cool, but this approach is doomed to fail. It's much better to devote resources to getting our native filesystems up to speed -- or, ha, into convincing Oracle to relicense.
Personally, I was pretty sure Sun was going to go with relicensing under the GPLv3, which gives strong patent protection and would have put them in the hilarious position of being more-FSF free software than Linux. But with Oracle trying to squeeze the monetary blood from every last shred of good that came from Sun, who knows what's gonna happen.
I was confused as to what versions of ZFS were available on which distros so I made a chart that lists the different distros and which version of ZFS they support:
http://petertheobald.blogspot.com/2010/11/101-zfs-capable-operating-systems.html
Hope it's helpful...
- For the complete works of Shakespeare: cat
Couldn't they name the file system something better than butterface?
He who knows best knows how little he knows. - Thomas Jefferson
I've been through a few filesystem war^Wdramas and stuck with ext?fs the whole time. I liked the addition of journaling but I'm not sure that I've noticed any of the other "backstage" improvements in day to day functioning.
Is there really a reason to jump ship as a single-workstation user?
--- Do you believe in the day?
It's OK, runs fairly stable, but it also locks up once in a while and does some aggressive disk I/O. No idea what exactly, probably housekeeping, but it's somewhat irksome, could use some more fine tuning.
The main problem with btrfs right now is that it lacks fsck tools, so in case of havoc there is little chance to recuperate, which is not good for server like systems.
As for ZFS, it's not the tech that's keeping it from Linux but the restrictive licensing. Unless that gets fixed (probably won't happen), it is off limits, and Linux folks will do their own thing, like the always do.
hmmm, well the most obvious feature that ZFS has that Ext4 does not is check summing.
That feature is one reason why ZFS is better (it will tell you if your disk is going bad, and if you have a raid setup, it will go get the good data for you). However, this is also one reason why ZFS is slower... it spends time making sure your data is safe and that it always gives you the correct bits from your disk.
That single feature is why I run FreeBSD (looking forward to kFreeBSD/debian!) on my file server in a mirrored raid configuration. Yes, it is "slower", but I still pull data off that server at over 50MB/sec on my home gigabit lan! The specs on that server aren't great either... 2GB ram, and an old 1.6GHZ single core sempron.
It's still under development. But it's already pretty competitive, doing reasonably well in many tests.
And then there's this (on the last page) "Ending out our tests we had the PostMark test where the performance of the ZFS Linux kernel module done by KQ Infotech and the Lawrence Livermore National Laboratories was slaughtered. The disk transaction performance for ZFS on this native Linux kernel module was even worse than using ZFS-FUSE and was almost at half the speed of this test when run under the OpenSolaris-based OpenIndiana distribution."
Ok, maybe someone can disabuse me of a misconception that I have, but: There's no reason that ZFS in the kernel should be slower than a FUSE version. That means there's something wrong. If they figure out what's wrong and fix it, that could very likely affect the results in some or all of the other tests.
ZFS isn't done yet, and it already looks like it might be worth the trade-off for the features ZFS provides. And performance might get somewhat better. This article is good news (though that final benchmark is distressing, especially when you look at the ZFS running on OpenSolaris).
It says: "When KQ Infotech releases these ZFS packages to the public in January and rebases them against a later version of ZFS/Zpool, we will publish more benchmarks."
and I'm looking forward to that new article.
The throughput for large data sorts are just faster, period.
A lot of it has to do with the reading of compressed data, and the huge ram-buffer that ZFS uses on the OS, optional commit on writes, block sizes that match the database pages.
The system scans 3 megs of index data, what it's actually reading to get that off is say 1 meg, which it decompresses on the fly on one of the many cores the database server has. In the end throughput destroys what i get running non-compressed volumes on EXT4 or XFS on Linux. For "MY" database, it runs nearly 2-3x faster than the same hardware running on Linux. (RHEL5 is what I ran the db on for a long time).
I have not been able to get Linux/Postgres to run even partially as fast as I have been able to get Solaris/ZFS running Postgres 8.3.
Btrfs isn't even near production states yet, but i am really hoping that it will give me an option to get off of Solaris.
On that note, one thing i haven't tried yet with our DB is Solid State Drives. The sheer throughput might more than make up for the benefits i get on compressed ZFS volumes.
I for one am VERY VERY hopeful that BTRFS can get stable, and fast. Oracle's fiasco has me and a few other people at our small business very nervous. I'm not planning on replacing our Sol10 (free) distribution , and could care less about the support Oracle offers. I'm playing with Solaris Express 11 now, but not sure I want to pay the $1k a year for production use, though if it offers me the performance gains over linux that I'm currently seeing, it will probably be worth it for our Database system alone.
Has anyone here had experience tuning Postgres on Linux versus Solaris/ZFS ? We're not a huge shop, 8 people running large data-warehouse type applications. We run on a shoestring and don't have a bunch of money to throw at the problem and would be very grateful for any ideas on how to make our database run with comparable performance on Linux. I'm hoping that I'm missing something obvious.
Holy crap, they only tested on a single SSD. This is akin to running DOS on a 64 core system.
ZFS is a full volume management /file system with the ability to partition filesystems (on the fly I might add), assign different attributes (compression, deduping, max size, caching, data replication etc), add space on the fly, set storage/redudancy parameters (RAID 0,1,10 and their own raidz1, raidz2), spare assignment, live disk replacement, etc, etc.
Try throwing 48 x 3TB disks into a chassis and configuring them (use SSDs for the ZIL/ARC, and set up the zpools correctly). Try doing the same thing with BTRFS. Actually, you can't.
I love the fact that a native Linux driver is coming, but this review is completely useless.
Has anyone here had experience tuning Postgres on Linux versus Solaris/ZFS ? We're not a huge shop, 8 people running large data-warehouse type applications. We run on a shoestring and don't have a bunch of money to throw at the problem and would be very grateful for any ideas on how to make our database run with comparable performance on Linux. I'm hoping that I'm missing something obvious.
What have you done so far and how are you using Postgres? Mostly reads, mostly writes or some combination of the two? Postgres as it ships is notorious for slow configuration, and many Linux distributions are consistently one major version behind the curve (which is a little annoying as much of the focus of the Postgres people for some time has been improving performance).
So, you're not a fan of Oracle I take it? Have a look at who's developing btrfs.
When XFS was introduced in Linux it also sucked performance wise, so, I think for ZFS on Linux there's certainly a room for improvement.
And even in this early age ZFS shows very remarkable results, so let's just wait and see.
Picking on ZFS for being slow when ported to a different OS and running on atypical hardware is like criticizing Stephen Hawking for being a poor juggler. It's focussing on the wrong thing. The goals of ZFS are, in no particular order:
- Scalability to enormous numbers of devices
- Highly assured data integrity via checksumming
- Fault tolerance via redundancy
- Manageability/usability features (i.e., snapshots) that conventional file systems simply don't have
Oh, and if it's fast, well, that's gravy.
Am I part of the core demographic for Swedish Fish?
If you need to ask most likely you would not care about them.
Mentioning on the same breath ZFS and Ext4 says it all about your expertise on this field really.
IANAL but write like a drunk one.
Since ZFS is doing metadata replication, running the tests on a single disk is going to punish ZFS performance much more than other filesystems. It would be much more interesting to run a benchmark with an array of 6 or 8 disks with RAID-Z2, with ZFS managing the disks directly, and XFS/btrfs/ext4 running on MD RAID-6 + LVM. Next, run a test that creates a snapshot in the middle of running some long benchmark and see what the performance difference is before/after.
For the curious, it's a single person called Chris Mason who happened to work also on ReiserFS (the killer filesystem).
Change is certain; progress is not obligatory.
btrfs naturally rolls off the tongue as bit rot filesystem..
I don't want speed from ZFS, I will do that via hardware.
I what the tech from ZFS to give me everything that it does.
Why judge a Nascar on it's performance when it runs on a Rally car track? (I am a bit of a car geek so I think that is a pretty good /. car analogy! ;)
Really, I know what I'm doing...Ohhhh, look at the shiny buttons!
It's a good blend of both reads and writes.
We have tables that have as many as 100m records, where Solaris/ZFS seemed to help massively was the big reads for reporting. We have indexed it pretty aggressively, even going so far as to index statements and managed to pull amazing performance, considering the concurrency we see from a free database. (Which for the record, has never given us any problems... postgres has been rock-solid)
for the most part it was running "ok" on linux, but the bump we got from the testing on Solaris with ZFS with identical hardware and similar configs was nothing short of amazing.
One of the big differences between the 2 configs, we disabled the raid controler (A dell perc 6/i) to run jbod instead of Raid 1+0. I've not tried to do a stripe configuration on Linux with a similar configuration , even without compression. To be fair to the linux performance, i really need to setup and test with a similar config to make sure my results were not hardware related.
A friend had told me where solaris and ZFS really gives the big bump on the performance is how it's not having to read each byte from the disk, it's reading a compressed block and decompressing it on the fly, which if you have the CPU cycles to spare causes the io transfers to be a lot quicker. (at times 2-3x faster than a raw read with uncompressed data)
I'm guessing that we could probably get similar results with Linux on XFS or ext4 using solid state drives, which are now a little more affordable than they were years ago.
Again, we're not a large shop with lots of money to throw around at the project, we're a startup just trying to get by in a brutal economy. :)
You're right though about the default configuration. I've gone through and tuned the work memory, index cache, tuned the memory to match my hardware. (Currently 32 gigs on an array of 8 disks on a 8 core Xeon server)...
The consistency guarantees provided by the tested filesystems differ significantly. Most (all?) aside from ZFS only journal metadata by default. All data and metadata written to ZFS is always consistent on disk. You won't notice the difference until you crash, and even then you still might not, but it will certainly show up in the benchmarks.
ZFS is not a lightweight filesystem, that is a fact. The 128-bit addresses, 256-bit checksums, compression, and two or three way replicated metadata don't come for free. Also, another thing that may be working against ZFS on a Flash based SSD is the page size. By default, ZFS uses a minimum of 512 byte blocks for data, whereas most other filesystems use 4k which matches the SSD page size. It would be interesting to create the ZFS pool with a 4k asize and see how that affects the results.
The benchmarks aside, it is the feature set which really sells it. The performance is good, the administrative interface is excellent, and it does an fine job of returning your data in an error free state. At the end of the day, that is what really matters.
Even so, I look forward to more numbers when stable releases can be compared. It would also be nice to include DragonFlyBSDs HAMMER filesystem, to round out the modern set.
If the licenses are incompatible, then why even port it? Academic interest?
--dave
davecb@spamcop.net
In that Oracle is developing btrfs and Chris Mason is the one that Oracle has doing the work, yes.
I believe one of the FreeBSD kernel folks runs Postgres 9 on FreeBSD 8 with ZFS and has had similar performance gains.
Or perhaps it may have been Ivan Voras (see his Arrow of Time blog)
...who happened to work also on ReiserFS (the killer filesystem).
I see what you did there.
I didn't think "Oracle" was a suitable response, since this was a "who" rather than a "which company" question honestly.
Change is certain; progress is not obligatory.
> the Solaris file-system in most areas is not nearly as fast as EXT4, Btrfs, or XFS."
If you look at the benchmarks, it is not just "not nearly as fast", but it is a few magnitude orders slower in most benchmarks!
ZFS is more evidence that complexity and mountains of hacks don't make anything better or faster, no matter how smart the developers (claim) to be and how many buzzwords they manage to hit.
"When in doubt, use brute force." Ken Thompson
I don't agree. While he might be the person doing the work, he's still doing the work in an official capacity. Oracle has the final say on everything he does, and can decide it wants things done differently, or assign him to something else entirely. Ultimately, the entity responsible for the development is not Chris Mason, but Oracle.
I'd like to know how ZFS performs on Solaris, OpenSolaris, FreeBSD, Linux, FUSE, etc. Comparing the Linux and FUSE ports to one another is pretty useless for estimating the progress of these ports compared to the "native" versions.
I'm proud of my Northern Tibetian Heritage
Have you considered moving to FreeBSD, which already supports ZFS natively? At the very least, it would be a useful stepping-stone/stop-over to get you off Solaris, and save you some licensing fees.