Benchmarking Linux Filesystems Part II
Anonymous Coward writes "Linux Gazette has a new filesystem benchmarking article, this time using the 2.6 kernel and showing ReiserFS v4. The second round of benchmarks include both the metrics from the first filesystem benchmark and the second in two matrices." From the article: "Instead of a Western Digital 250GB and Promise ATA/100 controller, I am now using a Seagate 400GB and Maxtor ATA/133 Promise controller. The physical machine remains the same, there is an additional 664MB of swap and I am now running Debian Etch. In the previous article, I was running Slackware 9.1 with custom compiled filesystem utilities. I've added a small section in the beginning that shows the filesystem creation and mount time, I've also added a graph showing these new benchmarks." We reported on the original benchmarks in the first half of last year.
An interesting analysis in every aspect, and it's fine and dandy for the person who uses 400 GB drives and a ATA controller on a 500MHz computer but I'd like to see how the filesystems compare on a bigass RAID system run by a Power5 server, or a few Itaniums that usually have with a few hundred connected users. Something a bit more "entreprise" - where the choice of a filesystem is a bit more critical than a small server or a home PC.
One thing this does show is that you need to be very careful to match the filesystem type to the main tasks the PC is going to be used for. Personally, there's no real clear winner as all have major gains or deficiencies in some areas. One very interesting point was the vast difference in the amount of available space after a partition and format between the different filesystems.
Conor "You're not married,you haven't got a girlfriend and you've never seen Star Trek? Good Lord!" - Patrick Stewart
It is widely known that Reiser filesystems are heavy on CPU usage 4 more than 3. These benchmarks seem to show a CPU bound IO situation as opposed to an IO bound IO situation. As an earlier comment pointed out, the hardware used in this test was a 500mhz CPU. My slowest computer is a 1000mhz system, which is usually IO limited, not CPU limited. I'd be interested to see these same benchmarks run on real hardware, or some more complex benchmarks (random RW, DB load, etc.). The hardware used for this test would be suitable for a fileserver, but not much else. In that situation, E2, E3 or XFS are probably the right choices as it points out. What about desktop loads, enterprise loads, or something more interesting?
--Brandon
Here's what's missing. They forgot to tell you how well the drive performed after being used for 1 year, and having constantly moved data from one place to another, and constantly deleting and creating new data. It would have been a better test if the drive was about 75% full, with data from 2 years of use, and then the same tests were performed.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Remember, fastest!=best. Some filesystems cannot shrink. Some cannot change size at all. If you're doing anything with LVM or RAID, generally ext3 is the way to go. If you're just formatting a disk and using it without anything on top of it, these FS's may be for you. Then again, ext3 looks damn good in the tests as stands. XFS looks like the clear loser.
Since when has this country used intellectual elite as a pejorative term?
I love the CPU utilization graph for "touch 10,000 files".
A quick glance shows ReiserV4 as much more CPU intensive, you have to look at the scale to realize it only used 0.3% more CPU.
His benchmark data is ruined by using a gross unrealtistic piece of hardware - modern fast hard disks coupled with a cpu which is absurdly slower than anything you can buy.
Am I reading this "benchmark" correctly? Did he base his results on a sample size of 1?
At the very least, you run multiple times and average the results to give statistically meaningful numbers. I can't think of ANY time where a sample size of 1 was meaningful for anything.
What would be really interesting is to come up with a reasonable UCL and LCL for each test, and then calculate out a cpK for each test. It's one thing to say "I got these results one time", it's something much more impressive to say "I can achieve this result +-10%".
Of course, if a particular benchmark can't even hit a cpK of 1, then maybe there is room for improvement in the coding of the driver.
For those of you who haven't done much with statistics, cpK is a measure of "capability" in a machine or process. It shows how repeatable the measured process is. A higher number indicates that you have a highly targeted, low deviation process whereas a low number (1 or less) indicates that your process is incapable of repeatability and/or accuracy.
Ron Gage - Westland, MI
IIRC NCQ isn't 100% fully-baked on Linux yet, so even NCQ-capable controllers and drives won't take advantage of it yet. I just upgraded my home file server with NCQ-capable gear and I don't think it's using it yet, even though I'm running the latest kernel.
There are patches for libATA that enable NCQ, but they're not in the mainline yet.
The only thing worse than testing without the new technologies would be testing with half-baked implementations of them. Let's wait until NCQ is done before we try testing with it.
"Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
There were some current (recent 2.6 kernel with XFS, JFS, possibly Reiser4, etc) benchmarks done on highend servers (or at least something with drives a few steps up from the CompUSA weekly special), especially if anyone wants to see Linux succeed in the enterprise.
Based on the geometric mean of all the benchmark times for each filesystem, which effectively weights all benchmarks equally:
JFS won
EXT2 and EXT3 took 17% longer than JFS
XFS took 29% longer than JFS
Reiser3 took 38% longer than JFS
Reiser4 took 52% longer than JFS
Now, 1.52 seconds is not a whole lot longer to wait than 1 second. With any luck we'll see a post from Hans explaining why Reiser4 took longer, or what sacrifices were made to make the others faster, if there are any.
Reiser is not designed for slow CPUs. AFAIK, a key part of the design was the Hans Reiser realised that CPUs were vastly underused. IO resources were maxed out and CPUs were sitting idle. So he found ways to use the CPU to make more efficient use of the IO resources. So this benchmark on a 500Mhz machine will of course show Reiser in a bad light, and moving lower down to a 266Mhz will make it even worse.
For a decent benchmark of how filesystems work on modern hardware: use modern hardware.
Please help publicise swpat.org - the software patents wiki
It would be interesting to see the results of the same tests running against a SCSI drive system where there is less IO overhead to see if the results differ.
There are other considerations here as well. What about the I/O elevator's tuning options.
Yes, I'd much rather see this test occur against a SCSI drive or better yet against a RAM drive for pure software performance.
Cheers fellow slashdoters!
-Joe Baker
Anyway, how is the average user supposed to be concerned by these results?
In my daily work I manage hundreds of GB's of data and have hardly seen a significative difference between XFS, JFS and ReiserFS v.3 on relatively modern hardware (Tyan S2882 Pro motherboard, two Opteron 244 processors, 4 GB RAM and two 250-GB SATA HD's) running OpenSuSE 10. I put the most important data on a XFS partition but also have a small ReiserFS partition which can be read from Windows.
-- Help us to save our cousins the great apes, do not use cell phones.
The total free space graph is poor statistical representation
It starts at 345GB and goes to 375GB on the y scale. This makes the difference between 355 and 370 look like a 50% difference rather than that 5.7% increase.
He does it again in make 10,000 directories 99.5% is not double the cpu use of 97%
What am I saying? I want to know how efficent these filesystems are in packing the data on the HD.
I'm glad that you get more data out of Reiser v4, JFS, and XFS at formatting time, but my feeling is that Reiser v4 (once profiled, tweaked and refined for speed and space) will pack data tighter than anyone else. Meanwhile, I'm looking for something like ext3 that packs better.
--
# Canmephians for a better Linux Kernel
$Stalag99{"URL"}="http://stalag99.net";
I'm *sick* of reading filesystem benchmarks of people who doesn't even care about even reading the documentation of the filesystems they compare
OK, so ext3 is not the fastest filesystem on earth. But it has some default options which makes it suck even more than it usually do, and those options are *documented* in Documentation/filesystem/ext3.txt
* Ext3 does a sync() every 5 seconds. This is because ext3 developers are paranoid about your data and prefers to care about your data than win on benchmarks. Syncing every 5 seconds ensures you don't lose more than 5 seconds of work but it hurts on benchmarks. Other filesystems don't do it, if you are doing a FAIR comparison override the default with the "commit" mount option
* ext3's default journaling mode is slower than those from XFS, JFS or reiserfs, because it's safer. When ext3 is going to write some metadata to the journal, it takes care of writting to the disk the data associated to that metadata. XFS and JFS journaling modes do *not* care about this, neither they should, journaling was designed to keep filesystem integrity intact, not data, ext3 does it as an "extra", and it's slower because of that. But if you want to do a fair comparison, you should use the "data=writeback" mount option, which makes ext3 behave like xfs and jfs WRT to journaling. Reiserfs default journaling mode is like XFS/JFS, but you can make it behave like the ext3 default option with "data=ordered"
ext3 is not going to beat the other by using those mount options, but it won't suck so much, and the comparison will be more fair. And remember: ext3 tradeoffs data integrity for speed. There's nothing wrong with XFS and JFS, but _I_ use ext3.
What's with the Microsoft Excel style graphs? They're not very precise or professional-looking.
You would have thought the author would use something better like gnuplot?
The author's opinion "Personally, I still choose XFS for filesystem performance and scalability."
is largely irrelevant here and sounds like bias, although the author acknowledges this.
There is no discussion of the results. The text between the graphs only mentions superficially
what is obvious to anyone looking at the graphs.
Seems a far cry from the very nicely done BSD and Linux benchmark at http://bulk.fefe.de/scalability/
Of course JFS won, since it was designed to be as simple as possible ... it's originating from OS/2, afterall. On such a machine as used in this test, this is a huge advantage.
If someone does not know that filesystem benchmarks that take less than a tenth of a second are meaningless, it makes you wonder if they made errors in other aspects as well. These results are not consistent with the results that we have had. I bet he did not make an effort to ensure that you had to read the disk for these benchmarks, that he did not copy his file set from the same fs as he was measuring (makes a HUGE difference to performance and it is the mistake every beginner makes), etc. You'll note that the way he makes his graphs makes 1% differences look huge, etc.
XFS is a nice filesystem, I like it. Not enough to use in production, but I like it. Personally I use reiserfs3.6 on many production servers, and have never seen a problem. I am experimenting with 4 at home.
I have a strong warning if you are considering XFS. If you don't have a GOOD power backup (UPS), then don't use it. XFS caches very agressively for writes in RAM. You lose power, you lose that data.
XFS was designed with datacenters with good power backups in place, not home users. So chose carefully.
-- Note: If you don't agree with me, don't bother replying. I won't read it.
I would rather see these benchmarks on a computer less than 5 years old. I would also appreciate an open source version of the tests so they could be reproduced. For ease of reading, I think the article should be on a separate page on the site as well.
/usr/bin/touch
/dev/zero stuff is completely bogus. No indication of the blocksize that was used.
I've got a screaming Dell 1.6 GHz P4 to test with and here are my results for a couple of tests it only has ext3 and a whatever cheap harddrive came with the box. I'm not sure if dma is enabled or if I've done any hdparam tunings, but I'm not sure of their test system either:
my touch 10,000 files: 24.314 seconds theirs 48.25
I used a shell script that called
Now if I use a Perl open() call, I get 8.887 seconds
Now with a cheesy C that uses fopen() and fclose() I get 4.639 seconds
my make 10,000 directories: 56.832 seconds theirs 49.87
that is a shell script
If I user perl, I get 35.171 seconds
The
The copy kernel stuff to and from a different slower disk with an unknown filesystem on it is useless.
The split tests are not indicative of anything in real life, and they took on order of between 60 seconds and 130 seconds to perform on their 500MHz system with most being in the 130 second range. I got 16.547 seconds.
I do not see how any relevant information can be obtained from this article. I'm disappointed in the Linux Gazette and Slashdot for printing this information.
<blink> Test is flawed! </blink>
/w and w/o softupdate benched.
Checkout the CPU utilizations; reiserfs is pegged at 100% cpu utilization for ~8 tests. For a FS which describes itself as willing to use more CPU in order to achieve better I/O than the competition, running the benches on an antiquated 700 mhz machine is simply not fair.
OTOH, Untarring and tarring are notably NOT cpu limited, and still pretty lackluster for Reisers case. Disappointing, very disappointing. I was extremely impressed in the ext's; I simply had no idea how consistently well performing they were.
I'd also like to see FreeBSD's UFS
Myren
Wow. They must have a super slow file system if they are just now getting the results of the tests and the first half of the article ran in the first half of last year! :) This is supposed to be funny by the way.
Charles Wyble System Engineer