Benchmarking Linux Filesystems Part II

← Back to Stories (view on slashdot.org)

Benchmarking Linux Filesystems Part II

Posted by Zonk on Friday January 6, 2006 @05:34AM from the some-of-this-content-may-be-inappropriate-for-young-readers dept.

Anonymous Coward writes "Linux Gazette has a new filesystem benchmarking article, this time using the 2.6 kernel and showing ReiserFS v4. The second round of benchmarks include both the metrics from the first filesystem benchmark and the second in two matrices." From the article: "Instead of a Western Digital 250GB and Promise ATA/100 controller, I am now using a Seagate 400GB and Maxtor ATA/133 Promise controller. The physical machine remains the same, there is an additional 664MB of swap and I am now running Debian Etch. In the previous article, I was running Slackware 9.1 with custom compiled filesystem utilities. I've added a small section in the beginning that shows the filesystem creation and mount time, I've also added a graph showing these new benchmarks." We reported on the original benchmarks in the first half of last year.

24 of 255 comments (clear)

Min score:

Reason:

Sort:

Very interesting article... by toofast · 2006-01-06 05:36 · Score: 4, Interesting

An interesting analysis in every aspect, and it's fine and dandy for the person who uses 400 GB drives and a ATA controller on a 500MHz computer but I'd like to see how the filesystems compare on a bigass RAID system run by a Power5 server, or a few Itaniums that usually have with a few hundred connected users. Something a bit more "entreprise" - where the choice of a filesystem is a bit more critical than a small server or a home PC.
1. Re:Very interesting article... by Captain+Segfault · 2006-01-06 06:20 · Score: 3, Interesting
  
  It is completely absurd for a filesystem to kill a disk. If you were getting those errors (with the "drive ready" and "seek complete" bits being set being most common) it *strongly* suggests that either your disk is broken or it is improperly powered.
  
  If you're actually using that disk, still, have a look at it with smartctl. In particular, run "smartctl -t long" on it, and have a look at the results. If it doesn't pass that, don't even think of trusting it with your data.
Need to be careful... by Conor+Turton · 2006-01-06 05:42 · Score: 3, Insightful

One thing this does show is that you need to be very careful to match the filesystem type to the main tasks the PC is going to be used for. Personally, there's no real clear winner as all have major gains or deficiencies in some areas. One very interesting point was the vast difference in the amount of available space after a partition and format between the different filesystems.

--
Conor "You're not married,you haven't got a girlfriend and you've never seen Star Trek? Good Lord!" - Patrick Stewart
1. Re:Need to be careful... by Raphael · 2006-01-06 06:19 · Score: 4, Insightful
  
  One very interesting point was the vast difference in the amount of available space after a partition and format between the different filesystems.
  
  Unfortunately, that graph is rather misleading. The ext2 and ext3 filesystems keep some percentage of the disk space as "reserved" and only root can write to this reserved area. This is useful if the disk contains /var or other directories containing log files, mail queues and other stuff. Even if a normal user has filled the disk to 100%, it is still possible for some processes owned by root to store some files until an administrator can fix the problem. On the other hand, if your filesystem contains only /home or other directories in which users are not competing for disk space with processes owned by root, then it does not make much sense to have a lot of disk space reserved for root. That is why you should think about how the filesystem is going to be used when you create it, and set the amount of reserved space accordingly.
  
  The default behavior for both ext2 and ext3 is to reserve 5% of the disk space for root. You can see it in the section Creating the Filesystems from the article:
  4883860 blocks (5.00%) reserved for the super user
  You can change this behavior with the -m option, specifying the percentage of the disk space that is reserved. The article did not mention how the filesystem was supposed to be used if it had been used in production. However, I would guess that the option -m 0 or maybe -m 1 could have been used in this case. This would have provided a fair comparison and suddenly you would have seen all filesystems in the same range (close to 373GB available), except maybe for Reiser3.
  
  --
  -Raphaël
Hardware mismatch by lostlogic · 2006-01-06 05:52 · Score: 5, Interesting

It is widely known that Reiser filesystems are heavy on CPU usage 4 more than 3. These benchmarks seem to show a CPU bound IO situation as opposed to an IO bound IO situation. As an earlier comment pointed out, the hardware used in this test was a 500mhz CPU. My slowest computer is a 1000mhz system, which is usually IO limited, not CPU limited. I'd be interested to see these same benchmarks run on real hardware, or some more complex benchmarks (random RW, DB load, etc.). The hardware used for this test would be suitable for a fileserver, but not much else. In that situation, E2, E3 or XFS are probably the right choices as it points out. What about desktop loads, enterprise loads, or something more interesting?

--
--Brandon
1. Re:Hardware mismatch by Hextreme · 2006-01-06 06:00 · Score: 3, Informative
  
  This was definitely an issue in testing here. The wide range of "winning" filesystems for the different tests clearly indicates the bottleneck is somewhere other than the disk. In most modern systems, this isn't an issue.
  
  From TFA: ReiserFS takes a VERY long time to mount the filesystem. I included this test because I found it actually takes minutes to hours mounting a ReiserFS filesystem on a large RAID volume.
  
  Looks like this guy makes a habit out of using systems with 500MHz CPUs... my dual 3GHz xeon box mounts a 1.2TB raid5 array formatted with ReiserFS in about 33 seconds, give or take a couple seconds.
Here's what's missing by CastrTroy · 2006-01-06 05:54 · Score: 5, Interesting

Here's what's missing. They forgot to tell you how well the drive performed after being used for 1 year, and having constantly moved data from one place to another, and constantly deleting and creating new data. It would have been a better test if the drive was about 75% full, with data from 2 years of use, and then the same tests were performed.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
how to lie with statistics by Clover_Kicker · 2006-01-06 06:00 · Score: 4, Insightful

I love the CPU utilization graph for "touch 10,000 files".

A quick glance shows ReiserV4 as much more CPU intensive, you have to look at the scale to realize it only used 0.3% more CPU.
somewhat worthless by aachrisg · 2006-01-06 06:06 · Score: 5, Insightful

His benchmark data is ruined by using a gross unrealtistic piece of hardware - modern fast hard disks coupled with a cpu which is absurdly slower than anything you can buy.
It would be nice if... by bhirsch · 2006-01-06 06:15 · Score: 4, Insightful

There were some current (recent 2.6 kernel with XFS, JFS, possibly Reiser4, etc) benchmarks done on highend servers (or at least something with drives a few steps up from the CompUSA weekly special), especially if anyone wants to see Linux succeed in the enterprise.
Normalized results by dtfinch · 2006-01-06 06:15 · Score: 3, Informative

Based on the geometric mean of all the benchmark times for each filesystem, which effectively weights all benchmarks equally:
JFS won
EXT2 and EXT3 took 17% longer than JFS
XFS took 29% longer than JFS
Reiser3 took 38% longer than JFS
Reiser4 took 52% longer than JFS

Now, 1.52 seconds is not a whole lot longer to wait than 1 second. With any luck we'll see a post from Hans explaining why Reiser4 took longer, or what sacrifices were made to make the others faster, if there are any.
1. Re:Normalized results by phoenix.bam! · 2006-01-06 06:44 · Score: 5, Insightful
  
  Reiser uses much more CPU for file system tasks. ReiserFS is a modern filesystem meant to run on modern machines. This machine is only 500mhz and therefore Reiser performs poorly. Had this machine been a 2ghz (standard now, 4x faster than the test machine), or even a 1ghz (Outdated and 2x as fast) machine Resier would have performed much better.
  
  If you want to use parts from 1997 to build a computer, Reiser is not for you. 500mhz is at least 8 year old technology if I remember correctly.
2. Re:Normalized results by Westley · 2006-01-06 07:27 · Score: 3, Insightful
  
  It's one thing to say "Let's use more CPU because we can."
  
  It's another to say "Let's use more CPU (which is usually relatively idle) in order to improve the normal bottleneck, which is IO."
  
  I don't see what's wrong with that at all. Of course, it's no good if you've got a machine which doesn't represent the "normal" current situation, any more than using a graphics card for "acceleration" makes sense if the graphics card in question is 10 years old but you're using a fast new CPU.
  
  Jon
Re:I would agree by lawpoop · 2006-01-06 06:16 · Score: 4, Interesting

I'm no expert by any means, but I think the idea behind the ReiserFS is breaking down the FS paradigm from the file level to the line level.

There is the classic example from the Reiser website. If your password file gets hacked, you have to ditch the whole file if you're using traditional file systems. You only know whether or not the file's been changed. However, with the Reiser system, it can tell you *what line*, and thus which user/password, was changed.

That's just a taste of where you can go with the ReiserFS. There are other things coming down the pipe; check out the reiser website for a better idea of the new features that ReiserFS promises.

--
Computers are useless. They can only give you answers.
-- Pablo Picasso
I think trying on a P2 266 is a bad idea by H4x0r+Jim+Duggan · 2006-01-06 06:18 · Score: 5, Interesting

Reiser is not designed for slow CPUs. AFAIK, a key part of the design was the Hans Reiser realised that CPUs were vastly underused. IO resources were maxed out and CPUs were sitting idle. So he found ways to use the CPU to make more efficient use of the IO resources. So this benchmark on a 500Mhz machine will of course show Reiser in a bad light, and moving lower down to a 266Mhz will make it even worse.

For a decent benchmark of how filesystems work on modern hardware: use modern hardware.

--
Please help publicise swpat.org - the software patents wiki
1. Re:I think trying on a P2 266 is a bad idea by captain_craptacular · 2006-01-06 08:12 · Score: 3, Insightful
  
  So this benchmark on a 500Mhz machine will of course show Reiser in a bad light, and moving lower down to a 266Mhz will make it even worse.
  
  If you look at the charts, the "editing" doesn't help either. For example one cpu usage chart showed a range starting @ 92% and ending @ 94%. The Rieser4 bar was 3x as long as the next bar, but guess what, it was using something like .7% (ie 93.7% as opposed to 93%) more CPU. If the scale hadn't been jacked up you wouldn't have been able to spot the difference at all, but they way they chose to present the data, it looked like a total smackdown.
  
  --
  They who would give up an essential liberty for temporary security, deserve neither liberty nor security
IDE Drives Cause other Overheads by j0ebaker · 2006-01-06 06:20 · Score: 4, Insightful

It would be interesting to see the results of the same tests running against a SCSI drive system where there is less IO overhead to see if the results differ.
There are other considerations here as well. What about the I/O elevator's tuning options.
Yes, I'd much rather see this test occur against a SCSI drive or better yet against a RAM drive for pure software performance.

Cheers fellow slashdoters!
-Joe Baker
Re:I would agree by Anonymous Coward · 2006-01-06 06:26 · Score: 5, Insightful

Ext2/Ext3: Mediocre at almost everything. Distros like Fedora that mandate the initial install ONLY use Ext3 are being stupid. The best fall-back filing systems if you can't find anything better for what you want the partition to do, but should never be used in specialized contexts.

Huh? Sorry, did you read the same graphs or are you just trolling?

This article shows that ext2 and ext3 are close to the top performer in most tests and do not have many "worst-case scenarios" (unlike, e.g. Reiser3 and Reiser4).

If there is anything that you can conclude after reading this study, it is that ext3 is a reasonably good default choice for a filesystem.
Outdated hardware... by tetabiate · 2006-01-06 06:39 · Score: 3, Informative

Anyway, how is the average user supposed to be concerned by these results?
In my daily work I manage hundreds of GB's of data and have hardly seen a significative difference between XFS, JFS and ReiserFS v.3 on relatively modern hardware (Tyan S2882 Pro motherboard, two Opteron 244 processors, 4 GB RAM and two 250-GB SATA HD's) running OpenSuSE 10. I put the most important data on a XFS partition but also have a small ReiserFS partition which can be read from Windows.

-- Help us to save our cousins the great apes, do not use cell phones.
Nice stats... but wrong... by strredwolf · 2006-01-06 06:56 · Score: 3, Interesting
You know, I was looking at all these stats from this roundup... and while I'm glad they have one nice stat (how much the FS itself takes, the rest for space), I'm not happy that there is no "We've loaded it up, lets see how much is left" statistic.

What am I saying? I want to know how efficent these filesystems are in packing the data on the HD.
- I know Reiser v3 has "tail packing" to take small files and ends of files that stick out past a block boundary, and packing them inside "sub-blocks" to save space. ext2/3 is stuck at the block boundary (even though you can adjust the size of these blocks)
- I don't know if ext2/3 has been enhanced to pack small files in inode data.
- JFS and XFS does not have a tail-packing feature, and is too stuck at (adjustable) block boundaries.
I'm glad that you get more data out of Reiser v4, JFS, and XFS at formatting time, but my feeling is that Reiser v4 (once profiled, tweaked and refined for speed and space) will pack data tighter than anyone else. Meanwhile, I'm looking for something like ext3 that packs better.
--

--
# Canmephians for a better Linux Kernel
$Stalag99{"URL"}="http://stalag99.net";
Re:Warning by drinkypoo · 2006-01-06 07:01 · Score: 3, Insightful

XFS does things that ext? and Reiser can't do. Reiser does things other FSes don't do as well. It's a true 64-bit filesystem and it supports insanely large filesystems, up to 9 million terabytes in 64 bit mode (with a 64 bit kernel.) It even provides realtime support, although I guess that's still beta in linux? It can be defragged and even dumped while live. It has insanely quick crash recovery. And of course, it does other stuff too; check the project page. XFS may not be the fastest filesystem - it may even be the slowest - but it's got features no other filesystem has. If you need them, XFS is the winner. Hell, if you just trust XFS more than you trust other filesystems, it's the winner. (Sorry, but I wasn't sleeping when reiser was eating everyone's data, and ext3 handles corruption much more poorly than any of the other Journaled options.)

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Interesting? How about a DECENT one? by diegocgteleline.es · 2006-01-06 07:01 · Score: 5, Interesting

I'm *sick* of reading filesystem benchmarks of people who doesn't even care about even reading the documentation of the filesystems they compare

OK, so ext3 is not the fastest filesystem on earth. But it has some default options which makes it suck even more than it usually do, and those options are *documented* in Documentation/filesystem/ext3.txt

* Ext3 does a sync() every 5 seconds. This is because ext3 developers are paranoid about your data and prefers to care about your data than win on benchmarks. Syncing every 5 seconds ensures you don't lose more than 5 seconds of work but it hurts on benchmarks. Other filesystems don't do it, if you are doing a FAIR comparison override the default with the "commit" mount option

* ext3's default journaling mode is slower than those from XFS, JFS or reiserfs, because it's safer. When ext3 is going to write some metadata to the journal, it takes care of writting to the disk the data associated to that metadata. XFS and JFS journaling modes do *not* care about this, neither they should, journaling was designed to keep filesystem integrity intact, not data, ext3 does it as an "extra", and it's slower because of that. But if you want to do a fair comparison, you should use the "data=writeback" mount option, which makes ext3 behave like xfs and jfs WRT to journaling. Reiserfs default journaling mode is like XFS/JFS, but you can make it behave like the ext3 default option with "data=ordered"

ext3 is not going to beat the other by using those mount options, but it won't suck so much, and the comparison will be more fair. And remember: ext3 tradeoffs data integrity for speed. There's nothing wrong with XFS and JFS, but _I_ use ext3.
benchmarks that take less than 1/10 of a second by hansreiser · 2006-01-06 07:40 · Score: 4, Insightful

If someone does not know that filesystem benchmarks that take less than a tenth of a second are meaningless, it makes you wonder if they made errors in other aspects as well. These results are not consistent with the results that we have had. I bet he did not make an effort to ensure that you had to read the disk for these benchmarks, that he did not copy his file set from the same fs as he was measuring (makes a HUGE difference to performance and it is the mistake every beginner makes), etc. You'll note that the way he makes his graphs makes 1% differences look huge, etc.
Re:Very interesting article... NOT! by hackstraw · 2006-01-06 08:06 · Score: 5, Insightful

I would rather see these benchmarks on a computer less than 5 years old. I would also appreciate an open source version of the tests so they could be reproduced. For ease of reading, I think the article should be on a separate page on the site as well.

I've got a screaming Dell 1.6 GHz P4 to test with and here are my results for a couple of tests it only has ext3 and a whatever cheap harddrive came with the box. I'm not sure if dma is enabled or if I've done any hdparam tunings, but I'm not sure of their test system either:

my touch 10,000 files: 24.314 seconds theirs 48.25

I used a shell script that called /usr/bin/touch

Now if I use a Perl open() call, I get 8.887 seconds
Now with a cheesy C that uses fopen() and fclose() I get 4.639 seconds

my make 10,000 directories: 56.832 seconds theirs 49.87

that is a shell script

If I user perl, I get 35.171 seconds

The /dev/zero stuff is completely bogus. No indication of the blocksize that was used.

The copy kernel stuff to and from a different slower disk with an unknown filesystem on it is useless.

The split tests are not indicative of anything in real life, and they took on order of between 60 seconds and 130 seconds to perform on their 500MHz system with most being in the 130 second range. I got 16.547 seconds.

I do not see how any relevant information can be obtained from this article. I'm disappointed in the Linux Gazette and Slashdot for printing this information.