Slashdot Mirror


Linux File System Shootout

IpSo_ writes "Finally an extensive, human readable Linux file system benchmark has been unleashed upon us. Originally posted on the Linux Kernel mailing list, using two of the most popular benchmarking tools available, it compares all the major file systems, including their different mount options. The results are surprising."

41 of 437 comments (clear)

  1. /.ed already? by tqft · · Score: 1, Informative

    Can someone with it open mirror it please - will not open. All I can get is a 685b file

    --
    The Singularity is closer than you think
    Quant
  2. Re:The only one that matters by maharg · · Score: 1, Informative

    Samba is not a file system, and as such is not in the benchmarks. RTFA and see http://samba.org

    --

    $ strings FTP.EXE | grep Copyright
    @(#) Copyright (c) 1983 The Regents of the University of California.
  3. Short summary by mst76 · · Score: 5, Informative
    iozone benchmark
    best: jfs
    worst: ext3_journal

    bonnie++ benchmark
    best: ext2
    worst: reiser4/reiser4_extents, ext3_ordered/ext3_journal

    1. Re:Short summary by tanveer1979 · · Score: 1, Informative
      Note fellas, though Bonnnie says reiser4 series is worst, reiserfs is one of the good ones, hanging on above avg in all counts!

      Me bit confused here, arent file systems which come new(like ext3 etc) supposed to be better than the older ones!?

      --
      My Aurora : http://www.youtube.com/watch?v=o91ZsGwJYyg
      FB : https://www.facebook.com/TanveersPhotography
    2. Re:Short summary by Anonymous Coward · · Score: 2, Informative

      Reiser4 is now better in some areas, mainly speed. It is not fairing well because it is still unfinished. The Reiser4 team are now focusing on two things before releasing a more final version: increased stability and reduced processer usage (which is what currently kills Reiser4 in benchmarks).

      From: Comment 7167683

      We allocate a "jnode" per unformatted node in the filesystem. The traversing of these jnodes consumes more CPU than performing the memcpy from user space to kernel space when doing large writes. I don't yet really understand on an intuitive level why this is so, which is a reflection on my ignorance as it is consistent with stories I have heard from other implementors of filesystems who found that eliminating per page structures was an important part of optimizing large writes. We will fix this by creating a new structure called an extent-node that will exist on a per extent basis, and this will probably cure the problem. This will greatly simplify parts of our code for reasons I won't go into, and it will also take us 6 weeks to do it. I don't think users should wait for it, and so we will ship without it.
      . . .

      Our dbench performance was poor, has improved due to coding changes, and we need to test and analyze again. Perhaps more fixes will be needed, we can't say yet.

      * Our fsync performance is poor. We will pay attention to this next year, frankly, after we have fully implemented the transactions API. At that point we will say something like, if you care about fsync performance you should be using the transactions API and/or sponsoring us to tune for NVRAM, users will say back "but our legacy apps on hardware without NVRAM matter!", and we will grudgingly but effectively tune for this because we care about real users too.;-)

      Benchmarks can be found at www.namesys.com/benchmarks.html

    3. Re:Short summary by JanneM · · Score: 4, Informative

      Actually, at least in our case we thread the app, with one thread handling disk IO and other threads handling other aspects (such as CPU intensive stuff), precisely to squeeze out a bit more performance and so disk accesses do not interfere with and stall other stuff. You get this as soon as you try to do something in soft realtime (such as video applications). On one hand, you want to stream video to/from a drive as quickly and efficiently as possible; on the other, you want to do some CPU-intensive operations (filtering, resizing) on the video stream at the same time.

      I'm not saying that trading CPU for filesystem speed is a bad idea; it isn't. What I'm saying is that it's not a simple "more is better" function, and that the cutoff for when it no longer makes sense does depend a lot on the application you intend it for. Again, to take an extreme, you would not want to have a system where the filesystem eats so much CPU the rest of the system essentially blocks, starved for CPU time, when the disk is used.

      To take an even more extreme way of doing the tradeoff: you could compress and uncompress all data on the fly. That way you would increase transfer speed (and increase it quite a bit in the case of text files and similar) as well as decrease disk usage. It is not often done, though, because the tradeoff is not worth it in general.

      For us, and our app, Reiser is on the wrong side of that cutoff point (and Reiser4 is not even on the horizon yet).

      --
      Trust the Computer. The Computer is your friend.
  4. Re:Huh? by matticus · · Score: 5, Informative
    Well, here's IBM's page about it.


    From what I've seen poking around USEnet, JFS seems to have the too little, too late problem. I've never seen it pwn a benchmark like it did today though.
    I'm a little confused-I have been told XFS is the best designed, highest performing file system, and I would hate to think SGI is getting into a lot of this crap with SCO for a relatively slow journaling file system...

  5. histogram, please! by glenkim · · Score: 2, Informative

    Hey, if somebody could organize this data into histograms, it'd be a lot easier to interpret the results..

  6. Re:Huh? by Frodo420024 · · Score: 5, Informative
    I'm a little confused-I have been told XFS is the best designed, highest performing file system, and I would hate to think SGI is getting into a lot of this crap with SCO for a relatively slow journaling file system...

    IIRC, XFS is more about guaranteed performance under various stressful conditions than about getting the absolute peak speed in calm conditions.

    --
    I'm in a Unix state of mind.
  7. Re:Huh? by Anonymous Coward · · Score: 2, Informative

    Time to rethink things. The generally accepted opinions between the two are that JFS is faster for small files and XFS has a bit of an edge with larger files. Both perform very very well.

    I don't know how JFS falls in the "too little to late" catagory, both file systems have been available for a long time on Linux, however very few Linux distributions embrase them during installs so they have gone unknown to a great deal of the non-storage geeks out there. Mandrake, much to their credit, has for a long time included these file systems as an install option for your root filesystem, which has always made me appreicate what Mandrakes doing.

    I will say that even though JFS is probly the best high-performance choise for user workstations due to smaller file sizes on average, I prefer XFS because it has a much more robust toolset and you can get alittle more hands on with the filesystem thanks to the tools provided. But most users just don't care that much for low level tools so give JFS a twirl.

  8. Reliability? by hofer · · Score: 2, Informative

    We did a lot of testing with various file systems for a product earlier this year. After a couple of terabytes of intensive reads/writes (and a couple of days...) the JFS kernel processes randomly locked up and blocked all disk I/O operations (1.1.0 and 1.1.1 versions). JFS was indeed the fastest of the file systems we tested, but we had to drop it for being unreliable.

    I wonder if anyone has some experience with the reliability of the current version?

    --
    Score:1, Unread
  9. Summary by samj · · Score: 4, Informative

    Use XFS unless you want to do lots of deletes (as they are slow and expensive) in which case ext2 is probably a better bet since the files are probably temporary (Squid caches for example).

    1. Re:Summary by samj · · Score: 2, Informative

      And if you want to use 2.4 kernels without compiling your own then you probably want to consider the 'all rounder', JFS, as 1.1.0 (or thereabouts) has been included since 2.4.20. I have a feeling XFS modifies things which weren't to be touched until 2.6.0 so you'll need a custom kernel for it. While some vendors ship 2.4 kernels with XFS support, I only really care about debian and it only ships the patch.

  10. "linux reiserfs" by bani · · Score: 5, Informative

    type "linux reiserfs" when booting the installer, and you will have access to reiserfs during redhat install.

    i've been using this method for ~2 years now.

    1. Re:"linux reiserfs" by Mark+Wilkinson · · Score: 2, Informative

      Although it can leave you with a system that the Red Hat installer won't upgrade. If your root partition is actually an LVM logical volume or a RAID device, and it's formatted with reiserfs, the installer won't find your existing system and won't offer the upgrade option.

  11. Re:Other's don't do journaling? by altamira · · Score: 2, Informative

    There is a difference between journaling DATA and METADATA. Don't get confused by that...

  12. Re:Throughput benchmarks only... by zurab · · Score: 4, Informative

    Have a look at Hans' benchmarks at namesys.com. Although he only compares Reiser4 to ext3, and may not be an objective party. But I'm surprised how well JFS performed anyway and that Reiser4 is unusually CPU-intensive.

  13. Re:Results question by NickFortune · · Score: 4, Informative
    Is it fair to thow in a Non Journaling FS in a benchmark against a bunch of Journaling ones?
    Yes. Of course. The ext2 numbers provide a baseline for the comparison comparison. Any journaled FS that could match it would have to be very good indeed. This isn't explicity stated anywhere - but this was posted to the kernel list. They can reasonably be expected to know the difference between ext2 and the rest. It's all data. Data is good.

    I know we're used to seeing "benchmarks" used as corporate propaganda, but let's not forget what they're supposed to be used for

    --
    Don't let THEM immanentize the Eschaton!
  14. Ext3 by rf0 · · Score: 3, Informative

    Well ext3 might suck but when you've only got a resuce system that can read ext2 it can really save your neck. I would be intrested to see what is best in terms of stability though..

    Rus

    1. Re:Ext3 by srussell · · Score: 2, Informative
      Augh. I can't stand it any more.

      I had ext3 on my wife's laptop for a while, and it failed twice. By "fail", I mean that, due to Linux crashes, the filesystem had errors that had to be recovered by hand. By "fail", I mean actual, significant data loss.

      When I got my new laptop (from QliLinuxPC), they formatted the HD into one big partition (well, one for /boot and one for /), and formatted those as ext3. I didn't switch to ReiserFS, because QliLinuxPC said they'd had good luck with ext3. In the past year, I've had three seperate filesystem corruptions of / on that machine.

      On my older laptop (now functioning as a print server), I have had ReiserFS for the past three years, and ReiserFS again on a desktop system for the past 5. I've only had one problem with ReiserFS, and that was three or four years ago, and I don't remember what it was -- although I remember it being a real pain to recover, and I think it involved LVM.

      Considering I've tried it on two systems with entirely different hardware components, I'm faulting ext3. My conclusion is that ext3 sucks, but my opinion is based on the fact that, for a journelled filesystem, ext3 seems to be terrible at surviving sudden power failures, and has given me as many, if not more, filesystem failures than ext2 ever did.

    2. Re:Ext3 by Anime_Fan · · Score: 2, Informative

      While I have experienced most of what you say, I've also had my share of Reiserfs corrupting filesystem. Granted, it was not root partitions, but I had two separate IBM Deathstars (120GB) that both failed within a week (during normal usage, no power failure). I couldn't access certain directories, and a rebuild-tree saved some more files.

      That said, I've not had ANY problems with Reiserfs on good hardware (Maxtor 160GB 8MB Cache/Western Digital 120GB 8Mb cache).

      I would have used XFS if it wasn't for the fact that the kernel wouldn't mount the damn things ^^. But I'm now stuck with 2x 120GB ext3 partitions.

  15. Irrelevant numbers by krorvik · · Score: 2, Informative

    These benchmarks were performed on relatively old hardware, with a slow cpu and a disk only running UDMA2. And, as others have already pointed out, the data are statistically not really reliable.

    Myself, I'd be much more interested in seeing numbers made on a setup more like my own.

    Static benchmarks are never good for deciding "which is best".

  16. Re:human readable ? by Tet · · Score: 2, Informative
    EXT3 "surprise, surprise" sucks rocks.

    Really? You must be looking at a different set of benchmarks to me, because as I see it, ext3 is running a close race with XFS to take second place behind JFS. Remember, ext3's journalled mode is journalling data as well, and hence it isn't fair to compare it to other filesystems directly as it's doing much more work (equally, ext2 comes out on top for a number of things because it's doing far less work). Others like reiserfs, XFS and JFS are journalling metadata only (c.f., ext3's ordered mode).

    I tend to run ext3 on all of my servers, because while it's not necessarily the absolute fastest, it's fast enough, and more importantly, it's rock solid in terms of stability. I wouldn't touch reiserfs with a 10 foot bargepole for any of my machines, mostly because I don't trust it (or Hans). Now it seems even the touted performance benefits aren't really there either. I've been considering JFS for a while, and have had a test JFS filesystem running for the last few months. Maybe I'll switch, but even if I don't, ext3 is more than adequate.

    --
    "The invisible and the non-existent look very much alike." -- Delos B. McKown
  17. Re:there is more to a filesystem that speed. by LNX+Flocki · · Score: 2, Informative

    Remember that this was a "perfomance test" not a "which FS is better" test. Benchmarks only show you which one is faster at specific tasks, it doesn't necessarily tell you which one's better.

  18. Worth Noting by MajroMax · · Score: 3, Informative
    It's worth noting here that the benchmarks were all run on files >= 1GB, if I'm reading the table correctly; this stresses the raw IO of the system, and doesn't really take into account the differences in tree-structure between the filesystems.

    As for complaints about Reiser's performance -- last I heard, it was more optimized for many small files -- precisely the domain that this thing didn't test.

    --
    "Evil company X is threatening to restrict our rights! Let's all get together to stop--OOOH! SHINEY!!!" -- AC
  19. Re:Can't wait for Novell Storage System on Linux by Anonymous Coward · · Score: 3, Informative

    You forgot one thing. As an enterprise filesystem Novell was absolutely bulletproof long before RAID systems were in vogue. It took me awhile to even figure out why we needed one after years of running Novell as our main storage controller (flawlessly).

    Novell could give *nix systems windows like (don't bash if you don't know) fine granularity over user access at the enterprise level along with true enterprise scaling. Again, if you have never worked in a cross enterprise environment, don't start bashing because you really can't appreciate some of Novells strengths until you need the features.

  20. Re:human readable ? by Psiren · · Score: 3, Informative

    Since there was no legend explaining what the colors meant, I couldn't figure out anything from looking at them. Is the high number good? As in did the most work? Or is the high number bad? As in took the longest amount of time to do something?

    Depends on the column. For K/sec, higher is better, so red cell shows lowest, and green shows highest. For %CPU, lower is better, so green shows lowest and red shows highest. It's not that complicated really if you take a few minutes to look at it. What you get from the data depends on what you were looking for in the first place.

  21. bonnie++ comparison between reiser4 and ext3 by nikitad · · Score: 2, Informative

    I should probably add that I am getting quite different bonnie++ results for reiser4 vs. ext3.

    They are available at reiser4 benchmarking page along with
    hardware specifications.

    http://www.namesys.com/benchmarks.html#bonnie++.20 03.09.30

  22. Re:Cheaters! by kasperd · · Score: 2, Informative

    Like I said, who cares about NTFS or FAT performance?

    FAT performs well as long as you just do sequential access to large files. But don't access too many files at the same time, because there are only eight entries in the fat_cache. If you run over this limit or do random access FAT is going to be the worlds slowest filesystem. And that is so bad it has really caused me trouble some times.

    The reality is minix is faster, cleaner, simpler, and more flexible than FAT. Just take a look on the source minix 47KB, fat 131KB. And keep in mind that minix use a a fast tree structure to locate blocks while FAT use an extremely slow linked list.

    --

    Do you care about the security of your wireless mouse?
  23. Re:DeFacto Standard by warpSpeed · · Score: 2, Informative
    Um, why would you want to put squid on a journaled file system?

    For the same reason you would want to have email, or a file server, on a journaled system, recovery speed.

    I have some clients with servers (that run squid) and when they take a power hit, long enough to drain thier UPS, the last thing I want to have to do is deal with a call saying "how come the server did not come back up..." Meanwhile the fsck is still running and they are hitting the power switch trying to "reboot" the problem away before the ext2 fsck can finish checking though the cache partition...

    Thats just on reason.

  24. Re:Sort of on topic... by Jameth · · Score: 2, Informative

    If you can afford a little extra partitions, try reiserFS. Although it is not natively a part of windows, there are tools which let you use it. This means you cannot install windows on it, but you could get read-write access with it.

    Back when I still dual-booted, I had this layout:

    5 gig NTFS WindowsXP partition
    5 gig XFS Slackware partition
    1 gig swap parition
    45 gig reiserFS shared storage partition

    This also made me feel a lot safer in using the systems: Neither ever mounted the other system's Root directory, so all that was actually shared was what I used as my home directory. No matter what I did on one system, even if data corruption happened, it would never be so bad that I couldn't boot it. (Note: I never had data corruption, it was just a mental comfort issue)

  25. Re:Sort of on topic... by angle_mark · · Score: 4, Informative

    There are some free and some commercial products which can offer full read/write + journalling access for ext3 partitions from Windows. I'd definitely recommend you pick ext3 over fat32.

    Some examples..

    Free: Explore2fs allows you to read ext2 and ext3. Limited write support is available.

    Commercial: Ext2FS Anywhere don't let the name put you off as it has full read/write support for ext2, ext3 and I think reiserFS is supported now too.

  26. Some suggestions for future tests by cluge · · Score: 2, Informative

    These numbers are great, but only tell us a little about reliability or "real world" performance. When I did testing on these file system I used all the benchmarks here, plus a benchmark called postmark. This benchmark utility was released into the public domain by Net App and has to be one of the better "real world" benchmark suites.

    The problem that we had with JFS during testing is that we had kernel panic with very large files. Thus we chose XFS - which has done an excellent job. I'm sure glad that the XFS file system has been merged into the 2.6 kernel, no more patching the 2.4's!

    For more benchmarks on other file systems using postmark check out This

    --
    "Science is about ego as much as it is about discovery and truth " - I said it, so sue me.
  27. this benchmark was performed using a 200Mhz CPU by hansreiser · · Score: 4, Informative

    which makes the whole thing pretty questionable in my view, especially when you consider that Nikita got completely different results on his more modern hardware (see www.namesys.com/benchmarks.html)

    I don't really target 200Mhz CPUs in my performance tuning....;-)

    Hans

    1. Re:this benchmark was performed using a 200Mhz CPU by Deagol · · Score: 4, Informative
      How true a difference the hardware makes.

      I took an old PII-350 w/ 128MB RAM and benchmarked ext2, ext3, jfs, reiserfs, and xfs on an old 5GB IDE drive. ext2 was the winner by a margin (raw throughput).

      Now I'm beating up various hardware and software RAID configs on a dual Athlon MP 2200+ system w/ 2GB RAM and dual 3ware 8-port 7500 controllers w/ 180GB WD drives. JFS rises above the rest in terms of throughput (I didn't test XFS on this new machine), and, of course, reiserfs simply spanks everything in terms of file creation/deletions. The thing I noticed was the JFS had much lower CPU utilization for file creations/deletions and was twice as fast at it than the ext2/3 filesystems (it still got spanked by reiserfs, though).

      If anyone's interested, the "best" overall was reiser w/ the mount options noatime,notail,nodiratimeall. Also, if anyone cares, on this machine, the Linux software RAID code at no less than twice the performance numbers over the 3Ware hardware RAID. Running RH9 with all RH updates applied.

  28. Re:human readable ? by TheCrazyFinn · · Score: 3, Informative

    Never heard of Read-only filesystems?

    Mount static filesystems read-only, and make them EXT2 for performance. Use a journalling FS for dynamic filesystems. Reap the benefits of both.

    --
    "You've got an invalid haircut" -Warren Zevon - Life'll Kill Ya
  29. Re:DeFacto Standard by hackstraw · · Score: 2, Informative

    Um, why would you want to put squid on a journaled file system?

    If you're looking to restart quickly after a power failure you can always set a partition to ignore file system checks at startup, "0 0" options in /etc/fstab. /var/spool/squid (or whatever) is on its own partition right? Perhaps on it's own disk?

    You have never waited over an hour to fsck 3 harddisks while over 100 people have no "internet".

  30. Different results when using RAID devices by Anonymous Coward · · Score: 1, Informative

    I did some filesystem benchmarks myself and found that JFS performed well at raid levels 1 and 10, but XFS totally dominated on RAID5. At least when using dt as a benchmark program. I also ran IOzone, but do not have the results in a form that I can easily compare them.

  31. Re:DeFacto Standard by drinkypoo · · Score: 3, Informative
    he installed a hard drive, formatted it to Reiser, and moved the proxy cache to the reiser disk. I couldn't belive it. Just changing the filesystem caused an increase that was noticable across our network. At no cost!

    He installed a hard drive. He didn't just format to reiser. The hard drive costs money.

    If the proxy cache was formerly on a disk that was also doing other things, it would have sped up no matter what filesystem he used.

    You will have to give us more information if you want your claims to have any merit.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  32. Re:The only one that matters by Newander · · Score: 2, Informative

    If you read carefully, you'll notice that the name of the filesystem is SMB. Samba is software that interfaces with the SMB filesystem. Of course, SMB isn't really a filesystem either. When you want to share something you don't have to create and format a partition as SMB.

    --

    Jesus saves and takes half damage.

  33. Re:these are narrow tests, not comprehensive tests by hansreiser · · Score: 2, Informative

    Think of different benchmarks as being like x-ray vs. infrared photographs. Each of them gives a different insight into the subject of the photo.

    In this case, I think that this 200Mhz CPU benchmark is not highly worth optimizing for, but generally more views of a design are interesting.

    One of the things reiser4 needs to do is not have a structure per unformatted node for large files, and you can see the need for that if you look at our CPU consumption when writing a large file using dd. We'll probably adjust that aspect of the design sometime in the next two months, and have a structure per extent instead of per unformatted node. If I hadn't been running benchmarks, I never would have understood that misdesign decision of mine so clearly. The nice thing is that the code will get simpler as a result of the change.