EXT4, Btrfs, NILFS2 Performance Compared
An anonymous reader writes "Phoronix has published Linux filesystem benchmarks comparing XFS, EXT3, EXT4, Btrfs and NILFS2 filesystems. This is the first time that the new EXT4 and Btrfs and NILFS2 filesystems have been directly compared when it comes to their disk performance though the results may surprise. For the most part, EXT4 came out on top."
you folks are killing me
The version of Btrfs that they used was before their performance optimizations - 0.18. But they now have 0.19 which is supposedly a lot faster and will be in the next kernel release. There's about 5 months of development work between them:
# v0.19 Released (June 2009) For 2.6.31-rc
# v0.18 Released (Jan 2009) For 2.6.29-rc2
NILFS2 and Btrfs are both TRIM file systems optimized for SSD media. Comparing them to other file systems on a SATA drive is borderline stupidity, because you would never use them on a SATA drive. Any more than comparing NILFS2 or Btrfs to eXT3 on a SSD would be.
It's like comparing the performance of motor oil and sewing machine oil to lubricate an engine or a sewing machine. They're not the same thing just because they are both "oil".
Kinda disappointed the article didn't discuss JFS. After running into the fragility of XFS, I tried it out, and it's highly robust, fast, and easy on the CPU.
All of the file systems are designed for specific tasks/circumstances. I'm too lazy to dig up what's special about each, but they are most useful in specific niches. Not that you _can't_ generalize, but calling ext4 the best of the bunch misses the whole point of the other file systems.
http://www.maxineudall.com/2010/02/should-economists-be-sued-for-malpractice.html
The first benchmark on page 2 is 'Parallel BZIP2 Compression'. They are testing the speed of running bzip2, a CPU-intensive program, and drawing conclusions about the filesystem? Sure, there will be some time taken to read and write the large file from disk, but it is dwarfed by the computation time. They then say which filesystems are fastest, but 'these margins were small'. Well, not really surprising. Are the results statistically significant or was it just luck? (They mention running the tests several times, but don't give variance etc.)
All benchmarks are flawed, but I think these really could be improved. Surely a good filesystem benchmark is one that exercises the filesystem and the disk, but little else - unless you believe in the possibility of some magic side-effect whereby the processor is slowed down because you're using a different filesystem. (It's just about possible, e.g. if the filesystem gobbles lots of memory and causes your machine to thrash, but in the real world it's a waste of time running these things.)
-- Ed Avis ed@membled.com
Btrfs includes support for TRIM on SSD, but that's a secondary addition. The main purpose of Btrfs is to compete against Sun's ZFS in the area of robust fault tolerance. If you look at the original announcement, you can see SSD support wasn't on the radar at all; that's strictly been an afterthought in the design. Btrfs is absolutely designed to work on SATA drives and to compete head to head against ext3/ext4.
...Does it run Linux?
When these filesystems actually have matured enough to NOT have at least dozen bugfix changesets in each revision of kernel Changelog. Even ext3fs has received few rather interesting corner-case fixes this year, so maybe ext4 will be reliable in 5 years or so.
I think AC makes a good point if you think about it. ... couldn't you have thought of a better name?
BUTTerfs guys
Your fans will be at a huge disadvantage in flamewars
You speak London? I speak London very best.
Yeah, I know I'm behind the times, but when did striping become stripping?
I am literally 3000 tokens away from the chaotic crossbow --Stephen
ext4 DOES or DOES NOT outperform the reiserfs?
what?
What are the default mount options?
Are the ubuntu default options sane*? I remember Linus ranting about stupid defaults for ext4, but couldn't find it anymore.
*sane being defined as: power outage doesn't leave you with a corrupt fs
NILFS2 (http://www.nilfs.org/en/) is actually a pretty interesting filesystem. It's a log-structured filesystem, meaning that it treats your disk as a big circular logging device.
Log structured filesystems were originally developed by the research community (e.g. see the paper on Sprite LFS here, which is the first example that I'm aware of: http://www.citeulike.org/user/Wombat/article/208320) to improve disk performance. The original assumption behind Sprite LFS was that you'll have lots of memory, so you'll be able to mostly service data reads from your cache rather than needing to go to disk; however, writes to files are still awkward as you typically need to seek around to the right locations on the disk. Sprite LFS took the approach of buffering writes in memory for a time and then squirting a big batch of them onto the disk sequentially at once, in the form of a "log" - doing a big sequential write of all the changes onto the same part of the disk maximised the available write bandwidth. This approach implies that data was not being altered in place, so it was also necessary to write - also into the log - new copies of the inodes whose contents were altered. The new inode would point to the original blocks for unmodified areas of the file and include pointers to the new blocks for any parts of the file that got altered. You can find out the most recent state of a file by finding the inode for that file that has most recently been written to the log.
This design has a load of nice properties, such as:
* You get good write bandwidth, even when modifying small files, since you don't have to keep seeking the disk head to make in-place changes.
* The filesystem doesn't need a lengthy fsck to recover from crash (although it's not "journaled" like other filesystems, effectively the whole filesystem *is* one big journal and that gives you similar properties)
* Because you're not repeatedly modifying the same bit of disk it could potentially perform better and cause less wear on an appropriately-chosen flash device (don't know how much it helps on an SSD that's doing its own block remapping / wear levelling...). One of the existing flash filesystems for Linux (JFFS2, I *think*) is log structured.
In the case of NILFS2 they've exploited the fact that inodes are rewritten when their contents are modified to give you historical snapshots that should be essentially "free" as part of the filesystem's normal operation. They have the filesystem frequently make automatic checkpoints of the entire filesystem's state. These will normally be deleted after a time but you have the option of making any of them permanent. Obviously if you just keep logging all changes to a disk it'll get filled up, so there's typically a garbage collector daemon of some kind that "repacks" old data, deletes stuff that's no longer needed, frees disk space and potentially optimises file layout. This is necessary for long term operation of a log structured filesystem, though not necessary if running read-only.
Another modern log structured FS is DragonflyBSD's HAMMER (http://www.dragonflybsd.org/hammer/), which is being ported to Linux as a SoC project, I think (http://hammerfs-ftw.blogspot.com/)
I suspect their test methodology isn't very good, in particular the SQLite tests. SQLite performance is largely based on when commits happen as at that point fsync is called at least twice and sometimes more (the database, journals and containing directory need to be consistent). The disk has to rotate to the relevant point and write outstanding data to the platters before returning. This takes a considerable amount of time relative to normal disk writing which is cached and write behind. If you don't use the same partition for testing then the differing amount of sectors per physical track will affect performance. Similarly a drive that lies about data being on the platters will seem to be faster, but is not safe should there be a power failure or similar abrupt stop.
Someone did file a ticket at SQLite but from the comments in there you can see that what Phoronix did is not reproducible.
So what - when was still using Linux a working backup (incl. ACL, Xattib etc. pp) was the most important criteria and XFS came up on top. xfsdump / xfsrestore has save the day more then once.
Skip TFA - the conclusion is that these benchmarks are invalid.
At least they've improved since last time - they no longer benchmark filesystems using a Quake 3 timedemo.
Here's a trivial example from dmesg:
[ 0.682950] Linux agpgart interface v0.103
Ok, I've been wondering this for a long time. IBM contributed JFS to Linux years ago, but no one ever seems to give it a thought as to using it. I used it on my computer for awhile, and I can't say that I had any complaints (of course, one person's experience doesn't necessarily mean anything). When I looked into the technical features, it seemed to support lots of great things like journaling, Unicode filenames, large files, large volumes (although, granted, some of the newer filesystems *are* supporting larger files/volumes).
Don't get me wrong - some of the newer filesystems (ZFS, Btrfs, NILFS2) do have interesting features that aren't in JFS, and which are great reasons to use the newer systems, but still, it always seems like JFS is left out in the cold. Are there technical reasons people have found it lacking or something? Maybe it's just a case of, "it's a fine filesystem, but didn't really bring any compelling new features or performance gains to the table, so why bother"?
Personally I'm holding out for the initial release of the MILFS2 filesystem. XD
Are YOU using the TOOL, or is the TOOL using YOU? Think about it!
Talk about optimization or lack of it. Take a look at the SQL lite test. EXT3 is something like 80 times faster than EXT4 or BTRFS.
What heck is going on!!!. Postgress SQL does not seem to show this performance enhancement.
really this is an insanely different score, to the effect that if it's real no one in the right mind would run SQL on anything but EXT3.
Something must be wrong with this test.
Some drink at the fountain of knowledge. Others just gargle.
Comment removed based on user account deletion
It doesn't matter how fast it is, if it isn't correct! We as IT professionals should focus more on CORRECTNESS of the terabyes of data we store not how many IO/s as long as it does the job we need. Ensuring correctness should be job #1. Right now in production for me safe means ZFS. When Linux delivers a comparable stable tested filesystem I'll be all over it. Right now it still seems like the 1980's where 99% of people are obsessed over how FAST they can make things. I cringe every time I watch an admin start "tuning" a filesystem to make it faster by flipping off sync and other safety features.
does pussyfs have TRIM support?
Do you even lift?
These aren't the 'roids you're looking for.
Phoronix - conflation of "phoenix" and "moron". I.e., a moron that rises from the ashes, refusing to die.
Almost all of their tests involve working sets smaller than RAM (the installed RAM size is 4GB, but the working sets are 2GB). Are they testing the filesystems or the buffer cache? I don't see any indication that any of these filesystems are mounted with the "sync" flag.
At least according to some rough microbenchmarking I've done myself. My workload is to write raw CSV to disk as fast as possible. In testing, NILFS2 was nearly 20% faster than ext3 on a spinning disk.
It was also smoother. Under very heavy load ext3 seemingly batched up writes then flushed them all at once, causing my server process to drop from 99% to 70% utilisation. NILFS seemed to consume a roughly constant percentage of CPU the whole time, which is much more in line with what I want.
NILFS2 is not for everyone or for every purpose. But it suits my purpose. As usual, you should do the engineering thing: consider your needs, test the alternatives.
Classical Liberalism: All your base are belong to you.
As far as I can see from the comparison of these FSes, BTRFS is a promising file system for Linux and is under development. Some say that it will be the ZFS of Linux or even better. I think time will say.
Some other say, now that Oracle owns Sun, Oracle can change the license of ZFS from CDDL to GPL2 and port to Linux. But porting ZFS to Linux it's another story...
Until the skies turn blue...
Until the air of freedom strikes us...
I have conclusively proven that btrfs is actually a blatant repackaging of reiser4 in a cover up to avoid the political disaster of supporting the code of a convicted murderer. btrfs is 81.56% similar to reiser4. Here are the steps to reproduce. Please spread the word. http://pastebin.com/ff42272d http://pastebin.com/f27912488