Benchmarking XFS, ext2, ReiserFS, FAT32
blakestah writes: "Well, it looks like someone on the LKML has taken upon himself to do some benchmarking of ReiserFS, ext2, and XFS using the 2.4 kernel series. It is not a real benchmark test, but kind of interesting nonetheless. See the results (in Spanish) at this LUG in Mallorca. Simple runs of dd, tar, and rm are shown, and for most of the tests XFS is pretty dern fast, beating all the others. The exception is removal of a large source tree (the kernel source), for which XFS is the slowest by a fair amount. See this kernel post for the translations of important words. It will be nice to see more such open benchmarking posted, because benchmarks provide developers goals." The contrast between FAT32 and XFS is particularly interesting to see.
It is amazing how fast ReiserFS can remove a source tree to the Linux kernel. Almost as soon as you hit the "return" key, it is finished. Blows me away. It is so fast it could be dangerous to the careless. It works so fast that by the time you realize your mistake *everything* would be gone--in the blink of an eye.
My guess here is that FAT32 just marks the root source directory as deleted and then moves along. (This is how it works under DOS/Windoze at any rate.) If this were so, FAT32 is hardly fast here -- it takes 6.7 seconds to do just one write to the FAT table!
To be honest though, I'm not sure how the various Linux FSes work, so maybe they're doing it the same way, though I doubt one FS entry would take 10 - 20 seconds.
Benchmarks... bah. I don't think the ones linked to on that Spanish site are worthy of the name. For those that can't see it because it's /.ed, it makes out XFS to be quite a bit faster.
Some relevant comments from the LKML... [the last one showing that XFS is not always faster]
Alan Cox:
"reiserfs seems to handle large amounts of small files well, up to a point but it also seems to degrade over time. ext3 isnt generally available for 2.4 but is proving very solid on 2.2 and has good fsck tools. Ext3 does not add anything over ext2 in terms of large directories of files and other ext2 performance limits.
XFS is very fast most of the time (deleting a file is sooooo slow its like using old BSD systems). Im not familiar enough with its behaviour under Linux yet."
Andi Kleen:
"On one not very scientific test: unpacking and deleting a cache hot 40MB/230MB gzipped/unzipped tar on ext2 and xfs on a IDE drive on a lowend SMP box.
XFS (very recent 2.4.4 CVS, filesystem created with mkxfs defaults)
> time tar xzf ~ak/src.tgz
real 1m58.125s
user 0m16.410s
sys 0m44.350s
> time rm -rf src/
real 0m50.344s
user 0m0.190s
sys 0m13.950s
ext2 (on same kernel as above)
> time tar xzf ~ak/src.tgz
real 1m26.126s
user 0m16.100s
sys 0m36.080s
> time rm -rf src/
real 0m1.085s
user 0m0.160s
sys 0m0.930s
ext2 seems to be faster and the difference on deletion is dramatic, so at least here it looks like Alan's statement is true.
The test did not involve very large files, the biggest files in the
tar are a few hundred K with most of them being much smaller.
The values stay similar over multiple runs. I did not do any comparisons recently with reiserfs, but at least in the past reiserfs usually came out ahead of ext2 for similar tests (especially being much faster for deletion)"
I would imagine they didn't test against NTFS because the kernel drivers are very unstable for it. Sure, they could have installed Win2K and tested that way, but then OS differences would affect the results. A pure filesystem test should use the same kernel setup with only the fs drivers changing.
http://kernelnewbies.org/~phillips/
This guy Dave (I forget his last name now), from sgi gave a presentation to the DC-LUG back in 1999 and talked about XFS and how sgi wanted to release it as GPL to become a core component of Linux. He also talked about the history of XFS and how they had to invent a new size prefix to describe how large a filesystem XFS could accomodate ("exo-byte" = 1024 Gb). XFS has been used by sgi for their MIPS and Cray machines ever since 1984, and now that sgi has donated it to the Linux community, I think we'd be remiss if we didn't welcome it with open arms.
But that's just MHO. ;)
"It is quite surprising to see the write time be so slow for linux, as quite frankly, FAT32 is so simple (no transaction) it *should* be only slightly slower than optimal in medium to large file size cases."
I would imagine that the reason FAT32 isn't highly optimised on Linux is that it's only there as a compatibility tool. Nobody running Linux would put data on a FAT32 partition when performance was an issue. Being able to get at the filesystem at all is enough.
--
XFS and MOSIX -might- work, with sufficient application of brute-force. Neither work with 2.4.4-ac6, and I believe there's a lot of problems with FreeSWAN's IPSEC, too.
The only patches I've found to be relatively unstressful are the International Patch, the POSIX Timer patch, and IBM's Next Generation PThread code.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
The closest to a printable response was some sneer to the effect that the writers weren't interested in supporting Alan Cox's "private OS".
(Curiously, the patches tend to break with every release of the official kernel. Hmmm. Maybe if these coders got off their high horses long enough to see how code migrates between the official kernel and the AC series, they'd have an easier time keeping up.)
As of right now, though, you're going to have to do some of the merge by hand. And it's likely to prove difficult.
Look on the bright side, though! If you =DO= get a merged patch, it'll be YOUR patch that's used. Who wants second-best?
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Size of Linux-2.4.4 with XFS (as taken from SGI's CVS tree): 3,612,541 lines (same algorithm) and 145,367,040 bytes.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Seriously, XFS looks interesting. I did some stats on source size, and posted them to K5. :) Dropped in score so rapidly, it'll reach the Earth's core sometime in the next half hour.
Essentially, XFS is 10% of the entire kernel size, making it perhaps the single-most sophisticated driver available. On the flip-side, I can't help but feel that that much code is going to have -some- impact on the rest of the system.
Talking of benchmarking, how does IBM's "Next Generation PThread" code stand up? And how on Earth are you supposed to install it? It clashes with glibc, making an RPM install, ummm, of questionable safety. And once you start with RPM (or tarballs, or any other system), it's unwise to mix-and-match. Either you keep track of where things are, or the system does. Half-and-half is NOT good.
Lastly, anyone found a way to get XFS, JFS and AFS into the same kernel? (Without using a sledgehammer, preferbly.)
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
It's true. I ran bonnie++ on a test machine under ext2 and xfs. xfs had REALLY slow deletes, but fast throughput. XFS did 36 deletes per second, while ext2 did 802 deletes per second. As far as throughput, XFS did 538K/sec on block reads, while ext2 only got 436. Also, in general, xfs used less CPU resources. Also, ext2 kicked xfs's butt in all areas when there was only one bonnie++ process running. The benchmark data I've given was running three simultaneously on the filesystem. So, if the server only has one process, I'd NOT go with xfs. If the server has multiple processes, xfs probably beats most things, especially on block read performance.
Note, also, that my test machine was only a AMD K6-2 266 w/28M RAM
Engineering and the Ultimate
Well, "slashdot" is Barrapunto when translated in to Spanish, so the verb would be "barrapuntar". E.g. ellos barrapuntaron el pagina (They slashdotted the page).
- Sam
The secret to enjoying Slashdot is to realize that it should not be taken too seriously.
First benchmark:
Writing, reading, and deleting a fairly large file (256MB). The commands used are as follows (omitting the redundant 'time' command used in all cases):
- dd if=/dev/zero of=./prova bs=1M count=256
- dd if=./prova of=/dev/null bs=1M count=256
- rm -f
./prova
And the time these commands took (in seconds):Filesystem Write Read Delete .tar.gz file was first uncompressed, so that all the work was done on the tar file (which is larger then 100MB). The commands used on the uncompressed .tar file are as follows (with the time command ommitted again):
ReiserFS 18.5 23.41 0.4
Ext2FS 20.3 21.38 0.57
XFS 16.32 19.42 0.26
FAT32 43.65 27.98 1.59 Second benchmark: This benchhmark was donw with the source code of Linux 2.4.4. The
- cp linux-2.4.4.tar prova.tar
- tar xf prova.tar
- rm -f prova.tar
- rm -rf linux
The times these commands took (in seconds):FS Copy Extract rm file.tar rm -rf dir
Reiser 38.48 58.44 0.45 10.09
Ext2FS 21.31 59.19 2.88 11.12
XFS 16.21 35.44 0.18 21.96
Fat32 39.76 134.19 1.2 6.7
The secret to enjoying Slashdot is to realize that it should not be taken too seriously.
What the gut got:
$ddif=/dev/zeroof=./provab s=1Mcount=256
$ddif=./provaof=/dev/nullb s=1Mcount=256
$rm-f./prova
FSWriteReadrm
ReiserFS18.523.410.4
Ext2FS20.321.380.57
XFS16.3219.420.26
FAT3243.6527.981.59
$cplinux-2.4.4.tarprova.tar
$tarxfprova.tar
$rm-fprova.tar
$rm-rflinux
FScpextractrmrmdir
ReiserFS38.4858.440.4510.0 9
Ext2FS21.3159.192.8811.12
XFS16.2135.440.1821.96
FAT3239.76134.191.26.7
[1] ReiserFS has a high (reproducible) variance when copying the kernel tarball.
All test ran three times with the exception of the reiserfs kernel tarball copy, which ran six times. Machine is running in single user mode. Results timed with time(1).
Also, one more tidbit, the CVS tree also contains the kdb code, so that's a (small) part of the increase.
Just realized, CVS has a kernel/ dir and a cmds/ dir.
Any chance your tallies include the whole cmds/ tree? That's all userspace stuff...
I can't help but feel that that much code is going to have -some- impact on the rest of the system.
Although XFS is big, it doesn't stomp on the kernel that much.
If you're looking at it from a pure "volume of code" point of view, here's some info:
The XFS patches are split into 2 parts, one which contains kernel changes, and one which is the filesystem itself.
The "core-kernel" patch is about 190k, while the actual filesystem code patch is about 4.3M.
Bear in mind that a 190k patch to the kernel does not mean 190k of new code, either, since you have to take out the context lines, the headers, and the delete lines.
Overall, the impact on the kernel isn't as big as you might think, looking at the overall size.
Now, whether or not executing code in the filesystem slows down the rest of the system, I don't have any real data on that, although I have not noticed any detrimental effect on my system.
I'm using ReiserFS now and I'm really happy with performance/crash recovery. I have a lot of cat-related outages...
I installed xfs a week ago. Yesterday I wrote about it to my ex-office mate, saying I couldn't wait for a power failure. Got my wish this morning at about 4:00 a.m. My machine took 15 seconds to reboot. My colleague's machine with ext2 took nearly 3 minutes. Cool messages in the logs:
May 10 03:55:11 musuko kernel: Start mounting filesystem: ide0(3,5)
May 10 03:55:11 musuko kernel: XFS: WARNING: recovery required on readonly filesystem.
May 10 03:55:11 musuko kernel: XFS: write access will be enabled during mount.
May 10 03:55:11 musuko kernel: Starting XFS recovery on filesystem: ide0(3,5) (dev: 3/5)
May 10 03:55:11 musuko kernel: Ending XFS recovery on filesystem: ide0(3,5) (dev: 3/5)
Thank God I'm an atheist!
The differences between FAT32 and XFS may be interesting, but keep perspective. What you're seeing is the difference between the Linux driver for XFS and the Linux driver for FAT32, and not necessarily the inherent properties of either filesystem.
Don't get me wrong, I'm not comparing FAT32 to XFS by a long shot! But FAT32 is a fs that not a lot of hackers care about enough to improve the performance under Linux. Personally, I've always found that FAT32 access under Linux has been abysmal compared to access to the same filesystem under Windows.
I used ext3 about a year or to ago. It worked a lot better than it should have. Or so I thought.
It claimed that it'll still have fsck run when you crash, but actually nothing was getting run... I'm pretty sure that it was just not repairing itself after crashes. I'm really surprised I never lost any data until, well, one day I lost the whole damn partition.
I'm using ReiserFS now and I'm really happy with performance/crash recovery. I have a lot of cat-related outages...
--
In the land of the blind, the one-eyed man is kinky.
This isn't specifically for bytes, but the list of SI prefixes is here
The new prefixes for binary units (which nobody uses (the prefixes, not binary units)) are here
Maybe because this was a test of filesystems that work under Linux, not filesystems in general. To have a reliable test of NTFS, you'd have to use it under NT or Win2k, unless you wanted to do some shady benchmarking and use the developing NTFS driver for linux. Also, the point was do use the exact same commands for each filesystem...dd, rm, etc, which you couldn't do in Windows.
Fat32 was probably included because it's its so well supported and easy to do.
Little time or patience required, I can explain the delete behavior easily. (Not that I would label myself an FS hacker... distributed, maybe.)
Modern inode-based filesystems scatter both files and metadata across the disk, to reduce average seek time and fragmentation problems (free blocks are always nearby when using cylinder groups, unless the disk is very very full). So deletion can involve visiting many areas of the disk, and also traversing the allocation tree for each file and adding its blocks back to the free list (or vector).
MS-DOS FAT filesystems have most of the metadata tightly clustered in the two file allocation tables... only subdirectories containing filenames, some metadata, and pointers into the FAT are stored outside. (32 bytes each.) So for FAT you have to seek, read, and write once (usually) for each subdirectory, but for all the normal files you just make one pass on each FAT, which probably involves no seeking, since they fit entirely within a single cylinder on modern disks.
DOS just made this process seem excruciatingly slow because you were typically using it without caching and on floppies. FAT-based filesystems are a bit of a naive approach, and their average-case performance, especially under load, can be quite poor if you can't cache the entire FAT in memory. But they aren't all that awful.
Java: the COBOL of the new millenium.
Hmm... I don't know exactly why metadata operations should be slow on XFS, but my understanding is that XFS is primarily useful for:
Just my thoughts....
Java: the COBOL of the new millenium.
fat32 is an interesting control, but in an ideal benchmark, ntfs would have been used, as it is designed closer to other filesystems, as opposed to fat32, which is more like Baby's First Filesystem(tm).
Though I will grant that NTFS would have been hard or impossible to benchmark in this test, given the lack of robust drivers for it.
-Lx?
FAT32 has almost zero relevance here because if we were all using Win2k or NT we would be using NTFS just like Windows servers do. Why on god's green earth this test didn't test against NTFS is completely beyond me other than to have a weeker MS filesystem to poke fun at. Real objective, guys.
Have a Happy.
Maybe they should benchmark their DB system.
;-)
-sid
Its not about how *much* space, but about how often. Space gets allocated much more often then space gets freed, and usually at more time critical moments. Take, for example, a media system. If you're capturing a video (what else do you do on an XFS sytem?), the file will get enlarged several times during itslifetime. It will get deleted only once. Moreover, you care how quickly you can allocate space, since it effects you're capture quality, but don't really care if deleting files is slower (within reason, of course.) Or, take log files. Log files grow many times, are deleted only once. Also, as another poster pointed out, you usually delete log files when the system is lightly loaded doing internal maintainance. There is almost no application that I can think of (save artificial benchmarks) where XFS's deletion performance would be more important than its file creation/enlargement performance.
A deep unwavering belief is a sure sign you're missing something...
Yes, XFS does kick that much ass. Still, the deletion performance surprises me. Of course, the other speed things make up for it, but its still a puzzle. If you don't know, rm -rf has historically been a slow operation on XFS.
Here's my theory. Ext2 uses a bitmap to track free blocks, and I'm pretty sure ReiserFS does as well. Free block runs on XFS is managed by two B+trees, one keyed by address, one keyed by size. Thus, allocating space is very fast on XFS, and it is easy to keep things contiguous. However, inserting runs of blocks into both trees is a slower operation then simply clearing the appropriate bits. This would explain the difference between the file creation speed (extraction test) and file deletion speed (rm -rf test.) If this is the case, I think it is quite a good tradeoff, given that space is allocated much more often.
Of course, IANAXE (I am not an XFS engineer) so this is just my theory. I'd appreciate it is someone more informed about XFS could tell me the reason for the performance delta.
A deep unwavering belief is a sure sign you're missing something...
Did you look through the mailing list for ReiserFS?
If you had, you'd have seen that Reiserfs is journaling (Well, actually log structured)
It reserves 32mb for the log... Ergo, 32mb of missing space. If I remember right, you can decrease this when formatting it.
1 kilobyte = 1024 bytes
1 megabyte = 1024 kilobytes
1 gigabyte = 1024 megabytes
1 terabyte = 1024 gigabytes
1 petabyte = 1024 terabytes
1 exabyte = 1024 petabytes
Data Powers of Ten
a 'Yottafied' filesystem.
"Hmmmmmm,large it is. Strong is the file system in this one, hmmmmm."
KFG
XFS, ext2, ReiserFS, FAT32
GeForce3
Kernel Benchmarks
"My mother never saw the irony in calling me a son-of-a-bitch." - Jack Nicholson
--
Friends don't let friends use multiple inheritance.
1. The maxclients in php.ini doesn't work.
2. We want MySQL !!!
Right now, the author is doing more extensive tests. We will put them is a static page.
Hope the server is still alive when I arrive home.
Cheers, assasins;-)
--ricardo
sgis ddo ekil t'nod i
Sorry if it annoyed you. But don't complain to a guy who likes just to play around with Linux.
--ricardo
sgis ddo ekil t'nod i
You should tell ne before you try to slashdot us ;-) So, I could have time to increase the PG backend.
Hope we can keep it running now... (poor PIII 500Mhz)
--ricardo
sgis ddo ekil t'nod i
Possibly off-topic, but what's the story on the Tux2 http://slashdot.org/features/00/10/13/2117258.shtm l
file system? It sounded like a great idea,
now it seems even the links are broken.
That their webserver can not keep up with their filesystem anyway...
Needless to say I've got a pair of 61GB IDE's screaming for XFS once the 2.4.4 patches are out. Can't wait.
--
Top Most Bizarre/Disturbing Error Messages
Mirror here.
No idea why they're including something as outdated as FAT...
Do you like German cars?
Running something like that on ext2fs would (apart from the agony of fsck'ing 800GB of storage) be completely hopeless.
When it comes to setups with a few large files, though, the advantage isn't that great, and the numbers in the article makes reasonable sense.
Reiserfs should be your filesystem of choice if: a) you want to be able to put gazillions of files in a single directory when there is no logical hierarchy to the data (our tests indicate that Reiserfs handles shallow directory hierarchies with many files pr. directory faster than the opposite), and b) you want to be able to efficiently store a 100 byte file when grouping your data logically would give you file sizes of that magnitude, instead of grouping many pieces of data together.
We've been running Reiserfs for well over a year now, and it works great. It's important that you keep up to date on bugfixes, though, and that you're very careful about your recovery procedures - reiserfsck is really a last resort, and you should always ensure you copy out as much as possible of any data on a damaged volume, and preferrably take a raw copy of the entire volume, before running it. That said, I wouldn't hesitate to recommend Reiserfs to anyone with the specific needs I mentioned above.
I have not seen the FAT32 source linux, but I have for Windows and NT (as I wrote it). I can say that (as another posted has speculated) the quality of the source algorithms has a huge impact on the performance of the filesystem. It is quite surprising to see the write time be so slow for linux, as quite frankly, FAT32 is so simple (no transaction) it *should* be only slightly slower than optimal in medium to large file size cases. NTFS should certainly show slower on writes of larger files, and if it doesnt, you know you have a huge bug somewhere in the FAT32 driver. I dont know enough about the other FSs to comment, but I do suppose they should not be significantly faster.
XFS is optimised for dealing with streaming media, and so deals well with high IO and large files.
JFS has been around for years under AIX. It's a well proven general purpose journalling filesystem.
ReiserFS is the best established of the Linux journalling filesystems. It has several fairly innovative features and is more efficient than ext2 in terms of space utilisation. People are using it as their primary filesystem now, although it's still in development.
EXT3 is (unsurprisingly) a development of EXT2. It lacks most of the pretty features of the other journalled filesystems, but has the significant advantage that you can turn EXT2 partitions into EXT3 (and vice versa) without any trouble at all.
What many Linux users forget (or don't know) is that FreeBSD (and the other BSD's) default to synchronous writes with drive write caching turned off. This is done to prevent losing large chunks of data in the event of a sudden system shutdown. Linux defaults to asynchronous writes with write caching turned on. The other piece of information missing from the post is that /usr/ports on FreeBSD is 110MB with over 17,000 directories and over 62,000 files. If you plan on removing that many directories and files on a regular basis (not common on most servers) then just turn on asynchronous writes.