Merits Of The Different Journaling Filesystems?
a2800276 asks: "The story that XFS has gone beta raised some questions in my mind. There are now four journaling filesystems available under various OSS licenses and being actively developed for Linux, there being (in estimated order of maturity): SuSE/Namesys's reiserfs, SGI's XFS, IBM's JFS and Tweedie/Redhat's ext3fs. Avoiding the obvious question of why can't the effort going into four different projects be channeled into one, I think a discussion of the particular merits of the different fs's would be interesting."
Haven't we had this discussion umpteen times before? Such as... two days ago? There's even a link to a discussion on the four competing filesystems. Sheesh. Flogging the dead horse.
I was ASKED to install a Linux system at work last week! (I've been /home on it, and since I was logged in as root, it didn't matter.
preaching Linux for 3 years - patients pays off!) They gave me an
old P100 with 71MB or usable RAM and two HDs.
I decided to use SuSE 6.4 BECAUSE it had ReiserFS.
The graphic install really impressed the Win techies standing
around watching because it was easy enough that even they
could do it, and is pretty eye candy. KDe really impressed them
too.
Thirty minutes later the second HD, a 4.3 BigFoot, died.
I had
The dead drive was smoothly disconnected from the system.
Since I was needed to power down to replace the HD I decided
to test out the ReiserFS. I reached over and pressed the reset
button. A collective "gasp!" rose from the assembled techies.
Thirty seconds later I had the KDE graphical login prompt.
No corruption, no losses. It's like having an UPS attached.
I didn't notice any increase in speed of file accessing, but it
was fast at rebooting. It's been up 18 days now, which is
also impressing the techies in our M$ shop. They are still
afraid of Linux though. I think it is because they may feel
that they may have to retrain, loosing any employment
advantages they may have accumulated. They are right.
Running with Linux for over 20 years!
McKusick's Soft Updates has also a nice feature: unlike the journaling file systems, it does not have the burden of writing blocks to a logging device. So a soft-updates enabled kernel runs at the speed of traditional asyncronous file systems (ie, default ext2) while providing a very good level of reliability (it is not a syncronous file system, so it runs at a very enjoyable speed).
You can boot a Soft Updates file system without fscking it, the file system will be in a functional state. The only problem is that you might start to loose free blocks that are believed to be busy. So every 100 or 200 crashes you might want to run fsck to free those 100 blocks.
I agree with you regarding the ext2 file system when running in async mode: when there is a lot of activity on the disk, and a lot of changes to the file system, crashing an ext2 file system will loose a considerable ammount of data. ext2 fsck will not be able to recover your file system properly (it has happened to me a couple of times already).
For non-SoftUpdates kernels and non-Journaling kernels, if you are running a system with sensitive information, I suggest turning syncronous access on the file system (add option sync to it).
The sad part here is that the BSDs have traditionally been optimized for the syncronous case, so they run at acceptable speeds. Linux ext2fs has never been optimized for this case so in practice it is very slow.
I am using ReiserFS on my laptop, but on a server, if I had to choose, I would run SoftUpdates for BSD kernels and ReiserFS for Linux kernels.
Miguel.
When I was at the Ottawa Linux Symposium, there were talks on XFS, JFS, and ext3fs. It seemed clear that XFS was near beta, so the recent announcement was no surprise. Ext3fs also sounded near beta. Ext3 takes the simple approach of adding journaling to ext2 in such a way that as long as you unmount cleanly (so there's no need to play the log back), you can take an ext3 partition and mount it as an ext2 partition. From the talk, it sounded pretty much ready.
JFS was another story. My take on the talk was that people who atteneded it learned one important thing: JFS is the journaling file system to ignore. The Linux port comes from OS/2, instead of directly from AIX. It lacks such things as support for mixed case filenames. The answers to most of the questions were, "We hadn't thought of that," or, "We'll have to look into that." If JFS didn't have the "me-too" ego of IBM behind it, the developers would have realized that they were better off working on one of the other file systems.
I tried to ask this question a few months ago, but with no luck getting it posted I did some research on my own. I wanted to make a 60GB file server that would give me some insurance on my data. I was close to using the IBM JFS, but kept hearing about ReiserFS and gave it a try. (Heck, sourceforge uses ReiserFS on their servers, so it's good enough for mine.) Anyway, after a little more reading, I realized that ReiserFS doesn't just add journaling to a partition, it also restructures the filesystem into B-trees which can enhance access speeds, and it also adds a bit of encryption to the filesystem since it uses a hashing algorithm to sort the files.
In my opinion, you just get more. I also found the installation and recompile fairly easy to do. I've been using ReiserFS for the past 3 months with absolutely no problems.
Sometimes I doubt your commitment to Sparkle Motion.
tux2fs probably *will* take more memory (substantially more?) than ext2 or a journaling filesystem, but with the amount of memory that most systems have available for file cache, I doubt that is a problem.
.1% of the total in a filesystem. My guess is you won't notice any extra load on the buffer cache.
I've analyzed that question and I think tux2 will only use a little more cache memory, not a lot more, and it could even be less - see below. Tux2 uses per-block copy-on-write, and when the old version of a block won't be used any more (the normal case) that means you can just change the disk block number in the buffer - no extra memory used at all. The only time extra memory is used is when a file block is written over and over again, every 10th of a second or so - then you will sometimes get two copies of it in memory at the same time. The first copy will disappear as soon as it finishes being transfered to disk. This kind of writing pattern is rare with normal data but is common with metadata. Fortunately metadata is about
In fact, I think Tux2 will take a load off the buffer/page cache because it doesn't let dirty data hang around a long time - it starts writing to disk a fraction of a second after you start writing to a file. My plan is to have Tux2 shorten its phase length under heavy memory pressure, so the space needed for dirty buffers will drop down to just 100-200K, and you'll still get good performance.
Cache memory for reading under Tux2 is the same as Ext2 and most other filesystems.
--
Have you got your LWN subscription yet?
XFS is optimised for dealing with streaming media, and so deals well with high IO and large files.
JFS has been around for years under AIX. It's a well proven general purpose journalling filesystem.
ReiserFS is the best established of the Linux journalling filesystems. It has several fairly innovative features and is more efficient than ext2 in terms of space utilisation. People are using it as their primary filesystem now, although it's still in development.
EXT3 is (unsurprisingly) a development of EXT2. It lacks most of the pretty features of the other journalled filesystems, but has the significant advantage that you can turn EXT2 partitions into EXT3 (and vice versa) without any trouble at all.
For similar crash protection, you might want to try out McKusick's "Soft Updates" that appear in *BSD systems. Essentially, they are ordered disk writes that makes sure data gets on the disk before metadata is altered. They go through the buffereing system, so performance isn't bad.
:( .
As an experiment, I pulled the plug towards the end of 5 FreeBSD kernel compiles (SMP `make -j 4`). In all cases, the fsck upon restart was minor, just freeing inodes. In four of the cases, `make` just picked up where it left off, and finished the kernel compile, losing only ~40 seconds work. In one case, a `make clean` had to be done because something was incomplete.
Don't try this on Linux! The ext2 fsck is horrible after a powerfail, and I've lost superblocks and had to re-install
It is also proof that open source software does not just 'chase tail lights' - the work is substantially innovative.
Phillips is also implementing tailmerging (a feature from ReiserFS to efficiently store small files) for ext2/ext3/tux2.
For more details, check his web pages here, and the linux-fsdevel mailinglist.