Slashdot Mirror


Merits Of The Different Journaling Filesystems?

a2800276 asks: "The story that XFS has gone beta raised some questions in my mind. There are now four journaling filesystems available under various OSS licenses and being actively developed for Linux, there being (in estimated order of maturity): SuSE/Namesys's reiserfs, SGI's XFS, IBM's JFS and Tweedie/Redhat's ext3fs. Avoiding the obvious question of why can't the effort going into four different projects be channeled into one, I think a discussion of the particular merits of the different fs's would be interesting."

19 of 192 comments (clear)

  1. Combine efforts... why by Anonymous Coward · · Score: 3

    Everyone loves to ask those really stupid questions about combining open source projects. Being an open soruce developer, I find these very irratating. The following reasons should be enough to either make you understand, or prove you can't understand...

    - Compatition is key to all things being good...
    - Each FS has an almost compleatly different techincal design, and each has a chance of being the "best" general purpose FS.
    - Each FS might in fact be the "best" at some things
    - Each developer is working on their respective project because they think it is the right thing to do. You tell them "No, stop that work on your project and help this one" and they will. They will see that the comunity thinks of their project as meaningless wheel-reinvention and quit.

    There are many more reasons that others will point out as well. Look at other industries. Why are there different radio stations? Different cars (even made by the same company)?

    Hey, if I stop one more person from telling my friends and my self that we should join this other project or that then this note is worth it.

    Got a problem with my view? I'm easy to find.

    Jamie B

  2. Re:Do they remove the 2G file size limits? by Christopher+B.+Brown · · Score: 3
    No, that is not a limitation imposed by the filesystem, so they can't.

    As far as I can tell, all of these filesystems allow files vastly bigger than 2GB, but the interface between VFS and LIBC still nicely enforces the 2GB limit for most purposes.

    This is the wrong thread on which to try to find resolution to that issue; take it up with the folks that defined ISO and ANSI C...

    --
    If you're not part of the solution, you're part of the precipitate.
  3. Hmm.. by Adnans · · Score: 3

    Avoiding the obvious question of why can't the effort going into four diffrent projects be channeled into one

    That's like asking all the *BSDs to work together with Linux. Diversity is a very good thing. Personally I think SGI's XFS is going to kick some serious butt since the SGI folks are all for high performance and huge I/O throughput performance. They've also shown they can do it (on Irix). Not that the other parties will stay behind. I just think SGI has a better chance at dominating the Journalling FS landscape in Linux...

    -adnans

    --
    "In short: just say NO TO DRUGS, and maybe you won't end up like the Hurd people." --Linus Torvalds
  4. Only Ext3 works with new NFS v3 code ... by BitMan · · Score: 3

    I have production systems now using Ext3, which is little more than Ext2 with full data journaling (and completely reversable). This is NOT an endorsement of Ext3, but the fact is that it is the only use usable at this point (largely because it is just a slight evolution from Ext2, unlike the others).

    I am in the middle of a ~30 page HOWTO on NFS+Journaling. Contact me direclty if you are interested in a copy. Again, I have production servers and workstations running with Ext3 and sharing data out via NFS v2/3.

    -- Bryan "TheBS" Smith

    --
    -- Bryan "TheBS" Smith
    Independent Author, Consultant and Trainer
  5. Obligatory BeOS plug. by be-fan · · Score: 3

    Okay, today, I'm actually going to preach the benifets of a BeOS technology.
    With all this talk of journeling file system, I'm surprised that bfs got ignored. BFS has several things going for it:

    1) It's fast. While it is a journaling file system, on Bonnie, it is about 20% faster than my ReiserFS partition (which is closer to the outside of the disk too) on straight reads and writes. It is also a good bit faster on the per-char tests. Best of all, the CPU utilization is 30% lower than Linux in the sequential, and 50% lower in the per char (where Linux pegs the CPU). However, the rewrite tests, it is significantly slower. Something I think has more to do with the BeOS VM and disk cache than the file system.

    2) It is ever so solid. I regularly (read: three times a week) shut down my BeOS machine with my power button. Not yet have I gotten a lost file, block, or data corruption. Linux regularly needs reinstalling if I turn it off in the middle of something important, and even NT bugs out on me for not shutting down. (I just hosed the system two days ago.)

    3) It has had database capabilities for years, while ReiserFS still has them in planning. That might be a "gee-whiz" features, but nothing beats having your MP3s automatically entered in a database based on ID3 info. (Or emails, or pictures, or whatever.)

    4) It has a flexible system of attributes. No more .stupid-extension because file-type is stored within attributes. The user and edit attributes all they want, and custom file info (like gamma-info for a GIF) can be imbeeded into a file. For example, if you've got a special program that can display a GIF with variable gamma settings, it can embed those gamma settings into an attribute. Those attributes are ignored by other programs, and stripped out when sent to another OS (unless you use .zip compression.) However, when displayed in a program that recognizes that attribute, it can be used.

    --
    A deep unwavering belief is a sure sign you're missing something...
  6. Re:Do they remove the 2G file size limits? by Inoshiro · · Score: 3

    Uhm, no.

    Linux 2.4 has a set of 64bit file calls which work natively on 64-bit achitectures, and work also on 32-bit architectures by using double word operations. You take a performance hit on 32-bit systems, but it works fine. The glibc 2.x has it, as does the kernel. You just have to ensure that the libc was compiled with support for it.

    Remember, this is opensource. We've patched the libc and vfs layer with little trouble because of it. Now it just needs testing :)
    --

    --
    --
    Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
  7. Some do different things by Tony+Hammitt · · Score: 3

    You have to remember that XFS and JFS journal all data and metadata changes whereas ReiserFS and EXT3 only journal metadata. Thus, [XJ]FS protect your data from becoming corrupted where the others just help you boot faster after a crash (with respect to their journalling).

    Ext3 can be used (once it's stable) on a preexisting Ext2 filesystem. The others cannot directly migrate from anything 'official'.

    I like the ideas of all four. It may be that you want to have a combination of all of them in your system, but that would be pushing it.

    I've used JFS a lot and it is really bulletproof if you set it up properly. It's heavily tied to the LVM in AIX, so I wouldn't expect much progress without LVM for Linux being a stable API. So, call it post 2.4.0 at least.

  8. ReiserFS is more than just fine, its great! by Chyeburashka · · Score: 3
    I've been using ReiserFS for about six months now, and have been very happy with it. On my home system, the electrical power is much more likely to glitch, and the journalling system always comes up without any problems, and very quickly. You might say "get a UPS" but a UPS is bulky, expensive, etc.

    At work I have five workstations and two servers running ReiserFS. These have performed flawlessly over the past several months, as they have been eased into production.

    The ReiserFS folks have been really good about finding and fixing bugs. Recently, a bug was discovered which crashed the system with ReiserFS-3.6 systems if you saved a file from Star Office on top of itself on an NFS server. That bug was eliminated with ReiserFS-3.6.17 in just a few days after being reported.

    Since ReiserFS isn't merged into the official kernel tree, when you want to try out the latest kernel, you have to patch ReiserFS into the system yourself, but that is quite easy.

    I look forward to the day when ReiserFS and these others are merged into the kernel. By the way, the 2.4 kernel is quite nice. Try copying a file several times your memory size from one disk to another (a 600 mb iso image should do the trick) on both 2.2.x and 2.4.0-test9pre-whatever. And try to do something with your system during the copy. You'll become addicted to 2.4.0, I promise you. Its wonderful.

  9. Re:Journalling is dead. Long live phase trees! by Spoing · · Score: 3
    ...'Tux2'...provides the same guarantees of a consistent filesystem as data+metadata journalling, without the performance hit.

    I'm curious...does anyone know how much more RAM/CPU Tux2/ReiserFS/... need over and beyond Ext2? Journaling is an impressive feature, yet some of the machines I monitor aren't cutting edge; Pentium 200/64MB, Celeron 300/96MB. I've already tweaked these systems in other ways (no extra consoles, MTTR settings, ...).

    --
    A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
  10. Um, it's deja vu all over again... by Holgate · · Score: 4

    Haven't we had this discussion umpteen times before? Such as... two days ago? There's even a link to a discussion on the four competing filesystems. Sheesh. Flogging the dead horse.

  11. Re:ReiserFS by Jerry · · Score: 4

    I was ASKED to install a Linux system at work last week! (I've been
    preaching Linux for 3 years - patients pays off!) They gave me an
    old P100 with 71MB or usable RAM and two HDs.
    I decided to use SuSE 6.4 BECAUSE it had ReiserFS.
    The graphic install really impressed the Win techies standing
    around watching because it was easy enough that even they
    could do it, and is pretty eye candy. KDe really impressed them
    too.
    Thirty minutes later the second HD, a 4.3 BigFoot, died.
    I had /home on it, and since I was logged in as root, it didn't matter.
    The dead drive was smoothly disconnected from the system.
    Since I was needed to power down to replace the HD I decided
    to test out the ReiserFS. I reached over and pressed the reset
    button. A collective "gasp!" rose from the assembled techies.
    Thirty seconds later I had the KDE graphical login prompt.
    No corruption, no losses. It's like having an UPS attached.
    I didn't notice any increase in speed of file accessing, but it
    was fast at rebooting. It's been up 18 days now, which is
    also impressing the techies in our M$ shop. They are still
    afraid of Linux though. I think it is because they may feel
    that they may have to retrain, loosing any employment
    advantages they may have accumulated. They are right.

    --

    Running with Linux for over 20 years!

  12. Re:Try *BSD Soft Updates by miguel · · Score: 4

    McKusick's Soft Updates has also a nice feature: unlike the journaling file systems, it does not have the burden of writing blocks to a logging device. So a soft-updates enabled kernel runs at the speed of traditional asyncronous file systems (ie, default ext2) while providing a very good level of reliability (it is not a syncronous file system, so it runs at a very enjoyable speed).

    You can boot a Soft Updates file system without fscking it, the file system will be in a functional state. The only problem is that you might start to loose free blocks that are believed to be busy. So every 100 or 200 crashes you might want to run fsck to free those 100 blocks.

    I agree with you regarding the ext2 file system when running in async mode: when there is a lot of activity on the disk, and a lot of changes to the file system, crashing an ext2 file system will loose a considerable ammount of data. ext2 fsck will not be able to recover your file system properly (it has happened to me a couple of times already).

    For non-SoftUpdates kernels and non-Journaling kernels, if you are running a system with sensitive information, I suggest turning syncronous access on the file system (add option sync to it).

    The sad part here is that the BSDs have traditionally been optimized for the syncronous case, so they run at acceptable speeds. Linux ext2fs has never been optimized for this case so in practice it is very slow.

    I am using ReiserFS on my laptop, but on a server, if I had to choose, I would run SoftUpdates for BSD kernels and ReiserFS for Linux kernels.

    Miguel.

  13. ext3 more real than JFS by crow · · Score: 4

    When I was at the Ottawa Linux Symposium, there were talks on XFS, JFS, and ext3fs. It seemed clear that XFS was near beta, so the recent announcement was no surprise. Ext3fs also sounded near beta. Ext3 takes the simple approach of adding journaling to ext2 in such a way that as long as you unmount cleanly (so there's no need to play the log back), you can take an ext3 partition and mount it as an ext2 partition. From the talk, it sounded pretty much ready.

    JFS was another story. My take on the talk was that people who atteneded it learned one important thing: JFS is the journaling file system to ignore. The Linux port comes from OS/2, instead of directly from AIX. It lacks such things as support for mixed case filenames. The answers to most of the questions were, "We hadn't thought of that," or, "We'll have to look into that." If JFS didn't have the "me-too" ego of IBM behind it, the developers would have realized that they were better off working on one of the other file systems.

  14. ReiserFS is just fine by me by da3dAlus · · Score: 4

    I tried to ask this question a few months ago, but with no luck getting it posted I did some research on my own. I wanted to make a 60GB file server that would give me some insurance on my data. I was close to using the IBM JFS, but kept hearing about ReiserFS and gave it a try. (Heck, sourceforge uses ReiserFS on their servers, so it's good enough for mine.) Anyway, after a little more reading, I realized that ReiserFS doesn't just add journaling to a partition, it also restructures the filesystem into B-trees which can enhance access speeds, and it also adds a bit of encryption to the filesystem since it uses a hashing algorithm to sort the files.
    In my opinion, you just get more. I also found the installation and recompile fairly easy to do. I've been using ReiserFS for the past 3 months with absolutely no problems.

    --

    Sometimes I doubt your commitment to Sparkle Motion.
    1. Re:ReiserFS is just fine by me by Inoshiro · · Score: 4

      I think you're confused about what a hash is. A hash is a one-way function. A cryptographic hash is a one-way function with an incredibly low probability of collision. Example: passwords are cryptographic hashes, allowing the system to verify your password without having a copy of it handy :)

      Encryption is a two-way function requiring a key (3DES, blowfish, IDEA), or a pair of matched one-way functions (RSA, DSA). Don't confuse this with a cryptographic hash which is strictly one way. And don't confuse hashes with any form of encryption.
      --

      --
      --
      Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
  15. Re:Journalling is dead. Long live phase trees! by Daniel+Phillips · · Score: 4

    tux2fs probably *will* take more memory (substantially more?) than ext2 or a journaling filesystem, but with the amount of memory that most systems have available for file cache, I doubt that is a problem.

    I've analyzed that question and I think tux2 will only use a little more cache memory, not a lot more, and it could even be less - see below. Tux2 uses per-block copy-on-write, and when the old version of a block won't be used any more (the normal case) that means you can just change the disk block number in the buffer - no extra memory used at all. The only time extra memory is used is when a file block is written over and over again, every 10th of a second or so - then you will sometimes get two copies of it in memory at the same time. The first copy will disappear as soon as it finishes being transfered to disk. This kind of writing pattern is rare with normal data but is common with metadata. Fortunately metadata is about .1% of the total in a filesystem. My guess is you won't notice any extra load on the buffer cache.

    In fact, I think Tux2 will take a load off the buffer/page cache because it doesn't let dirty data hang around a long time - it starts writing to disk a fraction of a second after you start writing to a file. My plan is to have Tux2 shorten its phase length under heavy memory pressure, so the space needed for dirty buffers will drop down to just 100-200K, and you'll still get good performance.

    Cache memory for reading under Tux2 is the same as Ext2 and most other filesystems.
    --

    --
    Have you got your LWN subscription yet?
  16. A brief summary by Fluffy+the+Cat · · Score: 5

    XFS is optimised for dealing with streaming media, and so deals well with high IO and large files.

    JFS has been around for years under AIX. It's a well proven general purpose journalling filesystem.

    ReiserFS is the best established of the Linux journalling filesystems. It has several fairly innovative features and is more efficient than ext2 in terms of space utilisation. People are using it as their primary filesystem now, although it's still in development.

    EXT3 is (unsurprisingly) a development of EXT2. It lacks most of the pretty features of the other journalled filesystems, but has the significant advantage that you can turn EXT2 partitions into EXT3 (and vice versa) without any trouble at all.

  17. Try *BSD Soft Updates by redelm · · Score: 5

    For similar crash protection, you might want to try out McKusick's "Soft Updates" that appear in *BSD systems. Essentially, they are ordered disk writes that makes sure data gets on the disk before metadata is altered. They go through the buffereing system, so performance isn't bad.

    As an experiment, I pulled the plug towards the end of 5 FreeBSD kernel compiles (SMP `make -j 4`). In all cases, the fsck upon restart was minor, just freeing inodes. In four of the cases, `make` just picked up where it left off, and finished the kernel compile, losing only ~40 seconds work. In one case, a `make clean` had to be done because something was incomplete.

    Don't try this on Linux! The ext2 fsck is horrible after a powerfail, and I've lost superblocks and had to re-install :( .

  18. Journalling is dead. Long live phase trees! by teraflop+user · · Score: 5
    OK, that overstates the case a little, journalling is still better when transactions have to complete as early as possible. But for most purposes Daniel Phillips' 'Tux2' phase-tree filesystem looks as though it may well be superior to journalling - it provides the same guarantees of a consistent filesystem as data+metadata journalling, without the performance hit.

    It is also proof that open source software does not just 'chase tail lights' - the work is substantially innovative.

    Phillips is also implementing tailmerging (a feature from ReiserFS to efficiently store small files) for ext2/ext3/tux2.

    For more details, check his web pages here, and the linux-fsdevel mailinglist.