Slashdot Mirror


XFS Merged into Linux 2.4

Alphix writes "As noted on KernelTrap Marcelo has merged XFS into 2.4 after a code review by Christoph Hellwig. The mail from Marcelo on LKML is here. Apparently it touched very little VFS code so people not using XFS shouldn't see any ill effects from this (it's even supposed to fix some VFS bugs). XFS is described by SGI as '...a journalling filesystem developed by SGI and used in SGI's IRIX operating system. It is now also available under GPL for linux. It is extremely scalable, using btrees extensively to support large and/or sparse files, and extremely large directories. The journalling capability means no more waiting for fsck's or worrying about meta-data corruption.' Let the stability vs. new-features flamewar begin."

40 of 265 comments (clear)

  1. Re:but NTFS by triptolemeus · · Score: 1, Informative
    --
    The site where: "I'm right, as long as you ignore the things that prove me wrong", became a valid method of debate.
  2. Stability has been there for a long time. by Anonymous Coward · · Score: 4, Informative

    Let the stability vs. new-features flamewar begin.

    It's already been stable for years, since VERY early in the 2.4.x cycle. It's just a detail in the naming that makes it merged as part of 2.4.x itself.

  3. Careful with LILO by slashnik · · Score: 4, Informative

    Be careful those of you who still use lilo

    Q: Does LILO work with XFS?
    This depens on where you install LILO. For MBR installation: Yes. For root partitions: No, because the XFS superblock goes where LILO would be installed. This is to maintain compatibility with the Irix on-disk format. This will not be changed. Putting the Superblock on the swap partition is reported to work but not guaranteed.

    1. Re:Careful with LILO by Jeffrey+Baker · · Score: 4, Informative
      I guess you've never run into LILO's "timestamp mismatch" error, which is undocumented and has nothing to do with timestamps. It prevents machines with large numbers of SCSI devices from booting. This is also precisely the market XFS serves.

      GRUB is good. Boots anything. Wish we had OF.

    2. Re:Careful with LILO by Anonym0us+Cow+Herd · · Score: 2, Informative
      Initrd is something for the good new days, not good old days. The idea is that during boot, the boot loader load two things: (1) kernel, and (2) initrd. The initrd is a ramdisk image of a smallish filesystem. The kernel is started with the smallish filesystem. The kernel runs a program from this filesystem prior to completing the boot process and running /sbin/init. The program that gets run prior to boot completion could do anything you needed to do prior to starting /sbin/init. Some useful ideas...
      • Load kernel modules for additional hardware that was not compiled into the kernel. This allows you to use a more generic stripped down kernel and focus on customizing the initrd which is much more flexible.
      • Load kernel modules for filesystems not compiled into the kernel. In fact, you could compile a kernel with none or only one filesystem. This leads to fewer generic kernels, and you focus on editing the initrd script to load which modules you need.
      In fact, in past versions of SuSE, the entire installation process was actually run from initrd! You boot from the CD ROM. The kernel loads, mounts the initrd ramdisk. Runs the initrd program from ramdisk. This is an extremely large, complex program, that not only loads the additional kernel modules needed, but then goes through a GUI installation process (frambeuffer device, no X) and then after the hard disk has been partitioned, filesystems written, and jillions or maybe even gazillions of files copied from multiple cd's into those newly created filesystems; then the initrd program ends, and the kernel "finishes" booting into the hard disk system that did not even exist when you started the kerenel booting up.

      You could conceivably have a server running for several years whose kernel was first loaded from the CD during installation onto an empty hard disk. Compare to number of reboots during Windows install.
      --
      The price of freedom is eternal litigation.
    3. Re:Careful with LILO by akedia · · Score: 2, Informative

      Make sure you enable "Initial ramdisk support" in your kernel config and once you have your bzImage built and installed in /boot and your kernel modules built and installed, you can then

      mkinitrd -o /boot/initrd.img /lib/modules/[kernel-version]

      That will build a nice ramdisk for your kernel which will preload any necessary modules. Then you just need to add a line to /etc/lilo.conf under your kernel section:

      initrd=/boot/initrd.img

      And of course reinstall lilo to the MBR by running /sbin/lilo.

  4. An Overview by Gudlyf · · Score: 5, Informative

    SGI has an overview on the XFS filesystem, just briefly pointing out some highlights. I also recall reading somewhere that it was possible (moreso than ext* filesystems) to undelete files on an XFS filesystem, although I'm skeptical.

    --
    Trolls lurk everywhere. Mod them down.
    1. Re:An Overview by Fruit · · Score: 3, Informative

      No. That is wrong. It's usually *harder* to retrieve a deleted file from an XFS filesystem than from ext2/ext3, not in the first place because the on-disk structures are more complicated.

  5. Re:ext3vs XFS? by Trigun · · Score: 5, Informative

    Here ya go.

  6. Re:Benchmarks by martinde · · Score: 5, Informative

    Don't forget IBM's JFS, it's in 2.4 AFAIK, and the last time that there were benchmarks linked from slashdot, it actually seemed the best overall, even over the highly anticipated reiser4.

  7. Re:ext3vs XFS? by Gudlyf · · Score: 3, Informative

    You can always look back at this old Slashdot article.

    --
    Trolls lurk everywhere. Mod them down.
  8. Comparison by Alphix · · Score: 5, Informative

    For all those that are looking for a filesystem comparison, I found this story to be quite interesting...or go here for the test details and results.

  9. XFS Rocks by fmlug.org · · Score: 5, Informative

    I use XFS on serveral different servers, mainly because I belive it performs better then ext3, or any other fs. Also because Alot of the servers I run are samba servers and the ACL support is built native into XFS. And last I looked ACL support was still not quite stable in ext2/3 it has been awhile so it could be stable by now.

    1. Re:XFS Rocks by oracleofbargth · · Score: 2, Informative

      And last I looked ACL support was still not quite stable in ext2/3 it has been awhile so it could be stable by now.

      As of the patch for kernels 2.4.19+, acl support is very stable for ext[23]. In fact, I've been using it in production for over 2 years now. (I did help write some of the ext3-xattr+acl code, though, so maybe that means I'm a little bit more trusting of the code.)

      The only big issues I've ever had is when using them in conjunction with quotas, but even when stress testing the filesystem, I haven't been able to coerce the code to crash since the 2.4.19 patch.

      There is a performance hit to NFS when using the ACL-over-NFS code, but you never see it unless you're reading or writing the ACLs themselves. (ie. only when you're using `star --acl` to backup or restore files with their acls, or using a recursive (get|set)facl on an NFS mount.) Of course, the nfs code is less tested, so if you try it and have problems, please submit bugs to acl-devel@bestbits.at

  10. Re:Benchmarks by AlxRogan · · Score: 2, Informative

    http://www.newsforge.com/os/03/10/07/196222.shtml? tid=2

  11. YEAH, WOO HOOO - ALRIGHT! by cluge · · Score: 4, Informative

    After patching every single kernel thats come out since the early 2.4s, I now have a kernel that I don't need to patch. WOW, about darn time!! Perhaps I'll even get lucky enough that RedHat and others that do not support XFS yet will build it into their kernels. That will make MY life easier, and updates go faster.

    We chose XFS after lots of serious testing. It beat all comers at the time and we've been using it ever since. The only downside to XFS is file deletion times are a bit long, especially compared to Reiser, but when you have a server that is uner HEAVY load (Databses, mail servers) and with LARGE files (log server) nothing beats XFS.

    Thanks guys, this is one of those merges that has made me estatic!

    Angry People Rule

    --
    "Science is about ego as much as it is about discovery and truth " - I said it, so sue me.
  12. No Complaints by LightForce3 · · Score: 5, Informative

    Mandrake has offered XFS since at least 9.0, my first Linux distro. I've been using XFS (at the suggestion of my friend who helped with the install) for at least 6 months now, with only instance of a problem (not sure if it was a fault in the filesystem itself): lost or corrupted an inode or two, and fixed very easily once I knew what to do.

    It works with both GRUB and LILO, is reasonably speedy, and has enormous partition and file size limits.

    Count me a happy customer.

    ~~LF

  13. Re:Vendor pressure by fishnuts · · Score: 2, Informative

    This "big merge" has nothing to do with vendor pressure. The XFS patches have been available and well-tested throughout most of the 2.4 kernel's life cycle and since its (XFS') stability has already been proven to play nicely with the rest of the kernel, it's quite appropriate to do a merge so late in the 2.4 tree's live cycle. The team at SGI that handles merging the XFS code into the kernel have done a very good job of keeping up with bug reports and changes in the kernel vfs code.

    Marcelo probably shares my opinion in that the current XFS code has been around long enough, demonstrated stability, and successfully merged with every recent 2.4 kernel back to at least 2.4.1x, that it's more than suitable for inclusion in the main kernel source without risk of introducing instability.

    The only clashes I've ever seen with XFS and other code was with other 3rd-party patches, such as the ACL support in grsecurity. Those are now "switchable", anyway.

  14. Re:Vendor pressure by knuffie · · Score: 2, Informative

    That is not true. The biggest hold back during the past 3 years has been the fact that the VFS layer needed a number of alterations and so far Marcello did not merge XFS because of this.

    It wasn't untill Cristoph OK'd the VFS changes that Marcello merged the XFS core.

    SGI as a vendor has had nothing to do with it. Buy a altix 3000 and they would happily maintain any special patch you would need for that (IA64) machine.

    I think I know what I'm talking about since it's my name on the XFS FAQ. And no I don't work for SGI.

    --
    Where's the light switch. Seth
  15. Re:I love XFS by kill-1 · · Score: 3, Informative

    XFS has been in 2.6 for a long time. It was merged early during the 2.5 development cycle.

  16. Re:ext3vs XFS? by _|()|\| · · Score: 5, Informative
    can someone offer a nice comparison of ext3 versus XFS?

    Ext3 can grow or shrink an unmounted file system. XFS can grow a mounted file system.

    Ext3 and XFS both have dump utilities, which many sys admins prefer for backup.

    Ext3 supports three modes of journaling: writeback (risky metadata only), ordered (metadata only), and journal (all data). I believe XFS is comparable to ordered ext3.

    Ext3 has been widely deployed on Linux, and it trivially reverts to ext2. The XFS design is mature, but its implementation on Linux is less proven.

  17. Re:I say what....? by fishnuts · · Score: 4, Informative

    Extended ACLs, btree filesystem structures to facilitate huge files, fast sparse files, large directories, fast deletes, and a couple other niceties that would have required huge functional changes to ext2/ext3 to implement. It's also completely 64-bit clean, as it has from its conception.

    The btree-based storage structure is already employed by reiserfs in a similar manner, but XFS' implementation has been stable (used in IRIX) for quite a bit longer.

  18. Re:ext3vs XFS? by _|()|\| · · Score: 4, Informative
    Does linux have an XFS dump/restore ported to it?

    Yes.

  19. Re:NTFS not GPL, FAT not free by Merlin42 · · Score: 5, Informative

    Actually IMHO journalling on flash would be a bad idea. Most flash memories give you only about 100k write cycles before giving up the ghost. For mp3 players or digicams this is just fine. But, the point of the journal is that it is flushed to disk immediately on a write operation, so depending on usage you could wear out the memory cells that contain the journal file an order of magnitude faster, killing your flash memory REAL FAST.

  20. Re:ext3vs XFS? by Fruit · · Score: 3, Informative

    ext2dump is unsupported; in particular I recall a quote from Linus to the extent that anyone who uses ext2dump might just as well not make backups at all.

    xfsdump on the other hand will work correctly.

  21. Bechmarks by kompiluj · · Score: 4, Informative
    You can find the benchmarks on:
    http://epoxy.mrs.umn.edu/~minerg/fstests/results.h tml, or a copy at: ReiserFS homepage.
    Of course your mileage may vary but I generally got results consistent with those cited.
    My own experiences (I have used both reiserfs and xfs with 2.4.20 kernel:
    • reiserfs is a little bit faster than xfs
    • xfs gives you 2 times bigger CPU usage than reiserfs
    • both are still much better than jfs
    • the reliability of both xfs and reiserfs is satisfactory
    • the results are still order of magnitude worse than those I get with UFS2 with softupdates on FreeBSD 5.1
    --
    You can defy gravity... for a short time
    1. Re:Bechmarks by Pow.R+Toc.H · · Score: 2, Informative
      These benchmarks have been run with Reiser 4 which, AFAIK, is not shipped by default with no Linux distribution. Even Mandrake guys, who are fond of experimental software on their distributions hasn't included Reiser 4 on Mdk9.2. Most distributions include 3.6.28, IINM.

      OTOH, I've been using XFS to store and edit 36-bit film scans (40+ MB file sizes) and XFS has been serving me extremely well, without data corruption of any kind - differently from Reiser 3, which needs a reiserfsck every time I boot Win2k (not that this happens very often).

      Finally, according to Oracle, even ext3 blows ReiserFS 3 off the water:

      http://otn.oracle.com/oramag/webcolumns/2002/techa rticles/scalzo_linux02.html
      --------
      Outgoing mail is certified Virus Free - after all, I use Mandrake Linux!
      Get my public PGP signature at http://www.paulo.fessel.nom.br

      --

      --------
      Fighting the herd since 1985.
    2. Re:Bechmarks by kompiluj · · Score: 2, Informative
      Ok, real-world performance can be VERY different from the test results, for instance mounting options are really important - I have heard about ext3 running much faster when mounted in (theoretically) slowest mode data=journal. But for applications that write to the same location many times (e.g. databases) it means that the data are kept in memory and don't get written to disk (remember - the journal resides in memory before it gets synced). The same for mail servers - like postfix. In these cases ext3 data=journal can be a good choice, however it has some limitations.
      On the other hand, when comparing reiserfs and xfs you must remember that reiserfs and reiser4 are generally systems that are tuned with many little files in mind and xfs is made for big transfers. So reiserfs shines when you have many little files (e.g. my ext3 broke having to deal with about 500 thousand files in a directory tree of average depth 10 - moving them to reiserfs helped), and xfs is good when you have hardware capable of big transfers - like the SGI(tm) architectures, or at least some ultra-mega-fast-SCSI.
      However, I would still stick to the following points:
      • FreeBSD 5.1 with Berkeley FFS (a.k.a. ufs2 - some authors mistake ufs for s5fs - no longer in use) with soft-updates is order of magnitude better than any system on Linux 2.4 - don't know how it is with 2.6
      • ext3 is the least CPU intensive
      • xfs is the mostCPU intensive
      • reiserfs can be nasty in a case of real crash (when you need to reiserfsck --rebuild-tree)
      --
      You can defy gravity... for a short time
  22. Re:ext3vs XFS? by Nothinman · · Score: 4, Informative

    dump is not recommended with ext2 or ext3 because it opens the block device directly which bypasses the page cache and can give you corrupt data if there are dirty pages that havn't been flushed to disk.

    I'm not sure if xfsdump is any smarter about it because of the DMAPI stuff available, but I'd be carefull.

  23. Re:Why so much fuss over JFS? by virtual_mps · · Score: 3, Informative

    Well, one of your mistakes is assuming that the non-journalling fs will be faster. XFS will wipe the floor with ext2 on certain workloads. The other is assuming that it takes a number of crashes to make fscking a problem. A single fsck on a large filesystem could take upwards of an hour.

  24. A few other nice XFS features by isoga · · Score: 5, Informative
    ...no one has mentioned yet:
    (from http://www.sgi.com/software/xfs/overview.html)

    Guaranteed Rate I/O
    XFS is the only file system available that provides a guaranteed rate I/O system, which allows applications to reserve specific bandwidth to or from the file system. The file system can determine the available bandwidth and guarantee that a requested level of performance is met for a given time. This functionality is critical for media delivery systems such as video-on-demand or data acquisition.

    Expanded Dump Capabilities
    Unlike traditional file systems, which must be dismounted to guarantee a consistent dump image, you can dump an XFS file system while it is being used. The XFS dump utility, XFSdump, can dump an entire filesystem, a directory tree, or specific files. XFSdump is restartable, which allows a large dump to be spread over an extended period of time or to be resumed after a system restart.

    -->tech stuff

  25. Re:NTFS not GPL, FAT not free by Anonymous Coward · · Score: 5, Informative

    Actually, this is not quite true. Most modern flash file systems are built upon a wear-leveling structure, so that rewrites to a particular sector are remapped uniformly over the remaining freespace. This prevents a single location in the flash from receiving too many rewrites. In practice, this makes the device last virtually forever. (Though knowing the wear-leveling pattern, you could probably force the worst case, in practice, this will not occur.)

  26. SCO by einhverfr · · Score: 4, Informative

    Plus its sure to piss SCO off :)

    That is not the half of it. You see-- Hellwig is a former SCO employee who when he worked there, worked with IBM closely on their port of JFS to Linux. He was also heavily involved in the SMP development process too. Just do a search for his name and SCO and Caldera on your favorite search engine. I think it will be hard for him to avoid a deposition ;-)

    Now he works for SGI.

    --

    LedgerSMB: Open source Accounting/ERP
  27. Re:Why so much fuss over JFS? by GooberToo · · Score: 2, Informative

    I remember about 7 years ago, I was working ona project that had a VAX cluster for our Sybase DB. It crashed. He had to wait almost 4 hours for it to finish fscking the disks.

  28. no GRIO on Linux by Booker · · Score: 3, Informative

    GRIO is not available on Linux, because it requires a lot of other support in the kernel proper, in the various I/O subsystems etc.

    however, the realtime subvolume, which is a component of GRIO, is available for use on Linux.

  29. Re:ext3vs XFS? by MSG · · Score: 2, Informative

    xfsdump is definitely smarter because of DMAPI, and is safe to use on live filesystems.

  30. Re:XFS for Win32? by drinkypoo · · Score: 2, Informative

    The final option (the one I use) is to put your personal data on a cheap *NIX box (mine's FreeBSD box with a UFS disk) and SMB mount it for remote access under Windows and NFS or SMB mount it under *NIX.

    I guess you missed the article on Using the Real ntfs.sys Driver Under Linux, eh?

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  31. Re:NTFS not GPL, FAT not free by ozzee · · Score: 4, Informative
    JFFS addresses the flash write frequency concern. It would be good if someone was to create a tiny JFFS auto loader that would load off the flash filesystem into windows automagically. That way you can make it seamless.

    This is not an endorsement of JFFS, it's just an example of a flash friendly journalling filesystem. (I have not used it - it may be the best filesystem ever, I don't know).

  32. Re:Changes to the stable kernel? by fishnuts · · Score: 2, Informative

    Now that the VFS layer has been stabilized and supportive of such FS drivers as XFS, very little needs to be changed to add XFS support. It's almost completely "additive", rather than modifying the existing code.

  33. Re:XFS for Win32? by lpq · · Score: 3, Informative

    One of the XFS authors said anyone who wants to undertake such a port -- "go for it".

    Considering the difficulty in ensuring data integrity and support for B-tree arranged data, Microsoft would not look kindly upon XFS being ported to NT, since their next generation OS is supposed to include database like features to speed up indexing and accessing data like XFS already has built-in. It would really rain on their parade. Also, benchmarking shows NTFS is considerably slower than XFS (or FAT32 for that matter) for large files and NTFS has no support for Real-time I/O partitions or journals being located on separate disks.

    NTFS also requires (according to ad-copy) constant defragmentation due to their primitive block allocation scheme while XFS does quite well even without the XFS FSR (File System Reorganizer). XFS's FSR was created for 1 specific customer who had a particular application that generated excessively fragmented disks. Before that, an FSR (/defragmenter) wasn't considered necessary because XFS is intelligent about how it lays out files when they are written and how it stores free space (with free space also stored in ordered B-tree's by powers-of-two size of the free space blocks.

    The only benchmark I've seen XFS run noticeably slower on linux, on is deleting large numbers of small files -- something one doesn't notice on IRIX, since the space deallocation happens in background on IRIX, and only the inodes need be marked deleted before the user prompt returned. I seem to remember on Linux the space had to be deallocated synchronously for some reason or another.

    Makes sense given the way free space is managed -- when files are deleted, free blocks are recursively combined with adjacent free blocks to create the largest possible 'free block size' (I think up to 128k blocks, default=4k block size) (my numbers may be a bit rusty). Free space blocks were combined asynchronously, under IRIX (as I understand it), in a system thread after the last reference to an inode was released. Linux, if I remember correctly, didn't support the facilities for such a background thread -- thus the block combining happens synchronously, explaining the performance hit for file tests that delete lots of small files: there are many small free blocks that are candidates for being merged with adjacent free space.

    I'm not entirely sure why a special "XFS_del" process couldn't be started at system run time who's sole purpose was taking unreferenced inodes and doing the space combining in background, allowing foreground programs to continue asynchronously after simply marking the inode as unusable and enqueing it to the XFS "free space" combining process. It is quite possible some of this has been implemented and my information is dated. But free space combining on cleanup is one of the main reasons why, historically, XFS file systems, didn't need to have _continually_ running programs like Executive Software's, _DiskKeeper_, running, full time in background: because XFS had it's own built-in defragmentation every time a user did a file-delete.

    For the degenerate case -- *one* customer was not getting sufficient speed for real-time, uncompressed video recording to disk (back in the early to mid 1990's when disks were much slower). The swat team, assigned to the problem, found that the customer's particular use kept many small files around while deleting some files in a way that prevented automatic space consolidation. This odd usage was just enough to slow down direct-to-disk video recording (something quite difficult on systems in the early to mid 90's when disks were not so fast and SCSI-2 was still state of the art). To solve this problem for *one* customer, the "xfs_fsr" util was written.

    To make the most of the efforts spent on the one customer, SGI incorporated xfs_fsr into the general OS to be run occasionally to stave multi-month/year buildup of possible, similar degenerate cases. I.e. XFS customers considered fragmentation such an unlikely / non-issue, that the X