XFS Merged into Linux 2.4
Alphix writes "As noted on KernelTrap Marcelo has merged XFS into 2.4 after a code review by Christoph Hellwig. The mail from Marcelo on LKML is here. Apparently it touched very little VFS code so people not using XFS shouldn't see any ill effects from this (it's even supposed to fix some VFS bugs).
XFS is described by SGI as '...a journalling filesystem developed by SGI and used in SGI's IRIX operating system. It is now also available under GPL for linux. It is extremely scalable, using btrees extensively to support large and/or sparse files, and extremely large directories. The journalling capability means no more waiting for fsck's or worrying about meta-data corruption.' Let the stability vs. new-features flamewar begin."
Not.
Original ntfs.sys
The site where: "I'm right, as long as you ignore the things that prove me wrong", became a valid method of debate.
Let the stability vs. new-features flamewar begin.
It's already been stable for years, since VERY early in the 2.4.x cycle. It's just a detail in the naming that makes it merged as part of 2.4.x itself.
Be careful those of you who still use lilo
Q: Does LILO work with XFS?
This depens on where you install LILO. For MBR installation: Yes. For root partitions: No, because the XFS superblock goes where LILO would be installed. This is to maintain compatibility with the Irix on-disk format. This will not be changed. Putting the Superblock on the swap partition is reported to work but not guaranteed.
SGI has an overview on the XFS filesystem, just briefly pointing out some highlights. I also recall reading somewhere that it was possible (moreso than ext* filesystems) to undelete files on an XFS filesystem, although I'm skeptical.
Trolls lurk everywhere. Mod them down.
Here ya go.
Don't forget IBM's JFS, it's in 2.4 AFAIK, and the last time that there were benchmarks linked from slashdot, it actually seemed the best overall, even over the highly anticipated reiser4.
You can always look back at this old Slashdot article.
Trolls lurk everywhere. Mod them down.
For all those that are looking for a filesystem comparison, I found this story to be quite interesting...or go here for the test details and results.
I use XFS on serveral different servers, mainly because I belive it performs better then ext3, or any other fs. Also because Alot of the servers I run are samba servers and the ACL support is built native into XFS. And last I looked ACL support was still not quite stable in ext2/3 it has been awhile so it could be stable by now.
http://www.newsforge.com/os/03/10/07/196222.shtml? tid=2
After patching every single kernel thats come out since the early 2.4s, I now have a kernel that I don't need to patch. WOW, about darn time!! Perhaps I'll even get lucky enough that RedHat and others that do not support XFS yet will build it into their kernels. That will make MY life easier, and updates go faster.
We chose XFS after lots of serious testing. It beat all comers at the time and we've been using it ever since. The only downside to XFS is file deletion times are a bit long, especially compared to Reiser, but when you have a server that is uner HEAVY load (Databses, mail servers) and with LARGE files (log server) nothing beats XFS.
Thanks guys, this is one of those merges that has made me estatic!
Angry People Rule
"Science is about ego as much as it is about discovery and truth " - I said it, so sue me.
Mandrake has offered XFS since at least 9.0, my first Linux distro. I've been using XFS (at the suggestion of my friend who helped with the install) for at least 6 months now, with only instance of a problem (not sure if it was a fault in the filesystem itself): lost or corrupted an inode or two, and fixed very easily once I knew what to do.
It works with both GRUB and LILO, is reasonably speedy, and has enormous partition and file size limits.
Count me a happy customer.
~~LF
This "big merge" has nothing to do with vendor pressure. The XFS patches have been available and well-tested throughout most of the 2.4 kernel's life cycle and since its (XFS') stability has already been proven to play nicely with the rest of the kernel, it's quite appropriate to do a merge so late in the 2.4 tree's live cycle. The team at SGI that handles merging the XFS code into the kernel have done a very good job of keeping up with bug reports and changes in the kernel vfs code.
Marcelo probably shares my opinion in that the current XFS code has been around long enough, demonstrated stability, and successfully merged with every recent 2.4 kernel back to at least 2.4.1x, that it's more than suitable for inclusion in the main kernel source without risk of introducing instability.
The only clashes I've ever seen with XFS and other code was with other 3rd-party patches, such as the ACL support in grsecurity. Those are now "switchable", anyway.
That is not true. The biggest hold back during the past 3 years has been the fact that the VFS layer needed a number of alterations and so far Marcello did not merge XFS because of this.
It wasn't untill Cristoph OK'd the VFS changes that Marcello merged the XFS core.
SGI as a vendor has had nothing to do with it. Buy a altix 3000 and they would happily maintain any special patch you would need for that (IA64) machine.
I think I know what I'm talking about since it's my name on the XFS FAQ. And no I don't work for SGI.
Where's the light switch. Seth
XFS has been in 2.6 for a long time. It was merged early during the 2.5 development cycle.
Ext3 can grow or shrink an unmounted file system. XFS can grow a mounted file system.
Ext3 and XFS both have dump utilities, which many sys admins prefer for backup.
Ext3 supports three modes of journaling: writeback (risky metadata only), ordered (metadata only), and journal (all data). I believe XFS is comparable to ordered ext3.
Ext3 has been widely deployed on Linux, and it trivially reverts to ext2. The XFS design is mature, but its implementation on Linux is less proven.
Extended ACLs, btree filesystem structures to facilitate huge files, fast sparse files, large directories, fast deletes, and a couple other niceties that would have required huge functional changes to ext2/ext3 to implement. It's also completely 64-bit clean, as it has from its conception.
The btree-based storage structure is already employed by reiserfs in a similar manner, but XFS' implementation has been stable (used in IRIX) for quite a bit longer.
Yes.
Actually IMHO journalling on flash would be a bad idea. Most flash memories give you only about 100k write cycles before giving up the ghost. For mp3 players or digicams this is just fine. But, the point of the journal is that it is flushed to disk immediately on a write operation, so depending on usage you could wear out the memory cells that contain the journal file an order of magnitude faster, killing your flash memory REAL FAST.
Thoughts on tech, Software Engineering, and stuff
ext2dump is unsupported; in particular I recall a quote from Linus to the extent that anyone who uses ext2dump might just as well not make backups at all.
xfsdump on the other hand will work correctly.
http://epoxy.mrs.umn.edu/~minerg/fstests/results.
Of course your mileage may vary but I generally got results consistent with those cited.
My own experiences (I have used both reiserfs and xfs with 2.4.20 kernel:
You can defy gravity... for a short time
dump is not recommended with ext2 or ext3 because it opens the block device directly which bypasses the page cache and can give you corrupt data if there are dirty pages that havn't been flushed to disk.
I'm not sure if xfsdump is any smarter about it because of the DMAPI stuff available, but I'd be carefull.
Well, one of your mistakes is assuming that the non-journalling fs will be faster. XFS will wipe the floor with ext2 on certain workloads. The other is assuming that it takes a number of crashes to make fscking a problem. A single fsck on a large filesystem could take upwards of an hour.
(from http://www.sgi.com/software/xfs/overview.html)
Guaranteed Rate I/O
XFS is the only file system available that provides a guaranteed rate I/O system, which allows applications to reserve specific bandwidth to or from the file system. The file system can determine the available bandwidth and guarantee that a requested level of performance is met for a given time. This functionality is critical for media delivery systems such as video-on-demand or data acquisition.
Expanded Dump Capabilities
Unlike traditional file systems, which must be dismounted to guarantee a consistent dump image, you can dump an XFS file system while it is being used. The XFS dump utility, XFSdump, can dump an entire filesystem, a directory tree, or specific files. XFSdump is restartable, which allows a large dump to be spread over an extended period of time or to be resumed after a system restart.
-->tech stuff
Actually, this is not quite true. Most modern flash file systems are built upon a wear-leveling structure, so that rewrites to a particular sector are remapped uniformly over the remaining freespace. This prevents a single location in the flash from receiving too many rewrites. In practice, this makes the device last virtually forever. (Though knowing the wear-leveling pattern, you could probably force the worst case, in practice, this will not occur.)
Plus its sure to piss SCO off :)
;-)
That is not the half of it. You see-- Hellwig is a former SCO employee who when he worked there, worked with IBM closely on their port of JFS to Linux. He was also heavily involved in the SMP development process too. Just do a search for his name and SCO and Caldera on your favorite search engine. I think it will be hard for him to avoid a deposition
Now he works for SGI.
LedgerSMB: Open source Accounting/ERP
I remember about 7 years ago, I was working ona project that had a VAX cluster for our Sybase DB. It crashed. He had to wait almost 4 hours for it to finish fscking the disks.
GRIO is not available on Linux, because it requires a lot of other support in the kernel proper, in the various I/O subsystems etc.
however, the realtime subvolume, which is a component of GRIO, is available for use on Linux.
xfsdump is definitely smarter because of DMAPI, and is safe to use on live filesystems.
I guess you missed the article on Using the Real ntfs.sys Driver Under Linux, eh?
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
This is not an endorsement of JFFS, it's just an example of a flash friendly journalling filesystem. (I have not used it - it may be the best filesystem ever, I don't know).
Now that the VFS layer has been stabilized and supportive of such FS drivers as XFS, very little needs to be changed to add XFS support. It's almost completely "additive", rather than modifying the existing code.
One of the XFS authors said anyone who wants to undertake such a port -- "go for it".
Considering the difficulty in ensuring data integrity and support for B-tree arranged data, Microsoft would not look kindly upon XFS being ported to NT, since their next generation OS is supposed to include database like features to speed up indexing and accessing data like XFS already has built-in. It would really rain on their parade. Also, benchmarking shows NTFS is considerably slower than XFS (or FAT32 for that matter) for large files and NTFS has no support for Real-time I/O partitions or journals being located on separate disks.
NTFS also requires (according to ad-copy) constant defragmentation due to their primitive block allocation scheme while XFS does quite well even without the XFS FSR (File System Reorganizer). XFS's FSR was created for 1 specific customer who had a particular application that generated excessively fragmented disks. Before that, an FSR (/defragmenter) wasn't considered necessary because XFS is intelligent about how it lays out files when they are written and how it stores free space (with free space also stored in ordered B-tree's by powers-of-two size of the free space blocks.
The only benchmark I've seen XFS run noticeably slower on linux, on is deleting large numbers of small files -- something one doesn't notice on IRIX, since the space deallocation happens in background on IRIX, and only the inodes need be marked deleted before the user prompt returned. I seem to remember on Linux the space had to be deallocated synchronously for some reason or another.
Makes sense given the way free space is managed -- when files are deleted, free blocks are recursively combined with adjacent free blocks to create the largest possible 'free block size' (I think up to 128k blocks, default=4k block size) (my numbers may be a bit rusty). Free space blocks were combined asynchronously, under IRIX (as I understand it), in a system thread after the last reference to an inode was released. Linux, if I remember correctly, didn't support the facilities for such a background thread -- thus the block combining happens synchronously, explaining the performance hit for file tests that delete lots of small files: there are many small free blocks that are candidates for being merged with adjacent free space.
I'm not entirely sure why a special "XFS_del" process couldn't be started at system run time who's sole purpose was taking unreferenced inodes and doing the space combining in background, allowing foreground programs to continue asynchronously after simply marking the inode as unusable and enqueing it to the XFS "free space" combining process. It is quite possible some of this has been implemented and my information is dated. But free space combining on cleanup is one of the main reasons why, historically, XFS file systems, didn't need to have _continually_ running programs like Executive Software's, _DiskKeeper_, running, full time in background: because XFS had it's own built-in defragmentation every time a user did a file-delete.
For the degenerate case -- *one* customer was not getting sufficient speed for real-time, uncompressed video recording to disk (back in the early to mid 1990's when disks were much slower). The swat team, assigned to the problem, found that the customer's particular use kept many small files around while deleting some files in a way that prevented automatic space consolidation. This odd usage was just enough to slow down direct-to-disk video recording (something quite difficult on systems in the early to mid 90's when disks were not so fast and SCSI-2 was still state of the art). To solve this problem for *one* customer, the "xfs_fsr" util was written.
To make the most of the efforts spent on the one customer, SGI incorporated xfs_fsr into the general OS to be run occasionally to stave multi-month/year buildup of possible, similar degenerate cases. I.e. XFS customers considered fragmentation such an unlikely / non-issue, that the X