XFS merged in Linux 2.5

← Back to Stories (view on slashdot.org)

Posted by CmdrTaco on Tuesday September 17, 2002 @03:27AM from the who-says-journals-are-for-girls dept.

joib writes "According to this notice, the XFS journaling file system has been merged into Linus bitkeeper tree, to show up in 2.5.36." Ya just know someone out there wants to have every journaling file system on one drive just 'cuz.

6 of 271 comments (clear)

Min score:

Reason:

Sort:

XFS FAQ by semaj · 2002-09-17 03:35 · Score: 5, Informative

There's an XFS FAQ and a load more information about it on SGI's site - which points out that several large distributions have had XFS support for a while by default.

Still, it's noteworthy that Linus has finally accepted it into his tree...

--
Meep meep
Re:Comparison? by rindeee · 2002-09-17 03:40 · Score: 5, Informative

http://aurora.zemris.fer.hr/filesystems/
Re:Silly question by MasterD · 2002-09-17 03:49 · Score: 5, Informative

XFS supports ACL's (or access control lists) which are much better than standard UNIX permissions.

XFS is an extent based filesystem which means that you don't end up wasting tons of space having to allocate a 4K block for every small file. And you don't need to jump through tons of indirect blocks to get large files.

XFS allocated inodes on the fly so it grows with what data you put on there. Once again, not wasting space up front. And it sticks the inode near the file itself so the head does not have to move far on the hard drive.

XFS supports extended attributes which can be used for all kinds of extensions later on.

XFS has been around since 1994 and is the most mature of the journalling filesystems.

And there are many other reasons that I cannot think of right now.
Re:Comparison? by auferstehung · 2002-09-17 04:00 · Score: 5, Informative

You could check out Daniel Robbins' "Advanced filesystem implementor's guide" over on IBM's developerworks. He covers reiserfs, ext3, and XFS and I believe there is a link to articles on JFS in the Resources section at the bottom of the page.

--
Logic is not Divine.
Re:My experience with XFS by josh+crawley · 2002-09-17 05:40 · Score: 5, Informative

---"- Recoveries after a crash are really fast. Almost immedate, better than ext3 and reiserfs."

Hmmm.. I'd assume that ext3 wouldn't be as good.. A fix on a fix usually sucks. And then I've heard about Reiser's file truncation problems. I use Reiser and no big problems."

---"- _BUT_ there's something strange. Basically during disk I/O, the whole system is unresponsive. While I'm compiling something, KDE becomes slow, playing videos is not smooth at all, etc. Just as if it didn't scale at all for concurrent disk access. So I finally switched back to ReiserFS just because of this. Maybe the 2.5.x series of kernel behaves differently.

I've had the same problems on 2.2.X when I didn't tweak my HD's to dma66 32 bit. Try doing a:

hdparm /dev/(drive linux is on)
hdparm -tT /dev/(drive linux is on)

If you dont like those settings, Drop into single user mode, with / read only and do this command

hdparm -X66 -d1 -u1 -m16 -c3 /dev/hda

Now manually do a fsck on that partition. If you have errors, it's a bad mode. But if it works, then redo the -tT option (it's a benchmark).

Be aware that 2.4 does most of this for you, but sometimes can give to little of a setting (so your performance sucks). Then again, you could have an unsupported IDE device.

All the best..
Re:Journalling filesytems... by psamuels · 2002-09-17 07:30 · Score: 5, Informative
What exactly is 'journalling'?

Here's the basic theory. Think about what happens when you make a change on a filesystem - say you add a file to a directory. The system has to:
- add a filename entry to the directory itself
- allocate the initial blocks for the file, from the pool of free space in your filesystem
- create the inode, which is a block of information about the file. The inode includes file modification times, owner, permissions, file type (regular file? directory? etc), and the location of its actual data blocks
- if there are too many data blocks, allocate one or more "indirect blocks", which are extensions to the inode so it can hold more data blocks - inodes usually have a fixed size. Initialise these with the correct block numbers as well.
- actually write the file contents to the data blocks you have allocated
If you don't do these things in the correct order, there will be times when the on-disk structure is not consistent. For example, you may have modified the directory to include an entry for the new file, but the entry points at an inode which hasn't been filled in yet. Or the inode may be filled in, but the free space pool hasn't been updated to correspond with the data block allocations in the inode. Throw in other modifications like deleting files or making them larger or smaller, and it gets pretty complicated. If the machine happens to crash at such a time - or the power goes out and you don't have a UPS - the disk will be in an inconsistent state. This has two major consequences:
1. the filesystem checker, or fsck (the equivalent Windows utility is scandisk) will have to run next time you boot, and go over the whole structure of your filesystem, which can take minutes or even hours on a large enough disk (80 GB takes a long time unless your disks are very fast). Nobody wants to sit around for 15 minutes waiting for the server to finish rebooting.
2. depending on exactly what was written to disk in what order, the fsck utility may not even be able to restore your filesystem to a consistent state at all, or it may lose important files or directories in the process of doing so.
Journalling prevents both problems (barring bugs in your OS or hardware, of course) by writing transactions to your filesystem. Instead of making changes directly to your directories, inodes, free block maps, etc, the filesystem batches up such changes by spooling them to a separate area on disk, the journal. Then, when it has written enough such changes to account for an entire, self-consistent transaction, it puts a marker in the journal indicating "transaction complete" and starts copying these changes to their usual locations on disk. Meanwhile, the next transaction can be spooled onto the end of the journal area, and it will get its own "transaction complete" marker when it is done. A journal can hold a lot of transactions - only limited by the journal size, which is usually configurable. When a transaction has been fully copied out of the journal to its final locations, it is re-labeled "journal free space" in the journal.

How does this help? Imagine that the machine goes down while a transaction is still incomplete in the journal. Next time you boot, the OS "replays" the journal: it looks for all the completed transactions and commits each part of a transaction to its correct permanent location. It ignores journal free space, and any incomplete transactions - essentially rewinding the filesystem state to the end of the last completed transaction. There is never any danger of "partially updated" filesystem state, since each transaction starts and ends with a known-consistent state.

(Ah, but what happens it the OS goes down again while replaying a journal? No big deal: next time it boots, it just replays the same journal again, which produces the same result as it would have done the first time.)

Some simplifications, obviously, but that's the basic idea. Did it help?

The different levels of journalling have to do with whether all filesystem data is journalled or only some of it. You usually only journal metadata, which is the filesystem structure: directories, inodes, free block maps, etc. That's because copying all your file contents twice (first into the journal, then into its permanent location in the filesystem) is quite slow. The main purpose of a journal is not to guarantee pristine file contents in the event of partially written files, but to ensure a consistent view of the filesystem as a whole - so you can avoid that long fsck and avoid ever ending up with a partially or fully scrambled filesystem (modulo hardware failure, of course).

HTH..
--
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README