XFS merged in Linux 2.5

Not just journaling by Anonymous Coward · 2002-09-17 03:34 · Score: 5, Interesting

As I understand it, XFS also offers things like extended attributes. However, I have been told that the Linux VFS does not offer any way to read or write the attribute information?

Is this correct? Will the VFS also be extended so that you can make use of extended attributes in XFS?

Re:Not just journaling by publius · 2002-09-17 03:57 · Score: 5, Interesting

I read them, write them and delete them all the time using the attr family of commands. 64K limitation on the current value size but that's not so bad, and in the future it will be the (I think) 512K that Irix has. When you begin to think of all the cool things you can do with that, it becomes very interesting...
Re:Not just journaling by IamTheRealMike · 2002-09-17 04:18 · Score: 5, Interesting

Is this correct? Will the VFS also be extended so that you can make use of extended attributes in XFS?
Cooler, if I read the tea leaves right. I believe some time ago now there was a thread on lkml about whether it'd be possible to have files as also directories (and vice-versa). The reasoning behind this was simple: we want flexible filing system attributes, but not at the expense of API bloat. You want ACLs? That'll be another API then. Extended Attributes? Another API. What, you want heirarchical extended attributes too? Well you've just created another version of the filing system API haven't you.
The theory goes (and Hans Reiser, top guy, explains it much better than I can) that by altering one of the rules of the filing system, we can get lots more power and expressiveness without having to invent lots of new APIs. Let's say you want to find out the owner of file foo. You can just read /home/user/foo/owner. You can edit ACLs by doing similar operations. Now you can have something more powerful than extended attributes, but you can also manipulate that data using the standard command line tools too! Coupled with a more powerful version of locate, you can have very interesting searching and indexing facilities.
This has implications beyond just string attributes. Now throw in plugins, so for instance the FS layer interprets JPEGs and adds extra attributes. Now you can read the colour depth of an image by doing "cat photo.jpg/colour_depth" or whatever. You can get the raw, uncompressed version of the file by doing "cp photo.jpg/raw > photo.raw". Noticed something yet? You no longer need a new API for reading JPEG data, because you are reusing the filing system API.
But the FS is not a powerful enough concept, I hear you cry! Have no fear, for with new storage mechanisms comes new syntax too, to allow for BeFS style live queries. If you want more info, you should really read up on this stuff at Reisers site.
That's why ReiserFS is so good at small files as well as large files. Have you ever wondered why that is? It's not just a quirk of its design, it was very deliberate. One day, Hans wants to see us store as much information as possible in a souped up version of the filing system, so reducing interfaces and increasing interconnectedness. Or something. It sounds cool anyway :) That's one thing that RFS has that the other *FSs don't - the ReiserFS team has vision.

XFS FAQ by semaj · 2002-09-17 03:35 · Score: 5, Informative

There's an XFS FAQ and a load more information about it on SGI's site - which points out that several large distributions have had XFS support for a while by default.

Still, it's noteworthy that Linus has finally accepted it into his tree...

--
Meep meep

Re:Comparison? by rindeee · 2002-09-17 03:40 · Score: 5, Informative

http://aurora.zemris.fer.hr/filesystems/

Silly question by Mr_Silver · 2002-09-17 03:41 · Score: 5, Interesting

This is a silly question but ...

When I install Linux, and it comes to anything to do with filesystems, I just go with whatever default it gives me.

I suspect I'm not exactly alone.

So ... what compelling reason is there for me to use any other filesystem? Being more stable or better with data loss is nice, but considering I've only ever had this problem once, doesn't mean that i'll leap up and down going "oo oo! got to have blahFS!" any time soon.

To give you an example, FAT16 to FAT32 was the fact you could have larger partitions. FAT32 to NTFS was because of permissions and security.

But whatever we have now (can't remember, i barely look) to XFS? What *compelling* absolutely-must-have reason do I have to go change from whatever my installer suggests putting on for me?

Or should I just stick with what the installer suggests from now until eternity?

--
Avantslash - View Slashdot cleanly on your mobile phone.

Re:Silly question by MasterD · 2002-09-17 03:49 · Score: 5, Informative

XFS supports ACL's (or access control lists) which are much better than standard UNIX permissions.

XFS is an extent based filesystem which means that you don't end up wasting tons of space having to allocate a 4K block for every small file. And you don't need to jump through tons of indirect blocks to get large files.

XFS allocated inodes on the fly so it grows with what data you put on there. Once again, not wasting space up front. And it sticks the inode near the file itself so the head does not have to move far on the hard drive.

XFS supports extended attributes which can be used for all kinds of extensions later on.

XFS has been around since 1994 and is the most mature of the journalling filesystems.

And there are many other reasons that I cannot think of right now.
Re:Silly question by rseuhs · 2002-09-17 04:20 · Score: 5, Insightful

XFS supports ACL's (or access control lists) which are much better than standard UNIX permissions.
Actually I think ACLs are the reason why everybody is running as Administrator in Windows. They are just too damn complicated.
The Unix-permissions are simple. You can understand the concept of user-group-all in a few minutes and there are only 2 commands to remember (chmod, chown).
Also, Unix-permissions have so far fit with everything I needed and in the rare case you really need something special, there is also sudo.
I think ACLs are only useful for a tiny minority, IMO. I certainly don't need it.
Re:Silly question by Jeremy+Allison+-+Sam · 2002-09-17 06:36 · Score: 5, Interesting

POSIX ACLs aren't much more complex than
standard UNIX permissions and allow you to do
the 2 common cases :

1). Group finance has access + user Jill
2). Group finance has acces but not user fred.

But then again I wrote the Samba POSIX ACL
code so I'm biased :-).

Windows ACLs are a complete *nightmare* in
comparison. I still don't understand why Sun
added an incompatible varient of Windows ACLs
to NFSv4 (ie. it's close, but not the same as
the real Windows ACLs. The problem is they based
the spec. on the Microsoft documentation of how
the ACLs work. Big mistake.... :-).

Regards,

Jeremy Allison,
Samba Team.

My experience with XFS by chrysalis · 2002-09-17 03:58 · Score: 5, Interesting

I've been running Gentoo Linux for some times with XFS. Here's my experience with this filesystem :

- It's extremely reliable. Filesystems never got corrupted, even after a lot of ugly reboots.

- Recoveries after a crash are really fast. Almost immedate, better than ext3 and reiserfs.

- Every needed tool is available to resize filesystems, check filesystems, analyze filesystems and backup/restore filesystems.

- _BUT_ there's something strange. Basically during disk I/O, the whole system is unresponsive. While I'm compiling something, KDE becomes slow, playing videos is not smooth at all, etc. Just as if it didn't scale at all for concurrent disk access. So I finally switched back to ReiserFS just because of this. Maybe the 2.5.x series of kernel behaves differently.

--
{{.sig}}

Re:My experience with XFS by josh+crawley · 2002-09-17 05:40 · Score: 5, Informative

---"- Recoveries after a crash are really fast. Almost immedate, better than ext3 and reiserfs."

Hmmm.. I'd assume that ext3 wouldn't be as good.. A fix on a fix usually sucks. And then I've heard about Reiser's file truncation problems. I use Reiser and no big problems."

---"- _BUT_ there's something strange. Basically during disk I/O, the whole system is unresponsive. While I'm compiling something, KDE becomes slow, playing videos is not smooth at all, etc. Just as if it didn't scale at all for concurrent disk access. So I finally switched back to ReiserFS just because of this. Maybe the 2.5.x series of kernel behaves differently.

I've had the same problems on 2.2.X when I didn't tweak my HD's to dma66 32 bit. Try doing a:

hdparm /dev/(drive linux is on)
hdparm -tT /dev/(drive linux is on)

If you dont like those settings, Drop into single user mode, with / read only and do this command

hdparm -X66 -d1 -u1 -m16 -c3 /dev/hda

Now manually do a fsck on that partition. If you have errors, it's a bad mode. But if it works, then redo the -tT option (it's a benchmark).

Be aware that 2.4 does most of this for you, but sometimes can give to little of a setting (so your performance sucks). Then again, you could have an unsupported IDE device.

All the best..

Re:Comparison? by auferstehung · 2002-09-17 04:00 · Score: 5, Informative

You could check out Daniel Robbins' "Advanced filesystem implementor's guide" over on IBM's developerworks. He covers reiserfs, ext3, and XFS and I believe there is a link to articles on JFS in the Resources section at the bottom of the page.

--
Logic is not Divine.

Re:Journalling filesytems... by psamuels · 2002-09-17 07:30 · Score: 5, Informative

What exactly is 'journalling'?

Here's the basic theory. Think about what happens when you make a change on a filesystem - say you add a file to a directory. The system has to:

add a filename entry to the directory itself
allocate the initial blocks for the file, from the pool of free space in your filesystem
create the inode, which is a block of information about the file. The inode includes file modification times, owner, permissions, file type (regular file? directory? etc), and the location of its actual data blocks
if there are too many data blocks, allocate one or more "indirect blocks", which are extensions to the inode so it can hold more data blocks - inodes usually have a fixed size. Initialise these with the correct block numbers as well.
actually write the file contents to the data blocks you have allocated

If you don't do these things in the correct order, there will be times when the on-disk structure is not consistent. For example, you may have modified the directory to include an entry for the new file, but the entry points at an inode which hasn't been filled in yet. Or the inode may be filled in, but the free space pool hasn't been updated to correspond with the data block allocations in the inode. Throw in other modifications like deleting files or making them larger or smaller, and it gets pretty complicated. If the machine happens to crash at such a time - or the power goes out and you don't have a UPS - the disk will be in an inconsistent state. This has two major consequences:

the filesystem checker, or fsck (the equivalent Windows utility is scandisk) will have to run next time you boot, and go over the whole structure of your filesystem, which can take minutes or even hours on a large enough disk (80 GB takes a long time unless your disks are very fast). Nobody wants to sit around for 15 minutes waiting for the server to finish rebooting.
depending on exactly what was written to disk in what order, the fsck utility may not even be able to restore your filesystem to a consistent state at all, or it may lose important files or directories in the process of doing so.

Journalling prevents both problems (barring bugs in your OS or hardware, of course) by writing transactions to your filesystem. Instead of making changes directly to your directories, inodes, free block maps, etc, the filesystem batches up such changes by spooling them to a separate area on disk, the journal. Then, when it has written enough such changes to account for an entire, self-consistent transaction, it puts a marker in the journal indicating "transaction complete" and starts copying these changes to their usual locations on disk. Meanwhile, the next transaction can be spooled onto the end of the journal area, and it will get its own "transaction complete" marker when it is done. A journal can hold a lot of transactions - only limited by the journal size, which is usually configurable. When a transaction has been fully copied out of the journal to its final locations, it is re-labeled "journal free space" in the journal.

How does this help? Imagine that the machine goes down while a transaction is still incomplete in the journal. Next time you boot, the OS "replays" the journal: it looks for all the completed transactions and commits each part of a transaction to its correct permanent location. It ignores journal free space, and any incomplete transactions - essentially rewinding the filesystem state to the end of the last completed transaction. There is never any danger of "partially updated" filesystem state, since each transaction starts and ends with a known-consistent state.

(Ah, but what happens it the OS goes down again while replaying a journal? No big deal: next time it boots, it just replays the same journal again, which produces the same result as it would have done the first time.)

Some simplifications, obviously, but that's the basic idea. Did it help?

The different levels of journalling have to do with whether all filesystem data is journalled or only some of it. You usually only journal metadata, which is the filesystem structure: directories, inodes, free block maps, etc. That's because copying all your file contents twice (first into the journal, then into its permanent location in the filesystem) is quite slow. The main purpose of a journal is not to guarantee pristine file contents in the event of partially written files, but to ensure a consistent view of the filesystem as a whole - so you can avoid that long fsck and avoid ever ending up with a partially or fully scrambled filesystem (modulo hardware failure, of course).

HTH..

--
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README

13 of 271 comments (clear)