Slashdot Mirror


XFS merged in Linux 2.5

joib writes "According to this notice, the XFS journaling file system has been merged into Linus bitkeeper tree, to show up in 2.5.36." Ya just know someone out there wants to have every journaling file system on one drive just 'cuz.

45 of 271 comments (clear)

  1. New file system by Gabrill · · Score: 4, Funny

    The round file gets all my bills. The manila one gets all my pay stubs. It works out ok.

    --
    Always going forward, 'cause we can't find reverse.
  2. Comparison? by FyRE666 · · Score: 3, Interesting

    Does anyone have a link to any comparisons of all these journaling filesystems, showing their strengths and weaknesses? Why shouldn't I just stick with ext3 for everything?

    1. Re:Comparison? by Wee · · Score: 3, Informative
      Does anyone have a link to any comparisons of all these journaling filesystems, showing their strengths and weaknesses?

      Google is always your friend.

      -B

      --

      Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.

    2. Re:Comparison? by rindeee · · Score: 5, Informative

      http://aurora.zemris.fer.hr/filesystems/

    3. Re:Comparison? by auferstehung · · Score: 5, Informative

      You could check out Daniel Robbins' "Advanced filesystem implementor's guide" over on IBM's developerworks. He covers reiserfs, ext3, and XFS and I believe there is a link to articles on JFS in the Resources section at the bottom of the page.

      --
      Logic is not Divine.
  3. Not just journaling by Anonymous Coward · · Score: 5, Interesting

    As I understand it, XFS also offers things like extended attributes. However, I have been told that the Linux VFS does not offer any way to read or write the attribute information?

    Is this correct? Will the VFS also be extended so that you can make use of extended attributes in XFS?

    1. Re:Not just journaling by publius · · Score: 5, Interesting

      I read them, write them and delete them all the time using the attr family of commands. 64K limitation on the current value size but that's not so bad, and in the future it will be the (I think) 512K that Irix has. When you begin to think of all the cool things you can do with that, it becomes very interesting...

    2. Re:Not just journaling by IamTheRealMike · · Score: 5, Interesting
      Is this correct? Will the VFS also be extended so that you can make use of extended attributes in XFS?

      Cooler, if I read the tea leaves right. I believe some time ago now there was a thread on lkml about whether it'd be possible to have files as also directories (and vice-versa). The reasoning behind this was simple: we want flexible filing system attributes, but not at the expense of API bloat. You want ACLs? That'll be another API then. Extended Attributes? Another API. What, you want heirarchical extended attributes too? Well you've just created another version of the filing system API haven't you.

      The theory goes (and Hans Reiser, top guy, explains it much better than I can) that by altering one of the rules of the filing system, we can get lots more power and expressiveness without having to invent lots of new APIs. Let's say you want to find out the owner of file foo. You can just read /home/user/foo/owner. You can edit ACLs by doing similar operations. Now you can have something more powerful than extended attributes, but you can also manipulate that data using the standard command line tools too! Coupled with a more powerful version of locate, you can have very interesting searching and indexing facilities.

      This has implications beyond just string attributes. Now throw in plugins, so for instance the FS layer interprets JPEGs and adds extra attributes. Now you can read the colour depth of an image by doing "cat photo.jpg/colour_depth" or whatever. You can get the raw, uncompressed version of the file by doing "cp photo.jpg/raw > photo.raw". Noticed something yet? You no longer need a new API for reading JPEG data, because you are reusing the filing system API.

      But the FS is not a powerful enough concept, I hear you cry! Have no fear, for with new storage mechanisms comes new syntax too, to allow for BeFS style live queries. If you want more info, you should really read up on this stuff at Reisers site.

      That's why ReiserFS is so good at small files as well as large files. Have you ever wondered why that is? It's not just a quirk of its design, it was very deliberate. One day, Hans wants to see us store as much information as possible in a souped up version of the filing system, so reducing interfaces and increasing interconnectedness. Or something. It sounds cool anyway :) That's one thing that RFS has that the other *FSs don't - the ReiserFS team has vision.

    3. Re:Not just journaling by jbolden · · Score: 3, Informative

      Apple is designing versions of the tools that support complex attributes for use with the HFS+ filesystem. While the specific issues are slightly different since their code is open sourced no reason it couldn't move over to Linux.

    4. Re:Not just journaling by 1010011010 · · Score: 3, Interesting


      How do I use these named streams for a directory? To re-use your example, can I:

      $ cat $HOME/owner

      and get my username? Or will it be looking for a file named "owner" in $HOME?

      --
      Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
  4. XFS FAQ by semaj · · Score: 5, Informative

    There's an XFS FAQ and a load more information about it on SGI's site - which points out that several large distributions have had XFS support for a while by default.

    Still, it's noteworthy that Linus has finally accepted it into his tree...

    --
    Meep meep
  5. Silly question by Mr_Silver · · Score: 5, Interesting
    This is a silly question but ...

    When I install Linux, and it comes to anything to do with filesystems, I just go with whatever default it gives me.

    I suspect I'm not exactly alone.

    So ... what compelling reason is there for me to use any other filesystem? Being more stable or better with data loss is nice, but considering I've only ever had this problem once, doesn't mean that i'll leap up and down going "oo oo! got to have blahFS!" any time soon.

    To give you an example, FAT16 to FAT32 was the fact you could have larger partitions. FAT32 to NTFS was because of permissions and security.

    But whatever we have now (can't remember, i barely look) to XFS? What *compelling* absolutely-must-have reason do I have to go change from whatever my installer suggests putting on for me?

    Or should I just stick with what the installer suggests from now until eternity?

    --
    Avantslash - View Slashdot cleanly on your mobile phone.
    1. Re:Silly question by MasterD · · Score: 5, Informative

      XFS supports ACL's (or access control lists) which are much better than standard UNIX permissions.

      XFS is an extent based filesystem which means that you don't end up wasting tons of space having to allocate a 4K block for every small file. And you don't need to jump through tons of indirect blocks to get large files.

      XFS allocated inodes on the fly so it grows with what data you put on there. Once again, not wasting space up front. And it sticks the inode near the file itself so the head does not have to move far on the hard drive.

      XFS supports extended attributes which can be used for all kinds of extensions later on.

      XFS has been around since 1994 and is the most mature of the journalling filesystems.

      And there are many other reasons that I cannot think of right now.

    2. Re:Silly question by fruey · · Score: 3, Informative
      Performance. Different systems are going to take more or less overhead depending on the task. Some daemons might write a lot of data to logs, you want this to be done asynchronously, you may not need the data so badly, you don't need journalling perhaps. (so use ext2??)

      Or you have a proxy, you don't care if suddenly your cached data is lost, it will soon be refilled, it's not important data, you want performance without too much security (reiserfs)?

      In fact each filesystem has inherent limits on inodes, filenames, permissions, etc... so you go with any that has a minimum for each thing you need. Journalling you don't really need unless you want to be able to step backwards or repair your filesystem in more interesting ways...

      --
      Conversion Rate Optimisation French / English consultant
    3. Re:Silly question by blakestah · · Score: 3, Informative

      1) Backup strategies. Versions of dump are available for ext2/ext3 and xfs, but not for ReiserFS (I don't know about JFS). (I don't mean to start a page cache/buffer cache debate).

      2) Journalled file systems mean fast re-boots on power outages

      3) Speed. This depends on your usage. A huge mail spool machine may use ReiserFS on the mail spool. For most people it is a wash.

      4) Ext3 can be remounted as ext2, and really good file system checking tools exist for ext2/3.

      Mostly, though, you CAN just stick with whatever the default suggests.

    4. Re:Silly question by rseuhs · · Score: 5, Insightful
      XFS supports ACL's (or access control lists) which are much better than standard UNIX permissions.

      Actually I think ACLs are the reason why everybody is running as Administrator in Windows. They are just too damn complicated.

      The Unix-permissions are simple. You can understand the concept of user-group-all in a few minutes and there are only 2 commands to remember (chmod, chown).

      Also, Unix-permissions have so far fit with everything I needed and in the rare case you really need something special, there is also sudo.

      I think ACLs are only useful for a tiny minority, IMO. I certainly don't need it.

    5. Re:Silly question by Jeremy+Allison+-+Sam · · Score: 5, Interesting

      POSIX ACLs aren't much more complex than
      standard UNIX permissions and allow you to do
      the 2 common cases :

      1). Group finance has access + user Jill
      2). Group finance has acces but not user fred.

      But then again I wrote the Samba POSIX ACL
      code so I'm biased :-).

      Windows ACLs are a complete *nightmare* in
      comparison. I still don't understand why Sun
      added an incompatible varient of Windows ACLs
      to NFSv4 (ie. it's close, but not the same as
      the real Windows ACLs. The problem is they based
      the spec. on the Microsoft documentation of how
      the ACLs work. Big mistake.... :-).

      Regards,

      Jeremy Allison,
      Samba Team.

  6. Questions... by pubjames · · Score: 3, Interesting


    When is Linux 2.6 likely to be released? I know that there is no fixed date, but what are the criteria?

    My second question... Does it really matter when the 'official' release comes out, when distribution makers "roll-their-own" anyway?

    Sorry if these sound like dumb questions to some of you, but I'd be interested to find out.

    1. Re:Questions... by bsharitt · · Score: 3, Funny

      Most distributions should have 2.6 a couple months after it is released, and Debian will have it by 2012.

    2. Re:Questions... by psamuels · · Score: 3, Informative
      The stable kernel is usually released a couple of months after the feature freeze (bugs permitting).

      +1, Funny. I think you mean after the code freeze, which usually happens a month later, well, two, three, ok, six months later. You also forgot to mention that Linus usually has multiple freezes, and the one on 31 Oct is only the first. With each successive freeze he puts on a more threatening tone, crying woe unto them who would dare tempt him to thaw the kernel again. Eventually the first code freeze happens, then maybe one or two more of those....

      Even odds we get a 2.6.0 by June.

      --
      "How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
  7. My understanding by 0x0d0a · · Score: 3, Informative

    ...is that the breakdown goes something like this:

    ext3:
    * can be told to journal everything, including data (not just metadata) -- most theoretical reliability.
    * is backwards compatible with ext2

    xfs:
    * tweaked for streaming large files to/from disk -- probably best at sequential reads/writes.

    reiserfs:
    * best performance with many, many files in a single directory.
    * Can save space on very small files with -tail option

    jfs:
    * really don't know. :-)

    1. Re:My understanding by 4of12 · · Score: 4, Interesting

      xfs:
      * tweaked for streaming large files to/from disk
      -- probably best at sequential reads/writes.

      Hm...would that imply that XFS would be say a really good candidate FS for building video streaming devices?

      Seems like it might fit well from the perspective of:

      1. high speed read write (good enough for 1080i?)
      2. quick reboots due to journaling (essential for consumer electronics devices)
      3. don't have a cow if there are a few bit errors in the stream
      --
      "Provided by the management for your protection."
    2. Re:My understanding by jgarzik · · Score: 3, Informative
      If reiserfs was inode-less, it would not work with Linux.

      Even NTFS has inodes, they simply call them "MFT records."

  8. Yes! by zentec · · Score: 3, Informative


    Despite being a little more resource intensive than ext3, XFS has to be one of the better file systems available. I've used it (obviously) on SGI's and it's been outstanding, and opted to use it before ext3, JFS and Reiserfs (although I believe Reiserfs is just as nifty).

    Having it accepted into the kernel makes upgrades a world easier, and hopefully I'll be able to move away from SGI's modified Red Hat installation. Although, I doubt Red Hat will support it out of the box.

    The other issue that needs fixing with XFS is the lack of an emergency boot disk. XFS enabled kernels are huge, and that creates a slight problem when booting from floppy.

  9. Re:New file system-attribute. by Anonymous Coward · · Score: 3, Funny

    "The round file gets all my bills. The manila one gets all my pay stubs. It works out ok. " ...and the IRS gets everything else. Time to use that 'hidden' attribute.

  10. here's an interesting read by someonehasmyname · · Score: 4, Informative

    this pdf compares how journaling file sytems compare to non-journaling systems like ffs or freebsd's soft updates.

    --
    Common sense is not so common.
  11. 2.6 kernel goodies by 0x0d0a · · Score: 4, Interesting

    2.6 has got me more excited than recent minor releases. Some of the things that look cool:

    * ALSA support. ALSA is a pain to keep patching your kernel with every redownload. ALSA is a Good Thing, if a pain in the butt to configure. My guess is that there will be decent front ends on top of the thing when distros start shipping 2.6.
    * Batch priority/boosted effect of nice levels. I've always felt that "nicing" something didn't have enough effect -- nicing something by one level is almost unnoticeable. 2.6 boosts this change. It also introduces batch priority, where a process gets *no* CPU time if there is *any* non-batch process in the runnable queue. Very sexy.
    * Low, low latency. Just as 2.4 emphasized good multiproc support, 2.6 is emphasizing low latency. Preemptive kernel, lots of disabled-interrupt time being reduced (especially the godawful framebuffer console), etc, etc. This is top-notch for both I/O performance and multimedia. Linux kernel 2.6 is supposed to beat any current release of Windows in audio latency when released.

    The only thing that I really wish Linux had was a prioritized disk scheduler. Linux can prioritize network traffic. It can prioritize processes. It just can't do the same with disk I/O. This is a shame, since I want my MP3 player not to skip when reading MP3s/paging, followed by X getting next highest priority when paging (so that the UI doesn't freeze up for long when paging something back in), and Linux just doesn't yet have the functionality. Currently, you can have a nice 20 process that's busy untarring a large tarball...and all your paged out processes will be blocked, waiting for this stupid tarball to finish.

    1. Re:2.6 kernel goodies by paulbd · · Score: 3, Informative

      the skipping in your mp3 player has nothing to do with disk i/o. it has to do with scheduling latency. that is, unless your mp3 player has been poorly designed, which many of them have been.

      also, 2.5/2.6 is still missing the better patches for low latency (from andrew morton), and so its performance is still not as good as it could be.

      2.6 doesn't beat windows at audio latency when using WDM drivers for windows. it (along with 2.2 and 2.4) beat windows with MME drivers. the WDM audio driver model is very fast, and windows has always done a better job of handling scheduling latency than linux (other than with andrew's patches). in 2.4 there are still places in a mainstream kernel that will stall the entire box for up to 1/10 second.

    2. Re:2.6 kernel goodies by 0x0d0a · · Score: 3, Interesting

      The skipping in your mp3 player has nothing to do with disk i/o. It has to do with scheduling latency.

      Not true. I've done quite a bit of poking around this issue. I have plenty of spare CPU time, and I'm not using a sound server or similar. The problem comes when reading an mp3 from disk (and no, this is not a "DMA/umasked interrupts" is not on issue) and other *heavy* sequential disk i/o is being done by another piece of software (because of the amount of data, tar xzvf is frequently the culprit). Linux heavily weights disk scheduling towards overall performance, not fairness. Besides, this isn't mp3-specific -- other software does it too. Try cat /dev/zero > foo and then trying to ls a directory. Extremely long delay. Heck, try doing said operation when playing an mp3 and you'll see the skipping I'm talking about. Seriously, try it -- it takes about ten seconds to try.

      I remember seeing benchmarks of various Windows audio latencies and Linux latencys, and at least the low-latency people had Linux at least a couple of ms below Windows. I wasn't aware that only some of these patches were going in, though, so that could be the difference between what we're talking about.

    3. Re:2.6 kernel goodies by Adnans · · Score: 3, Interesting

      The problem comes when reading an mp3 from disk (and no, this is not a "DMA/umasked interrupts" is not on issue) and other *heavy* sequential disk i/o is being done by another piece of software (because of the amount of data, tar xzvf is frequently the culprit).

      The skipping is caused by scheduling latency, as Paul suggests. I have written an mp3 player for Linux (see URL) and it only really skips when the audio output thread is not scheduled in time to satisfy the soundcard's needs. I.e. the Linux scheduler needs to make sure that whenever the audio thread wants to fill the soundcard buffers it must get the highest priority to do so. For example if you are using a soundcard buffer that is split into 2 fragments of 1024 bytes each that means that the audio thread needs to be scheduled every 6ms, 3ms for 512 byte fragments (44KHZ stereo, 16bit output). Even when your soundcard buffer size is 50 or 100ms deep you can very easily cause skipping if your audio thread is not scheduled for 100ms or longer. And this is pretty normal on a vanilla kernel for non-realtime scheduled processes. Think about it, your "cat > /dev/zero" has the same priority as your audio thread so they have equal rights to the CPU, however the audio thread has much stricter scheduling needs since you will get audio skips whenever it is scheduled too late (i.e. the soundcard buffers get depleted)

      In short, the soundcard will be starved of ready to play PCM data long before the decoder will be starved of MP3 encoded data (from disk). In the end it doesn't really matter because your music still skips, but it is important to identify exactly why it's skipping.

      -adnans

      --
      "In short: just say NO TO DRUGS, and maybe you won't end up like the Hurd people." --Linus Torvalds
  12. My experience with XFS by chrysalis · · Score: 5, Interesting

    I've been running Gentoo Linux for some times with XFS. Here's my experience with this filesystem :

    - It's extremely reliable. Filesystems never got corrupted, even after a lot of ugly reboots.

    - Recoveries after a crash are really fast. Almost immedate, better than ext3 and reiserfs.

    - Every needed tool is available to resize filesystems, check filesystems, analyze filesystems and backup/restore filesystems.

    - _BUT_ there's something strange. Basically during disk I/O, the whole system is unresponsive. While I'm compiling something, KDE becomes slow, playing videos is not smooth at all, etc. Just as if it didn't scale at all for concurrent disk access. So I finally switched back to ReiserFS just because of this. Maybe the 2.5.x series of kernel behaves differently.

    --
    {{.sig}}
    1. Re:My experience with XFS by red_dragon · · Score: 3, Informative

      Just wondering, are you using the custom kernel from Gentoo? If so, have you compiled your kernel with either/both of the low latency patch and/or the preemptible kernel patch? What are your experiences with either of those two options when running XFS? I'd expect the use of either of those two to improve a system's responsiveness to user interaction when doing a lot of disk I/O, but if those don't help when using XFS, I wonder what kind of black magic is going on inside that code.

      --
      In Soviet Russia, Jesus asks: "What Would You Do?"
    2. Re:My experience with XFS by josh+crawley · · Score: 5, Informative

      ---"- Recoveries after a crash are really fast. Almost immedate, better than ext3 and reiserfs."

      Hmmm.. I'd assume that ext3 wouldn't be as good.. A fix on a fix usually sucks. And then I've heard about Reiser's file truncation problems. I use Reiser and no big problems."

      ---"- _BUT_ there's something strange. Basically during disk I/O, the whole system is unresponsive. While I'm compiling something, KDE becomes slow, playing videos is not smooth at all, etc. Just as if it didn't scale at all for concurrent disk access. So I finally switched back to ReiserFS just because of this. Maybe the 2.5.x series of kernel behaves differently.

      I've had the same problems on 2.2.X when I didn't tweak my HD's to dma66 32 bit. Try doing a:

      hdparm /dev/(drive linux is on)
      hdparm -tT /dev/(drive linux is on)

      If you dont like those settings, Drop into single user mode, with / read only and do this command

      hdparm -X66 -d1 -u1 -m16 -c3 /dev/hda

      Now manually do a fsck on that partition. If you have errors, it's a bad mode. But if it works, then redo the -tT option (it's a benchmark).

      Be aware that 2.4 does most of this for you, but sometimes can give to little of a setting (so your performance sucks). Then again, you could have an unsupported IDE device.

      All the best..

  13. Red Hat DOES NOT has XFS... by Booker · · Score: 4, Informative

    This isn't correct... if it were correct, I would not have spent so much time working on a
    custom Red Hat installer for XFS. :)

    There is some XFS-aware code in the Red Hat Linux installer, but there is no kernel support or userspace tools available, so what you propose simply can't work.

    However, SuSE, Mandrake, Gentoo, Slackware, and Debian (to some extent) do have XFS support.

  14. But where is e2compr by Kynde · · Score: 3, Insightful

    There are systems where we simply don't and won't have enough disk space and where speed is not of the essence. We have them now, and we will continue to have them in the future.

    Being a linux developer for embedded production boxes and given the current increasing interest over linux in embedded along with embedded boxes typically running _WITHOUT_ hard disks (mostly just flash chips of some sort, due to their better life-time), I cannot help wondering why the kernel mailing list shows little or no interest towards ext2 (or ext3) compression.

    JFFS and JFFS2 don't come into question in most cases as they tear through the fs layers and cannot be used with IDE flash chips for example.

    Alcatel even released it two weeks ago for 2.4.17... loads of people, like me, must have ported it to 2.4.19 by now. But to get ext2 compression to 2.5.XX, forget it... but why?

    This little like the lack interest towards under clocking, eventhough once you've overclocked your main computer to the max, you will start looking for more silent option, if not for the desktop computer, but for the closet firewall. Even if you don't have the interest now, you will, once you shack in with a gal.

    --
    1 Earth is warming, 2 It's us, 3 it's royally bad, 4 we need to take action NOW
  15. Re:Cool by ShawnX · · Score: 3, Informative

    Try my patches at http://xfs.sh0n.net/2.4. They merge in XFS with 2.4.20-pre7 (current) and rmap =)

    Shawn.

    --
    Everyone wants a Tux in their life.
  16. Re:My personal experience by kubrick · · Score: 3, Informative

    # man tune2fs

    (you can turn fscks off, change the number of mounts or make it time-dependent, etc.)

    --
    deus does not exist but if he does
  17. Re:My personal experience by psamuels · · Score: 3, Informative
    Every month or so, I had to sit through the following:
    "Warning: drive has been mounted more than 30 times, check forced" on the ext3 partition

    This is a safety feature. Filesystem corruption can be caused by hardware funnies as well as software bugs. Your memory could be flaky, your hard drive could be on its way out, your IDE cable could be too long, your SCSI chain could be improperly terminated, your motherboard might be iffy, your CPU could be running too hot. There might be software bugs in the generic kernel, the block / scsi drivers, the ext3 code, or even some random driver that has nothing to do with filesystems or memory management.

    Because of this, ext2 and ext3 have tunable parameters for how often to force an fsck, overriding the fact that the fs is supposed to be in a known clean state. Apparently reiserfs does not have this safety feature - or does it? (I don't know.)

    If this annoys you, turn it off. 'man tune2fs', or specifically,

    tune2fs -c0 -i0 /dev/your/filesystem

    HTH..

    --
    "How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
  18. An interesting thing about XFS... by Scooby+Snacks · · Score: 3, Informative
    I hear that it's the only Linux filesystem that is endian-safe. IOW, you can move it from a system of one endian type to a system of the other type and it will still work. No other filesystem for Linux currently is able to make that claim.

    I find that very cool, for some reason. I guess one practical application is if you have a box that is the only one of that type (either big-endian or little-endian) that dies and you need to recover the data.

    --

    --
    Runnin' around, robbin' banks all whacked on the Scooby Snacks...
  19. Why is kernel-image so big? by Thagg · · Score: 3, Interesting

    I recently installed Linux-XFS on one of my computers here, as I was having problems with the kjournald process under ext3 taking extremely unreasonable amounts of time -- and I had had wonderful experiences with XFS on our SGIs -- it's always been solid and fast. Various reviewers of ext3 had complained about the existence of kjournald -- disputing the need for a user-code daemon.

    Several places it is mentioned, though, that the kernel image of XFS is very large, so much that you can't really fit it onto a floppy (although people over-format their floppies to get 1.8 MB or so onto them, and then the kernel might just barely fit.)

    I can't understand why any filesystem should be so big -- it seems that the code to run the filesystem is almost as big as the rest of Linux put together. How can this be? Is it really all code? What could that code possibly be doing?

    I studied XFS fairly extensively after I had to repair a disk that had 1 of its 23 heads fail. From the remaining 22/23rd of the disk I managed to recover almost every file and directory, by writing my own XFS filesystem interpretation code. The on-disk organization of the filesystem is fairly simple and straightforward, I can't imagine where the hundreds of K of code is going.

    I won't be shocked if the answer does lie in that kjournald daemon -- that XFS is bigger than ext3 because ext3 puts most of the bloat into a user-mode daemon instead of the kernel.

    thad

    --
    I love Mondays. On a Monday, anything is possible.
  20. Related question by Quixote · · Score: 3, Interesting

    XFS has a file size limit of 32TB (or so, I think), with a _filesystem_ limit in the EBs. But, I've heard that the Linux VFS layer has a max file size limit of 1TB. Is it possible to create files > 1TB on a Linux+XFS box ? Unfortunately, I don't have the resources to try it out just yet... :-)

    1. Re:Related question by foobar104 · · Score: 3, Informative
      Just FYI, XFS on IRIX can support files up to 9 million terabytes (9 EB) and filesystems up to 18 million terabytes (18 EB).

      It's more complex under Linux. Here's the Linux-specific answer to this question from the FAQ:
      Q: Does XFS support large files (bigger then 2GB)?

      Yes, XFS supports files larger then 2GB. The large file support (LFS) is largely dependent on the C library of your computer. Glibc 2.2 and higher has full LFS support. If your C lib does not support it you will get errors that the valued is too large for the defined data type.

      Userland software needs to be compiled against the LFS compliant C lib in order to work. You will be able to create 2GB+ files on non LFS systems but the tools will not be able to stat them.

      Distributions based on Glibc 2.2.x and higher will function normally. Note that some userspace programs like tcsh do not correctly behave even if they are compiled against glibc 2.2.x

      You may need to contact your vendor/developer if this is the case.

      Here is a snippet of email conversation with Steve Lord on the topic of the maximum filesize of XFS under linux.

      I would challenge any filesystem running on Linux on an ia32, and using the page cache to get past the practical limit of 16 Tbytes using buffered I/O. At this point you run out of space to address pages in the cache since the core kernel code uses a 32 bit number as the index number of a page in the cache.

      As for XFS itself, this is a constant definition from the code:

      #define XFS_MAX_FILE_OFFSET ((long long)((1ULL<<63)-1ULL))

      So 2^63 bytes is theoretically possible.

      All of this is ignoring the current limitation of 2 Tbytes of address space for block devices (including logical volumes). The only way to get a file bigger than this of course is to have large holes in it. And to get past 16 Tbytes you have to used direct I/O.

      Which would would mean a theoretical 8388608TB file size. Large enough?
  21. Re:My personal experience by Leto2 · · Score: 3, Funny

    And why do you reboot every day?

    --
    <grub> Reading /. at -1 is like driving through Cracktown in a convertible that is stuck in 1st
  22. Re:Journalling filesytems... by psamuels · · Score: 5, Informative
    What exactly is 'journalling'?

    Here's the basic theory. Think about what happens when you make a change on a filesystem - say you add a file to a directory. The system has to:

    • add a filename entry to the directory itself
    • allocate the initial blocks for the file, from the pool of free space in your filesystem
    • create the inode, which is a block of information about the file. The inode includes file modification times, owner, permissions, file type (regular file? directory? etc), and the location of its actual data blocks
    • if there are too many data blocks, allocate one or more "indirect blocks", which are extensions to the inode so it can hold more data blocks - inodes usually have a fixed size. Initialise these with the correct block numbers as well.
    • actually write the file contents to the data blocks you have allocated

    If you don't do these things in the correct order, there will be times when the on-disk structure is not consistent. For example, you may have modified the directory to include an entry for the new file, but the entry points at an inode which hasn't been filled in yet. Or the inode may be filled in, but the free space pool hasn't been updated to correspond with the data block allocations in the inode. Throw in other modifications like deleting files or making them larger or smaller, and it gets pretty complicated. If the machine happens to crash at such a time - or the power goes out and you don't have a UPS - the disk will be in an inconsistent state. This has two major consequences:

    1. the filesystem checker, or fsck (the equivalent Windows utility is scandisk) will have to run next time you boot, and go over the whole structure of your filesystem, which can take minutes or even hours on a large enough disk (80 GB takes a long time unless your disks are very fast). Nobody wants to sit around for 15 minutes waiting for the server to finish rebooting.
    2. depending on exactly what was written to disk in what order, the fsck utility may not even be able to restore your filesystem to a consistent state at all, or it may lose important files or directories in the process of doing so.

    Journalling prevents both problems (barring bugs in your OS or hardware, of course) by writing transactions to your filesystem. Instead of making changes directly to your directories, inodes, free block maps, etc, the filesystem batches up such changes by spooling them to a separate area on disk, the journal. Then, when it has written enough such changes to account for an entire, self-consistent transaction, it puts a marker in the journal indicating "transaction complete" and starts copying these changes to their usual locations on disk. Meanwhile, the next transaction can be spooled onto the end of the journal area, and it will get its own "transaction complete" marker when it is done. A journal can hold a lot of transactions - only limited by the journal size, which is usually configurable. When a transaction has been fully copied out of the journal to its final locations, it is re-labeled "journal free space" in the journal.

    How does this help? Imagine that the machine goes down while a transaction is still incomplete in the journal. Next time you boot, the OS "replays" the journal: it looks for all the completed transactions and commits each part of a transaction to its correct permanent location. It ignores journal free space, and any incomplete transactions - essentially rewinding the filesystem state to the end of the last completed transaction. There is never any danger of "partially updated" filesystem state, since each transaction starts and ends with a known-consistent state.

    (Ah, but what happens it the OS goes down again while replaying a journal? No big deal: next time it boots, it just replays the same journal again, which produces the same result as it would have done the first time.)

    Some simplifications, obviously, but that's the basic idea. Did it help?

    The different levels of journalling have to do with whether all filesystem data is journalled or only some of it. You usually only journal metadata, which is the filesystem structure: directories, inodes, free block maps, etc. That's because copying all your file contents twice (first into the journal, then into its permanent location in the filesystem) is quite slow. The main purpose of a journal is not to guarantee pristine file contents in the event of partially written files, but to ensure a consistent view of the filesystem as a whole - so you can avoid that long fsck and avoid ever ending up with a partially or fully scrambled filesystem (modulo hardware failure, of course).

    HTH..

    --
    "How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
  23. More on inodes (was Re:My understanding) by jgarzik · · Score: 3, Informative
    AFAIK in ReiserFS inodes are not used the way they're in traditional FS'. You certainly need to present the inode layer to the OS, but. They use Balanced trees for block allocation. AFAIK you do not end up with a fixed number of "inodes" after ReiserFS is created.

    You're mixing filesystem features up. To clear things up a bit,

    • Individual inode records need not be of a fixed size.
    • The inode table (total number of inodes) need not be a fixed size, and it can even be moved around, and spread across, various physical locations on the disk.
    • The inode table can either have a special-cased storage method (ext2/3), or simply be stored using the filesystem's own block allocation methods -- in effect treating the inode table as a "normal file" (jfs, ntfs, several others) This second method has the property of being very flexible: just as it is trivial to extend the length of a normal file [i.e. append], it is trivial to add new inodes to an inode table that the filesystem treats internally as a "normal file."
    There are wild and varied ways to store inodes. But ReiserFS definitely has them. :)

    Regards,

    Jeff