Slashdot Mirror


File System Design part 1, XFS

rchapman writes "Generally, file systems are not considered "sexy." When a young programmer wants to do something really cool, his or her first thought is generally not "Dude, two words... File System." However, I am what is politely termed "different." I find file systems very interesting and they have seldom been more so than they are right now. Hans Reiser is working on getting Reiser4 integrated into the Linux kernel, the BSD's are working on getting a journaled file system together, and Sun Microsystems just recently released a beta of ZFS into OpenSolaris. "

16 of 57 comments (clear)

  1. Oh, snap. by cbiffle · · Score: 3, Interesting
    the BSD's are working on getting a journaled file system together


    Oh, snap. Somebody's not running Soft Updates. :-)

    (Yes, I understand that Soft Updates is not technically metadata journalling as practiced by the Linux people. No, I don't believe there are a significant number of practical situations where the results will differ.)
    1. Re:Oh, snap. by Anonymous Coward · · Score: 3, Informative

      The main difference is, there is no fsck in XFS. None whatsoever.

      What the fuck?

      Have you read this, or even used XFS before, for that matter?

  2. File system design by Bogtha · · Score: 5, Informative

    If you're interested in this, you'll probably also be interested in Practical File System Design with the Be File System (PDF), by Dominic Giampaolo, the designer of the Be file system. There's also a Slashdot review of this book.

    --
    Bogtha Bogtha Bogtha
  3. Blatant error by lostlogic · · Score: 5, Interesting

    Sector size on hard disks is 512 bytes, not 512kbytes. WTF, don't act like an authority and be a dumbass. Imagine the data waste if we actually had 512k physical sectors on disks.

    Also the scaling numbers are completely hokey.

    --
    --Brandon
    1. Re:Blatant error by Anonymous Coward · · Score: 2, Insightful

      There are lots of other errors as well. For instance he asserts that the inode contains the filename (they don't). Other things are unclear. He refers to UFS and says it scales to around disks of 1TB, but does not define what he means by UFS (as opposed to FFS). He shows a considerable bias to PC hardware by refering to MBR's. He seems to think that taking something out of a B+-tree is faster than removing something from the front of a linked list. I have no idea why he thinks that Unix at Berkeley was "stillborn".

  4. Filesystems not sexy? by codergeek42 · · Score: 2, Funny

    You've not played much with Ext3, then, have you? =)

  5. Times must be changing... by __aaclcg7560 · · Score: 2, Funny

    So constructing a complier from stratch is no longer sexy?

    1. Re:Times must be changing... by Doug+Merritt · · Score: 2, Funny
      So constructing a complier from stratch is no longer sexy?

      It's just as much of a chick magnet as it ever was!

      But don't let that stop you. It's fun.

      --
      Professional Wild-Eyed Visionary
  6. division by newr00tic · · Score: 2, Interesting

    With everyone and their parrot talking about RAID these days, it would've been fun if some sort of dual array would work as ONE filesystem; where one(++) redunant set took care of the balancing/tree'ing, etc., (separately,) and the other(s) kept the actual files. If there was _yet_ another set (a ++third), with the relevant META-information belonging to the files, you would imagine it to be a step forward to what is now, well; I can, anyway..

    --
    A horse can't be sick, you know, even if he wants to.
  7. obligatory by DrSkwid · · Score: 4, Insightful

    If you like on disk file systems you should read Venti: a new approach to archival storage.

    Plan9's primary on-disk storage is Fossil, which runs in user mode. (Plan9 doesn't have a super user)

    You can run arbitrary programs in Plan9 that present a file/folder directory structure by using the common 9P protocol. All devices look like files and folders and can be manipulated like any other, even at the permission level.

    For instance, I have an image mounter that takes a tga file and presents 1 folder containing 4 files, red, green, blue and alpha.
    I can then use any tool I like to manipulate those files using the file semantics we are all familiar with. I even have a flag that mounts the files as textual rather than binary, i.e :
    00 00 ff ff
    00 00 ff ff
    ff ff 00 00
    ff ff 00 00

    and I can do image processing with awk !

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
    1. Re:obligatory by rplacd · · Score: 3, Informative

      The good news is, you don't need to install plan 9 to use venti. You can do it with plan9port on a Linux/FreeBSD/Mac OS X/etc box today.

  8. I concur, Mod Parent Up by NixLuver · · Score: 5, Insightful

    from TFA:
    "There is a minimum size you can write to or read from the disc. This minimum size is called a "sector," and is usually around 512k. So, unless you really like 512k files, it is very likely that you will end up either wasting space or cutting off the end of the file if your file system doesn't deal with this."

    This is clearly not a typo - which is what I was certain I would find when I did RTFA. This guy has a basic, fundamental flaw in his understanding of the very thing he's writing an article about. This is a non-starter, IMO. Combine that with poor sentence structure and bad scansion ... I mean:

    "Note: My ibook has a "30 gig" drive. This is bullshit and I'll tell you why: Drives are defined by the binary definition of mega, kilo and giga. For example, a kilobyte is not 1000 bytes, but actually 1024 bytes. However, your HD manufacturer uses the metric definitions, even up to gigabytes. Now I can see you thinking..."But Wait Mr. Mad Penguin Person...Thats patently ridiculous and means they are lying on the box." Yah... "

    If I'd written something like that, I'd delete it right away and start from scratch.

  9. NSS for Linux by marquis111 · · Score: 2, Interesting

    NSS has been ported to Linux too. That's an another modern industrial-strength filesystem with features sorely needed by Linux.

  10. Author doesn't mention his newbie status by Anonymous Coward · · Score: 2, Insightful

    From the article:
    Small difference there. It is also a very fast file system, allowing reads of up to 7 GB/sec.

    An assumption which could only be made by a newbie. Maximum throughput of a filesystem is not filesystem architecture dependent, but hardware dependent.
    I could give you 7GB/sec out of a FAT drive, given the proper hardware.
    Several other quotes suggest a bit of 'newbieness' like "B+trees are insanely complex".
    The concept was designed by a human, therefor it is clearly understandable by a human. It's not say, some potentially impossible-for-humans-to-really-comprehend law of nature.
    The author should just acknowledge that he or she does not know enough about B+trees, or does not know them well enough to illuminate the subject sufficiently.
    There's nothing wrong with that, but trying to scare people off of them just because he or she doesn't understand them well enough is damaging to potential readers.

    I want to postscript this with an evaluation that the article is not bad. If those things are changed, I would even say it is good.

    -fooburger

  11. Doesn't Live Up To Its Billing by JoshDanziger · · Score: 3, Informative

    Sorry, this article didn't really teach me anything interesting about filesystems. In general, the article was poorly written. For example, taking two sentences to say: "B+Trees are complex. Let me rephrase that. B+Trees are very, very complex." Readers of all types appreciate their time and don't want to have to waste it.

    You were lost at points between trying to sound like an expert to trying to sound like a grandfather explaining the grande old days of filesystem development. Are you a storyteller or a teacher? Pick one.

    Content-wise, there wasn't really much there for me. You spent a lot of time explaining the problems of a binary tree, but I think that your target audience already understands the time complexity of a binary tree. Then, you glaze over the B+ tree because its complicated.

    Sorry if I sound harsh. I hope that this comes off as constructive criticism.

  12. less reliability? by newr00tic · · Score: 2, Insightful

    No, I was thinking more along the lines of when(/if) META-data becomes big, and you'd get further throughtput by having it on its own drives, so as to speed things up.

    By all the three examples I provided, I tried to "account" for both speed and reliability, even though it's only a vague theory..

    --No wonder (_real_)things keep standing still for fscking 10 years at the time, and only Disney features are implemented; people turn down theories just as snappy as they turn down webdesigns (50ms, or whatever)..

    --
    A horse can't be sick, you know, even if he wants to.