Slashdot Mirror


Advanced Filesystem Implementors Guide Continues

Tom writes: "This is part six of the Advanced filesystem implementor's guide. I've been following an outstanding series of articles about implementing the advanced filesystems that are available with Linux 2.4. The author really knows his stuff and has done a great job with explaining Reiserfs, XFS, GFS, and the other file systems that are available." The series gets into greater depth as it goes on; you may want to start with Part One and work on from there.

8 of 60 comments (clear)

  1. Re:NTFS ? by pete-classic · · Score: 3

    My dad called me the other day because Win2k wouldn't let him delete a particular file. I found this article Q246026.

    Notice that the resolution is "backup, format, restore becuase we are too lame to write a filesystem integrity checker that acutally works." (or words to that effect.)

    If that's state of the art, I'll keep tried and true, thanks.

    As far as the Linux driver never going "stable" don't you think that it might have something to do with the facts that 1. NTFS is a moving target 2. NT and NTFS have bugs that "cooperate" making it very difficult for someone else to write a compatable driver?

    -Peter

  2. File systems obselete? by Peaker · · Score: 3, Interesting

    It seems to me, the more I think about it, that file systems should be buried in the past, as the idea of mapping a hierarchy of string identifiers to serialized objects is not quite the way to do it.

    Firstly, a much better user interface to objects would be a relational database the user can query anything on.

    As for a system interface to objects, why force the objects to be serialized? Use orthogonal persistency. This method is more efficient, and easier for the applications. It actually makes persistency transparent, except for critical applications, that need to persist something now in which case, they can use a journalling interface.

    In summary:
    - Replace file system persistency with orthogonal persistency.
    - Replace the hierarchic-string uesr interface with a relational database.

    1. Re:File systems obselete? by Ian+Bicking · · Score: 3, Informative
      I dunno about the database idea. Is the filesystem going to be replaced by one specific set of tables with conventional fields? I can see how that would be done, but I honestly can't see why. You just got another heap of data with a different set of metadata.

      The other option is a database with dynamic tables, that would somehow fit the data. I don't know how you are going to manage that, though... can any application make tables that make sense for its problem space? How are those tables partitioned off so that you have some degree of safety, that one application doesn't step on another? How are they then integrated, so information from one application can be used in another?

      A non-relational database might make more sense, I believe they are often called Object Databases (not to be confused with an OO RDBMS). That's really just a way of saying "orthogonal persistence", except maybe that they aren't completely orthogonal (they require some extra programming to use).

      The problem with orthogonal persistence, that I see, is all the junk that can collect. Having used Squeak, which offers a certain sort of persistence in its images, transient objects can pile up fairly easily and lead to a sort of faux-memory-leak in the system. It's a convenient system, but not stable.

      Serialization provides a certain discipline -- it's like you have a checkpoint in the application when everything gets consolodated into something well-defined and granular.

      Now, you don't have to serialize to apply this sort of discipline. But orthogonal persistence just makes it so damn easy to be undisciplined. I feel like there's some major work to be done to find a way to manage such a large collection of interrelated objects with indefinite lifespans.

    2. Re:File systems obselete? by Peaker · · Score: 4, Informative
      Persistency: Data's "survival" throughout time, power breaks, etc. Persistent memory is non-volatile memory (disk, for example).
      Persistency in operating system is usually achieved by writing things to disk, in order to persist them.

      ... could you explain what you mean when you say objects in a filesystem are forced to be serialized?

      Not all data in a file system can be stored as it is in memory, because pointers, and other information must be converted to persistent form. Often objects are stored in very difficult ways to write to disk (by being spread on many small linked objects, for example). This means you must serialize the data into the disk, by converting it to a stream of 1's and 0's, that allows reconstructing the objects' structure. This requires a lot of work for every application and object implementor, as they have to create methods to serialize, and de-serialize the objects, from their normal repserentation to a persistent streamed representation.

      And what orthogonal persistency is. It sure sounds good, but I would really like to know what it means.

      Orthogonal persistency is persistency implemented by the underlying operating system, rather than every application writer.
      The entire system state is saved to disk every once in a while, in a checkpoint.
      Mechanisms are used to ensure there's always a stable/reliable checkpoint to go back to. Some schemes even let you roll back to any checkpoint in the past. Typically, checkpoints are done every 5 minutes.

      Orthogonal persistency is totally transparent to applications. They seem to 'live forever', and do not need to explicitly persist or serialize their information. They can keep it represented as objects, or whatever representation they choose for their own simplicity.

      Orthogonal persistency treats RAM as a cache to the disk, and thus achieves two purposes.

      Simplicity: There is only non-volatile memory, rather than volatile, and non-volatile memory, that are allocated and managed separately

      Performance: It is much easier to optimize this system, as there are no file caches, and memory swap areas on disk. Instead, you treat the entire RAM as a cache to the disk, allowing simpler and more powerful page caching algorithms, that do not have to guarantee things such as quick disk writes for files, as file systems do.

      An amazing advantage for orthogonally persistent systems, is that due to the entire chunk of dirty pages from memory being copied to disk at once, it can sequentially move the disk heads across the disk to update all necessary areas. This process is called migration, and is a far more efficient method of updating the disk from the volatile state, than the explicit update used by current file systems.

      Yet another advantage, is that due to the entire system state being preserved as a whole, more powerful security schemes can be used. The whole load-from-file process can be avoided, and with it, the security problems of identifying who has access to what file, and why.

    3. Re:File systems obselete? by Ian+Bicking · · Score: 3, Informative
      "Persistence" is what a filesystem provides, and RAM does not -- the object can remain in existance indefinitely.

      "Serialization" means you take your object and turn it into a stream of bytes of some sort. Some more introspective languages, like Python, Smalltalk, and Java allow very easy serialization, but in something like C you spend a lot of time figuring out how to do it. Even if it is indirect, most files somehow represent an object that was in memory and can be put back into memory at a later time.

      "Orthogonal" means that something is seperate from something else -- or more specifically, that while two aspects of a thing are related, you can work with one without effecting the other. Kind of -- it's a subtle (though very useful) notion.

      "Orthogonal Peristence" means that all objects persist indefinitely with no effort from the programmer. "Orthogonal" refers to the fact that the persistence happens without any relation to other aspects of the program -- everything just persists by default. While it may involve serialization, this is hidden from the programmer, as is any other technique that supplies the persistance.

      In such a system there wouldn't be any distinction between objects in RAM or on a disk -- often that is then expanded to objects that are also remote (similar to CORBA, but again, the network access is orthogonal and invisible). Anyway, the system moves things to disk as it needs to, and pulls them off as needed.

      I brought up the cleanliness issue before, but the other issue is scaling. Particularly something like garbage collection is a bit difficult, because you can't just do a mark-and-sweep every so often, because anything on the entire disk could contain a reference.

      EROS has this, Smalltalks have generally had this (you might wish to look at Squeak), and the old Lisp machines also tended to have orthogonal persistence.

  3. Re:Maybe its time for convegence by chabotc · · Score: 3, Insightful

    Not completely true. One file system might make it to be the 'default instalation choice' in most distributions, but each of the 3 journaled FS's has there own set of features and targeted markets.

    ReiserFS, is a top-tech journaling file system which can be _very_ fast with some situations (large directories, etc), but as hans reiser pointed out, his purpouse is not to make a stable FS, but to keep development up, inventing new and cool technikes.. so not your #1 production choice for some.

    XFS is known for its high output and parralism. In its roots it was tuned for streaming video and audio, and to work wel with _many_ cpu's (think >> 32).

    JFS has a bit more mainframe background, stable (slower?), and secure..

    ofcource each day they grow a little closer together (each wants all advantages), but untill one of them reaches the status 'ultimate FS', i think there is plenty of room for multiple visions and implimentations.

  4. Encrypted loopback root example. by Adam+J.+Richter · · Score: 4, Informative

    I am posting this from a notebook computer that has all partitions encrypted except for a boot partition at the front of the disk. The kernel boots an initial ramdisk with an /sbin/init script that does essentially the following, using cryptoapi, the successor to the linux "kerneli" patches.

    modprobe cryptoapi
    modprobe cryptoloop
    modprobe cipher-aes
    losetup -e AES /dev/discs/disc0/part6 /dev/loop/0
    Password:
    mount -t ext2 /dev/loop/0 /newroot
    cd /newroot
    exec ./bin/chroot . ./sbin/init $@

    This should work with any disk file system, not just ext2.

    I have been using this arrangement for several months now on a couple of computers, the slowest of which is a Kapok 1100M that uses a 233MHz Pentium II process and, I believe, PC-66 SDRAM. On that computer, the change in interactive responsiveness is hard to notice, but it is noticible for disk intensive activities. I have not timed it, but I think that big rsync runs are at least a factor of two slower.

    I do not run swapping on these computers, as I've seen claims that there are more potential deadlocks when attempting to swap to an encrypted partition than when attempting to swap to an unencrypted partition.

    I hope this information is helpful.

  5. Re:Maybe its time for convegence by blakestah · · Score: 3, Insightful

    Why does it seem like all the responses to this are full of hot air ?

    Some journalling filesystems exist because there are UNIX companies with expertise in them that support them, like XFS and JFS.

    Some journalling filesystems are a natural migration for most linux users - like ext3.

    And some people want to re-invent filesystems en todo like Hans Reiser, and a good journalled filesystem is just the first stop.

    More than one is just "value added". They all work. They are all secure and stable. Some are faster than others - but XFS, ReiserFS and ext3 are all "fast enough" for almost any uses.

    The parent echos a common complaint about Free Software - that developer resources are not dedicated appropriately. Well, developers work on what they want, or what they are paid to work on. This often leads to multiple efforts that accomplish similar goals - like window managers, desktop environments, word processors, journalled filesystems, VM management etc. But ultimately competition is good if intelligent test results are publicized.

    Look at the Mindcraft web server benchmark results about 18 months ago. Now, linux blows the doors off IIS in the exact same test. The same is becoming true of filesystems. Test results show ext2/3 is slow with lots of small files - so a developer named Daniel Phillips added a directory hash that fixes this shortcoming.