Slashdot Mirror


Ext4 Advances As Interim Step To Btrfs

Heise.de's Kernel Log has a look at the ext4 filesystem as Linus Torvalds has integrated a large collection of patches for it into the kernel main branch. "This signals that with the next kernel version 2.6.28, the successor to ext3 will finally leave behind its 'hot' development phase." The article notes that ext4 developer Theodore Ts'o (tytso) is in favor of ultimately moving Linux to a modern, "next-generation" file system. His preferred choice is btrfs, and Heise notes an email Ts'o sent to the Linux Kernel Mailing List a week back positioning ext4 as a bridge to btrfs.

94 of 510 comments (clear)

  1. BTRFS? REALLY? by erroneus · · Score: 4, Interesting

    Couldn't they come up with a better name than "BuTteR FaSe?" I know I can't be the only one who read it like that. Call it anything but that.

  2. BTRFS? by Anonymous Coward · · Score: 5, Funny

    So it incorporates compression by vowel ommission? Brllnt!

  3. Why not ZFS? by mlts · · Score: 5, Interesting

    Unless ZFS has patent issues, why not just work on having ZFS as Linux's standard FS, after ext3?

    ZFS offers a lot of capabilities, from no need to worry about a LVM layer, to snapshotting, to excellent error detection, even encryption and compression hooks.

    1. Re:Why not ZFS? by PhrostyMcByte · · Score: 5, Informative
      I am not aware of the differences, but from Theodore Ts'o:

      people who really like reiser4 might want to take a look at btrfs; it has a number of the same design ideas that reiser3/4 had --- except (a) the filesystem format has support for some advanced features that are designed to leapfrog ZFS, (b) the maintainer is not a crazy man and works well with other LKML developers (free hint: if your code needs to be reviewed to get in, and reviewers are scarce; don't insult and abuse the volunteer reviewers as Hans did --- Not a good plan!).

    2. Re:Why not ZFS? by Anonymous Coward · · Score: 5, Informative

      The ZFS developers specifically wanted the open sourced code to be under a GPL incompatible license, hence it has been released under CDDL (there was a interview with the Sun open source rep, can someone provide info/links about this). So ZFS cannot be part of the kernel, but there is a FUSE port of ZFS and according to http://en.wikipedia.org/wiki/ZFS#Linux Sun is investigating a Linux port, so there may be something good coming

    3. Re:Why not ZFS? by Mad+Merlin · · Score: 4, Insightful

      ZFS offers a lot of capabilities, from no need to worry about a LVM layer, to snapshotting, to excellent error detection, even encryption and compression hooks.

      ...and that's it's biggest problem. ZFS duplicates a lot of functionality that belongs outside of a filesystem. All of the above can already be done using any Linux filesystem, so why keep around a second copy of all that code that implements those features for just a single filesystem?

      ReiserFS was (is) in a similar situation, where it also duplicates a lot of functionality that doesn't belong in the filesystem. Not only does this make it harder to maintain, but it makes a lot of features filesystem specific that shouldn't be.

    4. Re:Why not ZFS? by volsung · · Score: 4, Informative

      I don't know about the patents, but the current major obstacle is the license. ZFS, as part of the OpenSolaris kernel, is available under the CDDL. The CDDL is incompatible with the GPL, ruling out ZFS inclusion directly in the Linux kernel. Sun has hinted that they could dual license the Solaris kernel under CDDL and GPL, but that hasn't happened yet. Small parts of the ZFS filesystem code have been GPLed so they could be added to grub to support booting ZFS root filesystems.

      There is a userspace port of the ZFS code and utilities which avoids the license problem by using FUSE to separate the filesystem code into a separate process: ZFS-FUSE.

      If Sun were to ever dual-license ZFS, the ZFS-FUSE codebase would be a good place to start for porting the code to direct kernel inclusion. (Note: Sun, via their subsidiary, Cluster File Systems, now employes the author of ZFS-FUSE to use his port as an optional backend for the Lustre file system.)

    5. Re:Why not ZFS? by Wonko · · Score: 5, Informative

      ZFS duplicates a lot of functionality that belongs outside of a filesystem. All of the above can already be done using any Linux filesystem, so why keep around a second copy of all that code that implements those features for just a single filesystem?

      It wouldn't be possible to duplicate RAID-Z with LVM. Other features of ZFS are very handy, but RAID-Z is by far my favorite. Same storage density as RAID 5 but without the horrible write performance. RAID-Z uses copy-on-write to avoid RAID 5's required read for every non-cached write.

      Being able to create filesystems just as easily as creating directories is quite handy as well, though. IIRC, the filesystem sizes in ZFS are controlled by a quota style system. That is much simpler than shrinking an LV (if your filesystem supports shrinking), then adding a new LV, and then creating a filesystem. I don't know about you, but I am always a bit nervous when I have to resize an LV.

    6. Re:Why not ZFS? by 42forty-two42 · · Score: 3, Informative

      Sun has some patents on ZFS; the CDDL grants a license to these patents if you're deriving from the original ZFS source, but then you can't link it to linux.

      FWIW, I doubt ZFS-FUSE would be a good place to start - FUSE is totally different from Linux's actual vfs layer, after all.

    7. Re:Why not ZFS? by mritunjai · · Score: 4, Informative

      The ZFS developers specifically wanted the open sourced code to be under a GPL incompatible license, hence it has been released under CDDL (there was a interview with the Sun open source rep, can someone provide info/links about this). So ZFS cannot be part of the kernel, but there is a FUSE port of ZFS and according to http://en.wikipedia.org/wiki/ZFS#Linux Sun is investigating a Linux port, so there may be something good coming

      Rather, GPL is incompatible with anything else that can't be re-licensed as GPL, and that includes GPL v2 and v3, which can't even be mixed among themselves. May first we clear that mess, right ?

      ZFS is present in both Mac OSX and FreeBSD, thank you! They have no license issues whatsoever.

      --
      - mritunjai
    8. Re:Why not ZFS? by setagllib · · Score: 2, Insightful

      A FUSE ZFS guarantees it will never be the "default" filesystem anyway. BTRFS has a good shot at being your / in a couple of years.

      --
      Sam ty sig.
    9. Re:Why not ZFS? by clarkkent09 · · Score: 5, Insightful

      (b) the maintainer is not a crazy man and works well with other LKML developers

      Also important, he might be more focused due to not being in prison for first degree murder

      --
      Negative moral value of force outweighs the positive value of good intentions.
    10. Re:Why not ZFS? by GrievousMistake · · Score: 5, Interesting

      Huh. One of the interesting things things about Reiser4 from an end-user perspective was Hans Reisers plans for file metadata. From what I can find about btrfs, it currently doesn't even support normal extended attributes. There was also talk about making it easy for developers to extend the filesystem with plugins that could add e.g. compression schemes.
      I can't really recognize anything from Hans Reiser's ramblings in the btrfs documentation that isn't standard file system improvements already seen in e.g. ZFS. does anyone have any specific examples of the ZFS-leapfrogging features referred to?

      --
      In a fair world, refrigerators would make electricity.
    11. Re:Why not ZFS? by Ivlis · · Score: 3, Informative

      Parts of ZFS are patented, but the license allows running it in userspace using FUSE.

      I'm confused: if we ask people why not run ZFS using FUSE, they reply because it's slow (I'm assuming it's possible to load ZFS at boot time using an initrd). And if we ask people which is better monolithic or microkernel, they reply microkernel. But ZFS using FUSE would be like a microkernel driver, so which is it?

    12. Re:Why not ZFS? by Anonymous Coward · · Score: 5, Funny

      Huh. One of the interesting things things about Reiser4 from an end-user perspective was Hans Reisers plans for file metadata.

      No, the most interesting feature of ReiserFS is this one (look to the far right).

      --
      ReiserFS: It puts the "stab" in "/etc/fstab".

    13. Re:Why not ZFS? by Xaria · · Score: 3, Informative

      No, it wouldn't. A microkernel loads modules into the kernel space. You're talking about running in user space. So when an application makes a system call, the kernel has to translate it to the FUSE layer into user space. So there's an extra layer consuming time. On top of that, kernel space isn't generally swapped out, but user space can be. Obviously it should never happen, but wouldn't it suck if your disk driver was swapped out?

      See the diagram at the bottom of this page: http://fuse.sourceforge.net/

      Also, ZFS (like ReiserFS) handles its metadata differently from ext3, so you have to translate the differences between the virtual file system and ZFS. This is why writes are significantly slower. Reads are not so bad. The NFS penalty would be huge. See http://www.linux.com/feature/138452

    14. Re:Why not ZFS? by deniable · · Score: 5, Funny

      Yep, BeaTeR FS is a kinder, gentler alternative to Reiser FS.

    15. Re:Why not ZFS? by mml · · Score: 5, Informative

      > Rather, GPL is incompatible with anything else that can't be re-licensed as GPL, and
      > that includes GPL v2 and v3, which can't even be mixed among themselves.

      Saying that GPLv2 and GPLv3 "can't even be mixed among themselves" is wrong and
      misleading.

      Section 14 of GPLv2 specifically deals with the problem of later versions of the
      licence and sets out the options. A copyright holder can choose to allow work to be used
      with later versions, such as GPLv3, or can choose not to. There are also more
      complex options. The licence itself doesn't force the choice one way or the other.

      Matt

    16. Re:Why not ZFS? by QuoteMstr · · Score: 2, Informative

      You can clone as many copies of some big DB as you like - instantaneous and space-saving - then destroy as you like.

      I've been able to do this forever with LVM snapshots under Linux.

    17. Re:Why not ZFS? by mvdwege · · Score: 3, Interesting

      Come back when ZFS has decent filesystem maintenance tools.

      And don't give me that 'ZFS doesn't need a fsck' crap. SGI tried to pull that with XFS, and it didn't work. Filesystem (at least metadata) corruption will happen, and once it does, ZFS doesn't have the tools to fix it.

      Mart

      --
      "I know I will be modded down for this": where's the option '-1, Asking for it'?
    18. Re:Why not ZFS? by adrianwn · · Score: 5, Interesting

      A microkernel loads modules into the kernel space.

      No, that's the opposite of a microkernel. A microkernel loads its modules (then often called "servers") into user space. If the kernel and its drivers etc. run in the same address space (as is the case with, e.g., Linux), then we're talking about a monolithic kernel, even if it can dynamically load modules.

    19. Re:Why not ZFS? by BrokenHalo · · Score: 4, Interesting

      not to belittle ext3 and ext2 for that matter, but their time is beginning to pass, and something new needs to replace it.

      I'm not sure that I see why, unless you're simply bored with the older filesystems. Something as critical as this should not be driven by what is trendy at any given moment. If one has no need for particular advanced bells or whistles, there is no need to use them.

      For instance, since for historical and security reasons I keep /boot on its own separate partition which is mounted readonly, it makes sense here to not have anything trying to write to a journal, so ext2 is still a very good choice here. As the partition is tiny (only 20MB) it takes a fraction of a second to run e2fsck over it when or as required, so there is nothing to be gained by journalling it anyway.

      I still use ReiserFS3 on most of my other partitions, since I don't have any intention of changing the filesystem until I change the drives. ReiserFS is still a good choice for my purposes anyway.

    20. Re:Why not ZFS? by BrentH · · Score: 4, Insightful

      The things you think belong outside of a filesystem only 'nelong' there because that's what years of narrowminded developing have tought you. Look at it this way: /everything/ related to filestorage is managed by ZFS. What could be more convenient than that? Because of this, ZFS can do things much faster and much more reliable than any combo of LVM with a filesystem. Why chain together tools yourself, and manually think about things you really shouldn't be thinking about, when you can have a good filesystem take care of it for you.

      ZFS is easier to maintain, from a users perpective (and that's the job of development, to make usage easier, not ever the other way round).

    21. Re:Why not ZFS? by Kjella · · Score: 4, Informative

      Rather, GPL is incompatible with anything else that can't be re-licensed as GPL, and that includes GPL v2 and v3, which can't even be mixed among themselves. May first we clear that mess, right?

      With a copyleft license, you intend to secure certain rights to the end user to the work as a whole. It is at the very essence of what the GPL tries to do compared to non-copyleft open source licenses or the LGPL that only covers the parts consisting of LGPL code, not any sort of "flaw" or "mess". Licenses work so that you must simultaniously fulfill all of them, so the GPL denies using GPL code with code that denies end users the four freedoms the FSF profess. That is the intention by design, but then there is some collateral damage as well-intended licenses are rendered GPL-incompatible due to details since the GPL (or any copyleft license) couldn't allow open-ended arbitrary restrictions without losing all meaning. The GPLv2 was particularly flawed in this area since it was made fairly long ago with this not much in mind, and in the GPLv3 they did a lot of work to improve compatibility leading to section 7 that among other things say:

      Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms:
      a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or
      b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or
      c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or
      d) Limiting the use for publicity purposes of names of licensors or authors of the material; or
      e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or
      f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors.

      That vastly improves compatibility with the licenses the GPL wants to be compatible with so collateral damage is reduced to a minimum. It's still very easy to write a license, even a free software license, that isn't GPL compatible though. If you look at the reason the CDDL and GPL are incompatible it's that the CDDLs copyleft conditions and the GPLs copyleft conditions clash because they both try to do the same thing. It's almost impossible to write two copyleft licenses where one (or both) doesn't see the other as adding "additional restrictions" on the end user. Even the GPL can't escape that as it tries to improve the GPL unless you have the "and later" clause. Then again, there's no reason such a license should have to be revised often - it took 16 years before releasing version three and it'll probably be longer until next time it's needed.

      --
      Live today, because you never know what tomorrow brings
    22. Re:Why not ZFS? by tkinnun0 · · Score: 3, Insightful

      It's not wrong or misleading. If you have GPLv3 and GPLv2 code, you can mix them if the GPLv2 code's copyright holder gives you the permission. Likewise, if you have BSD and GPLv2 code and wish to retain the BSD licence. The mechanisms may be different but the end result is the same: GPLv2 in itself doesn't give you the permission, you need permission from the copyright holder.

    23. Re:Why not ZFS? by mike_sucks · · Score: 2, Insightful

      The GPL is restrictive as it is /because/ it ensures freedom for users. It is the /developers/ that the GPL bugs.

      Bring on the GPL, I say! Boo to Sun for being anti-users.

      /Mike

      --
      -- "So, what's the deal with Auntie Gerschwitz et all?"
    24. Re:Why not ZFS? by Daniel+Phillips · · Score: 2, Informative

      I don't know about that either. There are consistent reports that ZFS is slower than Ext3 on many common workloads. Also reports of instability.

      While I do respect some of the engineering achievements in ZFS, I do not consider it to be the last word in filesystem design, or even the best filesystem for many applications. I also have doubts about the wisdom of some of the design decisions, such as inhaling the LVM into the filesystem, using 128 byte block pointers, and making a distinction between filesystem snapshots and clones.

      --
      Have you got your LWN subscription yet?
    25. Re:Why not ZFS? by gbjbaanb · · Score: 3, Insightful

      Why chain together tools yourself, and manually think about things you really shouldn't be thinking about, when you can have a good filesystem take care of it for you.

      Because that's the Unix way - build small components (applications) and chain them together to create something out of the parts. I mean, why have ls and grep when you can have lsgrepsortfind? Really, the point is to have small, easily maintained apps that do 1 thing well than 1 app that does everything possibly well, but more usually poorly as its difficult to maintain and ensure it works properly. Not to mention the bloat when it replicates functionality already provided.

      This may not be the best model for a critical component like a filesystem, but on the other hand, reliability of a filesystem is paramount, so keeping it as small as possible is probably a good idea.

    26. Re:Why not ZFS? by Wonko · · Score: 3, Interesting

      I often hear that claim but never see any support of that claim.

      The closest thing to RAID-Z in the Linux kernel is the RAID 5 DM. If you want to write a 4k block to some random location that isn't currently fully cached the DM has to read 1 stripe from each disk in the array, make the 4k change, recompute the checksums, and then flush that stripe back to each disk. The default stripe size is 64k. That means if you have 4 drives you would be performing a 256k read and a 256k write just to change a single 4k block. Of course, that is worst case. Best case is you have to overwrite the entire stripe with a fresh 256k block of data.

      ZFS and RAID-Z get around that problem by just writing the changed blocks to an unused part of the disk. Once the write is complete it just moves the pointer to the new block location. This is copy-on-write, and this is where the performance boost comes in over RAID 5. With RAID-Z you should never be required to read the whole stripe to do a write.

      RAID-Z also allows for dynamic stripe sizing. That helps get more optimal efficiency on small files and large files.

      The dynamic stripes aren't terribly important, but if you could figure out a way to do the copy-on-write without the filesystem have very fine grained control and knowledge of the underlying array we would all love to hear about it :).

    27. Re:Why not ZFS? by Znork · · Score: 2, Insightful

      RAID-Z uses copy-on-write to avoid RAID 5's required read for every non-cached write.

      Of course, the very same copy-on-write will also result in massive file fragmentation, carefully smearing your dbf files over the entire platters, making your SAN caches useless. Over time resulting in horrible read performance.

      ZFS is certainly a huge improvement for anyone used to ufs and disksuite, but I have to say that using it in the real world it's not all it's cracked up to be. A more layered approach would have made it easier to switch in and out features that turned out to be misfeatures in certain situations.

      Mixing together the features of various layers is, imo, no matter how tempting, simply the wrong approach. Proceed further along that road and you get to record based filesystems or even more special-purpose variants. I mean, there are even more optimizations that you can do if you know the _contents_ of the files. But once you go down that road complexity will grow with every possible different situation you need to handle and you end up either with something far too complex or something unsuitable for many cases. Better then to do the best you can without extra knowledge, code special layers for special features, assume nothing, and let the possibly competent admin add appropriate layers for appropriate data.

    28. Re:Why not ZFS? by Wonko · · Score: 3, Interesting

      Of course, the very same copy-on-write will also result in massive file fragmentation, carefully smearing your dbf files over the entire platters, making your SAN caches useless. Over time resulting in horrible read performance.

      If you want good database performance you probably want as little file system overhead as possible between your database and the disk. I wouldn't have expected ZFS to be the most efficient place to store a database.

      I would have to imagine your SAN is just doing uninformed readaheads. That would be a very good way to fill up a cache with useless data if you are reading from a fragmented file system. :)

      This issue with copy-on-write will be entirely sidestepped in a few years by flash storage's lightning fast seek times and smarter caching. IIRC, isn't the reason that zfs-fuse uses so damn much ram because ZFS has its own caching logic built in? If the file system knows where all the blocks in a file are it can do readaheads on its own.

      ZFS is certainly a huge improvement for anyone used to ufs and disksuite, but I have to say that using it in the real world it's not all it's cracked up to be.

      I don't have enough of my own real world experience with ZFS to argue with your experience. In fact, what I know of how ZFS works makes me believe that it can cause exactly the problems of which you speak.

      However, I don't think that means that there aren't a ton of workloads that wouldn't be impacted by these problems. I also believe that a large percentage of those workloads could benefit greatly from some of the features ZFS brings to the table.

      RAID-Z is nice when you need write performance but can't afford the drives for RAID 10. I can think of plenty of times when it would have been nice to have a writable snapshot to chroot into.

      Hell, I would even love to have ZFS on my laptop for snapshotting and cloning. It also seems like ZFS send/recv would make for much more efficient backups of my laptop than rsync buys me.

      Mixing together the features of various layers is, imo, no matter how tempting, simply the wrong approach. Proceed further along that road and you get to record based filesystems or even more special-purpose variants. I mean, there are even more optimizations that you can do if you know the _contents_ of the files.

      I think we are getting some pretty neat new features out of our file systems by blurring the lines between the layers. I wouldn't be surprised if we stumble upon a few more neat ideas before we're through.

      There is still quite a bit of improvement to make even before we have to make the file system aware of what is inside our files. :)

    29. Re:Why not ZFS? by QuoteMstr · · Score: 3, Informative

      I'm definitely in the layered-design-is-good, ZFS-is-an-abomination camp. But I do have to point out that mlockall would keep a userspace filesystem server from being swapped out, and with realtime priority, the process could even have some guaranteed CPU time. Userspace isn't that bad.

    30. Re:Why not ZFS? by diegocgteleline.es · · Score: 4, Informative

      One of the differences I can find between btrfs and ZFS is that ZFS explicitely avoided a fsck utility, and btrfs is explicitely designed with features designed to make fsck even more powerful than it's on usual filesystems like ext3. In btrfs, data structures have "back references", and the fsck can be used while the filesystem is mounted.

      IMO, this is a a btrfs advantage. ZFS has checksums and will find errors, but only will be able to self-heal the errors in a redundant configuration. On a single disk, ZFS will find the error thanks to checksums but will not be able to recover your data. Since ZFS was mainly designed for systems that will use redundant configurations, it may have sense there, but desktops are not never going to do such things. IMO the ZFS people were a bit elitist here - "let's going to build a filesystem so good that we won't need a fsck". But in the real world you _are_ going to need a fsck util. Only in excepcional and very rare cases, but you're going to need it.

      Of course that doesn't makes ZFS a bad filesystem, but it's an advantage for btrfs and linux.

    31. Re:Why not ZFS? by makomk · · Score: 3, Informative

      Yeah, and if you get any sort of metadata corruption, you're apparently fscked. See, for example, this thread in alt.sysadmin.recovery. Several of the posters say they basically had to manually fix the filesystem after it got screwed up - how very 1970s.

    32. Re:Why not ZFS? by segedunum · · Score: 2, Interesting

      ZFS has checksums and will find errors, but only will be able to self-heal the errors in a redundant configuration. On a single disk, ZFS will find the error thanks to checksums but will not be able to recover your data. Since ZFS was mainly designed for systems that will use redundant configurations, it may have sense there, but desktops are not never going to do such things.

      I find this checksumming and self-healing interesting, but the real question is what do you actually do to really solve it? With ZFS, an awful lot of people over at OpenSolaris get excited about detecting 'bit rot', but answers are a bit thin on the ground when you ask what can be done about it or what some of the errors actually mean. Yer, you're a bit less likely to get data loss, but you can only really avoid that if you have redundancy. Also, most of the problems ZFS has detected that I have seen have, at a best guess, probably been caused by a Solaris device driver doing something no one had known about. The filesystem can't help you there, no matter how advanced it is.

      The problem is our current storage technology, and more needs to be done where the problems occur - within disk drives themselves. I'm hoping SSDs will end up giving us a better fundamental starting point when it comes to storage.

    33. Re:Why not ZFS? by jhol13 · · Score: 3, Insightful

      If a filesystem detects errors it is helping me (at least) there. No matter what creates them.

      I do not think SSDs will solve storage problems: there will be flaky adapters and other IF chips/firmware, etc.

    34. Re:Why not ZFS? by GleeBot · · Score: 2, Informative

      How does those "back references" recover your data in case of a corrupted sector? Honest question, I do not know brfs.

      AFAIK ZFS has no fsck because there is no failure case where it would really help.

      Back references could help you reconstruct the file system tree during fsck, but if random data is getting corrupted, you're not going to get it back without redundancy (or forward error correction, I suppose, but that amounts to the same thing).

      I can't think of many scenarios where the only kind of data corruption I'm worried about is corruption to file system metadata (which is incidentally all journaling is supposed to protect you from), but who knows.

    35. Re:Why not ZFS? by BobNET · · Score: 2, Informative

      Linus specifically struck the "or any later version" clause out of the copy of the gpl2 he used to license Linux.

      That piece of text isn't part of the license itself, it's part of a separate standard notice that states that the software is copyrighted and gives permission to redistribute or modify it under the terms of the GPL. It could just as easily have said "either version 2 of the License, or (at your option) any license you want in exchange for buying Linus a beer" and still be under version 2 of the GPL.

    36. Re:Why not ZFS? by compro01 · · Score: 2, Informative

      Which is why it got edited out. Note the "oldid" bit in the URL.

      --
      upon the advice of my lawyer, i have no sig at this time
    37. Re:Why not ZFS? by harry666t · · Score: 2, Interesting

      What about kernels written in type-safe languages? (Singularity, all the Java OSs)

      In these systems, ALL the programs are run in one address space. Does it make the whole OS (not just the kernel) monolithic or what?...

    38. Re:Why not ZFS? by adrianwn · · Score: 2, Interesting

      Obviously the common definition of "microkernel" does not apply to SAS (Single Address Space) systems. The difference between Singularity and Linux is that in Linux all the modules logically belong to the kernel, while they are logically separated in Singularity: in Linux all data structures can potentially be accessed by every module; this is not the case in Singularity. Hence you can call Singularity a microkernel system, even though everything runs in the same address space.

  4. What I'd like by grasshoppa · · Score: 4, Interesting

    I would like transparent, administrator controlled, versioning. Modified a word document and saved it in place? root can go back and get the old version ( and, alternatively, the user can. root could disable this functionality ).

    The pieces are in place, it's doable, just someone needs to program it.

    --
    Mod me down with all of your hatred and your journey towards the dark side will be complete!
    1. Re:What I'd like by corsec67 · · Score: 4, Interesting

      So, you want a Versioning file system? Just make sure you never let that run on /var.

      OSS is like capitalism: If you see a need, then make it and distribute it.

      --
      If I have nothing to hide, don't search me
    2. Re:What I'd like by bendodge · · Score: 4, Interesting

      That leads to space-bloat.

      What I'd like are files with expiration dates. When I make up some twiddly chart or download some funny video, I keep it because I'll probably want it tomorrow or next week, but then I tend to forget to delete it later. It would be really cool if creating a user data file prompted you with a simple dialog specifying how long you want it. Common options like 1 Week, 1 Month, 6 Months, 2 Years, Forever would do most of the time, and an option to choose a custom date would cover the rest. When a file expired, it would be placed in some kind of psudo-Trash Bin that could be reviewed and emptied when you want more space.

      I'd also love something tag-based instead of hierarchy-based. For example, I store photos by Year > Month > Event, but sometimes I want to make another category for photos of a specific person. This means I either make duplicates or have to dig around to find things. If I could tag them with dates (that should actually be auto-generated from the EXIF), event, place, and people I could then just browse for files with a particular tag.

      Come to think of it, these ideas are both somewhat akin to how a human brain stores stuff.

      --
      The government can't save you.
    3. Re:What I'd like by Anonymous Coward · · Score: 2, Informative

      wayback, copyfs, and ext3cow are all fairly stable versioning filesystems for linux. I'm not sure if they let you stop non-root users from getting old versions, but I don't see why you'd want people to have to ask an admin to get old versions of their files?

    4. Re:What I'd like by fuzzyfuzzyfungus · · Score: 4, Funny

      Wouldn't the world be so, so, so much nicer if users understood that the actual importance of a document is reflected in how carefully they stored it, not how angry they get when you can't get it back?

    5. Re:What I'd like by Tubal-Cain · · Score: 3, Insightful

      It sounds useful, but I think it would turn out to be about as annoying as UAC. Better to keep your files organized and prune occasionally.

    6. Re:What I'd like by EvanED · · Score: 2, Interesting

      How does the filesystem know when to create a new version? Should every byte ever written to the file be construed as a new version? If so, how does the admin figure out which precise version, out of the literally billions that would be created, is the right one?

      True, you may not be able to get it perfect, but you can get way more useful than nothing.

      For instance, many programs that work on small files (the kind you'd most want to version) don't keep the file open, and instead open the file, write to it, and close it each time you save. (Some will move the file to a backup name (eg file~), create a new file, and write to that. This is in part so you at least have the previous version in the ~ file, and in part to compensate for non-ACID file systems because there will always be at least one copy of either the old or new data at any given time.) So creating a version when a process calls fclose() is a reasonable thing to do.

      Sure, it won't work for programs that keep the file open and update it by seeking around and writing, but it will work for the vast, vast majority of the cases that at least I personally would want.

      And how do you reasonably prune that wasted space?

      What you see as wasted space I see as space going to a pretty darn good use.

      As for pruning, you'd have to be fairly clever. But you could create policies that specify how long to keep old versions, how many versions to keep in a certain time period, etc. You could also pay attention to how often a file is opened, how often old versions of files are opened, etc. There's a paper on a file system called Elephant written for FreeBSD where they discuss some ideas on how to do this.

      There's also a hypothesis that at least I would agree with that things recently saved are much more likely to be useful. If you remember the "last lecture" guy Randy Pausch, he did another talk about time management in which I think he told a story about an experiment they did where the goal was to clean up the lab. People were too hesitant to throw things out because "I might need it later," so they set up a rotation of the trash bins. Things you throw out would stick around for a week, which meant that you could still safely retrieve it. But if you didn't need it within a week, you almost certainly wouldn't need it, so it was still basically safe to do. It it helped a lot with cleanliness since people actually threw things away. (He said the biggest problems were when the janitors emptied trash bins at the wrong time.)

      Finally, you could restrict the versioning by the file size, so for instance it would only store past versions for files under a certain size, etc. If you set it to 200K or something that would cover almost all of the files that I would really like versioning on, and yet keep the extra space relatively low.

      No, what you really want is version control software.

      That may be what you want, but it's not what I want.

      At the very least, it ensures that each commit was deliberate, and represents a valid state.

      This is also a downside: it means you can't see anything but valid states.

      Personally, I would like it if things like text editors and word processors saved the entire edit history of documents, persistently. You could use a scroll bar to go through the history, saves would be marked with small tick marks, and deliberate commits would be marked with larger tick marks.

  5. I can't believe... by arrenlex · · Score: 5, Funny

    Butter FS? Are you kidding me?

    Here is your first official list of jokes. Please contribute.

    1. You're still running ext4? I can't believe it's not ButterFS!
    2. But will it run on toast?
    3. Will fsck be renamed to butterknife?
    4. If your system overheats will your filesystem melt?
    5. If you use ButterFS too much, will it turn into FAT?
    6. If you leave ButterFS on your volume too long, will your hard drive start to reek?
    7. Will the next version of ButterFS be called GoatButterFS, just like the next version of Leopard is Snow Leopard?
    8. "Tough" notebooks will never have their hard drives formatted with ButterFS, because if you dropped them, they would always land hard drive down.
    9. When you submit your dead ButterFS hard drive to a data recovery centre, will they have an intern lick it to get the data off instead of putting it under a read head?

    These are getting kind of desperate -- your turn now.

    Honestly, what is it with FOSS and crappy names? (looking at you, gimp)

    1. Re:I can't believe... by Anonymous Coward · · Score: 2, Funny

      Honestly, what is it with FOSS and crappy names? (looking at you, gimp)

      All the good ones are trademarked. And it's The Gimp, to you, mister!

    2. Re:I can't believe... by penguinchris · · Score: 3, Funny

      When your hard drive fails and you hear those awful noises, you can say it's churning butter.

    3. Re:I can't believe... by Anonymous Coward · · Score: 5, Funny

      These are getting kind of desperate -- your turn now.

      Yeah, you're spreading yourself a bit thin.

      • I hear some of the features in btrfs have been refined from ext3cow.
      • I touch'd a file on a btrfs disk, and now it's sticky!
      • I hear the standard block size of btrfs is 8 oz.
      • How can I make a business case for btrfs? I'm all for introducing new tech, but my boss only cares about how it will affect our margarins.
      • Will btrfs keep my servers from grinding? I'm a bit worried that if they churn too much, my files will separate!
      • And most importantly, In an emergency, can I use btrfs for a smoother fsck?
    4. Re:I can't believe... by RuBLed · · Score: 2, Funny

      ButterCupFS - Just when you thought it had everything built up, it will then turn you down and mess things around.

      you said for yourself that this was getting desperate

  6. Butters' FS! by russlar · · Score: 3, Funny

    Great for playing "Hello Kitty! Adventures"

    --
    Anybody want my mod points?
  7. Re:BTRFS? REALLY? by Anonymous Coward · · Score: 4, Funny

    Butter Fase probably intended as Butter Face.

    Sounds like "But Her Face" as in: She has a great body, but her face...

  8. Whoa! by aevans · · Score: 5, Funny

    A Linux article on Slashdot!?

    1. Re:Whoa! by icydog · · Score: 3, Funny

      You must be... old here.

  9. Re:BTRFS? REALLY? by initialE · · Score: 5, Insightful

    Why not? It's a good analogy for FOSS after all. Great software, robust and all, but her face...

    --
    Starbucks, Harbuckle of Breath.
  10. what's a "next generation" file system? by seanadams.com · · Score: 2, Interesting

    Something like ZFS immediately comes to mind... but is there some generally accepted definition of what makes a file system "next generation"? TFA doesn't say, and I hate to diminish anyone's efforts here, but the new features in ext4 (according to wikipedia) aren't much to write home about: higher precision time stamps, larger volumes, larger directories, faster fscking. These may be worthy accomplishments but they are incremental improvements, not anything new. Or did I miss something?

  11. Re:BTRFS? REALLY? by hampton · · Score: 5, Funny

    You're right. BTRFS is really silly. I recommend that the shortened form be ButtFS.

  12. Re:BTRFS? REALLY? by blahplusplus · · Score: 5, Insightful

    "Couldn't they come up with a better name than "BuTteR FaSe?" I know I can't be the only one who read it like that. Call it anything but that."

    I read it as:

    BeTteR FileSystem

    I guess we'll have to part was :P

  13. Re:BTRFS? REALLY? by spazdor · · Score: 5, Funny

    Good, strong file-bearing hips!

    --
    DRM: Terminator crops for your mind!
  14. You're both right. by SanityInAnarchy · · Score: 5, Interesting

    ZFS duplicates a lot of functionality that belongs outside of a filesystem.

    Very true.

    It wouldn't be possible to duplicate RAID-Z with LVM.

    Also true.

    And the features which could be duplicated, couldn't be done nearly as well without a little more knowledge of the filesystem.

    The real problem here is that we're finding out that generic block devices aren't enough to do everything we want to do outside the filesystem itself. Or, if they are, it's incredibly clumsy. Trivial example: If I want a copy-on-write snapshot, I have to set aside (ahead of time) some fixed amount of space that it can expand into. If I guess high, I waste space. If I guess low, I have to either expand it (somehow, if that's even possible) or lose my snapshot.

    A filesystem which natively implemented COW could also trivially implement snapshots which take up exactly as much space as there are differences between the increments. But because of the way the Linux VFS is structured, this kind of functionality would have to be in a single filesystem, and would be duplicated across all filesystems. Best case, it'd be like ext3's JBD, as a kind of shared library.

    A humble proposal: We need another layer, between the block layer and the filesystem layer -- call it an extent layer -- which is simply concerned with allocating some amount of space, and (perhaps) assigning it a unique ID. Filesystems could sit above this layer and implement whatever crazy optimizations or semantics they want -- linear vs btree vs whatever for directories, POSIX vs SQL, whatever.

    The extent layer itself would only be concerned with allocating extents of some requested size, and actually storing the data. But this would be enough information to effectively handle mirroring, striping, snapshotting, copy-on-write, etc.

    It wouldn't be universal -- I've said nothing about the on-disk format, and, indeed, some filesystems exist on Linux solely for that purpose -- vfat, ntfs, udf, etc. Those filesystems could be done pretty much exactly the way they're done now. After all, the existence of a block layer in no way implies that every filesystem must be tied to a block device (see proc, sys, fuse, etc.)

    But I think it would work very well for filesystems which did choose to implement it. I think it would provide the best of ZFS and LVM.

    I haven't actually been seriously following filesystem development for years, so maybe this is already done. Or maybe it's a bad idea. If not, hopefully some kernel developers are reading this.

    --
    Don't thank God, thank a doctor!
    1. Re:You're both right. by Wonko · · Score: 2, Interesting

      Trivial example: If I want a copy-on-write snapshot, I have to set aside (ahead of time) some fixed amount of space that it can expand into. If I guess high, I waste space. If I guess low, I have to either expand it (somehow, if that's even possible) or lose my snapshot.

      That still only covers one deficiency of LVM snapshots. LVM snapshots are read only and intended to be temporary. I'm also pretty sure you can't snapshot a snapshot, which wouldn't be at all helpful with a read only snapshot anyway.

      A humble proposal: We need another layer, between the block layer and the filesystem layer -- call it an extent layer -- which is simply concerned with allocating some amount of space, and (perhaps) assigning it a unique ID. Filesystems could sit above this layer and implement whatever crazy optimizations or semantics they want -- linear vs btree vs whatever for directories, POSIX vs SQL, whatever.

      We'd never be able to get it right and it would probably be more likely to get in the way. We seem to be learning that we can do much niftier things by tightly coupling what used to be very separate layers.

      I haven't actually been seriously following filesystem development for years, so maybe this is already done. Or maybe it's a bad idea. If not, hopefully some kernel developers are reading this.

      I don't really believe it is a bad idea. I do think it would have to be too heavy of a layer, though. It would have to track which file systems own each extent, and if you want to come close to matching RAID-Z you are going to need to be able to return very small extents (LVM defaults to 4MB, IIRC). If a file system is going to be requesting 4k extents you're going to have a lot of overhead in storing the extent ownership and size information. You're also going to have a lot of overhead in checking who owns each extent on any given read or write. I can think of ways to optimize that a bit, but I imagine it'll still have a significant space+performance impact.

  15. If you want a blazingly fast file system.... by FlyingGuy · · Score: 2, Informative

    Then look no farther then NSS ( Novell Storage Services ).

    It is Open Source, you get the full source if you download SLES.

    It has more of the desired features then anything else on the block right now.

    This should be the default file system for Linux. It has years of very heavy duty R&D behind it, it is pretty much completely de-bugged and ready to rock.

    --
    Hey KID! Yeah you, get the fuck off my lawn!
    1. Re:If you want a blazingly fast file system.... by moosesocks · · Score: 4, Interesting

      Max Volume Size: 8 TiB.

      That's not enough. Given that 1TB storage devices are on the market now, that could become outdated quite quickly. You'd be foolish to adopt that sort of filesystem, unless you were absolutely positive that you'd never upgrade (unlikely).

      Honestly, ZFS seems like it's the holy grail of filesystems. There are a few small issues that might need to be worked out, though it seems as close to "ideal" as you'd ever be able to get.

      --
      -- If you try to fail and succeed, which have you done? - Uli's moose
    2. Re:If you want a blazingly fast file system.... by Kent+Recal · · Score: 3, Interesting

      Well, it looks interesting feature-wise but they seem to be explicitly targeting SuSE - which is a no-go for most people.
      From a glance at the docs (hey, at least they have docs, that's a plus) it also seems like it's tied to specific versions of EVMS and other parts of the kernel, thus if you don't run a "blessed, certified" SuSE kernel with all the nasty patches then you're on your own.

      Just google for "debian|gentoo|redhat|... novell nss filesystem". Apparently nobody even tried to run NSS on another distro, or at least didn't write about it.

      I, for one, would only touch this on a blackbox, vendor-supported appliance but never consider it for a production server of my own (none of which run SuSE).
      If they worked towards integrating it into the mainline kernel, now that would be nice.

  16. Re:B-tree based Filesystem by AmberBlackCat · · Score: 3, Funny

    That's exactly what they're doing. The plan is to limit every directory to exactly two files or subdirectories that will be kept in alphabetical order. That way, you can find any file on your drive in log(n) time. Future updates are planned for people who have more than two songs by the same artist.

  17. Re:BTRFS? REALLY? by deniable · · Score: 4, Funny

    I read it as BeaterFS and wondered if it was too soon for ReiserFS jokes.

  18. when ext4 is feature complete it will be the #3 fs by ZeekWatson · · Score: 4, Interesting

    I'd like to know why Ted Tso and others are working on ext4? Even when ext4 is feature complete it will be the #3 filesystem in linux in terms of features and scalability behind xfs and jfs. I'd like to know what Ted Tso and others grudge against xfs and jfs is because they basically wont even acknowledge those filesystems.

    btrfs does have some nice looking features, its basically a gpl rewrite of zfs.

    The weakness with linux is in the LVM or EVMS layer. They both suck in that they are not enterprise ready (ie multi TB filesystems, 100+ MB/s sustained read/write) in that they cause unexplained IO hicups, lockups and kernel panics. LVM/EVMS certainly work fine for Joe Blow's HTPC, or a paltry 100GB database but they fall down when under serious load.

    This is the problem with open source. Certain areas, like filesystem development attract all the developers, and other areas like LVM/EVMS are seen as busting rocks and nobody wants to work on them. The results is we get a plethora of second rate filesystems (ie ext4) and a buggy LVM/EVMS layer that nobody wants to work on.

  19. Re:B-tree based Filesystem by hitchhacker · · Score: 2, Interesting
    B-Tree:

    Not to be confused with binary tree.

    -metric

  20. Re:when ext4 is feature complete it will be the #3 by Dionysus · · Score: 2, Insightful

    I'd like to know why Ted Tso and others are working on ext4? Even when ext4 is feature complete it will be the #3 filesystem in linux in terms of features and scalability behind xfs and jfs. I'd like to know what Ted Tso and others grudge against xfs and jfs is because they basically wont even acknowledge those filesystems.

    NIH

    --
    Je ne parle pas francais.
  21. Re:B-tree based Filesystem by Anonymous Coward · · Score: 2, Informative

    A B-Tree can have N children per node, where N is determined by the number of child links you can fit in one block. You are thinking of a binary-tree.

  22. Re:Ring 1 and 2? by Anonymous Coward · · Score: 3, Interesting

    yes, IIRC Windows NT uses rings 0 and 4. However, the problem would not be made better by having more rings, the performance cost is the transition between rings, nothing special about the rings themselves. eg progressing from ring 10 to ring 9 is as expensive as going from ring 0 to 1, or from ring 0 to ring 100.

  23. Re:BTRFS? REALLY? by Ragzouken · · Score: 5, Funny

    This is the internet, it's never too soon.

  24. Re:Why all the fragmentation? by atraintocry · · Score: 2, Funny

    So you're saying someone should run a defrag on these filesystem projects?

  25. Reiser has time and no need to work by r00t · · Score: 3, Funny

    They feed him. They put a roof over his head.
    They even bathe him.

    He might as well devote himself to filesystems.

    1. Re:Reiser has time and no need to work by standbypowerguy · · Score: 2, Interesting

      Jail is supposed to be punitive & reflective, not fun or interesting. There are plenty of worthwhile jobs in prison... laundry, cook, librarian, janitor, license plate stamper, etc.

      --
      This isn't the sig you're looking for... Move along.
  26. What about Tux3 by obi · · Score: 2, Interesting

    While btrfs looks quite cool, I'm even more interested to see whether http://tux3.org/ will go anywhere. Let's hope both will materialise and mature soon.

  27. Re:when ext4 is feature complete it will be the #3 by Jah-Wren+Ryel · · Score: 5, Interesting

    The weakness with linux is in the LVM or EVMS layer. They both suck in that they are not enterprise ready (ie multi TB filesystems, 100+ MB/s sustained read/write) in that they cause unexplained IO hicups, lockups and kernel panics. LVM/EVMS certainly work fine for Joe Blow's HTPC, or a paltry 100GB database but they fall down when under serious load.

    LVM has been rock-solid for me with a ~7TB and 2 2TB ext3 filesystems (24 500GB disks) over the course of a year and a half. No problems migrating extents all over the place when I needed to swap disks in and out. Almost identical to HPUX in functionality, but without the sizing constraints.

    But, when I tried xfs for kicks I found out that a 7TB filesystem means you need 7GB of RAM to fsck it - impossible on a 32-bit system, I also had a week where I it all went in the shitter because I ran free-space to zero and started getting OS panics and data corruption.

    I'm definitely considering jfs for the next generation, my main complaint with ext3 has been ridiculously slow deletes and fsck's. Problems I have read don't exist with jfs.

    --
    When information is power, privacy is freedom.
  28. Re:Back when there was only fat16, ntfs, ext2 used by vadim_t · · Score: 5, Informative

    I hope you're joking.

    ext2 is nice and simple, but it's neither fast not reliable. It uses a linear search to find directory entries, which means it's very slow on large directories, like Maildir mailboxes. It doesn't do tail packing which means it wastes space and is slower with small files. It's not reliable because without a journal it needs a fsck after a bad shutdown which takes ages on a modern disk, and recovers it worse than a journal would.

    Just search for benchmarks, something like reiserfs beats ext2 by huge margins when it comes to important workloads such as a mail server.

    There are very good reasons why distributions generally go with ext3, or one of the other filesystems. I haven't seen ext2 as the default option for the root FS in a very long time.

  29. buttfsck!! by Zaiff+Urgulbunger · · Score: 5, Funny

    You think that's bad? The file system check command is buttfsck!

  30. Re:Back when there was only fat16, ntfs, ext2 used by jez9999 · · Score: 4, Funny

    Just search for benchmarks, something like reiserfs beats ext2 by huge margins when it comes to important workloads such as a mail server.

    Hell, it probably beats it to death.

  31. Re:Back when there was only fat16, ntfs, ext2 used by IceCreamGuy · · Score: 4, Insightful

    Yeah, I remember they used to talk about this in the Gentoo handbook; use ext2 for /boot, but ext3 for everything that you actually care about.

  32. Re:Back when there was only fat16, ntfs, ext2 used by Chemisor · · Score: 4, Interesting

    > Just search for benchmarks, something like reiserfs beats ext2 by huge margins

    You mean like these ones where ext2 beats reiserfs in most cases and is at least as fast in the others?

    > I hope you're joking. ext2 is nice and simple, but it's neither fast not reliable.
    > It uses a linear search to find directory entries, which means it's very slow on
    > large directories, like Maildir mailboxes.

    Believe it or not, the world does not revolve around huge mail servers. Some of us actually run Linux on a desktop, and so don't really care about how well an fs handles a million maildir mailboxes. Latency is the most important criteria, and reiserfs is just too complicated to deliver it, as well as being a largely fringe fs. Especially now with Hans gone, it would become even more fringe.

    > It doesn't do tail packing which means it wastes space and is slower with small files.

    Yup, I'd like to have efficient small file handling. But really, it is better to avoid having many small files in the first place. Use compressed archives to store such things; it's quite a bit more efficient, and does not require exotic file systems which most normal people (i.e. your customers) will not use.

    > It's not reliable because without a journal it needs a fsck after a bad shutdown

    I used to do that, and then I got a UPS instead and switched back to pure ext2. The performance hit from journalling is simply too high to tolerate. A decent UPS (pretty much anything made by APC) will prevent the crashes in the first place, solving the problem completely and without any unnecessary overhead. With UPS prices being as low as they are, there is no excuse for not having one, so I think that journalling will become obsolete in some near future.

  33. ButterFS by Keeper+Of+Keys · · Score: 2, Funny

    I can't believe it's not better.

  34. Re:BTRFS? REALLY? by not+already+in+use · · Score: 2, Funny

    Exactly. I couldn't even imagine where Linux would be right now if it weren't driven by a bunch of egotistical nerds clamoring for their own implementation of something rather than incorporating someone else' extremely capable and far more mature existing implementation.

    --
    Similes are like metaphors
  35. Re:Back when there was only fat16, ntfs, ext2 used by MBGMorden · · Score: 4, Insightful

    so I think that journalling will become obsolete in some near future.

    I bet in 1992 you were still thinking color TV's wouldn't last either . . .

    Look, a UPS is a great thing. I run one myself. Heck with more and more people switching to laptops a lot of people are running a "UPS" without even realizing it. The simple fact though is that modern processors and disks are so fast that the minimal speed impact of journaling is barely noticeable. It's certainly not worth giving up over some marginal speed gains.

    I mean we're talking about a world where people will give up tons of speed in their computer just to make the WINDOWS WOBBLE when you move them, or to make teddy bears wave at them from the system tray. Do you honestly believe that they're going to risk having their files corrupt on an unexpected power outage for a fraction of a percent increase in meaningful speed?

    --
    "People who think they know everything are very annoying to those of us who do."-Mark Twain
  36. Re:Back when there was only fat16, ntfs, ext2 used by vadim_t · · Score: 4, Insightful

    You mean like these ones where ext2 beats reiserfs in most cases and is at least as fast in the others?

    Look at the bottom of the page. That's from 2003. Of kernel 2.6.0. A lot of code changed since then.

    Believe it or not, the world does not revolve around huge mail servers. Some of us actually run Linux on a desktop, and so don't really care about how well an fs handles a million maildir mailboxes. Latency is the most important criteria, and reiserfs is just too complicated to deliver it, as well as being a largely fringe fs. Especially now with Hans gone, it would become even more fringe.

    I'm not sure what exactly you mean by this. Latency is mostly influenced by the hard disk. And on a desktop the disk shouldn't be a bottleneck anyway.

    Yup, I'd like to have efficient small file handling. But really, it is better to avoid having many small files in the first place. Use compressed archives to store such things; it's quite a bit more efficient, and does not require exotic file systems which most normal people (i.e. your customers) will not use.

    Except there's lots and lots of those files in a modern Linux system. Config files, icon files, and small libraries for instance. Additionally many files are searched in different paths, making a fast directory search important.

    I used to do that, and then I got a UPS instead and switched back to pure ext2. The performance hit from journalling is simply too high to tolerate. A decent UPS (pretty much anything made by APC) will prevent the crashes in the first place, solving the problem completely and without any unnecessary overhead. With UPS prices being as low as they are, there is no excuse for not having one, so I think that journalling will become obsolete in some near future.

    Just as a RAID is not a backup, an UPS isn't a disk journal. One of those days you'll get a long outage, or the power cable will turn out to fit badly into the power supply, have a kernel panic, the UPS won't switch to battery fast enough, etc. And then after several minutes of fsck something important might end up broken.

    If the journal causes you a noticeable slowdown you probably aren't a typical user. In typical usage the disk should be mostly idle after boot.

    I don't see a point in going forward insanely fast without brakes. I'll take the safety. I have an UPS on every computer, and still have a journalled FS, because there were times when the UPS was of no help. Like yesterday, when I upgraded my laptop's RAM, booted it, and found that with more than 2GB RAM, the BIOS maps the video RAM above 4GB. The video card showed its displeasure with that state of affairs by corrupting the display and locking up. Had no choice but to powercycle the box.

  37. Re:Back when there was only fat16, ntfs, ext2 used by illumin8 · · Score: 5, Insightful

    I used to do that, and then I got a UPS instead and switched back to pure ext2. The performance hit from journalling is simply too high to tolerate. A decent UPS (pretty much anything made by APC) will prevent the crashes in the first place, solving the problem completely and without any unnecessary overhead. With UPS prices being as low as they are, there is no excuse for not having one, so I think that journalling will become obsolete in some near future.

    Yeah, because systems never kernel panic, or crash for any other reason than power outages... Wake me up after you've been waiting for fsck to finish on your 1TB drive and it's been running for the last 72 hours.

    Whether or not you've had a system shutdown uncleanly in the past, you certainly will at some time in the future, so why not just use ext3 and save yourself the headache of a 3 day long fsck?

    It's also painfully obvious that you've never worked as a sysadmin before. You try explaining to your manager that the reason why your company's server will take 3 days to come back online is that you wanted to save a few microseconds of latency when users were accessing files...

    --
    "When the president does it, that means it's not illegal." - Richard M. Nixon
  38. All hardware can fail, including UPSes. by Medievalist · · Score: 5, Insightful

    I used to do that, and then I got a UPS instead and switched back to pure ext2. The performance hit from journalling is simply too high to tolerate. A decent UPS (pretty much anything made by APC) will prevent the crashes in the first place, solving the problem completely and without any unnecessary overhead. With UPS prices being as low as they are, there is no excuse for not having one, so I think that journalling will become obsolete in some near future.

    Our industrial UPS (which is orders of magnitude more reliable than any APC product ever made) recently exploded, burnt, and shorted out the entire building's power. It spiked thousands of volts through the protected equipment and destroyed a half-dozen servers. The fire was fierce enough to cause our fm200 system (halon equivalent) to dump, which put out the fire before the main battery bank was breached.

    This was the first time I've ever seen an UPS bigger than a Chrysler fail, but I've seen dozens of failures from those crappy little APC units. At one time I had a stack of burnt-out ones in my basement (I used to salvage the batteries for cash).

    If your disaster survivability plan depends on any single piece of hardware never failing, it's no good. Offsite backup is your friend.

  39. Re:Back when there was only fat16, ntfs, ext2 used by mortonda · · Score: 2, Insightful

    A decent UPS (pretty much anything made by APC) will prevent the crashes in the first place, solving the problem completely and without any unnecessary overhead. With UPS prices being as low as they are, there is no excuse for not having one, so I think that journalling will become obsolete in some near future.

    While a UPS is certainly a must, it does not protect you from hardware faults completely. Ever have a cap burn out on your motherboard, or lightning strike through your network?

    Or the most irritating one of all, get a static shock through the keyboard that resets the system?

  40. Re:Back when there was only fat16, ntfs, ext2 used by RAMMS+EIN · · Score: 2, Interesting

    ``Believe it or not, the world does not revolve around huge mail servers. Some of us actually run Linux on a desktop, and so don't really care about how well an fs handles a million maildir mailboxes.''

    What if I have large Maildir mailboxes on my desktop system? Or anything else that puts many files in a single directory? Just because _you_ don't need that case to be fast doesn't mean it isn't a good idea to have it be fast, anyway.

    ``Latency is the most important criteria, and reiserfs is just too complicated to deliver it''

    Excuse me? Do you have any numbers to back up that claim? Because I'm having a hard time taking it on face value.

    ``as well as being a largely fringe fs''

    A filesystem that has been included in the mainline Linux kernel for several years, is offered as a prominent choice during installation of various distros, used to be the default fs on some distros, and is widely used by people who make conscious and informed choices about which filesystem to use. But yes, if you want to call it a "fringe fs", go right ahead.

    ``Especially now with Hans gone, it would become even more fringe.''

    This, unfortunately, is all too true. ReiserFS still is a great filesystem in terms of reliability and performance, from tiny files to huge ones, under a wide range of scenarios. Reiser4 was going to be even better: faster and more flexible and extensible, with fast arbitrary attributes and a lot of other goodness. But it never made it into the mainline kernel, and, with Hans Reiser in jail, the future doesn't seem bright for Reiser4. On the other hand, there are various new contenders: ZFS, btrfs, and ext4, just to name a few. None of them seem to be quite there yet, but hey, neither was Reiser4.

    ``Yup, I'd like to have efficient small file handling. But really, it is better to avoid having many small files in the first place. Use compressed archives to store such things; it's quite a bit more efficient''

    Kindly point me at this compressed archive format that lets me fetch files (small and large) by name and other attributes more efficiently than Reiser4 or even ReiserFS. Then please point out how I can use this as I would a filesystem: so that the good old Unix software can access the files. And remember: I need random access to the file contents, and I need to be able to add, remove, write, etc. files. And if any operation is interrupted suddenly and unexpectedly, the integrity of my tree needs to be preserved. Bonus points for full data integrity preservation.

    ``The performance hit from journalling is simply too high to tolerate.''

    Performance hit from journalling? And you're using ext2 to avoid it? Your usage patterns must be very different from mine. True, ext2 running in async mode (i.e. no consistency guarantee at all) is slower than ext3 with journalling which guarantees consistency. On the other hand, with ReiserFS, I can have journalling, guaranteed consistency of at least the filesystem structure, and better performance. Plus, for some strange reason, ext3 seems to lose a lot of files on my systems (although they can be recovered by running fsck) during normal operation. Among the 3, ReiserFS is the clear winner for me. I am not disputing that you may be seeing other data, but let's at least conclude that ext2 is _not_ faster than all journalled filesystems for everyone, and that the performance hit of journalling, if any, is not "too high to tolerate" for everyone.

    ``With UPS prices being as low as they are, there is no excuse for not having one, so I think that journalling will become obsolete in some near future.''

    I think smart people realize that having a UPS is no guarantee that your system will never fail in the middle of a write. So a method to bring the system back to a consistent state is needed in any case. Let's also realize that journalling isn't only for recovery. It is one way to implement transactions, and transactions are useful for more than recovery alone; for example, they can be used to ensure consistency of da

    --
    Please correct me if I got my facts wrong.
  41. Re:Ring 1 and 2? by DamnStupidElf · · Score: 3, Interesting

    Not exactly. To effectively change the actual permissions that the permissions rings allow, stacks, segment registers, i/o permission bitmaps, and page tables (among other things) have to be changed. Generally this means reading values from memory into caches, which is slow. Probably the slowest of them all is the page cache. Invalidating the entire page cache is godawful slow, and is necessary if each separate user-space has a truly private address space and not simply a chunk out of the entire virtual address space. Even for operating systems that partition the virtual address space into regions for each user process, the local descriptor (or equivalent) table for segment access needs to be reloaded. This has to happen for every cross-privilege-level call. It is *much* faster to simply call another kernel mode function (push some stuff on the stack, change the instruction register, and you're done) without messing with caches.

    In fact, it would be even faster to not separate the kernel and user space processes at all, and instead use formal verification or a virtual machine (which really just means a smaller instruction set that's easier to verify) to prove that no user process could ever mess with the kernel or other processes. Virtual machines for languages are essentially at this stage today; they implement what would constitute a kernel as the run-time level portions of the virtual machine, running the virtualized software in the same address space. There have been some attacks based on virtual machine weaknesses or memory corruption that break the protection model by changing data structures so that they violate the security model. This can happen in OS's that use hardware protection as well, there are just fewer places in memory that random changes can cause problems (just the page tables and other security paraphernalia), making it less likely.