Slashdot Mirror


EXT4 Is Coming

ah admin writes "A series of patches has been proposed in Linux kernel mailing list earlier by a team of engineers from Red Hat, ClusterFS, IBM and Bull to extend the Ext3 filesystem to add support for very large filesystems. After a long-winded discussion, the developers came forward with a plan to roll these changes into a new version — Ext4."

27 of 182 comments (clear)

  1. Sounds like a good idea. by Ant+P. · · Score: 5, Funny

    This'll fill the gap between now and when Reiser4 is declared stable - some time after Duke Nukem Forever gets released.

    1. Re:Sounds like a good idea. by CRCulver · · Score: 4, Interesting

      This'll fill the gap between now and when Reiser4 is declared stable

      Reiser4 will never be declared stable in the Linux kernel because Hans Reiser refuses to make his code conformant to kernel coding standards. There has been long and wearying discussion of this on the LKML.

    2. Re:Sounds like a good idea. by raxx7 · · Score: 3, Interesting

      There are or were a few quirks.

      First off the bat: you can't install the bootloader in a XFS partition since XFS uses the first 512 byte block on the partition. Of course, most people install the bootloader in the MBR but for some it's an issue.

      GRUB had a bug with XFS. When you tried to use a XFS partition as /boot, you could corrupt XFS.

      For a considerable period of time, ext3's code was more stable than XFS.

      ext3 has an ordered data mode (which is the default). Other journaled file systems only support writeback mode. In general, ordered data mode doesn't provide any better warranty of consistency than writeback mode but does make an important difference for a few special cases but which can make a substancial difference to a desktop user.

      Typical annoying case:
      - You're editing a file on your favorite text editor and you save it.
      - The editor opens the file in overwrite mode, meaning the file is actually deleted and a new one is created (under Linux's default settings, the OS will commit the changes to the metadata in 5 seconds or less and the changes to the data in 30 seconds or less).
      - The changes to the metadata are commited to disk.
      - The system crashes!
      When the system comes back up, the new file is there it's full of garbage.

      With ext3's ordered data mode, the contents of the file would have been commited to disk before the associated changes to metadata. It's problable (but not assured!!) that after a crash you'll have either the old version or the new version of the file.

    3. Re:Sounds like a good idea. by hansreiser · · Score: 4, Informative

      What are you talking about? I said I didn't like the coding standards. I then had us change the code to conform to them.

  2. Yes but by Anonymous Coward · · Score: 5, Interesting
    Yes, but will it be enough if you had energy to boil all the oceans?

    Interesting bit from wiki/ZFS:
    ZFS is a 128-bit file system, which means it can provide 16 billion billion times the capacity of current 64-bit systems. The limitations of ZFS are designed to be so large that they will never be encountered in any practical operation. When contemplating the capacity of this system, Bonwick stated "Populating 128-bit file systems would exceed the quantum limits of earth-based storage. You couldn't fill a 128-bit storage pool without boiling the oceans."

    In reply to a question about filling up the ZFS without boiling the ocean, Jeff Bonwick, an engineer at Sun Microsystems who led the team in developing ZFS for Solaris, offered this answer:

    "Although we'd all like Moore's Law to continue forever, quantum mechanics imposes some fundamental limits on the computation rate and information capacity of any physical device. In particular, it has been shown that 1 kilogram of matter confined to 1 liter of space can perform at most 1051 operations per second on at most 1031 bits of information [see Seth Lloyd, "Ultimate physical limits to computation." Nature 406, 1047-1054 (2000)]. A fully-populated 128-bit storage pool would contain 2128 blocks (nibbles) = 2137 bytes = 2140 bits; therefore the minimum mass required to hold the bits would be (2140 bits) / (1031 bits/kg) = 136 billion kg.

    To operate at the 1031 bits/kg limit, however, the entire mass of the computer must be in the form of pure energy. By E=mc2, the rest energy of 136 billion kg is 1.2x1028 J. The mass of the oceans is about 1.4x1021 kg. It takes about 4,000 J to raise the temperature of 1 kg of water by 1 degree Celsius, and thus about 400,000 J to heat 1 kg of water from freezing to boiling. The latent heat of vaporization adds another 2 million J/kg. Thus the energy required to boil the oceans is about 2.4x106 J/kg * 1.4x1021 kg = 3.4x1027 J. Thus, fully populating a 128-bit storage pool would, literally, require more energy than boiling the oceans."
    1. Re:Yes but by Anonymous Coward · · Score: 4, Informative

      That post makes more sense if you realize that there should be ^ marks to show exponentiation, such as 10^51 and 2^140. Otherwise it just looks like gibberish numbers that someone made up and stuck in the wiki for shits and giggles.

  3. LWN article on ext4 by ElMiguel · · Score: 5, Informative

    LWN had an interesting article on ext4 not long ago.

  4. ClusterFS by schon · · Score: 5, Funny

    engineers from Red Hat, ClusterFS, IBM

    OK, hands up - who wants to run ClusterFS so that they can say they needed to do a "clusterfsck"?

  5. Re:Modularizable filesystem by Bogtha · · Score: 5, Informative
    --
    Bogtha Bogtha Bogtha
  6. LKML Message by Anonymous Coward · · Score: 3, Informative

    The kernel mailing list message:

    Subject Proposal and plan for ext2/3 future development work
    From "Theodore Ts'o"
    Date Wed, 28 Jun 2006 19:55:39 -0400

    Given the recent discussion on LKML two weeks ago, it is clear that many
    people feel they have a stake in the future development plans of the
    ext2/ext3 filesystem, as it one of the most popular and commonly used
    filesystems, particular amongst the kernel development community. For
    this reason, the stakes are higher than it would be for other
    filesystems. The concerns that were expressed can be summarized in the
    following points:

    * Stability. There is a concern that while we are adding new
    features, bugs might cause developers to lose work.
    This is particularly a concern given that 2.6 is a
    "stable" kernel series, but traditionally ext2/3
    developers have been very careful even during
    development series since kernel developers tend to get
    cranky when all of their filesystems get trashed.

    * Compatibility confusion. While the ext2/3 superblock does
    have a very flexible and powerful system for
    indicating forwards and backwards compatibility, the
    possibility of user confusion has caused concern by
    some, to the point where there has been one proposal
    to deliberately break forwards compatibility in order
    to remove possible confusion about backwards
    compatibility. This seems to be going too far,
    although we do need to warn against kernel and
    distribution-level code from blindly upgrading users'
    filesystems and removing the ability for those
    filesystems to be mounted on older systems without an
    explicit user approval step, preferably with tools
    that allow for easy upgrading and downgrading.

    * Code complexity. There is a concern that unless the code is
    properly factored, that it may become difficult to
    read due to a lot of conditionals to support older
    filesystem formats.

    Unfortunately, these various concerns were sometimes mixed together in
    the discussion two months ago, and so it was hard to make progress.
    Linus's concern seems to have been primarily the first point, with
    perhaps a minor consideration of the 3rd. Others dwelled very heavily
    on the second point.

    To address these issues, after discussing the matter amongst ourselves,
    the ext2/3 developers would like to propose the following path forward.

    1) The creation of a new filesystem codebase in the 2.6 kernel tree in /usr/src/linux/fs/ext4 that will initially register itself as the
    "ext3dev" filesystem. This will be explicitly marked as an
    CONFIG_EXPERIMENTAL filesystem, and will in affect be a "development
    f

  7. Re:define very large by Kjella · · Score: 5, Insightful

    Let me put it this way, it's a little past the average slashdot porn collection:

    ext3: 8TB total, 4TB files
    ext4: 32 zettabyte (1024*1024*1024 TB), 1 exabyte files (1024*1024 TB)

    Beyond that, it doesn't seem to actually change much.

    --
    Live today, because you never know what tomorrow brings
  8. Re:How does it compare to zfs? by Ignominious+Cow+Herd · · Score: 3, Insightful

    Ummm...zfs exists, ext4 doesn't. Yet.

    --
    Lump lingered last in line for brains, and the ones she got were sorta rotten and insane.
  9. Why EXT4 ? by Anonymous Coward · · Score: 4, Interesting

    Ext4 is an extention of ext3, much like ext3 is an extention of ext2. The plan is to ensure backwards compatability and sanity for when things break, and with filesystems.. things break.

    There are many factors that influence filesystems, not just "how fast it can write", but rather.. how it breaks when it does.

    While the fanboys of XFS, JFS, ZFS may promise that their filesystems are faster, had no problems, secure and will not eat your data, it simply is not as proven as ext2 and ext3.

    Scream fanboys scream, someone will listen, but the problem is that these filesystems are not proven in the field, or in some circumstances even in the kernel itself.

    1. Re:Why EXT4 ? by Frumious+Wombat · · Score: 4, Informative

      Actually, XFS (SGI), JFS (IBM), and ZFS (Sun) are very well proven in the field, on their respective native operating systems. Given the situations they're used in (financial sector, pharmaceutical research data, supercomputing), they're far more proven that EXT(anything). Now, whether the average Linux user knows how to install, tune, and use them is a different issue, but if I were worried about scalable, mission-critical, filesystems, those three would be on the top of my list. (and my personal history says that while XFS never gave me any trouble, JFS would be my first choice. Nobody ever let me have a budget large enough to buy a machine that would justify ZFS).

      With IBM's know-how in the mix, EXT4 may be able to join the above three, but it would seem to be time better spent fixing XFS/JFS support in Linux first, rather than worrying about backwards compatibility with EXT2.

      --
      the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
    2. Re:Why EXT4 ? by Znork · · Score: 3, Informative

      "ZFS (Sun) are very well proven in the field"

      Um, I have yet to see a production installation of ZFS in an enterprise environment, and it hasn't been out as an actual release for even a year yet. You probably mean UFS. HTH.

  10. Re:Why only 48 bits? by r00t · · Score: 4, Interesting

    With a block size of 32 kB (64 kB is expected to be supported soonish) the 48-bit numbers will take you 1 byte over the maximum file size that apps can support. There is no UNIX-like OS that lets an app handle files bigger than 2**63.

    We'll need to adjust other things if filesystems ever get so huge. The whole design probably needs a rethink, but we can't do it now. We don't know what the future holds in terms of seek times, transfer rates, sector sizes, etc.

  11. Re:Modularizable filesystem by Bogtha · · Score: 4, Insightful

    the premise that Reiser is more stable than ext3 "because it has been out longer"

    It's dishonest to put something in quotes when it's not a direct quote. The exact quote is:

    "We don't touch the V3 code except to fix a bug, and as a result we don't get bug reports for the current mainstream kernel version. It shipped before the other journaling filesystems for Linux, and is the most stable of them as a result of having been out the longest. We must caution that just as Linux 2.6 is not yet as stable as Linux 2.4, it will also be some substantial time before V4 is as stable as V3."

    There's a substantial difference between saying that something is more stable "as a result" of something and more stable "because" of something. He's not claiming that being out longer intrinsically makes it more stable as your misquote suggests, he's claiming that it led to reiserfs becoming more stable - because of the practices he mentioned.

    In short - something being out longer == more stable? No. Something being exposed to lots of real-world use and receiving only bugfixes == more stable? Yes.

    the quote from Adam Smith

    He didn't quote Adam Smith, he drew an analogy between what he was saying and the network effect. It's an entirely reasonable analogy.

    the ridicule of the unix approach of everything as a file

    What ridicule? He's actually supporting that approach. For example:

    Can we do everything that can be done with {files, directories, attributes, streams} using just {files, directories}? I say yes--if we make files and directories more powerful and flexible. I hope that by the end of reading this you will agree.

    Would you care to point out where you thought he was ridiculing the UNIX approach?

    all the naked people covered in newsprint

    Yeah, they look dumb, don't they?

    Anyone have a "more technical" link

    I can only assume you mean something other than "technical".

    without dancing trees

    Dancing trees are a fundamental part of the design. How are you meant to understand the filesystem without understanding dancing trees?

    and with a bit about how to recover your filesystem when something goes weird with the hardware even if the filesystem is perfect?

    Ah, you don't mean technical at all, you mean practical for somebody who is entirely uninterested in the way the filesystem works. Perhaps Reiser4 Transaction Design Document is what you are after, but I doubt it.

    --
    Bogtha Bogtha Bogtha
  12. Pattern by Eudial · · Score: 4, Funny

    Ext2...Ext3...Ext4

    Wait... I think I can detect a pattern. The next number has to be Ext7½!

    --
    GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!
  13. Re:define very large by runep · · Score: 3, Funny
    Let me put it this way, it's a little past the average slashdot porn collection:
    I think you underestimate the combination of lonely geeks, OCD, unemployment, broadband and wget.
  14. fsck quality by r00t · · Score: 5, Informative

    Nobody has a fsck that can compare to e2fsck (ext2/ext3/etc.) for quality.

    The e2fsck program has a huge test suite that it must pass before a release. A set of corrupted filesystems must be correctly repaired to be bit-for-bit identical to the desired result.

    A typical fsck has a good chance of crashing (SIGSEGV, the "segmentation violation") when the going gets tough.

    While FreeBSD's UFS developers were messing around with sync writes to avoid testing a fsck that would often crash, the ext2 developers ran full async and wrote a damn fine fsck to put things back in order. Now you can choose from three different levels of journalling, and you still get the ass-kicking fsck program.

    There basically is no fsck for XFS, Reiserfs, or Reiser4. JFS doesn't have much AFAIK, and ZFS is a newborn.

    What are you going to do when your fancy filesystem gets trashed? I hope you keep excellent backups, very recent and tested to be readable.

  15. Re:define very large by glwtta · · Score: 3, Insightful

    ext3: 8TB total, 4TB files
    ext4: 32 zettabyte (1024*1024*1024 TB), 1 exabyte files (1024*1024 TB)


    Are they just going to work on improving the 8TB paper limitation, or are they actually trying to improve on ext3 scalability? Which, currently tends to suck the big one, especially on a significant number of disks (eg: http://scalability.gelato.org/DiskScalability/Resu lts).

    I also seem to keep coming up against a pretty hard 2TB block device limit in Linux (eg LVM2 lv size, LUN size for fibre attached SAN, etc). I don't really know what the reasons for it are, anyone know what technologies allow for larger single partitions?

    Anyway, I've long ago settled on reiserfs (3) for speedy random access to small files, and XFS for file server type applications; though I still wonder why RedHat doesn't include any "enterprise" filesystems by default in their "enterprise" products (I know, I know, you can enable it - I did say "by default").

    --
    sic transit gloria mundi
  16. Linux and other Unix FSes by digitalhermit · · Score: 3, Insightful

    I'm as big a Linux fan as anyone, but one glaring thing that it needs is some better filesystem tools. Don't get me wrong -- they've come a long way in the last couple years -- but compared to something like AIX it still has a little ways to go. Here's one feature that causes a challenge: Linux filesystems and the underlying logical volume layer is largely decoupled. You have an immense amount of flexibility but as a consequence, the filesystem and volume layers don't always communicate as well. For example, the AIX JFS2 tools allow you to dynamically grow/shrink filesystems. This functionality exists in Linux for some filesystems (EXT3, ReiserFS) but the procedure varies depending on how the filesystem is constructed. And at this point, I'm not fully convinced of its stability as I've recently (three months ago) lost an entire disk after a dynamic resize on an LVM backed EXT3 partition. I have yet to reproduce the failure but it occurred with a 95% full /home and a kernel compile going full tilt.

    But I'm amazed at how quickly these features are being integrated. There's functionality in Linux that allows me to easily create file-backed volumes, remote volumes, SAN LUNs, etc.. The "resize in a single command" is not fully there yet, but within 6 months I'd expect it to be.

    1. Re:Linux and other Unix FSes by Homology · · Score: 3, Insightful

      >I'm as big a Linux fan as anyone, but one glaring thing that it needs is some better filesystem tools.

      I'm pretty certain that Linux would have better filesystem tools if the developers could resist add a new filesystem every few months.

  17. Re:define very large by Kjella · · Score: 3, Informative

    From what I understood the sector index will be configurable as either 32 or 64 bit, so pick it if you need it... Since there's no reason to use it unless the disk is that big, I imagine this can be set automaticly. Also, the whole reason this will be ext4 is that they'll change the way it stores the sectors (ranges instead of singles) which will be better for big files, and since one sector is 4kB almost any file is "big".

    --
    Live today, because you never know what tomorrow brings
  18. Re:Well, how does a Honda Civic ... by DavidS · · Score: 3, Informative

    This is simply not true. ZFS is not just for big iron. It's strongest feature is perhaps the melding of the volume manager and raid into one single unit greatly simplifies administration. Not to mention other nice features, either new os greatly simplified from their past versions, such as pooling, dynamic striping, CoW, instant snapshots and cloning, fault tolerance, etc.

    I'd suggest reading through these links before spreading more mis-information:

    http://unixconsult.org/zfs_vs_lvm.html - ZFS vs. Linux Raid vs. Linux LVM vs. Linux LVM + Raid

    http://uadmin.blogspot.com/2006/05/why-zfs-for-hom e.html - Why ZFS for home

    dks

  19. Re:Well, how does a Honda Civic ... by DavidS · · Score: 4, Informative

    This is true, but let's look at the case of 1-2 drives:

    Assuming we still want mirroring or volume management on our two drives:
    The overhead is still greater for SVM or for linux md and sistina lvm. Both require more administration knowledge, time, and commands to accomplish the same tasks that ZFS can do in a couple commands. (Yes, I'm aware that mdadm helps the process a *bit*, but it's still obtuse.) Anyone who has setup either knows how annoying anything is with either choice. (having to micromanage partitions, etc.)

    The biggest thing for ZFS in a ``small'' 1-2 drive usage case is, in my opinion, the pooling: ZFS doesn't require one to set volume sizes in advance. Since everything pulls out of a common pool, the size of volumes can grow or shrink accordingly. (Affected by free pool space or volume quotas.) So, that means that one can just create their volumes, and not have to worry about making them the wrong size.

    I'd also argue that fault tolerance is important anywhere, large or small.

    Another thing is on-disk, low overhead, compression that can be enabled just by toggling one filesystem paramater, live. For a lot of things that people store, this compression would save a lot of space.

    They really put a lot of thought in ZFS. It scales amazingly well, from small to large. I'm not really giving it justice explaining it here, so I'd encourage you to look at the documentation with an open mind before just writing it off as an ``enterprise only'' thing.

    dks
    (I have no affiliation with Sun in any way.

  20. Re:My take on current filesystems by waferhead · · Score: 3, Insightful

    "I consider it to be about as stable as XFS."

    I have had my /video and /home partitions on XFS for... WAY too long, several years, same drives.
    (I just keep adding on)

    I lose power a lot where I live (glitches) and XFS has been utterly bullet proof.

    (This filesystem has bee thru 3 motherboards, several linux distros (1 mb dead/2 upgrades), 2 cases, and so on)

    If Reiser4 is about as stable as XFS, I'll glady switch everything over tomorrow on my MythTV box.