EXT4 Is Coming

Yes but by Anonymous Coward · 2006-07-01 01:40 · Score: 5, Interesting

Yes, but will it be enough if you had energy to boil all the oceans?

Interesting bit from wiki/ZFS:

ZFS is a 128-bit file system, which means it can provide 16 billion billion times the capacity of current 64-bit systems. The limitations of ZFS are designed to be so large that they will never be encountered in any practical operation. When contemplating the capacity of this system, Bonwick stated "Populating 128-bit file systems would exceed the quantum limits of earth-based storage. You couldn't fill a 128-bit storage pool without boiling the oceans."

In reply to a question about filling up the ZFS without boiling the ocean, Jeff Bonwick, an engineer at Sun Microsystems who led the team in developing ZFS for Solaris, offered this answer:

"Although we'd all like Moore's Law to continue forever, quantum mechanics imposes some fundamental limits on the computation rate and information capacity of any physical device. In particular, it has been shown that 1 kilogram of matter confined to 1 liter of space can perform at most 1051 operations per second on at most 1031 bits of information [see Seth Lloyd, "Ultimate physical limits to computation." Nature 406, 1047-1054 (2000)]. A fully-populated 128-bit storage pool would contain 2128 blocks (nibbles) = 2137 bytes = 2140 bits; therefore the minimum mass required to hold the bits would be (2140 bits) / (1031 bits/kg) = 136 billion kg.

To operate at the 1031 bits/kg limit, however, the entire mass of the computer must be in the form of pure energy. By E=mc2, the rest energy of 136 billion kg is 1.2x1028 J. The mass of the oceans is about 1.4x1021 kg. It takes about 4,000 J to raise the temperature of 1 kg of water by 1 degree Celsius, and thus about 400,000 J to heat 1 kg of water from freezing to boiling. The latent heat of vaporization adds another 2 million J/kg. Thus the energy required to boil the oceans is about 2.4x106 J/kg * 1.4x1021 kg = 3.4x1027 J. Thus, fully populating a 128-bit storage pool would, literally, require more energy than boiling the oceans."

Modularizable filesystem by Square+Snow+Man · 2006-07-01 01:48 · Score: 2, Interesting

What about a modularizable filesystem, which can be upgraded with modules for compression, encryption, larger file support etc. ? Is this impossible or is it a unkown area for the linux developers?

Re:Sounds like a good idea. by Anonymous Coward · 2006-07-01 02:00 · Score: 2, Interesting

It's BS that people think it should be considered stable. I've never had more corruptions, other than using XFS w/ very heavy writes, than Resier4. It needs at least another year. ext3 on its own, though not awesome in all areas, hasn't lost me any data yet.

Why EXT4 ? by Anonymous Coward · 2006-07-01 02:36 · Score: 4, Interesting

Ext4 is an extention of ext3, much like ext3 is an extention of ext2. The plan is to ensure backwards compatability and sanity for when things break, and with filesystems.. things break.

There are many factors that influence filesystems, not just "how fast it can write", but rather.. how it breaks when it does.

While the fanboys of XFS, JFS, ZFS may promise that their filesystems are faster, had no problems, secure and will not eat your data, it simply is not as proven as ext2 and ext3.

Scream fanboys scream, someone will listen, but the problem is that these filesystems are not proven in the field, or in some circumstances even in the kernel itself.

Re:Why EXT4 ? by Carewolf · 2006-07-01 03:53 · Score: 2, Interesting

In enterprise.. Exactly!

Note that servers with extensive mirroring and other hardware error-handling rarely need error-recovery from the filesystem. Filesystem errors happen on ordinary peoples harddrives when they grow old, and ext* have a million times more experience in the handling those than any enterprise FS..

Re:Why only 48 bits? by r00t · 2006-07-01 02:59 · Score: 4, Interesting

With a block size of 32 kB (64 kB is expected to be supported soonish) the 48-bit numbers will take you 1 byte over the maximum file size that apps can support. There is no UNIX-like OS that lets an app handle files bigger than 2**63.

We'll need to adjust other things if filesystems ever get so huge. The whole design probably needs a rethink, but we can't do it now. We don't know what the future holds in terms of seek times, transfer rates, sector sizes, etc.

Re:Sounds like a good idea. by CRCulver · 2006-07-01 03:14 · Score: 4, Interesting

This'll fill the gap between now and when Reiser4 is declared stable

Reiser4 will never be declared stable in the Linux kernel because Hans Reiser refuses to make his code conformant to kernel coding standards. There has been long and wearying discussion of this on the LKML.

Re:define very large by zlogic · 2006-07-01 03:50 · Score: 2, Interesting

Though this may be needed in some rare applications, I don't see ext4 as something needed in the near future. As I understand, the larger the max partition&file size, the more space indexes will need (not to mention that speed will probably drop).
For example, if we have 20-bit indexes (2^20 clusters max) and use 4-kilobyte clusters, to increase the maximum space we'll either have to add one bit to the indexes to double the maximum space or we'll have to increase the cluster size and have problems storing small files (remember the FAT16->FAT32 transition?)
ext4 is thousands larger than ext3, which will probably mean that indexes will need a lot more space, which will be bad for 8TB volumes (and besides, noone would notice any benefits!)

What they really should do is by Anonymous Coward · 2006-07-01 05:10 · Score: 1, Interesting

In ext4 they should get rid of some legacy stuff to foster development and usage of new technologies. The users of legacy technologies could still use ext3 and it would be very nice for ext4 users. I'm talking mostly about dropping support for the old style octal file access permissions system and bolting the ACL system as the default and enabling the metadata features by default.

The fact that nothing pressurises ever the distribution builders into using anything new has lead to majorly slowed down development of Linux.

Re:Sounds like a good idea. by mnmn · 2006-07-01 09:05 · Score: 2, Interesting

Who cares? Linux has more than its fair share of filesystems, including XFS. I'm still wondering why XFS isnt used universally on desktop and server Linux installations everywhere. Is the ext2/3 just 'traditional'?

--
"Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky

Re:Well, how does a Honda Civic ... by Anonymous Coward · 2006-07-01 09:31 · Score: 1, Interesting

"ZFS is an exotic beast with a totally ridiculous maximum capacity and tons of advanced of features that do not exist in any other Unix filesystem, but are only useful for Big Iron."

Actually, except for his highly advanced algorithms, ZFS code is very small and simple, and on top of that, ZFS is really nice in small desktop deployments, where his "big iron" features give him the ability to detect and automatically correct garbage being delivered by that cheap SATA drive.

In fact, having been ported (compiles, doesn't yet run) to Linux and in process of being ported to OS X, and FreeBSD, ZFS is on a pretty good track to becoming ubiquitous... which would be the exact opposite of exotic.

Re:Sounds like a good idea. by raxx7 · 2006-07-01 11:08 · Score: 3, Interesting

There are or were a few quirks.

First off the bat: you can't install the bootloader in a XFS partition since XFS uses the first 512 byte block on the partition. Of course, most people install the bootloader in the MBR but for some it's an issue.

GRUB had a bug with XFS. When you tried to use a XFS partition as /boot, you could corrupt XFS.

For a considerable period of time, ext3's code was more stable than XFS.

ext3 has an ordered data mode (which is the default). Other journaled file systems only support writeback mode. In general, ordered data mode doesn't provide any better warranty of consistency than writeback mode but does make an important difference for a few special cases but which can make a substancial difference to a desktop user.

Typical annoying case:
- You're editing a file on your favorite text editor and you save it.
- The editor opens the file in overwrite mode, meaning the file is actually deleted and a new one is created (under Linux's default settings, the OS will commit the changes to the metadata in 5 seconds or less and the changes to the data in 30 seconds or less).
- The changes to the metadata are commited to disk.
- The system crashes!
When the system comes back up, the new file is there it's full of garbage.

With ext3's ordered data mode, the contents of the file would have been commited to disk before the associated changes to metadata. It's problable (but not assured!!) that after a crash you'll have either the old version or the new version of the file.

Re:Sounds like a good idea. by szap · 2006-07-01 16:29 · Score: 2, Interesting

Just a quick chime in, take it with a grain of salt. Some rambling thoughts.

I've just converted my main partition (non-/boot) on a notebook from XFS to reiser3 mainly because I work with huge svn working copies and svn loves to keep small files around, as well as create lots of small files (lock files, etc) during routine svn work. xfs is just way considerably slower than reiserfs for svn status, update, commit, cleanup. Besides, reiser3's tail feature means svn's penchant for small files uses less space overall on my tinny notebook harddrive. Not sure if performance of reiser3 will degrade over time, (I've been on xfs on this partition for longer than a year), but we'll see.

BTW, http://www.debian-administration.org/articles/388 My observations differ from theirs (operations on file tree). I do have a significant larger amount of files, and many of those are smaller than the default block size, so that might affect things.

On the server side, XFS, on multiple concurrent large, random, writes (postgresql) just creams reiser3 and ext3. (IIRC, battery backed SCSI raid controller, tested with both RAID1+0 and RAID5, Linux 2.6.x, 6 x 15000RPM 132(?)GB HDD) Read operations and single thread seq/random writes are too similar in performance for the various filesystems.

Another feature of XFS I used a lot (before converting to reiser3) is xfs_fsr, which defrags a mounted xfs filesystem. Oddly buggy though, as after some runs, some inodes tends to have max_extents corrupted (endian problem?). I'd recommend a xfs_repair after a xfs_fsr, which effectively makes xfs_fsr a utility for defragging *UN*mounted filesystems. So yeah, xfs is a tad unstable. I've only one real corruption, though, and that's from killing the notebook power during some writes. Not sure if that's from the fs, or the harddisk misbehaving.

A real O/S filesystem needs defrag! by ArtStone · 2006-07-02 00:41 · Score: 2, Interesting

The main described change / advantage in this proposed ext4 is that the notion that a file's allocation is tracked via "extents" (a specified number of contiguous 2k blocks) rather than a chain of inode pointers (with up to 3 levels of indirection).

This is based not only on the need for a larger maximum file system, but a recognition that there is significant performance advantage to reducing read/write head movement and initiating large reads from consecutive blocks that can take advantage of the high transfer rates of today's drives. (this assumes that the OS filesystem doesn't attempt/require that the entire disk drive be cached in RAM to get decent performance)

Except for "write once" files, over time this will cause files to become physically spread over the disk and the performance benefit is reduced, unless a process periodically consolidates the blocks back into a contiguous series of blocks (ignoring for the moment that on today's disk drives, blocks may be "spared" into place that are not really physically consecutive, but just logically appear to be)...

One of the "proofs" that *nix is superior to other O/Ss has been the absence of a need to "Defrag" the file system.

A commenter on the article also raises the question of why the "right" solution isn't to increase the 2k block size limit rather than rework the internals of the block pointers, and got the response that since the linux kernal manages memory in 2k blocks, it is a nightmare in the kernal to support larger I/O transfers (although others here seem to indicate this is one of the solutions people have implemented)

Isn't "extents" a concept contained in NTFS? Has anyone looked into the patent implications of these proposed changes?

--
Final 2006 "Proof of Global Warming" US Hurricane Count -> 0

Re:fsck quality by hansreiser · 2006-07-02 05:29 · Score: 2, Interesting

ext2fsck has a history of plenty of problems, just like everyone. I get reports from users swearing they will never again use ext*. Ted Tso goes walking around FUD'ing everyone else's fsck. He does this because ext* performance is poor, so there is not much else to do but FUD. Some users suspect that high performance is a little sinful, so this works on some.

All of the major filesystems have a decent fsck, and all of them are by now stable to the point that you should worry about your hardware and backups failing, not your FS. The only qualifier on that is that ZFS is new, and I hope no one will view that as my FUDing.

Re:Sounds like a good idea. by fbjon · 2006-07-02 19:32 · Score: 2, Interesting

But if the code's already been changed, why hasn't it been included yet?

--
True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.

16 of 182 comments (clear)