Ext4 Advances As Interim Step To Btrfs

← Back to Stories (view on slashdot.org)

Ext4 Advances As Interim Step To Btrfs

Posted by kdawson on Sunday October 19, 2008 @03:57PM from the butter-is-better dept.

Heise.de's Kernel Log has a look at the ext4 filesystem as Linus Torvalds has integrated a large collection of patches for it into the kernel main branch. "This signals that with the next kernel version 2.6.28, the successor to ext3 will finally leave behind its 'hot' development phase." The article notes that ext4 developer Theodore Ts'o (tytso) is in favor of ultimately moving Linux to a modern, "next-generation" file system. His preferred choice is btrfs, and Heise notes an email Ts'o sent to the Linux Kernel Mailing List a week back positioning ext4 as a bridge to btrfs.

13 of 510 comments (clear)

Min score:

Reason:

Sort:

BTRFS? REALLY? by erroneus · 2008-10-19 16:00 · Score: 4, Interesting

Couldn't they come up with a better name than "BuTteR FaSe?" I know I can't be the only one who read it like that. Call it anything but that.
Why not ZFS? by mlts · 2008-10-19 16:06 · Score: 5, Interesting

Unless ZFS has patent issues, why not just work on having ZFS as Linux's standard FS, after ext3?
ZFS offers a lot of capabilities, from no need to worry about a LVM layer, to snapshotting, to excellent error detection, even encryption and compression hooks.
1. Re:Why not ZFS? by GrievousMistake · 2008-10-19 17:49 · Score: 5, Interesting
  
  Huh. One of the interesting things things about Reiser4 from an end-user perspective was Hans Reisers plans for file metadata. From what I can find about btrfs, it currently doesn't even support normal extended attributes. There was also talk about making it easy for developers to extend the filesystem with plugins that could add e.g. compression schemes.
  I can't really recognize anything from Hans Reiser's ramblings in the btrfs documentation that isn't standard file system improvements already seen in e.g. ZFS. does anyone have any specific examples of the ZFS-leapfrogging features referred to?
  
  --
  In a fair world, refrigerators would make electricity.
2. Re:Why not ZFS? by adrianwn · 2008-10-19 18:53 · Score: 5, Interesting
  
  A microkernel loads modules into the kernel space.
  No, that's the opposite of a microkernel. A microkernel loads its modules (then often called "servers") into user space. If the kernel and its drivers etc. run in the same address space (as is the case with, e.g., Linux), then we're talking about a monolithic kernel, even if it can dynamically load modules.
3. Re:Why not ZFS? by BrokenHalo · 2008-10-19 19:38 · Score: 4, Interesting
  
  not to belittle ext3 and ext2 for that matter, but their time is beginning to pass, and something new needs to replace it.
  
  I'm not sure that I see why, unless you're simply bored with the older filesystems. Something as critical as this should not be driven by what is trendy at any given moment. If one has no need for particular advanced bells or whistles, there is no need to use them.
  
  For instance, since for historical and security reasons I keep /boot on its own separate partition which is mounted readonly, it makes sense here to not have anything trying to write to a journal, so ext2 is still a very good choice here. As the partition is tiny (only 20MB) it takes a fraction of a second to run e2fsck over it when or as required, so there is nothing to be gained by journalling it anyway.
  
  I still use ReiserFS3 on most of my other partitions, since I don't have any intention of changing the filesystem until I change the drives. ReiserFS is still a good choice for my purposes anyway.
What I'd like by grasshoppa · 2008-10-19 16:09 · Score: 4, Interesting

I would like transparent, administrator controlled, versioning. Modified a word document and saved it in place? root can go back and get the old version ( and, alternatively, the user can. root could disable this functionality ).
The pieces are in place, it's doable, just someone needs to program it.

--
Mod me down with all of your hatred and your journey towards the dark side will be complete!
1. Re:What I'd like by corsec67 · 2008-10-19 16:20 · Score: 4, Interesting
  
  So, you want a Versioning file system? Just make sure you never let that run on /var.
  OSS is like capitalism: If you see a need, then make it and distribute it.
  
  --
  If I have nothing to hide, don't search me
2. Re:What I'd like by bendodge · 2008-10-19 16:33 · Score: 4, Interesting
  
  That leads to space-bloat.
  What I'd like are files with expiration dates. When I make up some twiddly chart or download some funny video, I keep it because I'll probably want it tomorrow or next week, but then I tend to forget to delete it later. It would be really cool if creating a user data file prompted you with a simple dialog specifying how long you want it. Common options like 1 Week, 1 Month, 6 Months, 2 Years, Forever would do most of the time, and an option to choose a custom date would cover the rest. When a file expired, it would be placed in some kind of psudo-Trash Bin that could be reviewed and emptied when you want more space.
  I'd also love something tag-based instead of hierarchy-based. For example, I store photos by Year > Month > Event, but sometimes I want to make another category for photos of a specific person. This means I either make duplicates or have to dig around to find things. If I could tag them with dates (that should actually be auto-generated from the EXIF), event, place, and people I could then just browse for files with a particular tag.
  Come to think of it, these ideas are both somewhat akin to how a human brain stores stuff.
  
  --
  The government can't save you.
You're both right. by SanityInAnarchy · 2008-10-19 17:30 · Score: 5, Interesting

ZFS duplicates a lot of functionality that belongs outside of a filesystem.
Very true.

It wouldn't be possible to duplicate RAID-Z with LVM.
Also true.
And the features which could be duplicated, couldn't be done nearly as well without a little more knowledge of the filesystem.
The real problem here is that we're finding out that generic block devices aren't enough to do everything we want to do outside the filesystem itself. Or, if they are, it's incredibly clumsy. Trivial example: If I want a copy-on-write snapshot, I have to set aside (ahead of time) some fixed amount of space that it can expand into. If I guess high, I waste space. If I guess low, I have to either expand it (somehow, if that's even possible) or lose my snapshot.
A filesystem which natively implemented COW could also trivially implement snapshots which take up exactly as much space as there are differences between the increments. But because of the way the Linux VFS is structured, this kind of functionality would have to be in a single filesystem, and would be duplicated across all filesystems. Best case, it'd be like ext3's JBD, as a kind of shared library.
A humble proposal: We need another layer, between the block layer and the filesystem layer -- call it an extent layer -- which is simply concerned with allocating some amount of space, and (perhaps) assigning it a unique ID. Filesystems could sit above this layer and implement whatever crazy optimizations or semantics they want -- linear vs btree vs whatever for directories, POSIX vs SQL, whatever.
The extent layer itself would only be concerned with allocating extents of some requested size, and actually storing the data. But this would be enough information to effectively handle mirroring, striping, snapshotting, copy-on-write, etc.
It wouldn't be universal -- I've said nothing about the on-disk format, and, indeed, some filesystems exist on Linux solely for that purpose -- vfat, ntfs, udf, etc. Those filesystems could be done pretty much exactly the way they're done now. After all, the existence of a block layer in no way implies that every filesystem must be tied to a block device (see proc, sys, fuse, etc.)
But I think it would work very well for filesystems which did choose to implement it. I think it would provide the best of ZFS and LVM.
I haven't actually been seriously following filesystem development for years, so maybe this is already done. Or maybe it's a bad idea. If not, hopefully some kernel developers are reading this.

--
Don't thank God, thank a doctor!
when ext4 is feature complete it will be the #3 fs by ZeekWatson · 2008-10-19 18:29 · Score: 4, Interesting

I'd like to know why Ted Tso and others are working on ext4? Even when ext4 is feature complete it will be the #3 filesystem in linux in terms of features and scalability behind xfs and jfs. I'd like to know what Ted Tso and others grudge against xfs and jfs is because they basically wont even acknowledge those filesystems.
btrfs does have some nice looking features, its basically a gpl rewrite of zfs.
The weakness with linux is in the LVM or EVMS layer. They both suck in that they are not enterprise ready (ie multi TB filesystems, 100+ MB/s sustained read/write) in that they cause unexplained IO hicups, lockups and kernel panics. LVM/EVMS certainly work fine for Joe Blow's HTPC, or a paltry 100GB database but they fall down when under serious load.
This is the problem with open source. Certain areas, like filesystem development attract all the developers, and other areas like LVM/EVMS are seen as busting rocks and nobody wants to work on them. The results is we get a plethora of second rate filesystems (ie ext4) and a buggy LVM/EVMS layer that nobody wants to work on.
Re:If you want a blazingly fast file system.... by moosesocks · 2008-10-19 18:52 · Score: 4, Interesting

Max Volume Size: 8 TiB.

That's not enough. Given that 1TB storage devices are on the market now, that could become outdated quite quickly. You'd be foolish to adopt that sort of filesystem, unless you were absolutely positive that you'd never upgrade (unlikely).
Honestly, ZFS seems like it's the holy grail of filesystems. There are a few small issues that might need to be worked out, though it seems as close to "ideal" as you'd ever be able to get.

--
-- If you try to fail and succeed, which have you done? - Uli's moose
Re:when ext4 is feature complete it will be the #3 by Jah-Wren+Ryel · 2008-10-19 22:09 · Score: 5, Interesting

The weakness with linux is in the LVM or EVMS layer. They both suck in that they are not enterprise ready (ie multi TB filesystems, 100+ MB/s sustained read/write) in that they cause unexplained IO hicups, lockups and kernel panics. LVM/EVMS certainly work fine for Joe Blow's HTPC, or a paltry 100GB database but they fall down when under serious load.
LVM has been rock-solid for me with a ~7TB and 2 2TB ext3 filesystems (24 500GB disks) over the course of a year and a half. No problems migrating extents all over the place when I needed to swap disks in and out. Almost identical to HPUX in functionality, but without the sizing constraints.
But, when I tried xfs for kicks I found out that a 7TB filesystem means you need 7GB of RAM to fsck it - impossible on a 32-bit system, I also had a week where I it all went in the shitter because I ran free-space to zero and started getting OS panics and data corruption.
I'm definitely considering jfs for the next generation, my main complaint with ext3 has been ridiculously slow deletes and fsck's. Problems I have read don't exist with jfs.

--
When information is power, privacy is freedom.
Re:Back when there was only fat16, ntfs, ext2 used by Chemisor · 2008-10-20 01:34 · Score: 4, Interesting

> Just search for benchmarks, something like reiserfs beats ext2 by huge margins
You mean like these ones where ext2 beats reiserfs in most cases and is at least as fast in the others?
> I hope you're joking. ext2 is nice and simple, but it's neither fast not reliable.
> It uses a linear search to find directory entries, which means it's very slow on
> large directories, like Maildir mailboxes.
Believe it or not, the world does not revolve around huge mail servers. Some of us actually run Linux on a desktop, and so don't really care about how well an fs handles a million maildir mailboxes. Latency is the most important criteria, and reiserfs is just too complicated to deliver it, as well as being a largely fringe fs. Especially now with Hans gone, it would become even more fringe.
> It doesn't do tail packing which means it wastes space and is slower with small files.
Yup, I'd like to have efficient small file handling. But really, it is better to avoid having many small files in the first place. Use compressed archives to store such things; it's quite a bit more efficient, and does not require exotic file systems which most normal people (i.e. your customers) will not use.
> It's not reliable because without a journal it needs a fsck after a bad shutdown
I used to do that, and then I got a UPS instead and switched back to pure ext2. The performance hit from journalling is simply too high to tolerate. A decent UPS (pretty much anything made by APC) will prevent the crashes in the first place, solving the problem completely and without any unnecessary overhead. With UPS prices being as low as they are, there is no excuse for not having one, so I think that journalling will become obsolete in some near future.