Ext4 Advances As Interim Step To Btrfs

← Back to Stories (view on slashdot.org)

Ext4 Advances As Interim Step To Btrfs

Posted by kdawson on Sunday October 19, 2008 @03:57PM from the butter-is-better dept.

Heise.de's Kernel Log has a look at the ext4 filesystem as Linus Torvalds has integrated a large collection of patches for it into the kernel main branch. "This signals that with the next kernel version 2.6.28, the successor to ext3 will finally leave behind its 'hot' development phase." The article notes that ext4 developer Theodore Ts'o (tytso) is in favor of ultimately moving Linux to a modern, "next-generation" file system. His preferred choice is btrfs, and Heise notes an email Ts'o sent to the Linux Kernel Mailing List a week back positioning ext4 as a bridge to btrfs.

25 of 510 comments (clear)

BTRFS? by Anonymous Coward · 2008-10-19 16:05 · Score: 5, Funny

So it incorporates compression by vowel ommission? Brllnt!
Why not ZFS? by mlts · 2008-10-19 16:06 · Score: 5, Interesting

Unless ZFS has patent issues, why not just work on having ZFS as Linux's standard FS, after ext3?
ZFS offers a lot of capabilities, from no need to worry about a LVM layer, to snapshotting, to excellent error detection, even encryption and compression hooks.
1. Re:Why not ZFS? by PhrostyMcByte · 2008-10-19 16:15 · Score: 5, Informative
  
  I am not aware of the differences, but from Theodore Ts'o:
  
  people who really like reiser4 might want to take a look at btrfs; it has a number of the same design ideas that reiser3/4 had --- except (a) the filesystem format has support for some advanced features that are designed to leapfrog ZFS, (b) the maintainer is not a crazy man and works well with other LKML developers (free hint: if your code needs to be reviewed to get in, and reviewers are scarce; don't insult and abuse the volunteer reviewers as Hans did --- Not a good plan!).
2. Re:Why not ZFS? by Anonymous Coward · 2008-10-19 16:18 · Score: 5, Informative
  
  The ZFS developers specifically wanted the open sourced code to be under a GPL incompatible license, hence it has been released under CDDL (there was a interview with the Sun open source rep, can someone provide info/links about this). So ZFS cannot be part of the kernel, but there is a FUSE port of ZFS and according to http://en.wikipedia.org/wiki/ZFS#Linux Sun is investigating a Linux port, so there may be something good coming
3. Re:Why not ZFS? by Wonko · 2008-10-19 16:38 · Score: 5, Informative
  
  ZFS duplicates a lot of functionality that belongs outside of a filesystem. All of the above can already be done using any Linux filesystem, so why keep around a second copy of all that code that implements those features for just a single filesystem?
  It wouldn't be possible to duplicate RAID-Z with LVM. Other features of ZFS are very handy, but RAID-Z is by far my favorite. Same storage density as RAID 5 but without the horrible write performance. RAID-Z uses copy-on-write to avoid RAID 5's required read for every non-cached write.
  Being able to create filesystems just as easily as creating directories is quite handy as well, though. IIRC, the filesystem sizes in ZFS are controlled by a quota style system. That is much simpler than shrinking an LV (if your filesystem supports shrinking), then adding a new LV, and then creating a filesystem. I don't know about you, but I am always a bit nervous when I have to resize an LV.
4. Re:Why not ZFS? by clarkkent09 · 2008-10-19 16:52 · Score: 5, Insightful
  
  (b) the maintainer is not a crazy man and works well with other LKML developers
  
  Also important, he might be more focused due to not being in prison for first degree murder
  
  --
  Negative moral value of force outweighs the positive value of good intentions.
5. Re:Why not ZFS? by GrievousMistake · 2008-10-19 17:49 · Score: 5, Interesting
  
  Huh. One of the interesting things things about Reiser4 from an end-user perspective was Hans Reisers plans for file metadata. From what I can find about btrfs, it currently doesn't even support normal extended attributes. There was also talk about making it easy for developers to extend the filesystem with plugins that could add e.g. compression schemes.
  I can't really recognize anything from Hans Reiser's ramblings in the btrfs documentation that isn't standard file system improvements already seen in e.g. ZFS. does anyone have any specific examples of the ZFS-leapfrogging features referred to?
  
  --
  In a fair world, refrigerators would make electricity.
6. Re:Why not ZFS? by Anonymous Coward · 2008-10-19 18:13 · Score: 5, Funny
  
  Huh. One of the interesting things things about Reiser4 from an end-user perspective was Hans Reisers plans for file metadata.
  No, the most interesting feature of ReiserFS is this one (look to the far right).
  --
  ReiserFS: It puts the "stab" in "/etc/fstab".
7. Re:Why not ZFS? by deniable · 2008-10-19 18:23 · Score: 5, Funny
  
  Yep, BeaTeR FS is a kinder, gentler alternative to Reiser FS.
8. Re:Why not ZFS? by mml · 2008-10-19 18:31 · Score: 5, Informative
  
  > Rather, GPL is incompatible with anything else that can't be re-licensed as GPL, and
  > that includes GPL v2 and v3, which can't even be mixed among themselves.
  Saying that GPLv2 and GPLv3 "can't even be mixed among themselves" is wrong and
  misleading.
  Section 14 of GPLv2 specifically deals with the problem of later versions of the
  licence and sets out the options. A copyright holder can choose to allow work to be used
  with later versions, such as GPLv3, or can choose not to. There are also more
  complex options. The licence itself doesn't force the choice one way or the other.
  Matt
9. Re:Why not ZFS? by adrianwn · 2008-10-19 18:53 · Score: 5, Interesting
  
  A microkernel loads modules into the kernel space.
  No, that's the opposite of a microkernel. A microkernel loads its modules (then often called "servers") into user space. If the kernel and its drivers etc. run in the same address space (as is the case with, e.g., Linux), then we're talking about a monolithic kernel, even if it can dynamically load modules.
I can't believe... by arrenlex · 2008-10-19 16:21 · Score: 5, Funny

Butter FS? Are you kidding me?
Here is your first official list of jokes. Please contribute.
1. You're still running ext4? I can't believe it's not ButterFS!
2. But will it run on toast?
3. Will fsck be renamed to butterknife?
4. If your system overheats will your filesystem melt?
5. If you use ButterFS too much, will it turn into FAT?
6. If you leave ButterFS on your volume too long, will your hard drive start to reek?
7. Will the next version of ButterFS be called GoatButterFS, just like the next version of Leopard is Snow Leopard?
8. "Tough" notebooks will never have their hard drives formatted with ButterFS, because if you dropped them, they would always land hard drive down.
9. When you submit your dead ButterFS hard drive to a data recovery centre, will they have an intern lick it to get the data off instead of putting it under a read head?
These are getting kind of desperate -- your turn now.
Honestly, what is it with FOSS and crappy names? (looking at you, gimp)
1. Re:I can't believe... by Anonymous Coward · 2008-10-19 17:26 · Score: 5, Funny
  These are getting kind of desperate -- your turn now.
  Yeah, you're spreading yourself a bit thin.
  I hear some of the features in btrfs have been refined from ext3cow.
  I touch'd a file on a btrfs disk, and now it's sticky!
  I hear the standard block size of btrfs is 8 oz.
  How can I make a business case for btrfs? I'm all for introducing new tech, but my boss only cares about how it will affect our margarins.
  Will btrfs keep my servers from grinding? I'm a bit worried that if they churn too much, my files will separate!
  And most importantly, In an emergency, can I use btrfs for a smoother fsck?
Whoa! by aevans · 2008-10-19 16:33 · Score: 5, Funny

A Linux article on Slashdot!?
Re:BTRFS? REALLY? by initialE · 2008-10-19 16:38 · Score: 5, Insightful

Why not? It's a good analogy for FOSS after all. Great software, robust and all, but her face...

--
Starbucks, Harbuckle of Breath.
Re:BTRFS? REALLY? by hampton · 2008-10-19 16:47 · Score: 5, Funny

You're right. BTRFS is really silly. I recommend that the shortened form be ButtFS.
Re:BTRFS? REALLY? by blahplusplus · 2008-10-19 16:56 · Score: 5, Insightful

"Couldn't they come up with a better name than "BuTteR FaSe?" I know I can't be the only one who read it like that. Call it anything but that."
I read it as:
BeTteR FileSystem
I guess we'll have to part was :P
Re:BTRFS? REALLY? by spazdor · 2008-10-19 16:56 · Score: 5, Funny

Good, strong file-bearing hips!

--
DRM: Terminator crops for your mind!
You're both right. by SanityInAnarchy · 2008-10-19 17:30 · Score: 5, Interesting

ZFS duplicates a lot of functionality that belongs outside of a filesystem.
Very true.

It wouldn't be possible to duplicate RAID-Z with LVM.
Also true.
And the features which could be duplicated, couldn't be done nearly as well without a little more knowledge of the filesystem.
The real problem here is that we're finding out that generic block devices aren't enough to do everything we want to do outside the filesystem itself. Or, if they are, it's incredibly clumsy. Trivial example: If I want a copy-on-write snapshot, I have to set aside (ahead of time) some fixed amount of space that it can expand into. If I guess high, I waste space. If I guess low, I have to either expand it (somehow, if that's even possible) or lose my snapshot.
A filesystem which natively implemented COW could also trivially implement snapshots which take up exactly as much space as there are differences between the increments. But because of the way the Linux VFS is structured, this kind of functionality would have to be in a single filesystem, and would be duplicated across all filesystems. Best case, it'd be like ext3's JBD, as a kind of shared library.
A humble proposal: We need another layer, between the block layer and the filesystem layer -- call it an extent layer -- which is simply concerned with allocating some amount of space, and (perhaps) assigning it a unique ID. Filesystems could sit above this layer and implement whatever crazy optimizations or semantics they want -- linear vs btree vs whatever for directories, POSIX vs SQL, whatever.
The extent layer itself would only be concerned with allocating extents of some requested size, and actually storing the data. But this would be enough information to effectively handle mirroring, striping, snapshotting, copy-on-write, etc.
It wouldn't be universal -- I've said nothing about the on-disk format, and, indeed, some filesystems exist on Linux solely for that purpose -- vfat, ntfs, udf, etc. Those filesystems could be done pretty much exactly the way they're done now. After all, the existence of a block layer in no way implies that every filesystem must be tied to a block device (see proc, sys, fuse, etc.)
But I think it would work very well for filesystems which did choose to implement it. I think it would provide the best of ZFS and LVM.
I haven't actually been seriously following filesystem development for years, so maybe this is already done. Or maybe it's a bad idea. If not, hopefully some kernel developers are reading this.

--
Don't thank God, thank a doctor!
Re:BTRFS? REALLY? by Ragzouken · 2008-10-19 20:34 · Score: 5, Funny

This is the internet, it's never too soon.
Re:when ext4 is feature complete it will be the #3 by Jah-Wren+Ryel · 2008-10-19 22:09 · Score: 5, Interesting

The weakness with linux is in the LVM or EVMS layer. They both suck in that they are not enterprise ready (ie multi TB filesystems, 100+ MB/s sustained read/write) in that they cause unexplained IO hicups, lockups and kernel panics. LVM/EVMS certainly work fine for Joe Blow's HTPC, or a paltry 100GB database but they fall down when under serious load.
LVM has been rock-solid for me with a ~7TB and 2 2TB ext3 filesystems (24 500GB disks) over the course of a year and a half. No problems migrating extents all over the place when I needed to swap disks in and out. Almost identical to HPUX in functionality, but without the sizing constraints.
But, when I tried xfs for kicks I found out that a 7TB filesystem means you need 7GB of RAM to fsck it - impossible on a 32-bit system, I also had a week where I it all went in the shitter because I ran free-space to zero and started getting OS panics and data corruption.
I'm definitely considering jfs for the next generation, my main complaint with ext3 has been ridiculously slow deletes and fsck's. Problems I have read don't exist with jfs.

--
When information is power, privacy is freedom.
Re:Back when there was only fat16, ntfs, ext2 used by vadim_t · 2008-10-19 22:37 · Score: 5, Informative

I hope you're joking.
ext2 is nice and simple, but it's neither fast not reliable. It uses a linear search to find directory entries, which means it's very slow on large directories, like Maildir mailboxes. It doesn't do tail packing which means it wastes space and is slower with small files. It's not reliable because without a journal it needs a fsck after a bad shutdown which takes ages on a modern disk, and recovers it worse than a journal would.
Just search for benchmarks, something like reiserfs beats ext2 by huge margins when it comes to important workloads such as a mail server.
There are very good reasons why distributions generally go with ext3, or one of the other filesystems. I haven't seen ext2 as the default option for the root FS in a very long time.
buttfsck!! by Zaiff+Urgulbunger · 2008-10-20 00:22 · Score: 5, Funny

You think that's bad? The file system check command is buttfsck!
Re:Back when there was only fat16, ntfs, ext2 used by illumin8 · 2008-10-20 02:30 · Score: 5, Insightful

I used to do that, and then I got a UPS instead and switched back to pure ext2. The performance hit from journalling is simply too high to tolerate. A decent UPS (pretty much anything made by APC) will prevent the crashes in the first place, solving the problem completely and without any unnecessary overhead. With UPS prices being as low as they are, there is no excuse for not having one, so I think that journalling will become obsolete in some near future.
Yeah, because systems never kernel panic, or crash for any other reason than power outages... Wake me up after you've been waiting for fsck to finish on your 1TB drive and it's been running for the last 72 hours.
Whether or not you've had a system shutdown uncleanly in the past, you certainly will at some time in the future, so why not just use ext3 and save yourself the headache of a 3 day long fsck?
It's also painfully obvious that you've never worked as a sysadmin before. You try explaining to your manager that the reason why your company's server will take 3 days to come back online is that you wanted to save a few microseconds of latency when users were accessing files...

--
"When the president does it, that means it's not illegal." - Richard M. Nixon
All hardware can fail, including UPSes. by Medievalist · 2008-10-20 03:00 · Score: 5, Insightful

I used to do that, and then I got a UPS instead and switched back to pure ext2. The performance hit from journalling is simply too high to tolerate. A decent UPS (pretty much anything made by APC) will prevent the crashes in the first place, solving the problem completely and without any unnecessary overhead. With UPS prices being as low as they are, there is no excuse for not having one, so I think that journalling will become obsolete in some near future.
Our industrial UPS (which is orders of magnitude more reliable than any APC product ever made) recently exploded, burnt, and shorted out the entire building's power. It spiked thousands of volts through the protected equipment and destroyed a half-dozen servers. The fire was fierce enough to cause our fm200 system (halon equivalent) to dump, which put out the fire before the main battery bank was breached.
This was the first time I've ever seen an UPS bigger than a Chrysler fail, but I've seen dozens of failures from those crappy little APC units. At one time I had a stack of burnt-out ones in my basement (I used to salvage the batteries for cash).
If your disaster survivability plan depends on any single piece of hardware never failing, it's no good. Offsite backup is your friend.