Ext3 Filesystem Explained

Distro battles? Nah. Journaling fs battles! by Deal-a-Neil · 2001-11-16 20:29 · Score: 5, Informative

ext3 catches my fancy because there's no ext2 --> ext3 conversion -- you just have to unmount, make a journal file, and remount. reiserfs migration is a challenge for the huge partitions.

ext3 by FeeDBaCK · 2001-11-16 20:31 · Score: 4, Insightful

One thing I would have to agree on in the usage of ext3 is the fact that the machine can be booted with a kernel that does not understand ext3 (only ext2) and the filesystem can still be read. This is a major strong-point in my book.

--
wolf31o2 Developer, Gentoo Linux Games Team

Re:Distro battles? Nah. Journaling fs battles! by ThatComputerGuy · 2001-11-16 20:38 · Score: 3, Informative

And because there's only a journal as an addition, you can remount as ext2 after a clean unmount and everything will still work fine.

--
XML is like violence. If it doesn't solve the problem, use more.

Excellent engineering by ppetru · 2001-11-16 22:12 · Score: 5, Insightful

The very existence of ext3, and it's complete forward and backward compatibility with ext2, shows that ext2 was extremely well designed by it's authors. Kudos to Remy Card, Ted Tso, and the rest of the ext2 team!

Also, based on the same extensibility of ext2, Daniel Phillips is working on a directory indexing patch which speeds up ext2 by a huge factor when working with lots of files in a directory. You can get the preliminary patches here and see a graph of a simple file creation benchmark here. Amazing!

--

Petru

Re:Partition resizing? by Sapien__ · 2001-11-16 22:45 · Score: 5, Informative

This thread might be useful.

To summarize: yes, it's possible to resize ext3 partitions, so long as your resizer doesn't mind. Don't use Partition Magic to do it. It doesn't like it. Badly.

Re:The journalling filesystem myth by mj6798 · 2001-11-16 22:53 · Score: 3, Insightful

Journaling insures filesystem integrity, which is very important. Mounting an unclean ext3 fs will take seconds - no need to check the filesystem for mid-write evidence, etc.

Let's say the journaling file system has 5% overhead (it probably has more). That means you lose more than 1h per day on a busy server--it's spread out, but it's still lost. You'd have to do a lot of rebooting in order to make up for that in terms of "saved" fsck time.

Re:The journalling filesystem myth by cowbutt · 2001-11-16 23:56 · Score: 5, Informative

Let's say the journaling file system has 5% overhead (it probably has more). That means you lose more than 1h per day on a busy server--it's spread out, but it's still lost. You'd have to do a lot of rebooting in order to make up for that in terms of "saved" fsck time.

Actually, Andrew Morton reckons ext3 is actually quicker than ext2 in spite of the journalling. Go figure. :)

--

Re:The journalling filesystem myth by edhall · 2001-11-17 00:13 · Score: 5, Insightful

A few points:

You can't equate down-time to a slightly slower response time. Having a reboot time of tens of seconds vs. tens of minutes for (e.g.) a large source repository or a critical web server is well worth a minor performance hit. Reboot time is dead time for all who need access to the server.
If your file server is running so close to capacity that a 5% decrease in maximum filesystem throughput represents a 5% slowdown in actual throughput, your server is dangerously overloaded already.
In general, journaling affects write performance, not read performance. If your server performs mostly reads, the overall overhead of journaling may amount to much less than your 5% figure. Most (though not all) applications for file servers are read-intensive with incidental writes apart from the initial "load" of the server.
Fast fsck's aren't the main reason for journaled filesystems. Rather, its the improvement in filesystem integrity that is the main attraction -- an improvement that incidently allows for fast fsck's.

-Ed

Re:The journalling filesystem myth by be-fan · 2001-11-17 03:08 · Score: 3, Informative

Actually, the new journaling filesystems (ReiserFS, XFS, and JFS) are all *faster* than ext2. Also, journaling itself can cost very little these days because modern JFSs use large buffers and coalesce writes. For example, BFS achieves metadata performance nearly as high as ext2 on a heavily loaded system. So if all you're doing all day is creating/deleting/growing/shrinking files, the filesystem is only slightly slower. When you factor in all the performance improvements, it end up being faster.

--
A deep unwavering belief is a sure sign you're missing something...

Re:Still same old 2GB limit? by LunaticLeo · 2001-11-17 04:55 · Score: 3, Interesting

ext2 doesn't have a 2GB file size limit. That was a operating system limit which went away somewhere in the middle 2.2.x stable series.

Further, ext3 is not the-next-version-of-EXT. It is an extention of ext2 which is fully compatible with ext2. Think of ext2 as two things: the format of bits on the disk, and the code to read/write those bits. Ext3 keeps the same format (actually with compatable extentions), but mostly it changes the code for reading/writing to the disk (journelling).

The ext2 filesystem is tried and true. You can go back and forth between ext2 and ext3 with no reformating or issueing of commands other than the mount command.
ReiserFS is a more "sophisticated" filesystem than ext[23], and XFS is a more "sophisticated" filesystem than ReiserFS. But I keep "sophisticated" in quotes because the utility, reliability, and speed of a FS relies more on your usage patterns, than on the genius of the filesystem designers/coders.

FFS-style: ext2,ufsFFS+journel: ext3, ufs+
B+tree directories, B+tree block layout, Journelling: ReiserFS
B+tree directories, B+tree block layout, extents, Journelling : XFS, JFS
Loggin FS: VxFS (my favorite)

I use ext3 at home. Good speed, no need to tar up all my files..reformat drives..untar all my data, journelling, mainline kernel support, tried and true.

One place I would seriously consider ReiserFS is for home directories. The place it really shines is constantly reading and writing lots of "small" files (small ~50k). For Gnome and KDE config files, Mozilla disk caches, CVS checkouts, and untaring of source, ReiserFS is going to be a leader of the benchmarking pack. You'll notice the difference.

But don't get into holy wars over FS, and don't think that Linux is whole generations behind Commercial Unixen. Linux Kernel is dramaticallly ahead in some areas and minorly behind in others. The only place it is dramatically behind is places where the computer you are running the OS on cost more than a half million dollars.

--
-- I am not a fanatic, I am a true believer.

snapshots by Anonymous Coward · 2001-11-17 06:23 · Score: 3, Interesting

Not to "troll" for my fav OS or whatever, but I've been playing with snapshots in FreeBSD-CURRENT for the last few days, and I must say that this is quite possibly the coolest filesystem technology I have ever seen.

In short, a snapshot is approximately equal to an image of a filesystem. To create a snapshot, you run a mount command like "-u -o snapshot /var/snapshots/snap1 /var". Becase of the way snapshots work, the snapshot must reside in the same filesystem that it contains.

Now, once the snapshot is created, it can be treated like another filesystem. You can run fsck on it, dump it, or even mount it. The only difference is that within the snapshot, previous snapshots will appear as null files.

Basically, when you create a snapshot, you tell the filesystem that you want it's contents at the current time preserved, and the snapshot file is where it does this. Now, whenever said filesystem is modified, the modification is basically applied in reverse to extant snapshots. So, when a snapshot is first taken, it doesn't contain much information at first, but when you rm a file living in the directory, the file is saved into the snapshot. When you modify a file, deltas to reverse the change are saved to the snapshot.

This is extremely powerful used in the hands of a good sysadmin. Imagine your server that is backed up to tape every week. When someone comes asking for a file they clobbered or deleted by accident, you say "how old was the file?" - you know if they say "8 days", you have to go restore from tape, and if they say "2 days", you have to tell them that they are out of luck. Now imagine if a cron job was set up to take a snapshot once a day, and clear out old ones once a week. If they say "8 days", you still have to go fetch the tapes, but if they say "2 days", all you need is some mdconfig, mount, cp, and umount action to restore the file. How cool is that?

Snapshots essentially give your filesystems the "undo" capabilities that your editor has.

Re:Performance by rodgerd · 2001-11-17 08:52 · Score: 3, Informative

IIRC RH7.2 installs ext3 with both data and metadata logging enabled by default, so your performance change is most likely that you're doing two writes for every one you did before.

You are missing the point by sigwinch · 2001-11-17 14:30 · Score: 4, Insightful

However, if you want reliability and avoid downtime, you must have redundant servers or replication; journaling will not protect against most of the problems that cause downtime.

Here in the real world we cannot afford triple redundant drives, motherboards, RAM, CPUs, power supplies, keyboards, mice, monitors, NICs, routers, and network cables for every single computer on every desktop in the entire organization. Sure, we could do it, but the cost would be ludicrous for a very small payback.

Most computers simply don't need guaranteed zero downtime. What they need is bounded downtime. It's OK if they crash every once in a while, as long as they reboot cleanly within a few minutes. The biggest contributor to boot time after a crash is the file system check. Since a journalling file system can recover the file system within a few minutes, it is a huge win.

Relying on it for "filesystem integrity" or "reduced downtime" or "reliability" is foolish.

Here in the real world, even the big real-time transaction processing systems occassionally have common-mode failures that wipe out all the redundant subsystems at the same time. Lightning strikes, idiots frob the emergency power switch, etc. Thus, the big real-time systems need journalling even more desparately than the small systems.

You pay for fast reboots in slower performance and more complex file system code.

Sheer ignorance. Replication of filesystems and databases has at least as much of a performance hit as journalling, and the complexity is likely to be vastly higher.

--

--
Kuro5hin.org: where the good times never end. ;-)

13 of 174 comments (clear)