Ext3 Filesystem Explained
sheckard writes: "The next installment of the wonderful Advanced filesystem implementor's guide, part 7, details the ext3 filesystem in all of its glory. This is another great voyage into the world of journaling filesystems, and ext3 has been rock-solid in my experience."
Journaling file systems / ext3 -- is this the best? Or should we be looking in another direction entirely?
"I think there is a world market for, maybe, five computers." __ IBM Chairman, 1943 __
ext3 catches my fancy because there's no ext2 --> ext3 conversion -- you just have to unmount, make a journal file, and remount. reiserfs migration is a challenge for the huge partitions.
One thing I would have to agree on in the usage of ext3 is the fact that the machine can be booted with a kernel that does not understand ext3 (only ext2) and the filesystem can still be read. This is a major strong-point in my book.
wolf31o2 Developer, Gentoo Linux Games Team
It isn't inherently better. It does have benefits of being more customisable, and a non-windows policy means you have a little choice, rather than piling more money into the pockets of a company as disreputable as MS.
And because there's only a journal as an addition, you can remount as ext2 after a clean unmount and everything will still work fine.
XML is like violence. If it doesn't solve the problem, use more.
Power supply dies. Power goes out and UPS dies after 30 minutes. Playing shuffle-the-cables at the co-lo facility and you mistakingly unplug the NAS unit. There are still a few non-Microsoft OS related catastrophes that exist, believe it or not. By the way, that last scenario was completely hypothetical. [whistling/twiddling thumbs]
Actually... You don't even have to unmount to create the journal... just to actually *use* the journaling.
wolf31o2 Developer, Gentoo Linux Games Team
"ext3 catches my fancy because there's no ext2 --> ext3 conversion "
In addition, you can actually read ext3 from a kernel then only supports ext2. Only catch is that the partition has to be cleanly unmounted for this to work. This is a "Really Good Thing (TM)", because then you can to boot from an old bootdisk and still access your files, or if you are running multiple distributions.
On my new machine I installed linux as my primary os, expecting soon get tired of it (again) and reconfigure a dual boot system windows as my primary OS. While installing linux, I didn't think much(since I would soon be destroying the partition anyway) and installed the system on reiserfs. To my surprise that didn't happen and unreliability of reiserfs started to bother me more and more. And with this article I'm convinced that ext3 is what I want. Now, how do I convert from ReiserFS to ext3? I have plent of empty space on a soon to be destroyed ntfs partition and a cd writer, so backing up existing data is no problem, but simply copying back files will not do the trick, right?
Gentlemen, you can't fight in here, this is the War Room!
I know RH7.2 has Ext3 support. Right from the setup you too, very sweet.
Anyone knows is there a working ext3 patch for 2.2.20? Because ext3 home for 2.2's (ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/) doesn't seems to be updated since June.
Is EXT3 in Linus' tree yet? Other thing I'm wondering is if it's worth moving to the Alan Cox tree to get it?
Hey, if Best Buy sells you hardware that's broken, just take it back to the store, and get it exchanged or refunded. Easy, isn't it? And if they won't stand up for their products, well, then you know where you won't be shopping the next time around.
Ext3 is quite nice, I've moved to it myself
because the root partition until recently
had to be ext2. I still can't help feeling
that ext3 is slower though compared to Reiser.
Anyone else got comments on performance?
Although I did enjoy the paragraph on filesystem journaling -- After pulling my one of my [gasp] Win2000 servers offline the other night to do a defrag, I could appreciate the fact that a developer could tweak Ext3 to do some neat things. (ahh, for linux, at least) Like when I save and resave files on a test server, the journaling approach could be made more efficient by only saving the changed data! (not the whole freakin fragmented file)
Now the question could be -- Is there someone who will step up to the plate and produce several custom filesystems. The article points out that there is no "best" file system, but given the options, I'm sure the teckie endusers could tweak settings to meet their needs, be it server or desktop.
Newt-dog
My Doctor prescribed daily nasal saline irrigation, hehe
I've tried both, but when I played with XFS it took forever to rm -rf a 2 GB directory. Was it me, or is XFS extremely slow with removing lots of files? If so, is this because if takes forever to update the journal?
I hope that joe public will eventually realize journalling filesystems don't guarantee data integrity in the event of an unclean system shutdown.
I've converted over to ext3fs, and am curious about one thing: resizing the ext3fs partitions. I know Partition Magic can resize ext2fs partitions with no difficulty, and Linux won't miss a beat. If the file systems are cleanly unmounted, as during a shutdown, and the ext3fs partitions are resized using Partition Magic, will there be problems? Is there anything in the journal that would make the kernel panic and puke on the newly changed partitions? I have no plans to do this; I'm just curious what would happen if I did.
"Anyone that has ever gotten an idea based on any of my work and done something better with it-good for you."--J.Carmack
Just had my first power-loss since switching to ext3 last night. Normally would take 10-15 minutes for my computer to restart after checking /home, etc. But today came up in just a couple of minutes with no corruption (or none I have noticed, or has been reported). So ext3 gets my thumbs-up!
The very existence of ext3, and it's complete forward and backward compatibility with ext2, shows that ext2 was extremely well designed by it's authors. Kudos to Remy Card, Ted Tso, and the rest of the ext2 team!
Also, based on the same extensibility of ext2, Daniel Phillips is working on a directory indexing patch which speeds up ext2 by a huge factor when working with lots of files in a directory. You can get the preliminary patches here and see a graph of a simple file creation benchmark here. Amazing!
Petru
If your root partition is formatted as ReiserFS, you're pretty much limited. Try to make a partition big enough on your free space, and make an ext[2-3] there. Then copy everything that is on the root partition to the new ext* one (use "cp -pR" to preserve permissions). Try to reboot the system, passing 'root=/dev/hd??' to the kernel, being ?? the new ext partition. If everything boots fine, you're on your way. If not, you won't lose anything on your old ReiserFS root; just reboot as usual.
Never had any hardware that was easier to install in Windows than Linux. Windows keeps autodetecting it, insisting on the Windows disc, installing the wrong driver, and then rebooting. Then I have to go through this again with the right driver. Installing a TNT2 took 5 reboots. Under Linux I just had to run XF86config. 3D required that I reboted and added a line to the config file, but that was easy enough. BeOS just worked out of the box.
A good UPS should let the computer know when it is running low, and at the very least remount alll the disks read only. Mistakenly unplugging the cables at a co-lo isn't going to happen to moist systems, and so is not really a reliable way to determine the reliability of a journalled file system.
I booted and my TNT2 worked. Haven't rebooted since (because of that) in winXP
Do you still need the every 20th mount fsck???
- Pimp
I like computers, women and computers... in that order...
Explanation : I am of two minds abouth everithing, so I can never decide how to organize my files, i.e. for category (like executables, libraries, html, music ) or for products (like gnome, kde, MyPerferredApp ). So I want to do both.
I want a filesystem in which you can define directories by query of file attributes : e.g. :
mkdir ~/gnome_bin --query -type=executable -package=gnome
And then the system keeps update my directory, and I can handle it with standard filesystem tools.
I know that it isn't easy : that is why I'm aksing it as a cristmas gift.
Ciao
----
FB
As simple as your example is, it's perfect. Sums up everything quick and easily.
:)
Now if only other people could understand this epiphany. (or spell it)
extreme, perhaps? extendable? extraneous? extatic?
I just moved over to this Reiserfs a couple of months ago. I like it and all but is ext3 better or faster. Faster is always better.
ACK
Yeah, but reiserfs was stable much sooner, so all my big filesystems are reiser. Now it would be a pain to convert them to ext3.
I'd choose ext3 for new installations however, if only for the purpose of trying it out and comparing to reiserfs.
I don't think that much advertised compatibility is going to count much. The faster/more reliable fs will be the sysadmin choice.
Yes. Have you tried changing your hardware in WinXP? Besides, installing a new OS is not the easiest way for me to get hardware to work.
excuse me but if it takes your servers more than 30 minutes to complete a shutdown then you have problems that are bigger than a filesystem
at my facility which is small and only has 15 servers the proceedure is this.... power drops, UPS's kick in, generator starts.
If generator starts then all is fine.
if generator doesn't start then the UPS's signal to the servers that power is lost and the servers shutdown. everything starts back up when power is restored.
It's happened 3 times without anyone there, and had no problems.... except for the NT machines hanging and one person (me,oops) leaving a NT install cd in a cd drive.
A properly designed backup power system will cause ZERO problems to a computer server system or network.
Oh and if you use one BIg UPS instead of dedicated UPS's for each server then you are asking for trouble. (reminds me of eggs in a basket)
I've seen the data center's 3 million dollar APC ups fail to work 3 times during tests. My APC 2200's never fail me (I replace all batteries every 18 months) so spending an insane amount of money for a power backup solution is not a smart thing to do.
Do not look at laser with remaining good eye.
Actually the "games" issue is going away.
I have 10 major brand games running on linux now, and 5 more under wine in linux.
No effort taken to install them.
as for better, you are mistaken. Linux is free. ZERO cost. I also dont have to agree to leagal bullcrap or am trapped to complying with M'S wishes. If I have a friend that wants my OS I just burn him/her a copy and legally give it to them.
It's the legal nightmare and Microsoft's dirty tricks that make linux better. Microsoft's lawyers are the best thing to happen to linux.
Their greed and stupidity digs the hole faster and faster for microsoft.
MS could overtake everyone instantly with one simple move. Non commercial use of their os is free. but that will never happen..
Do not look at laser with remaining good eye.
I am setting up a lab of 30 machines for internet surfing at my school. They are p200's with 32megs of Ram. I decided to go with XFS basically because I know SGI has been using it for a long time, and therefore, most of the bugs are probably worked out of it. For the Lab, I am using Redhat 7.1 XFS with IceWM as the window manager. The system boots, runs an autologin script I made, and goes into IceWM with Netscape.
I was using Blackbox, but I decided not to, because I didn't look "Windowish" enough, and I didn't want people confused by it. IceWM looks great, runs fast, and has a little Penguin for the start button. It took me about a month now to get all the net cards in the 30 computers (along with other stuff) and now all I have to do it haul them over to the middle school, and ghost them with the image I have on CD.
I am very happy, because I have been working the bugs out of this project since August, and am almost done. Next Wensday I hope to have all the machines done with. Then I get to find out how easily kids can trash linux. But, I didn't secure it that much, because I feel as if they want to mess it up, all they would have to do is boot with a floppy and nuke the partitions. And it only takes a few minutes to re-ghost them. The 486 lab they have now has been surviving for 5 years now with no reinstall, so I think I'm safe.
Does anyone have any comments that would help me out?
has anyone done any benchmarks comparing ext2 and ext3 to reiserfs? i know the article mentions differences in performance, but i wanna see graphs and pictures :)
Blah.
Yet another Red Hat revolutionary product that the rest of the distributions promptly ignore. And with good cause.
This talk of ext3 being faster than Reiser or XFS is crap. It's not faster, and on IDE hardware the journaling capabilities are offset by the way the IDE drives work. Ext3 is the weaker of the bunch on IDE hardware, to the point that you might as well not even use it. It seems the point of ext3 is to eliminate the need of fsck and not the benefits that can be had with journaling (as in XFS's xfsdump and xfsrestore).
If you want a good journaling filesystem, use Reiser or XFS on FAST drive hardware. If you're not up to making the investment in SCSI or ATA 100 drives and insist upon running XFS or Rieser on your 5200 rpm 10 gig IDE drive, of *course* it'll be slow.
I've been using all three filesystems on various machines at work (XFS being on SGIs), and I have to say that ReiserFS seems to be much faster than Ext3, Ext3 is much easier to upgrade from Ext2 (very convenient), and XFS is just plain powerful. I can't compare XFS under Linux to the other Linux journaling filesystems, but I'm getting ready to see what it's like.
If XFS for Linux is anything close to the SGI version, XFS is going to beat the socks off of both Ext3 and ReiserFS.
Actually, you can use it, with ext2 *and* ext3. The ACL group implemented ACLs as extended attributes, that can also be used for metadata (icons, mime types, whatever):
Check out the ACL guys homepage for more details.
Is it just me, or does ext3 sound like FAT16 > FAT32 and VFS, in that it's for all the little nancy boys who are too chicken/lazy to upgrade to a much better filesystem (and OS, while they're at it)?
Not that the work done by the ext2/ext3 people isn't excellent, it's just that time is coming for extX to move on (be incompatible), or move aside.
Opportunity knocks. Karma hunts you down.
I now wish I had waited for your reply, instead I did what AC said, tar everything, mkfs and untar. For some unclear reason, lilo refused to understand that I was trying to install it on /dev/hda5 (/) instead of /dev/hda1 (backup.) But your suggestion does not seem to be working around this problem either, how could I move boot stuff to ext3 and destroy reiserfs after transition?
Gentlemen, you can't fight in here, this is the War Room!
I've been running ext3 fine for a few weeks now on my home box and my linux workstation at work. On Monday I decided to update our cvs server to kernel 2.2.20 (from .19) and ext3 and the next morning it
was down big time. Reading logs, I could see that
something had gone wrong during the big backup
cronjob after 6am. It creates a 150-meg tmp tarball of our cvs repository for replication and it had only managed to do the first 4 megabytes. I also had a few "hda: lost interrupt" entries in the syslog, right during the time the backup process had halted. The disk was sloppy and not responding much, so it might be some h/w failure as well. I booted, the ext3 replayed the journal and everything seemed fine until I found some weird files with mysterious access bits set in some directories. I couldn't delete or move them. Also some files had disappeared and some others corrupted, AFAIK. I took the system down to runlevel one, remounted partitions read-only and run fsck.ext2 on them. It reported hundreds if not thousands blocks belonging to more than one inode.
This may just be some weird hardware failure but it just sounds too coincidental. The box has been rock stable for at least a year in its current h/w setup. I've been testing 2.2.20's fine on many machines before, both with ext3 and ext2. Now that I restored the old system from backups it's running on 2.2.19+ext2 again quite happily.
I'd like to know if anyone else has had problems that may be related to ext3? I'm still running it on my personal boxen but it seems that our servers won't be seeing this new filesystem at least until it appears on Debian Potato, included in the standard 2.2.x kernel release. If ever.
ext2 doesn't have a 2GB file size limit. That was a operating system limit which went away somewhere in the middle 2.2.x stable series.
Further, ext3 is not the-next-version-of-EXT. It is an extention of ext2 which is fully compatible with ext2. Think of ext2 as two things: the format of bits on the disk, and the code to read/write those bits. Ext3 keeps the same format (actually with compatable extentions), but mostly it changes the code for reading/writing to the disk (journelling).
The ext2 filesystem is tried and true. You can go back and forth between ext2 and ext3 with no reformating or issueing of commands other than the mount command.
ReiserFS is a more "sophisticated" filesystem than ext[23], and XFS is a more "sophisticated" filesystem than ReiserFS. But I keep "sophisticated" in quotes because the utility, reliability, and speed of a FS relies more on your usage patterns, than on the genius of the filesystem designers/coders.
FFS-style: ext2,ufsFFS+journel: ext3, ufs+
B+tree directories, B+tree block layout, Journelling: ReiserFS
B+tree directories, B+tree block layout, extents, Journelling : XFS, JFS
Loggin FS: VxFS (my favorite)
I use ext3 at home. Good speed, no need to tar up all my files..reformat drives..untar all my data, journelling, mainline kernel support, tried and true.
One place I would seriously consider ReiserFS is for home directories. The place it really shines is constantly reading and writing lots of "small" files (small ~50k). For Gnome and KDE config files, Mozilla disk caches, CVS checkouts, and untaring of source, ReiserFS is going to be a leader of the benchmarking pack. You'll notice the difference.
But don't get into holy wars over FS, and don't think that Linux is whole generations behind Commercial Unixen. Linux Kernel is dramaticallly ahead in some areas and minorly behind in others. The only place it is dramatically behind is places where the computer you are running the OS on cost more than a half million dollars.
-- I am not a fanatic, I am a true believer.
I run a large lab and use IBM's LCCM Package to install OS images to 60 client systems that I use for benchmarks.
LCCM does not support installing Linux like it does Windows OS's.
I attempted to use the latest Norton Ghost, and it will only allow ext2 filesystems to be created.
Anyone out there used IBM's LCCM to install ext3 filesystems? Or have a good process for making an image of an already installed system for mass installs?
Not to "troll" for my fav OS or whatever, but I've been playing with snapshots in FreeBSD-CURRENT for the last few days, and I must say that this is quite possibly the coolest filesystem technology I have ever seen.
/var/snapshots/snap1 /var". Becase of the way snapshots work, the snapshot must reside in the same filesystem that it contains.
In short, a snapshot is approximately equal to an image of a filesystem. To create a snapshot, you run a mount command like "-u -o snapshot
Now, once the snapshot is created, it can be treated like another filesystem. You can run fsck on it, dump it, or even mount it. The only difference is that within the snapshot, previous snapshots will appear as null files.
Basically, when you create a snapshot, you tell the filesystem that you want it's contents at the current time preserved, and the snapshot file is where it does this. Now, whenever said filesystem is modified, the modification is basically applied in reverse to extant snapshots. So, when a snapshot is first taken, it doesn't contain much information at first, but when you rm a file living in the directory, the file is saved into the snapshot. When you modify a file, deltas to reverse the change are saved to the snapshot.
This is extremely powerful used in the hands of a good sysadmin. Imagine your server that is backed up to tape every week. When someone comes asking for a file they clobbered or deleted by accident, you say "how old was the file?" - you know if they say "8 days", you have to go restore from tape, and if they say "2 days", you have to tell them that they are out of luck. Now imagine if a cron job was set up to take a snapshot once a day, and clear out old ones once a week. If they say "8 days", you still have to go fetch the tapes, but if they say "2 days", all you need is some mdconfig, mount, cp, and umount action to restore the file. How cool is that?
Snapshots essentially give your filesystems the "undo" capabilities that your editor has.
Remember that Namesys is Hans Reiser company, so they like ReiserFS, but I don't think they cheat with the bechmarks.
Cheat, probably not, but accurate to common usage of a filesystem?
Be very careful interpreting those benchmarks, because the ones they consistently list first are the ones with a bunch of files that are 100 bytes in length, which is essentially the only area where Reiserfs really pulls ahead. Reiserfs is essentially tied with ext2 for all reasonably sized files that you would expect to find on a system. (Unless you're dealing with intense processing of millions of 100 byte files) When comparing ReiserFS to XFS and JFS, ReiserFS pulls way ahead for extremely small files, but the other filesystems perform notably better for reasonably sized files (10k) when synchronized.
For practical uses, neither filesystem seems to really pull ahead, so it's worth considering other features when deciding which to use.
kernel 2.4 is not solid enough for me
to even start testing. if it doesn't
run on 2.2 then its not solid enough
for my needs. i don't want to have
to upgrade a kernel every 2-3 months
to fix a critical bug. reiserfs is
decent i use it on a few machines, but
ext2 is still the dominant filesystem
on my ~40 linux servers. most have
2 hours of battery backup and never
crash. so journalling isn't much of
an issue. the last power outage
that lasted more then 30mins that ive
experienced was back in 96 or 97 when
a tree branch broke a line and caused
most of the west coast to go to brownout/blackout state for 3-4 hours.
I've just recompiled my kernel to 2.4.14. So when will they add ext the the regular file systems rather than waiting for the patch to come out? I mean the first time I tried to apply the patch, bad things happened.
I don't like seeing kernel panic messages.
=================
Unix is very user friendly, it's just picky about who its friends are.
It's my understanding that 2gb limit was never ext2 related, and was a limitation of the 2.2 vfs.
And no you can make files as big as you want with both ext2 and ext3 with kernel 2.4.
Ext3 is a decent filesystem that offers solid reliabitily,performance, and feature set. XFS may be a better option in the long run, but right now ext3 is the best filesystem for linux.
Most computers simply don't need guaranteed zero downtime. What they need is bounded downtime. It's OK if they crash every once in a while, as long as they reboot cleanly within a few minutes. The biggest contributor to boot time after a crash is the file system check. Since a journalling file system can recover the file system within a few minutes, it is a huge win.
Here in the real world, even the big real-time transaction processing systems occassionally have common-mode failures that wipe out all the redundant subsystems at the same time. Lightning strikes, idiots frob the emergency power switch, etc. Thus, the big real-time systems need journalling even more desparately than the small systems. Sheer ignorance. Replication of filesystems and databases has at least as much of a performance hit as journalling, and the complexity is likely to be vastly higher.-- ;-)
Kuro5hin.org: where the good times never end.
Good news first: Something similar to what you're describing does exist. :) I'd imagine you could adapt one of these to suit you, at least under the Hurd. I am not a developer so I can only tell you that when people speak in general of porting the Hurd servers over to Linux, there is much groaning and ye olde mile long list 'o reasons why it's not such a great idea. If you're interested in taking a look anyway, feel free to mail me and I'll tell you the rest of what I know: I don't know that porting a single server over would get the same reaction but all the code is GPL'ed with the explicit intention of allowing people to adapt it to their needs. If this is what you need, then be prepared to do a lot of coding, though. They'll certainly wish you well, but I'm afraid no one from the GNU/Hurd project is gonna write it for ya. :)
Bad news: It's for the GNU/Hurd, not Linux.
The servers I think may interest you would be usermux and hostmux. I haven't booted the Hurd in awhile, but as I remember usermux creates a virtual fs that contains usernames. If you cd into their name you end up in their homedir. Hostmux does something similar but links to networking info about a given host, not part of it's filesystem. These vfs's last across reboots in the Hurd by writing information directly to an inode and they're automounted as they're required. All of this is done in ext2fs and I'm told it works in the BSD fs as well.
Now, what this means to you... I'm not quite sure, to be honest.
Lanir
Actually it's more like:
/dev/sda1
/dev/sda1
/dev/sda1 (Just to make sure)
/dev/sda1
/dev/sda1 /whereever
umount
fsck.ext2 -y -f
fsck.ext2 -y -f
tune2fs -j -C 0 -i 0d -c
mount
You have to make sure to check the filesystem before you put the journal on it. Also, set the fsck checktime to 0 so it will never be fsck'ed.
Linux O Muerte!