Slashdot Mirror


Ext3 Filesystem Explained

sheckard writes: "The next installment of the wonderful Advanced filesystem implementor's guide, part 7, details the ext3 filesystem in all of its glory. This is another great voyage into the world of journaling filesystems, and ext3 has been rock-solid in my experience."

40 of 174 comments (clear)

  1. Distro battles? Nah. Journaling fs battles! by Deal-a-Neil · · Score: 5, Informative

    ext3 catches my fancy because there's no ext2 --> ext3 conversion -- you just have to unmount, make a journal file, and remount. reiserfs migration is a challenge for the huge partitions.

  2. ext3 by FeeDBaCK · · Score: 4, Insightful

    One thing I would have to agree on in the usage of ext3 is the fact that the machine can be booted with a kernel that does not understand ext3 (only ext2) and the filesystem can still be read. This is a major strong-point in my book.

    --
    wolf31o2 Developer, Gentoo Linux Games Team
  3. Re:Distro battles? Nah. Journaling fs battles! by ThatComputerGuy · · Score: 3, Informative

    And because there's only a journal as an addition, you can remount as ext2 after a clean unmount and everything will still work fine.

    --
    XML is like violence. If it doesn't solve the problem, use more.
  4. Re:Of course its been solid by Deal-a-Neil · · Score: 2, Funny

    Power supply dies. Power goes out and UPS dies after 30 minutes. Playing shuffle-the-cables at the co-lo facility and you mistakingly unplug the NAS unit. There are still a few non-Microsoft OS related catastrophes that exist, believe it or not. By the way, that last scenario was completely hypothetical. [whistling/twiddling thumbs]

  5. Re:Distro battles? Nah. Journaling fs battles! by FeeDBaCK · · Score: 2

    Actually... You don't even have to unmount to create the journal... just to actually *use* the journaling.

    --
    wolf31o2 Developer, Gentoo Linux Games Team
  6. Re:Distro battles? Nah. Journaling fs battles! by Andreas(R) · · Score: 2, Insightful

    "ext3 catches my fancy because there's no ext2 --> ext3 conversion "

    In addition, you can actually read ext3 from a kernel then only supports ext2. Only catch is that the partition has to be cleanly unmounted for this to work. This is a "Really Good Thing (TM)", because then you can to boot from an old bootdisk and still access your files, or if you are running multiple distributions.

  7. how to convert to ext3? by nusuth · · Score: 2, Interesting

    On my new machine I installed linux as my primary os, expecting soon get tired of it (again) and reconfigure a dual boot system windows as my primary OS. While installing linux, I didn't think much(since I would soon be destroying the partition anyway) and installed the system on reiserfs. To my surprise that didn't happen and unreliability of reiserfs started to bother me more and more. And with this article I'm convinced that ext3 is what I want. Now, how do I convert from ReiserFS to ext3? I have plent of empty space on a soon to be destroyed ntfs partition and a cd writer, so backing up existing data is no problem, but simply copying back files will not do the trick, right?

    --

    Gentlemen, you can't fight in here, this is the War Room!

  8. Good Teckie Post by Newt-dog · · Score: 2, Interesting
    Well another great post that went over my head! :-)

    Although I did enjoy the paragraph on filesystem journaling -- After pulling my one of my [gasp] Win2000 servers offline the other night to do a defrag, I could appreciate the fact that a developer could tweak Ext3 to do some neat things. (ahh, for linux, at least) Like when I save and resave files on a test server, the journaling approach could be made more efficient by only saving the changed data! (not the whole freakin fragmented file)

    Now the question could be -- Is there someone who will step up to the plate and produce several custom filesystems. The article points out that there is no "best" file system, but given the options, I'm sure the teckie endusers could tweak settings to meet their needs, be it server or desktop.

    Newt-dog

  9. Partition resizing? by Bun · · Score: 2, Interesting

    I've converted over to ext3fs, and am curious about one thing: resizing the ext3fs partitions. I know Partition Magic can resize ext2fs partitions with no difficulty, and Linux won't miss a beat. If the file systems are cleanly unmounted, as during a shutdown, and the ext3fs partitions are resized using Partition Magic, will there be problems? Is there anything in the journal that would make the kernel panic and puke on the newly changed partitions? I have no plans to do this; I'm just curious what would happen if I did.

    --
    "Anyone that has ever gotten an idea based on any of my work and done something better with it-good for you."--J.Carmack
    1. Re:Partition resizing? by Sapien__ · · Score: 5, Informative
      This thread might be useful.

      To summarize: yes, it's possible to resize ext3 partitions, so long as your resizer doesn't mind. Don't use Partition Magic to do it. It doesn't like it. Badly.

  10. Re:Performance by Anonymous Coward · · Score: 2, Informative

    I've had an ext3 root partition for over a year - it needed a reboot to change root to ext3 in those days, though. All other partitions were done with a remount.

    Now, any ext2 fs can be turned into ext3 by a single tune2fs command, with no remount and no reboot.

    I'm sure you could come up with several benchmarks which show reiserfs to be faster than ext2/ext3/xfs/whatever, but for most desktops and servers, filesystem performance is not a factor to be overly concerned about - unless you choose something silly like umsdos or NFS over PPP. News servers and high load fileservers are a different matter of course.

  11. Power loss by nick255 · · Score: 2, Informative

    Just had my first power-loss since switching to ext3 last night. Normally would take 10-15 minutes for my computer to restart after checking /home, etc. But today came up in just a couple of minutes with no corruption (or none I have noticed, or has been reported). So ext3 gets my thumbs-up!

    1. Re:Power loss by Znork · · Score: 2

      Yep, I was reinstalling our main fileserver here at home the other week, upgrading to redhat 7.2. Unfortunately the space was a bit cramped and I didnt bother to put the cover back on the computer so the powerbutton ended up resting against my chair. Of course, this resulted in several instant poweroutages as I got up to get coffee, etc. I think I managed to instakill it 6 times total. Not a single problem noted, just fast log replays and up and running again :). Thumbs up for ext3 from me too.

  12. Re:The journalling filesystem myth by Subjective · · Score: 2, Insightful

    Nothing can insure data integrity in case of mid-write shutdown. That's logically obvious

    Journaling insures filesystem integrity, which is very important. Mounting an unclean ext3 fs will take seconds - no need to check the filesystem for mid-write evidence, etc. - the journal says excatly what mid-write problems there are, and wether to delete them or keep them as files.

    If your system crashes in the middle of your work, and your hard drive wasn't physically damaged (it can happen. Use RAID if you're so paranoid), everything but your open files will be normal. Your open files might be 'un-journaled' (new official term? no) back to before you wrote them.

    --
    My other .sig is also this bad
  13. Excellent engineering by ppetru · · Score: 5, Insightful

    The very existence of ext3, and it's complete forward and backward compatibility with ext2, shows that ext2 was extremely well designed by it's authors. Kudos to Remy Card, Ted Tso, and the rest of the ext2 team!

    Also, based on the same extensibility of ext2, Daniel Phillips is working on a directory indexing patch which speeds up ext2 by a huge factor when working with lots of files in a directory. You can get the preliminary patches here and see a graph of a simple file creation benchmark here. Amazing!

    --

    Petru
  14. Re:how to convert to ext3? -- as far as I know by Dante'sPrayer · · Score: 2, Informative

    If your root partition is formatted as ReiserFS, you're pretty much limited. Try to make a partition big enough on your free space, and make an ext[2-3] there. Then copy everything that is on the root partition to the new ext* one (use "cp -pR" to preserve permissions). Try to reboot the system, passing 'root=/dev/hd??' to the kernel, being ?? the new ext partition. If everything boots fine, you're on your way. If not, you won't lose anything on your old ReiserFS root; just reboot as usual.

  15. Re:The journalling filesystem myth by mj6798 · · Score: 3, Insightful
    Journaling insures filesystem integrity, which is very important. Mounting an unclean ext3 fs will take seconds - no need to check the filesystem for mid-write evidence, etc.

    Let's say the journaling file system has 5% overhead (it probably has more). That means you lose more than 1h per day on a busy server--it's spread out, but it's still lost. You'd have to do a lot of rebooting in order to make up for that in terms of "saved" fsck time.

  16. fsck... Is it needed??? by PimpNasty · · Score: 2, Interesting

    Do you still need the every 20th mount fsck???

    --
    - Pimp

    I like computers, women and computers... in that order...
  17. Ok which is best Reiserfs or ext3? by ACK!! · · Score: 2

    I just moved over to this Reiserfs a couple of months ago. I like it and all but is ext3 better or faster. Faster is always better.

    --
    ACK /ak/ interj. 2. [from the comic strip "Bloom County"] An exclamation of surprised disgust, esp. i
  18. Re:The journalling filesystem myth by cowbutt · · Score: 5, Informative
    Let's say the journaling file system has 5% overhead (it probably has more). That means you lose more than 1h per day on a busy server--it's spread out, but it's still lost. You'd have to do a lot of rebooting in order to make up for that in terms of "saved" fsck time.

    Actually, Andrew Morton reckons ext3 is actually quicker than ext2 in spite of the journalling. Go figure. :)

    --

  19. Re:The journalling filesystem myth by edhall · · Score: 5, Insightful

    A few points:

    1. You can't equate down-time to a slightly slower response time. Having a reboot time of tens of seconds vs. tens of minutes for (e.g.) a large source repository or a critical web server is well worth a minor performance hit. Reboot time is dead time for all who need access to the server.
    2. If your file server is running so close to capacity that a 5% decrease in maximum filesystem throughput represents a 5% slowdown in actual throughput, your server is dangerously overloaded already.
    3. In general, journaling affects write performance, not read performance. If your server performs mostly reads, the overall overhead of journaling may amount to much less than your 5% figure. Most (though not all) applications for file servers are read-intensive with incidental writes apart from the initial "load" of the server.
    4. Fast fsck's aren't the main reason for journaled filesystems. Rather, its the improvement in filesystem integrity that is the main attraction -- an improvement that incidently allows for fast fsck's.
    -Ed
  20. Re:Of course its been solid by Lumpy · · Score: 2

    excuse me but if it takes your servers more than 30 minutes to complete a shutdown then you have problems that are bigger than a filesystem

    at my facility which is small and only has 15 servers the proceedure is this.... power drops, UPS's kick in, generator starts.

    If generator starts then all is fine.
    if generator doesn't start then the UPS's signal to the servers that power is lost and the servers shutdown. everything starts back up when power is restored.

    It's happened 3 times without anyone there, and had no problems.... except for the NT machines hanging and one person (me,oops) leaving a NT install cd in a cd drive.

    A properly designed backup power system will cause ZERO problems to a computer server system or network.

    Oh and if you use one BIg UPS instead of dedicated UPS's for each server then you are asking for trouble. (reminds me of eggs in a basket)

    I've seen the data center's 3 million dollar APC ups fail to work 3 times during tests. My APC 2200's never fail me (I replace all batteries every 18 months) so spending an insane amount of money for a power backup solution is not a smart thing to do.

    --
    Do not look at laser with remaining good eye.
  21. Re:My semi-Weekly Drunken Comment... by Lumpy · · Score: 2

    Actually the "games" issue is going away.
    I have 10 major brand games running on linux now, and 5 more under wine in linux.

    No effort taken to install them.

    as for better, you are mistaken. Linux is free. ZERO cost. I also dont have to agree to leagal bullcrap or am trapped to complying with M'S wishes. If I have a friend that wants my OS I just burn him/her a copy and legally give it to them.

    It's the legal nightmare and Microsoft's dirty tricks that make linux better. Microsoft's lawyers are the best thing to happen to linux.
    Their greed and stupidity digs the hole faster and faster for microsoft.

    MS could overtake everyone instantly with one simple move. Non commercial use of their os is free. but that will never happen..

    --
    Do not look at laser with remaining good eye.
  22. Ext3 Is Dead at Birth by Anonymous Coward · · Score: 2, Interesting


    Yet another Red Hat revolutionary product that the rest of the distributions promptly ignore. And with good cause.

    This talk of ext3 being faster than Reiser or XFS is crap. It's not faster, and on IDE hardware the journaling capabilities are offset by the way the IDE drives work. Ext3 is the weaker of the bunch on IDE hardware, to the point that you might as well not even use it. It seems the point of ext3 is to eliminate the need of fsck and not the benefits that can be had with journaling (as in XFS's xfsdump and xfsrestore).

    If you want a good journaling filesystem, use Reiser or XFS on FAST drive hardware. If you're not up to making the investment in SCSI or ATA 100 drives and insist upon running XFS or Rieser on your 5200 rpm 10 gig IDE drive, of *course* it'll be slow.

  23. Re:XFS, Ext3, ReiserFS... by be-fan · · Score: 2

    Its really nifty. Its got attributes and ACLs and you can grow it without even rebooting!

    --
    A deep unwavering belief is a sure sign you're missing something...
  24. Re:The journalling filesystem myth by be-fan · · Score: 3, Informative

    Actually, the new journaling filesystems (ReiserFS, XFS, and JFS) are all *faster* than ext2. Also, journaling itself can cost very little these days because modern JFSs use large buffers and coalesce writes. For example, BFS achieves metadata performance nearly as high as ext2 on a heavily loaded system. So if all you're doing all day is creating/deleting/growing/shrinking files, the filesystem is only slightly slower. When you factor in all the performance improvements, it end up being faster.

    --
    A deep unwavering belief is a sure sign you're missing something...
  25. Re:wow by be-fan · · Score: 2

    Actually, the best filesystem on Linux right now for most uses is probably XFS. Its a little slow on deletes, and not as fast as Reiser for extremely small files, but from the stuff I've done with both (compiling, tar/untar, moving around directories, general workstation stuff) XFS is just as fast as Reiser for normal sized files, and much faster for large files. JFS is the dark horse here, though. I've seen some benchmarks showing it to have as good large file performance as XFS, but much better metadata (creating, deleting, growing, etc) performance. But there's not much info on it yet, and its not rock solid entirely.

    --
    A deep unwavering belief is a sure sign you're missing something...
  26. Re:extensions by Karpe · · Score: 2

    Actually, you can use it, with ext2 *and* ext3. The ACL group implemented ACLs as extended attributes, that can also be used for metadata (icons, mime types, whatever):

    Check out the ACL guys homepage for more details.

  27. Re:ext3 vs xfs by be-fan · · Score: 2, Informative

    XFS historically has very bad delete performance. I don't think its the fault of the journal, since other things involving the journal (growing or creating files) aren't slow (though, ReiserFS does seem to have the best journaling code). I don't know what the official take on this is, but here's my theory. Most filesystems use a bitmap to keep track of free blocks. XFS, on the other hand, uses a pair of B+ trees to mange extents of free areas. This allows it to find better (more contiguous) blocks more quickly when an allocation has to be done. A bitmap, on the other hand, has to do a scan through the bits and can't afford to spend a lot of time looking in different places for the "best" place to allocate. However, when deleting a file, the bitmap approach already has all the addresses of the blocks, so its just a matter of clearing some bits. XFS, on the other hand, has to go ahead an reinsert the blocks back into the B+ tree, which takes many more disk access and much more time. Normally, this is an okay tradeoff, since you usually grow files more often than you delete (ie. you grow it many times while writing it out to 2GB, but delete the thing in one go). On systems like Squid server, on the other hand, you create and delete files like mad, so Reiser is often faster in that case.

    --
    A deep unwavering belief is a sure sign you're missing something...
  28. Re:Performance by Dictator+For+Life · · Score: 2
    I'm sitting at a humble (K6-2 450MHz, 128MB RAM, 8GB HD) box that ran for 230 days on RH7, and I never noticed speed issues with it...until installing RH 7.2 with ext3.

    The box is definitelyslower doing disk access now. It's a little disappointing, but I'm not really concerned: I'd rather have the protection of the journaling and I'm willing to pay with a little speed tradeoff. Eventually I'll get a bigger/newer/faster HD for this thing, and that will probably help as well.

    My perceptions may be skewed; my main box for the last year has been a PIII-750MHz with a 30GB drive, and that bad boy sings, so it may be that I just don't remember precisely what performance is "supposed" to be like on this one.

    --

    DFL

    Never send a human to do a machine's job.

  29. Re:Still same old 2GB limit? by LunaticLeo · · Score: 3, Interesting

    ext2 doesn't have a 2GB file size limit. That was a operating system limit which went away somewhere in the middle 2.2.x stable series.

    Further, ext3 is not the-next-version-of-EXT. It is an extention of ext2 which is fully compatible with ext2. Think of ext2 as two things: the format of bits on the disk, and the code to read/write those bits. Ext3 keeps the same format (actually with compatable extentions), but mostly it changes the code for reading/writing to the disk (journelling).

    The ext2 filesystem is tried and true. You can go back and forth between ext2 and ext3 with no reformating or issueing of commands other than the mount command.
    ReiserFS is a more "sophisticated" filesystem than ext[23], and XFS is a more "sophisticated" filesystem than ReiserFS. But I keep "sophisticated" in quotes because the utility, reliability, and speed of a FS relies more on your usage patterns, than on the genius of the filesystem designers/coders.

    FFS-style: ext2,ufsFFS+journel: ext3, ufs+
    B+tree directories, B+tree block layout, Journelling: ReiserFS
    B+tree directories, B+tree block layout, extents, Journelling : XFS, JFS
    Loggin FS: VxFS (my favorite)

    I use ext3 at home. Good speed, no need to tar up all my files..reformat drives..untar all my data, journelling, mainline kernel support, tried and true.

    One place I would seriously consider ReiserFS is for home directories. The place it really shines is constantly reading and writing lots of "small" files (small ~50k). For Gnome and KDE config files, Mozilla disk caches, CVS checkouts, and untaring of source, ReiserFS is going to be a leader of the benchmarking pack. You'll notice the difference.

    But don't get into holy wars over FS, and don't think that Linux is whole generations behind Commercial Unixen. Linux Kernel is dramaticallly ahead in some areas and minorly behind in others. The only place it is dramatically behind is places where the computer you are running the OS on cost more than a half million dollars.

    --
    -- I am not a fanatic, I am a true believer.
  30. snapshots by Anonymous Coward · · Score: 3, Interesting

    Not to "troll" for my fav OS or whatever, but I've been playing with snapshots in FreeBSD-CURRENT for the last few days, and I must say that this is quite possibly the coolest filesystem technology I have ever seen.

    In short, a snapshot is approximately equal to an image of a filesystem. To create a snapshot, you run a mount command like "-u -o snapshot /var/snapshots/snap1 /var". Becase of the way snapshots work, the snapshot must reside in the same filesystem that it contains.

    Now, once the snapshot is created, it can be treated like another filesystem. You can run fsck on it, dump it, or even mount it. The only difference is that within the snapshot, previous snapshots will appear as null files.

    Basically, when you create a snapshot, you tell the filesystem that you want it's contents at the current time preserved, and the snapshot file is where it does this. Now, whenever said filesystem is modified, the modification is basically applied in reverse to extant snapshots. So, when a snapshot is first taken, it doesn't contain much information at first, but when you rm a file living in the directory, the file is saved into the snapshot. When you modify a file, deltas to reverse the change are saved to the snapshot.

    This is extremely powerful used in the hands of a good sysadmin. Imagine your server that is backed up to tape every week. When someone comes asking for a file they clobbered or deleted by accident, you say "how old was the file?" - you know if they say "8 days", you have to go restore from tape, and if they say "2 days", you have to tell them that they are out of luck. Now imagine if a cron job was set up to take a snapshot once a day, and clear out old ones once a week. If they say "8 days", you still have to go fetch the tapes, but if they say "2 days", all you need is some mdconfig, mount, cp, and umount action to restore the file. How cool is that?

    Snapshots essentially give your filesystems the "undo" capabilities that your editor has.

  31. Careful interpreting! by zCyl · · Score: 2

    Remember that Namesys is Hans Reiser company, so they like ReiserFS, but I don't think they cheat with the bechmarks.

    Cheat, probably not, but accurate to common usage of a filesystem?

    Be very careful interpreting those benchmarks, because the ones they consistently list first are the ones with a bunch of files that are 100 bytes in length, which is essentially the only area where Reiserfs really pulls ahead. Reiserfs is essentially tied with ext2 for all reasonably sized files that you would expect to find on a system. (Unless you're dealing with intense processing of millions of 100 byte files) When comparing ReiserFS to XFS and JFS, ReiserFS pulls way ahead for extremely small files, but the other filesystems perform notably better for reasonably sized files (10k) when synchronized.

    For practical uses, neither filesystem seems to really pull ahead, so it's worth considering other features when deciding which to use.

  32. Re:The journalling filesystem myth by mj6798 · · Score: 2
    Those are all the usual arguments. However, if you want reliability and avoid downtime, you must have redundant servers or replication; journaling will not protect against most of the problems that cause downtime. Once you have redundant servers, you can easily tolerate a little more time for fsck.

    What it comes down to is that journaling is a convenience feature. Relying on it for "filesystem integrity" or "reduced downtime" or "reliability" is foolish. You pay for fast reboots in slower performance and more complex file system code.

  33. Re:The journalling filesystem myth by mj6798 · · Score: 2
    That may well be, but it doesn't really affect the argument. Journaling imposes an additional set of constraints on when and where data needs to be written to disk, and an optimal file system designed under those constraints is going to have more overhead than an optimal file system designed without those constraints. Generally, with journaling, either you write the same data multiple times sooner or later (which, I believe, ext2 does), or you put data in places where it may take longer to get to when reading it.

    The only time where journaling doesn't have any significant overhead is if you put the journal on another device that can operate in parallel.

  34. Re:Performance by rodgerd · · Score: 3, Informative

    IIRC RH7.2 installs ext3 with both data and metadata logging enabled by default, so your performance change is most likely that you're doing two writes for every one you did before.

  35. Re:Performance by msaavedra · · Score: 2

    Actually, RH 7.2 uses the "data=ordered" mode, not the "data=journal" mode. The data is not stored in the journal, but it is written before to changes in the metadata are written (according to the article, that is). This should guarantee data consistency, and is faster than full data and metadata journaling, but still gives a minor performance hit.

    FWIW, I have tried both reiserfs and ext3fs on the same system, and haven't noticed a significant speed difference. Both seemed to work well for me.
    --
    "Any fool can make a rule, and any fool will mind it."
    --Henry David Thoreau
  36. Re:ext3 vs xfs by WNight · · Score: 2

    Sounds like they might benefit from the optimization of not putting newly freed blocks back into the main B+ trees until it's got some dead time...

    If they build another B tree (only trivially balanced) as they delete files they could return control to the system quickly, and then they could pull the free blocks out of the temp tree and spend the time to properly balance the main trees as it builds them.

    In the event that it needs those blocks *now* it could stop and take the time to merge them into the tree immediately.

    The benefit is that it would only have bad performance on very full drives, where it is writing immediately following a delete, into the freed space. As opposed to how it sounds now where it has bad performance on all deletes.

    Deletes are a common enough action that I think you'd want to optimize for them.

  37. You are missing the point by sigwinch · · Score: 4, Insightful
    However, if you want reliability and avoid downtime, you must have redundant servers or replication; journaling will not protect against most of the problems that cause downtime.
    Here in the real world we cannot afford triple redundant drives, motherboards, RAM, CPUs, power supplies, keyboards, mice, monitors, NICs, routers, and network cables for every single computer on every desktop in the entire organization. Sure, we could do it, but the cost would be ludicrous for a very small payback.

    Most computers simply don't need guaranteed zero downtime. What they need is bounded downtime. It's OK if they crash every once in a while, as long as they reboot cleanly within a few minutes. The biggest contributor to boot time after a crash is the file system check. Since a journalling file system can recover the file system within a few minutes, it is a huge win.

    Relying on it for "filesystem integrity" or "reduced downtime" or "reliability" is foolish.
    Here in the real world, even the big real-time transaction processing systems occassionally have common-mode failures that wipe out all the redundant subsystems at the same time. Lightning strikes, idiots frob the emergency power switch, etc. Thus, the big real-time systems need journalling even more desparately than the small systems.
    You pay for fast reboots in slower performance and more complex file system code.
    Sheer ignorance. Replication of filesystems and databases has at least as much of a performance hit as journalling, and the complexity is likely to be vastly higher.
    --

    --
    Kuro5hin.org: where the good times never end. ;-)

  38. Re:Distro battles? Nah. Journaling fs battles! by Mullen · · Score: 2

    Actually it's more like:

    umount /dev/sda1
    fsck.ext2 -y -f /dev/sda1
    fsck.ext2 -y -f /dev/sda1 (Just to make sure)
    tune2fs -j -C 0 -i 0d -c /dev/sda1
    mount /dev/sda1 /whereever

    You have to make sure to check the filesystem before you put the journal on it. Also, set the fsck checktime to 0 so it will never be fsck'ed.

    --
    Linux O Muerte!