Slashdot Mirror


ext3fs in Linus' Kernel Tree

peloy writes: "According to Linus' changelog for Linux 2.4.15pre2, the long waited ext3fs, the sucessor of ext2 with jounaling capabilities, has finally made its way into the official kernel tree. I have never tried ext3fs but it looks that now that it is "blessed" by Linus I'll be upgrading my old and trusty ext2fs partitions soon."

34 of 384 comments (clear)

  1. Finally! by alien88 · · Score: 3, Informative

    I've been running ext3 for about a month now, and it is so much better than ext2. I'm glad to see that Linus decided to merge it in. I know that there were some issues for a while with ext3 not working with the new VM, but they finally started releasing patches for the latest 2.4 kernels.

    -Alien88

  2. ext3, a journaled ext2 and not much more... by SpamapS · · Score: 5, Informative

    I've been using ext3 ever since I upgraded to 2.4.14 a few days ago. Its nice to have the journaled FS... as I have been testing out a lot of !cough!nvidia!cough! proprietary drivers and bleeding edge software lately, and subsequently crashing. W/ ext3, I can get back to the crashing very quickly.

    That said, I also use ReiserFS for some other things(try /var first, its simple to convert). It definitely speeds up the directory access... and on my squid it cut the average response time by a full half second... :-P.

    I personally think ext3 will win out, as it takes about 20 seconds to convert a 6GB partition... vs. XFS or ReiserFS taking nearly 10 minutes, and much more complexity.

    --
    SpamapS -- Undernet #Linuxhelp
    1. Re:ext3, a journaled ext2 and not much more... by Anonymous Coward · · Score: 1, Informative

      If mkreiserfs takes ten minutes on your computer then there is something wrong with your computer.

      I did mkreiserfs on a 40gig drive and it took seconds. Literally. Compare that to 45 minutes for NTFS 5.0

    2. Re:ext3, a journaled ext2 and not much more... by psamuels · · Score: 4, Informative
      how come Office takes 20 full seconds to start up on my *4* GB system anyway?

      Because you're not really converting the filesystem. The process consists of:

      1. creating a journal file
      2. marking it as a journal file in the various copies of your superblock

      That's the beauty of ext3 - it is essentially ext2 with journaling, no more no less.

      In fact, since this is the case, you can mount an ext3 filesystem as ext2 if you ever need to - the compatibility goes both forward and backward.

      --
      "How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README
    3. Re:ext3, a journaled ext2 and not much more... by BinaryAlchemy · · Score: 2, Informative

      The problem with the nvidia drivers is that after X barfs and you kill it (CTRL+ALT+BKSP or ATL+SysRq+k) the kernel driver hangs on and doesn't let it fall back to console mode (nvidia uses a kernel driver to get direct access to the card). You can type in to the console (ATL+SysRq+s then ATL+SysRq+u then ATL+SysRq+b is my standard system when wine pukes on me), you just can't see it.

      --
      ----- The problem with browsing at +5 is that everyone thinks you're being redundant
  3. Some important points... by ThatComputerGuy · · Score: 5, Informative

    Of course we'll have a lot of posts here talking about the issues of backwards compatiblity, ext3's offerings, etc, so we migh as well get those out of the way now.

    From what I understand, ext3fs is just ext2 with journaling support, so in the (somewhat rare) event of a system crash you don't have to go through a time-consuming fsck during the next boot. Results in better data protection and more uptime.

    If an ext3fs enabled kernel on an ext3 partition needs to go back to a previous kernel for some reason, or say, you forget to compile ext3 into a kernel, any ext2 kernel will still be able to read/write to an ext3 partition, as long as it was cleanly unmounted with the ext3 kernel.

    Why not push ReiserFS, XFS, etc? It seems that most of these are not very well proven yet. ext2 is tried and true, kernel support is good, and the new revision adds journaling, so why not stick with ext3?

    AFAIK, these are some of the most FAQs about ext3. I wonder how often they'll show up below...

    --
    XML is like violence. If it doesn't solve the problem, use more.
    1. Re:Some important points... by Reikk · · Score: 3, Informative

      : Why not push ReiserFS, XFS, etc? It seems that : most of these are not very well proven yet. ext2 : is tried and true, kernel support is good, and the : new revision adds journaling, so why not stick : with ext3? Bzz. Try again. XFS is extremely well proven. It's been in use for years in systems with massive storage - nuclear war simulations, automobile designing, and the area I've been dealing with the last several years, weather forecasting.

    2. Re:Some important points... by SurfsUp · · Score: 3, Informative

      From what I understand, ext3fs is just ext2 with journaling support,

      Yes and no. Functionally, that's strictly true. Internally, ext2 and ext3 have diverged somewhat. Ext3 does not share any common files with ext3 at this point. Ext3 is still buffer-oriented, wheras Ext2 has largely been converted to use the page cache. The page cache aspects of ext2 are expected to be added to ext3 in due course. At some point, there may be a full merge of the two code bases, though that's going to be a fair amount of work.

      --
      Life's a bitch but somebody's gotta do it.
    3. Re:Some important points... by dbarclay10 · · Score: 4, Informative

      I'd just like to clarify some of this post's points:

      From what I understand, ext3fs is just ext2 with journaling support, so in the (somewhat rare) event of a system crash you don't have to go through a time-consuming fsck during the next boot. Results in better data protection and more uptime.

      That's not entirely true for a couple of reasons; first of all, the ext3 code *started* as an exact duplicate of ext2, then they added journalling support. A lot has changed since then(in both code bases), so they're not identical any more. Secondly, journalling does not mean that there's no fsck; it just means that it's an order of magnitute or four faster. This is because during the filesystem consistency check, we know *exactly* where to look for problems(thanks to the journal). This doesn't result in better data protection, but it does result in better availability(and hence uptime).

      Why not push ReiserFS, XFS, etc? It seems that most of these are not very well proven yet. ext2 is tried and true, kernel support is good, and the new revision adds journaling, so why not stick with ext3?

      It should be noted that XFS has been around for years. I think your basic premise is still correct, though - neither XFS(in the scope of the Linux kernel) nor ReiserFS have been tested as extensively as ext3. And since ext3's code base started as ext2's code base, it doesn't even need so much checking.

      --

      Barclay family motto:
      Aut agere aut mori.
      (Either action or death.)
    4. Re:Some important points... by Anonymous Coward · · Score: 1, Informative

      I agree, ext3 is all the good sides about ext2 + journaling, which makes it fantastic fs for my uses. Personally I do need the backward ability as I twiddle around with dozens of kernels on some boxes all on their way to some embedded machines.

      _BUT_ I must say that having read more about reiserfs I have to say that what the guys are after is a really intuitive and good approach.

      They're thinking that, where as in unix system everything's a file anyway, why on earth should there be all sorts of databases, hash structures etc, when the same thing could be accomplished with directories and files given that the filesystem were good at handling small files.
      I.e. why build another database abstract file system inside a large file, rather than use the fs underneath as-is?

      Now, ext2 is particularly good with large files so it's not really suitable for this, _but_ what if we had a journaling filesytem that were really good at that.

      Thus the bottom line is that with reiserfs they're trying to make file system that can handle a shitload of small files as fast as possible.

      (now, the above was my understanding of the reiserfs manifesto alone, personally I use both ext3 and reiserfs (never tried xfs), so correct me I'm wrong)

    5. Re:Some important points... by be-fan · · Score: 2, Informative

      In linux there are two caching mechanisms. The first one, called the buffer cache, caches physical disk blocks. For example, there might be a buffer that caches blocks 8-16 on a particular disk. The second one, called the page cache, is much newer and caches files. So the page cache would cache, for example, the first page of a file. The difference between the two is that the page cache is much higher level, and thus much more flexible. For example, blocks on a disk are 512 bytes in size. Pages are 4KB in size. Thus the first page of a file might be contained in 4 different blocks on different parts of the disk. The page cache doesn't have to care about that, since its up to the filesystem to map pages to blocks. The page cache also interfaces very nicely with memory mapped files. Normally what happens when a process writes to a memory mapped file is that the kernel allocates a page of memory, and allows the process to write to that page. Eventually, the kernel writes out that page of memory to the disk file. With the buffer cache, there is no connection between what the process sees (pages) and what the disk deals with (blocks). Thus, the kernel has to manually make sure that the buffer cache and the memory mappings are in sync with each other. If a process read()'s from a file that another process is writing to with memory mapping, the kernel has to make sure that any changes to the buffers (read()/write()) agree with changes in the pages (memory mapping). The page cache, on the other hand, deals only with pages. So what happens with the page cache is that when a process writes to a memory mapping, it points the process to the page that is caching that part of the file. When another process uses regular read() to read that file, the kernel simply copies that data from the caching page. Another benifet is that it lets stuff like NFS (in which the kernel never deals with a disk, just files) use the same caching mechanism as regular files. The last benifet is that you don't have to treat file caches any differently from regular memory. The Linux VM system automatically swaps out pages that haven't been touched in awhile. With the page cache, the VM doesn't have to deal at all with buffers. It simply has to care about how often a particular page of memory has been written to (either by memory mapping, or read/write system calls). The original cache in the Linux kernel has the buffer cache. After the page cache was added, things like NFS were immediately built to use it. Older parts, like ext2, continued to use the older buffer cache. Over time, there has been a trend to converting the kernel to using the page cache more often. In Linux 2.2, for example, ext2 used the page cache to do file reads, and only dealt with the buffer-cache for writes. In 2.4, more of the filesystem layer switched to using the page cache for writes as well.

      --
      A deep unwavering belief is a sure sign you're missing something...
  4. Large file support? by RockyMountain · · Score: 2, Informative

    Question...

    What are the individual file size limits, and overall filesystem size limits for each of the various journalled filesystems?

    I ran into the file size limit on ext2 just recently (2GB, I think it was), and I want to upgrade to something that handles larger files.

    Thanks.

  5. Re:Yay! by beable · · Score: 2, Informative
    I hate having to fsck my / partition (which is still stuck in ext2 land because I'm afraid to change it).
    All you have to do is make a tiny /boot partition which can be ext2. Then you can easily use ReiserFS, ext3, XFS, or whatever you want for your root partition. If your system crashes, you would only have to fsck about 15 megabytes or whatever the smallest partition you could use is.
    --
    ...
  6. Re:Forgive my ignorance, here... by PlaysWithMatches · · Score: 3, Informative

    Here's a quick explanation of a journaled filesystem, courtesy of LinuxPlanet.com:

    The term "journaled" means that the filesystem maintains a log or record of what it is doing to the main data areas of the disk, so that if a crash occurs it can re-create anything that was lost.

    ...

    The idea is that the system can crash at any point in this process but that such a crash won't have lasting effect. ... So when the system reboots, it can simply replay the journal entries and complete the update that was interrupted, or it can back out a partially completed update to restore the file's previous state. In either case, you have valid data and not a trashed partition.

    Basically, it means no more long disk checks at startup after a crash or power outage. :) And it virtually eliminates disk fragmentation too, I believe. Hope that helps.

    --

    Mozilla's a nice operating system, but it needs a better browser.
  7. Re:Is it light on HD requirements? by Anonymous Coward · · Score: 2, Informative

    It will take a _little_ extra space. The journal is normally a file in the root of each ext3 partition which is untouchable/hidden from all but the kernel.

    The new fstools will create it for you with the -j option to mke2fs or tune2fs, but in the old days we created it with dd and passed it's inode to the kernel by a mount option - but only for the first mount.

    For my partitions of between 250Mb and 1.5Gb, I use a journal of 8Mb and have no problems. A bigger journal will allow more data to be journaled before it fills and a flush is forced, so is more efficient, but for a small disk with no big writes, a 4Mb to 8Mb journal is more than sufficient.

    BTW, the current code allow (I think) off-media journals, so you could use journal across disks, or to a battery-backed ramdisk, or an IDE disk implemented with battery backed DRAM, or SRAM.

    Unfortunately, FLASH disks would exceed their maximum-number-of-writes specification in about a year, based on a write every 30 seconds.

    astfgl@iamnota.org

  8. Re:Is it light on HD requirements? by Scooby+Snacks · · Score: 2, Informative
    Is this going to chew up more HD room?
    Unfortunately, yes. The journal itself takes up some room, and there's no getting around that.

    With ReiserFS, the journal size is 32MB, regardless of the partition size. Apparently, though, the journal size on an ext3 partition is variable, and is just 15MB by default. (Look for "Disk space" toward the end of the page.) See also the man page for tune2fs(8) with a reasonably recent version of e2fsprogs.

    --

    --
    Runnin' around, robbin' banks all whacked on the Scooby Snacks...
  9. Linus on preemptible kernel (and Tweedie on ext3) by kingdon · · Score: 5, Informative

    Someone asked Linus about the preemptible kernel patches (and latency in general) at the Annual Linux Showcase on Thursday night. The thing about the preemptible kernel is that it is only for uniprocessor - SMP kernels aren't preemptible. So unless you want the SMP case to be capable of tying up a processor for "too long" at a time, then you need to re-do each bit of code which is capable of long latencies anyway. The other thing which came up is that responsiveness of the system improved quite a bit recently with VM fixes (2.4.14 was the improved version, I think). It was a matter of the VM queueing up too much I/O (and the drivers trying to throttle it, instead of just throttling it all in the VM - or something like that). The preemptible kernel won't solve that kind of problem - although it may change/mask the symptoms enough to make it a bit hard to be sure where a problem is.

    Oh, and to bring things back to ext3, Steven Tweedie was also there and made a number of comments about ext3. He has been fairly busy/nervous lately as ext3 just got into the hands of Lots Of(TM) users (when it shipped with Red Hat 7.2). The most serious problem I remember him talking about was that the 7.2 installer had a box marked "upgrade my ext2 to ext3" and one marked "makefs the filesystem" (or something like that), and some people were checking both - which would create a nice new empty filesystem in place of the one which was being "upgraded". But of course that is just user error plus a confusing installer, not a kernel problem. Most of the things which looked like ext3 kernel problems seem to be something else, as far as Steven has been able to tell so far.

  10. ext2 limit graph by Anonymous Coward · · Score: 4, Informative

    The ext2/ext3 limit is 4 terabytes, but Linux
    device files have a 1 terabyte limit.

    http://www.cs.uml.edu/~acahalan/linux/ext2.gif

    Pay attention to the note on the right.
    That explains why apps often break at 2 GB.

  11. Re:I wish Linus would stop this by cvanaver · · Score: 2, Informative

    A) No one is forcing you to upgrade. If you are throwing the bleeding edge onto Prod servers you deserve what you get. That doesn't just apply to Linux, it applies to Solaris, AIX, HP-UX, Windoze and everything else. Get a clue.
    B) At least Linus isn't terrified of making changes to the existing code base and fixing inherent problems, regardless of his testing base. I just ran into an issue on Solaris where as we had a middleware daemon (TIBCO Rendezvous) which hit upon a rather serious flaw in 32-bit Solaris where it could not resolve more than 256 calls to alias a port/service. At 257 the call to resolve a service alias would just fail. We talked to Sun about it and they said they knew it was an issue an refused to change it (ever) cause it would require them to change a foundation C struct that might break a bunch of apps. I understand Sun's viewpoint, but they have taken the 'safe' approach where they are locked into code limitations of the past. Great, so I get stuck with un-documented bug-crap forever.
    C) Linus is the visionary, not the tester. If you don't trust RedHat or others to test the upgrades, and you don't have the desire/bandwidth to do it yourself, then you either shouldn't be running Linux or shouldn't be considering upgrades.

  12. Re:Ugh... More FUD From Within... by Anonymous Coward · · Score: 1, Informative

    More nonsense. ext2 will lose data if the data isn't written to the disk when a failure occurrs. So will UFS. But you won't experience corruption of data you're not working with otherwise. ext2 is stable and solid. It gets corrupted if you fuck with it. Same goes for every other fs.

    The difference is that EXT2 is in general mounted async while UFS+SUP isn't, so he's right. UFS+SUP is better, it performes on par with EXT2 while not corrupting meta data is a crash unlike EXT2.

  13. Re:FreeBSD by Anonymous Coward · · Score: 1, Informative

    If you run tunefs -enable on xBSD you get pretty much that. My webserver looses power all of the time (15 minute UPS) and it has never lost anything. It runs OpenBSD with FFS/Softupdates.

    fscks take about 4 seconds (50Mhz MicroSparc)

  14. Re:FreeBSD by cperciva · · Score: 5, Informative

    FreeBSD doesn't need a journaling file system: FreeBSD has softupdates, which ensure that the filesystem metadata is always in a consistent state while providing better performance than journaling.

  15. What about 2.5.x? by Cryptnotic · · Score: 2, Informative
    Shouldn't this REALLY REALLY be in Linux-2.5.x? What ever happened to the old mantra "odd numbers, development, new features; even numbers, stable, bug fixes"? Has Linus forgotten? First, completely changing the virtual memory system in 2.4.10, now this.


    Cryptnotic

    --
    My other first post is car post.
  16. Re:ext3 and quotas.... by selmer · · Score: 3, Informative

    It looks like there are still some quota-problems with the Linus-kernels, check this post on the ext3 user-list, in which which Andrew Morton says that there are no known quota-problems with the ac-kernels but that he wants to test a bit more on the Linus-kernel as it used to cause deadlocks.

  17. Re:Ugh... More FUD From Within... by Anonymous Coward · · Score: 1, Informative

    There was no reason to keep support for it lying around, but MS did anyway and it was responsible for a LOT of the instability in Windows 9x/2000.

    You're off your rocker if you believe that Windows 2000 has DOS underpinnings.

  18. Tip: Root partition not being mounted ext3? by Spoing · · Score: 4, Informative
    If you're converting from ext2 to ext3, update /fstab so that 'auto' is used instead of either 'ext2' or(!)'ext3'. Auto makes it easy to dynamically switch to a kernel that doesn't support ext3.

    Unfortunately, if your file system tools aren't upto date, your root partition won't be mounted ext3. A quick check to see if everything worked is to look at the output from either df or /proc/mounts like this;

    1. df -T

    In the second column, it should report the filesystem type of each mounted partition. If you don't see / , you should upgrade fileutils.

    1. cat /proc/mounts

    This is basically how your fstab is currently interpreted, as recorded in /etc/mtab.

    If either of these look wrong, check the kernel sources for Documentation/Changes, and verify that you are using the supporting program versions mentioned in the Current Minimal Requirements section.

    --
    A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
  19. Re:History of kernel video drivers by AndyS · · Score: 2, Informative

    Nvidia's drivers aren't DRI. They use their own approach, and have a much larger kernel driver. I use (at the moment) a Radeon DRI driver, and it's very solid, albeit slower. The amount of actual kernel code used in DRI is tiny, unlike the nvidia code (which was unbelievably huge)

    Aside from this, as I understand it, due to the design of PCs, it's not impossible to stop anything that directly writes to your graphics card from screwing it up, as your graphics card can do exceptionally unpleasant things to the rest of your machine. DRI is meant to have lots of checks to try and avoid nastiness happening to your kernel from your video card, and apparently it seems to work.

  20. Re:Ugh... More FUD From Within... by Od1um · · Score: 2, Informative

    Both io.sys and msdos.sys are 0 byte files.. looks like there's lots of DOS stuff in there.. They're there so old programs that look for them will still run.

    >Worse, try deleting these files and see what it gets you. Then try to do a repair install... oh my. They aren't put back... muahahahaha!

    I gave you the benefit of the doubt and deleted them to see what would happen, and then rebooted. Win2K started up with no problems. It didn't recreate them, but I'm posting this with no problems, so obviously they're not very important (which would make sense considering they were 0 byte files).

    Command.com is there, but it doesn't run natively - notice the extremely slows peed compared to the native NT command line program 'cmd.exe'.

  21. Re:Tip: Root partition not being mounted ext3? by Miles · · Score: 2, Informative

    Actually, I've found that putting ext3 (in /etc/fstab) with no ext3 support will automagically mount as ext2. I've also heard that having something like ext3,ext2 will work, but I've never tried it.

    Oh, and to check if you have ext3 you can also use tune2fs -l /dev/blah and look for the has_journal flag in the Filesystem features field.
    For your root filesystem, you may also see something like VFS: Mounted root (ext3 filesystem).

    Andrew.

  22. minor gotcha with "auto" by David+Jao · · Score: 2, Informative
    If you mount all your filesystems as "auto", and you use slocate, then be sure to edit the small file /etc/updatedb.conf and remove "auto" from the list of PRUNEFS types.

    Otherwise updatedb will ignore your "auto" filesystems (i.e., your whole system) and the slocate database will be empty.

  23. You can't just power off by ChrisWong · · Score: 2, Informative

    Red Hat 7.2's release notes on ext3:

    Please keep in mind that even a journaling file system can be damaged by power loss. When a system loses power, that system's behavior is
    undefined. For example, memory contents can decay (become randomly corrupt) as the contents are copied to a hard drive running on the
    last bit of power. This is a fundamentally different situation from the more defined sequence of events caused by pressing the system's "reset" button while the system is running. In addition, IDE hard drives do not provide all of the write order guarantees that SCSI drives do.

  24. Re:A dumb question... by Anonymous Coward · · Score: 1, Informative

    As far as I know, there is no "in place" ext2 ->L reiserfs conversion; you have to do the copy out / copy back routine. That said, you *could* make a tar saveset on a tape, then reformat the partition as reiser, reboot with a suitable boot disk and restore the backup to the new, empty reiser partition. You need to have a GOOD tape backup first, though..

  25. Re:Ugh... More FUD From Within... by SmittyTheBold · · Score: 2, Informative

    There was no reason to keep support for it lying around, but MS did anyway and it was responsible for a LOT of the instability in Windows 9x/2000.

    Just FYI, DOS was gone in the "pro" MS OSes for years. There is no trace of it in NT, 2000, or XP.

    2000 is actually quite stable in a production environment. It may not be as stable as a properly stripped-down and customized Linux install, but it's pretty damn close. It manages to do that with a good bit more user-friendliness.

    Backwards compatability is good if the older stuff is still used. Also, the backwards compatability in ext3 does not break its implimentation.

    The old stuff in DOS was still used by some people, that's why it's there. You know what? Much of Linux is not so much binary compatibility with previous releases, but it's idea compatibility with ancient software. If I was writing an OS from scratch, you can bet I would not target source compatibility with 60s software as my primary goal. That's what Linux is - a clone of old software. I find it quite amusing all these people insist Linux is the future, when all it really tried to do was emulate the past.

    Back to filesystem design - just because it's still used, doesn't mean it should stay in use. You have to keep using something while the new it brought in...but justifying software's existence by the fact it's in use is the exact argument MS uses. EXT2 is dead. EXT3 is a hack on top of EXT2 to make it slightly more modern. Think of it liek Windows 3.1 on top of DOS. Now you get it. We need something new, and there are filesystems coming that will be the new thing. If you want to see where the Linux FS scene will be in a few years, look at BFS. Journalling, attributes, 64-bit, you name it. EXT3 only does a little of what an FS will have to do in the future. Don't ignorantly assume because somethigg can still be useful now it will be useful in the coming days.

    --
    ± 29 dB
  26. Ext3 not safe against power-down by Euphonious+Coward · · Score: 2, Informative
    I see post after post reporting gleefully that people now can just pop off the power, believing that journaling will save their data from harm.

    It's not true.

    If you have only SCSI disks, it may be true, if your disks are from a very reputable manufacturer. (They are few, and charge more.)

    If you have IDE disks, it is almost certainly false. IDE disks report data successfully written to disk when it is still only in on-board RAM buffers. Even when told not to, they often do it anyway. (Lying results in better benchmark scores.)

    If you have IDE disks, journaling will help protect you against various lockups and crashes, but if the disk is active when the power goes out, all bets are off. If you think you didn't lose data, maybe you got lucky, or maybe you just haven't noticed your losses yet.

    The reason IDE disks are cheaper than SCSI is that the people who buy IDE disks have much, much lower quality standards. To compete, the manufacturers are forced to deliver lower quality. If you care about reliability, buy SCSI (or fiber-channel, or ...).

    If you use IDE, watch that power switch, and keep current backups. If you maintain critical data, invest in a UPS. Journaling is not a substitute for a UPS, it's just a time saver.