Reliability of Journalling Filesystems Under Linux?
chrysrobyn asks: "Every write-up I see about journalling file systems under Linux discusses efficiency (embedded) or speed (desktop/server). Have any studies been done on reliability? I've used Linux since Slackware 96 (and kernel 2.0.0), and put it on 9 or 10 machines over the years (Slackware on x86 and Debian on PPC), but I've never strayed from ext2. Always, when the uptime gets high, 20-50 days, the filesystems start to get minor fsck errors. Not that I repair the system and expect it to stay live, I just use the fsck -n to help me decide when a repair is in order. Since the same thing has happened on a variety of hardware (386-PII and every interface in between and 601 and 750 processors with Apple hardware), I'm leaning on blaming the ext2 filesystem for these, the slightest of problems. I typically keep my servers up for as long as possible because 95% of my hardware problems have happened during resets and cold power-ups. It's time for my every-other-year rebuild of my personal server, with another on its way, so I was hoping to incite some anecdotal Slashdot conversation on the journalling file systems available for Linux. Personally, I'm most interested in hearing about the file systems supported under Debian stable for ease of administration for this machine which is a 5 hour drive away from home. I've been around the block a few times, so I'm not fearful of patching the kernel with better patches, but I'm respectful of the work the Debian assurance teams have done."
you have to expect some errors to show up from time to time, because the filesystem may change while fsck is running, and if so it will not be internally consistant.
Not knowing the answer to this myself I present to you a few links that may be helpful. Hope this helps.
This link has some good benchmarks of Ext2, ReiserFS and XFS.
And here is a fairly good news group discussion relating to what you are talking about.
man
No manual entry for
You should be aware that if you are running fsck -n on the fs while it is mounted in rw-mode, then it can and will report inconsistencies which are not real, simply because the fs has changed between the passes in fsck, something which it does not expect.
For this reason, I suggest you try again with remounting the fs in ro-mode before running fsck -n. I am fairly sure you will find that your errors go away. Especially since you state this has happened on diverse hardware and presumably diverse kernels.
That said I would recommend going with a journaling fs for that extra safety that comes from never getting inconsistent even if the power goes out at the worst moment. ext3 and reiserfs are both good, my preference would be ext3 for the simple reason that it can be mounted as an ext2-fs, which means that you will be able to read it with any old rescue-disk or whatever. Reiserfs typically requires you to redo all your rescue-disks, and make sure that your backup-restore-scheme handles it rigth.
If the remounting in ro-mode does *not* make the reported errors in fsck -n go away, and you are somehow able to reproduce this, please report the bug to linux-kernel.
I've user ReiserFS since it first appeared in Mandrakes distro. I have never had any problems what so ever with it. It just keeps running.
For the last series, I have not noticed any unexpected filesystem errors after 200-300 days of uptime (they need to be rebooted from time to time for kernel upgrades).
To conclude, always suspect your hardware first, especially if it's at least a couple years old.
I don't know about XFS and other journalling fs's since I've only used ReiserFS and Ext3 so far.
:-)
;-) The decision whether to use on or another is more performance/religion-wise, IMHO :-)
My experience so far is that Ext3 is more reliable (read: repairable) than ReiserFS simply due to the fact that Ext3 is a kind of "extension" to Ext2, so you can just run the good old well tested and known to work fsck.ext2 on a Ext3 partition should it screw up.
But I have yet to see a Ext3 partition screwing up, I've set up several PCs and servers with Ext3 and it works fine, no single problem to date.
Unlinke ReiserFS. I have to admit, my only experiences with ReiserFS were about one and a half years ago or so, but at that time I had set up a home PC with ReiserFS and somehow I f***ed it up beyond all repair. I don't remember what I did then but I just got scared of ReiserFS
On the other hand I have still another home PC, running SuSE Linux 7.2 updated to 7.3 with ReiserFS which just runs fine, and this is my home server, running 24-7.
So I guess until you don't do anything stupid like I did both ReiserFS and Ext3 are pretty reliable today, given their widespread use you would probably have heard of any major glitches/problems
"when the uptime gets high, 20-50 days"
That's not high uptime! Maybe if you're running Windows 95. I've had my system running for a little over 320 days now, and I haven't experienced any problems on any of my ext3 drives. And I've never before experienced any problems, on ext3/2 HDs. If you want reliability, I think the best thing you can do is buy a UPS. That makes it much more reliable than any FS change can do.
Yes, this is a known troll but I still want to comment on this particular line:
On other unices, crashes usually are caused by external sources like power outages. Crashes in Linux are a regular thing, and nobody seems to know what causes them, internally. Linux advocates try to hide this fact by denying crashes ever happen. Instead, they have frequent "hardware problems".
Crashes in Linux are NOT a regular thing, unless you want to be extremly bleeding edge and/or use NVidia's drivers and/or ALSA (at least up to 0.90rc5) on 2.4 with lowlatency- and preemptive-patches. Especially if the above stuff are used on SMP-systems.
My system used to crash (freeze) frequently (every 2nd or 3rd day).. But after I sold my GeForce4-card and got a Matrox G450 instead, and switched back to using OSS instead of ALSA (I've got a SB Live..), I've not had a single crash! It has been running for several months without a single reboot, and everything is super-stable! I've used it heavily every day, burnt more than 150 CD-Rs, been on Direct Connect and Freenet 24/7 etc.. That's despite I run the heavily patched 2.4.19-gentoo-r8 kernel, and my whole system (including the kernel) is compiled with gcc 3.2 "-march=athlon-mp -O3 -mfpmath=sse -pipe"..
So my conclusion is: Linux IS stable! Extremly stable! The cause of 99% of the "linux crashes!"-bullshit is because of NVidia's crap-drivers (fast but unstable) and drivers still not "preemtive"-safe (ALSA on SMP for example).. But those things are not used on servers anyway.
And about the "hardware problems": Yes, you DO get hardware problems MUCH MUCH more often on cheap PCs than on multi-million-dollar Unix-servers from Sun/HP/IBM.. Cheap PCs uses the cheapest-of-the-cheapest variant of all components to cut down the price. Expensive Unix-servers use expensive components and have a lot of redundancy, so you don't have to have downtime just because a CPU, a harddisk som RAM or something else failed.
My other account has a 3-digit UID.
We had a bad network adapter which would fail when other DMA devices were busy. This meant that whenever disk I/O was heavy, using the network adapter was likely to cause a complete system lockup. This took a while to diagnose as the problems took upward of two weeks to reproduce.
Despite the equivalent of having the power cable yanked randomly a dozen times when the machine was at its busiest, we never had a single problem with Reiser. The file which was being written to existed as the old version, and there wasn't even a lengthy fsck. Integrity was 100%.
Says the RIAA: When you EQ, you're stealing bass!
trust your backup.
Two cents from an old admin.
This indicates, to me, some hardware flakiness on your end. (Even though you say this happens on a wide variety of hardware.) In every account I've seen, journaling filesystems are more stressful on the hardware because - surprise! - the journal is constantly being written to. I'd stick to ext2 if I were you, and figure out why you get any errors when you fsck a dismounted file system.
I'm in charge of roughly forty Linux boxes, including many desktops and many servers. I've never seen any problems that I could blame on the filesystem. (Though there have been kernel releases in the past - including one in the 2.4.x series, IIRC - where there was a bad filesystem bug, fixed within a day.)
This is correct. Actually, however, suspect that your hardware has developed a bad connection first. Many problems are corrected by pulling every adapter and cable out about 1 millimeter, then pushing it in again. That wipes the contacts clean of oxide.
I have been using ext3 since around 2.4.13...and it has not given me one problem yet and has in fact saved my ass serveral times. I have frequent power failures at my house, but ext3 recovers gracefully every single time....
Power Corrupts,Absolute Power Corrupts Absolutely, leaving one person(group)in charge is absolutely corrupt.
I agree. The only time I've hung my systems
I can't speak for ALSA drives being bad -- when I used them they seemed to be fine -- but Nvidia's do cause regular hangs for me on an AMD Athlon system (Chipset: "VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]"). Even adding mem=nopentium to the boot line and using Nvdia's latest drivers this system hangs on a regular basis.
The type of instability I get with Nvidia's drivers reminds me of the odd crashes I used to get when I used Windows. For what it's worth.
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
I've been running NVIDIA's drivers on my desktop machine for a year now... only crashed I've experienced were when I experimented with the 2.5 kernel, or development versions of KDE (which in turn just crash or sometimes hang X, which can be fixed by sshing in from anothre box). My updtime is generally 2-3 months (I don't have a UPS).
Lex orandi, lex credendi.
Wierd, I use the preempt, low latency, and ALSA 0.9.0rc[forgot, I think four] on my 2.4.19 kernel. My uptime is now 87 days, that doesn't seem unstable at all.
HAL 7000, fewer features than the HAL 9000, but just as homicidal!
Test your backup. Not just once, but periodically.
Two cents from a (different) old admin.
I've only tried reiserfs and xfs for a few days each, for the most part I've stuck to ext3 in recent days. I've hard-crashed (pull the plug type of thing) several different machines with ext3 while filesystem write activity was going on and never had a problem. Based on my time with ext3, my limited experience with reiserfs/xfs, and reading lots of lkml, I think ext3 is the safest choice at this point in time, even if it's not neccesarily the best performance.
11*43+456^2
Aside from everybody telling you "that shouldn't happen, you're doing something wrong", which is probably true, I just wanted to chime in with my support of ext3. I think you're making a mountain out of a molehill.
/, a daemon to do journalling, and a bit or two toggled on the disk itself.
You obviously haven't looked very closely into ext3, because it's an extremely simple layer on top of a standard ext2 filesystem. Essentially, all it is, is an extra file in
the FAQ has one question that lists the two steps required to install a journal on a stock ext2 filesystem (provided you've got a 2.4.16+ kernel, or have patched your older kernel).
Not only is it very simple to install, but it's very simple to uninstall too. Blindingly easy, in fact. Mount your filesystem as ext2. Done. No journal. If you want to do it permanently, there's an answer about that in the FAQ too.
So really, you have nothing to lose by trying ext3. I've had 0 problems with it, and I use it on a laptop that gets a lot of abuse WRT being turned off at random times (I can't view my battery level in Linux, but I can in Windows. Thanks broken ACPI BIOS...)
The only downside is that the filesystem will sync every 5 seconds or so, which completely destroys any possibility of ever letting the disks spin down for power saving, but that's more of a laptop issue than a server issue.
Random and weird software I've written.
" This is interesting, considering that the DOS heritage in the Windows 9x/ME series was considered a very bad thing by the Linux community, even though it provided what could be called one of the best examples of compatibility, ever. "
.so files.
IBM's Mainframe line of computers kicks WinDOS ass. You can run binaries compiled on slow, clunky 1960's System-360 refrigerators on modern multiprocessing, fault-tolerant, redundant zSeries systems. I can't even run my favorite DOS 5.0 apps under DOS 6.0, least of all under Windows 3.1 or Windows 95. My PC, when it was a DOS machine, had DOS 3.0, DOS 5.0, DOS 6.2, Windows 3.1, and Windows 95. Lots of rebooting to use all my old apps, unless I wanted mysterious crashes and freezes.
Linux can still run QMAGIC executables compiled against BSD libc4 on a modern ELF/glibc2.3 system by turning on a kernel option and copying a few
My only complaint about Linux compatibility, actually, is just the idiots careless programmers who change the API of their library without changing the major revision number. (*cough QT cough*)
--TheOrangeSquid Is it any wonder things seem so awry? We swim in a sea of confusion and don't have to think to survive
I've been using reiserfs exclusively since SuSE 7.1, and it's been great. I haven't had a single problem, even during power outages and such (no, I don't have a UPS).
That's about as annecdotal as it gets!
Anyway, I'm not going to recomend reiser over the others since I don't have any experience with them, but I will say that I've developed great confidence in reiser's reliability. If I had any old data that I really cared about and wanted to use the same drive, though, I would probably go with Ext3 for the non-destructive (or so I've heard) upgrade.
Under capitalism man exploits man. Under communism it's the other way around.
Maybe it's an SMP bug then, because I've only used ALSA on my dual athlon box... I've heard others who have had problems with ALSA being unstable on SMP-boxes also...
My other account has a 3-digit UID.