Slashdot Mirror


Reliability of Journalling Filesystems Under Linux?

chrysrobyn asks: "Every write-up I see about journalling file systems under Linux discusses efficiency (embedded) or speed (desktop/server). Have any studies been done on reliability? I've used Linux since Slackware 96 (and kernel 2.0.0), and put it on 9 or 10 machines over the years (Slackware on x86 and Debian on PPC), but I've never strayed from ext2. Always, when the uptime gets high, 20-50 days, the filesystems start to get minor fsck errors. Not that I repair the system and expect it to stay live, I just use the fsck -n to help me decide when a repair is in order. Since the same thing has happened on a variety of hardware (386-PII and every interface in between and 601 and 750 processors with Apple hardware), I'm leaning on blaming the ext2 filesystem for these, the slightest of problems. I typically keep my servers up for as long as possible because 95% of my hardware problems have happened during resets and cold power-ups. It's time for my every-other-year rebuild of my personal server, with another on its way, so I was hoping to incite some anecdotal Slashdot conversation on the journalling file systems available for Linux. Personally, I'm most interested in hearing about the file systems supported under Debian stable for ease of administration for this machine which is a 5 hour drive away from home. I've been around the block a few times, so I'm not fearful of patching the kernel with better patches, but I'm respectful of the work the Debian assurance teams have done."

66 comments

  1. If you are fscking a live filesystem by Pathwalker · · Score: 4, Informative

    you have to expect some errors to show up from time to time, because the filesystem may change while fsck is running, and if so it will not be internally consistant.

    1. Re:If you are fscking a live filesystem by Anonymous Coward · · Score: 0

      Well duh.. if you are stupid enough to run (a real)fsck on a live(mounted) filesystem, you *deserve* to get your filesystem thrashed.

    2. Re:If you are fscking a live filesystem by Anonymous Coward · · Score: 0

      Also, be sure to unmount the filesystem before fscking it, otherwise it could change while you're fscking it, and won't be internally constant.

    3. Re:If you are fscking a live filesystem by sofar · · Score: 2

      If you read carefully he uses 'fsck -n' to *check* for errors without fixing it. Doing this regularly will show that even a perfectly fine running system will pollute an ext2 fs.

      I suspect that he is sane enough to init S before doing a *REAL* fsck.

    4. Re:If you are fscking a live filesystem by gowen · · Score: 1
      Doing this regularly will show that even a perfectly fine running system will pollute an ext2 fs.
      Bollocks. All you'll get on a live filesystem is a series of false positives as the filesystem changes behind fsck's back. You won't trash it, but your results won't be reliable...
      --
      Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
  2. Few links for you by Rubbersoul · · Score: 4, Informative

    Not knowing the answer to this myself I present to you a few links that may be helpful. Hope this helps.

    This link has some good benchmarks of Ext2, ReiserFS and XFS.

    And here is a fairly good news group discussion relating to what you are talking about.

    --
    man .sig
    No manual entry for .sig.
  3. /something/ is wrong here. by Eivind · · Score: 5, Informative
    ext2 is a quite stable fs. It is not journaling, so crashing at an inconvenient time can lead to an inconsistent fs, but other than that there is no reason why an ext2 fs should magically develop inconsistencies after 3-6 weeks of runtime.

    You should be aware that if you are running fsck -n on the fs while it is mounted in rw-mode, then it can and will report inconsistencies which are not real, simply because the fs has changed between the passes in fsck, something which it does not expect.

    For this reason, I suggest you try again with remounting the fs in ro-mode before running fsck -n. I am fairly sure you will find that your errors go away. Especially since you state this has happened on diverse hardware and presumably diverse kernels.

    That said I would recommend going with a journaling fs for that extra safety that comes from never getting inconsistent even if the power goes out at the worst moment. ext3 and reiserfs are both good, my preference would be ext3 for the simple reason that it can be mounted as an ext2-fs, which means that you will be able to read it with any old rescue-disk or whatever. Reiserfs typically requires you to redo all your rescue-disks, and make sure that your backup-restore-scheme handles it rigth.

    If the remounting in ro-mode does *not* make the reported errors in fsck -n go away, and you are somehow able to reproduce this, please report the bug to linux-kernel.

    1. Re:/something/ is wrong here. by vsync64 · · Score: 0, Troll
      ext2 is a quite stable fs. It is not journaling, so crashing at an inconvenient time can lead to an inconsistent fs, but other than that there is no reason why an ext2 fs should magically develop inconsistencies after 3-6 weeks of runtime.

      No, no! ext2 loses data like a firehose loses water! I know it's true because some guy keeps posting it to Slashdot!

      --
      TO BUY A NEW CAR WOULD MAKE YOU SEXUALLY ATTRACTIVE.
    2. Re:/something/ is wrong here. by chrysrobyn · · Score: 3, Informative

      /something/ is wrong here.

      I think I may be able to point to the answer.

      You should be aware that if you are running fsck -n on the fs while it is mounted in rw-mode, then it can and will report inconsistencies which are not real, simply because the fs has changed between the passes in fsck, something which it does not expect.

      I'm doing my fsck -n's in RW mode. From the less file system experienced Linux user's perspective, I wonder what ext2 does when going from RW to RO that cleans up for fsck. I can understand the value of delaying some writes, but shouldn't that get flushed when the box is not active? Would fsck -n work on a RW mounted ReiserFS, JFS, XFS or ext3?

      I'm not being argumentative, this sounds like one of those typical Unix behaviors, but learning why may help me with other potential issues.

    3. Re: /something/ is wrong here. by Black+Parrot · · Score: 1


      > ext2 is a quite stable fs ... there is no reason why an ext2 fs should magically develop inconsistencies after 3-6 weeks of runtime.

      Agreed. My typical login time is 180 days, and I almost always have > 100 processes alive on the system for that whole time, but the only time I've had disk problems in several years of operating that way is when I had hardware failures.

      --
      Sheesh, evil *and* a jerk. -- Jade
    4. Re:/something/ is wrong here. by ChadN · · Score: 2

      Fsck reports inconsistencies when checking a RW file system, because from fsck's perspective, it IS inconsistent. Things have changed that fsck didn't see, and it (correctly) reports it. Some file systems are designed to allow fsck to work on a RW filesystem (FreeBSD soft-writes, to a limited extent).

      Remounting RO "cleans up" for fsck, by flushing all pending writes to disk, and insuring that no new writes happen during fsck's run. Even on an 'idle' system, writes to the file system can and do occur (to update meta-data, if nothing else). If you REALLY want to run fsck on a RW file-system, you should at least use the "noatime" mount flag, and possibly "sync", and "dirsync" as well.

      You can also use LVM, and create "snapshots" of the live filesystem, which can then be checked for consistency offline, while the original filesystem chugs along.

      I agree that it is highly unlikely that the fsck errors you see are anything more than anomalies caused by your method of checking (unless you have some funky power source or radiation that is affecting a variety of systems). ext2 is considered to be quite stable (ie. wont become inconsistent over time on working hardware). Are these SCSI disks or IDE? I'd suspect the disk driver code before the filesystem code.

      --
      "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
    5. Re:/something/ is wrong here. by budgenator · · Score: 2

      >That said I would recommend going with a journaling fs for that extra safety that comes from never getting inconsistent even if the power goes out at the worst moment. ext3 and reiserfs are both good,

      agree I put reiserfs on a 'puter for my wife's use, she was a total computer newbe for safety. she'd some times just punch the power button out of frustration but never a problem with the file system. I couldn't detect any seat-of-the-pants difference in speed.

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    6. Re:/something/ is wrong here. by Zeio · · Score: 3, Informative

      I have evaluated file systems of late, and wish only to express the need for more attentiveness in one's file system. Being nonchalant about this can lead to "bad situations."

      I just finished evaluating JFS 1.0.24 for Linux. My opinion of 1.0.24 and JFS is IBM is doing the port as a courtesy to AIX and OS/2 migrators. It is extremely robust, but slow, 2x slower than XFS or Reiser. I had maximal R/W activity (tar untar create deletes in while loops, Xwin started, downloading via ftp, scp, etc) and power off hot several times, never saw anything but "file system clean."

      I am in process or evaluating XFS 1.2pre3. 1.1 XFS for Linux is unreal. It does "everything," it has done it for years, its high performance, has a robust heritage and is all around very good. I have cold killed it, inserted and removed hot swap drives while running, while doing fairly absurd amounts of activity on the test box. Not using this file system is a shame. The release patched kernels, one catering to the Redhat droids and the other is a vanilla with their magic patched in. This isn't a Marcelo kludge either, these are professionals who care greatly in the stability of their product and do a great job in their little cornel of the kernel. The Mandrake and SuSE kernels have this stuff patched in, along with extended attributes and ACLs, and the XFS kernel only has ACL and DMAPI support, and the JFS patches won't apply clean to their kernel, but on thing is true of SGI's version: It actually compiles. The Mandrake 9 and SuSE 8.1 kernels seem not able to compile outside of their proprietary environments. I am upset about this. Typical second tier vendors who fail to bring coherency to fragmented set of projects loosely and informally known as the nebulous "Linux."

      EXT3 is a dirty hack (EXT2 with fake journaling). I don't know how EXT3 gets high performance marks - ever - my experience has suggest awful and inconsistent performance with several nasty changes made to e2fsprogs in succession to address potentially severe problems. Its insulting to enterprise customers that RedHat touts this garbage as a journaling filesystem. Reiser is a UFO, and is easily corruptible, and I fail to understand its wide use and early integration in the kernel - my only guess is its simplicity required the least cleaning up of the kludged Linux file system underpinnings. I also get sick to death of Hans blaming everyone and their mother while the guys at XFS and JFS quietly patch away the problems, while Hans whines. Hans did have a good point about the broken RedHat compiler back when it was an issue. I base my opinion of EXTx, and Reiser based on experience. I am appalled, and disappointed at the lack of respect the Linux kernel maintainers have given to XFS. The best of the litter being the last to go in - typical, and Appalling.

      UFS+logging on Solaris and UFS+S on FreeBSD are both superior. I have never seen these go haywire. Ever. Interestingly, UFS+S is apparently the 'softcore' journaling method that EXT3 uses, but its far less damageable by empirical determination, and its clearly faster and runs more smoothly. Anytime Veritas appears, which ironically is included in SCO, and is available for Solaris and NT based OSs, things come along quite nicely.

      Recently OS X added journaling to the already pathetic HFS+ filesystem. My experience with Mac OS 10.X, including 10.2 has been horrible. I think its inferior, the Mach kernel was deprecated by its progenitors, CMU, in 1994. I think the FreeBSD userland is outdated. I think HFS+ is a pathetic file system and fail to understand why they don't use UFS, but if you have ever tried using it with OS X you know it's not "finished." [defined as: nothing work if UFS is used - don't try and say otherwise] Adding journaling to HFS+ will only slow down an already horrifically bloated and underpowered platform. I find it laughable Apple hardware does not get submitted to www.spec.org, but I have CPU2000 results for PPC 1.25GHz, and of course it is so horrible they can't submit - everything including the SPARC beats it hands down. I also though having to have OS 9 installed on a separate partition as OS X for classic to work properly laughable. I base my deprecation of the Apple efforts on real life experience and objective comparison. I only have to convince myself, but for those who can't easily see where the truth lies on the speed of a Max vs. a PC, my condolences to any significant other you might be lucky to have.

      FreeBSD 5. UFS2 will probably be one of the best filesystems to ever see the light of day, and vinum will be there as well.

      [I hate Eugenia Dork Loli and her horrible crap "editing" and "journalism," but there are interviews with Steve Best [JFS],Hans Reiser, and Nathan Scott [XFS], held prisoner on OS"News" (more like OSCrapConjecture), very informative; http://www.osnews.com/story.php?news_id=69 ; with some more Journaling info here, http://www.linuxgazette.com/issue55/florido.html showing how Robust XFS is]

      When examining the facts, the superiority of XFS becomes clear, and I advocate its use, it's the responsible thing to do. I have recently beaten heavily on a 2.4.19 stock + XFS pre3 of release 1.2 merged in. I can tell you my experience with the Dell 1650 and constant filesystem abuse that the filesystem is that last thing I would worry about in that kernel. I am eagerly awaiting the release of the 2.4.20 kernel, typically long over due as we seem to have an absentee maintainer that rarely speaks, however, upon its release I believe the XFS 1.2 stable will be merged in or completed and I will have a configuration good to go for use on the order of years.

      While I may have harsh words from certain practices and sometimes people, I find XFS and the 2.4.19 kernel to be acceptably stable. I ran that 1650 through the washing machine fairly rigorously, and besides the idiotic spurious " Warning - running *really* short on DMA buffers" errors (which caused a flame war on LKML), it seems to be a useful kernel. The RedHat 2.4.18-17.7x kernel, by the way, is the worst most untested pile I have ever seen. What is wrong with these people? Several net drives with no working promiscuous mode, kernel panics, the list is endless.

      --
      Legalize the constitution. Think for yourself question authority.
    7. Re:/something/ is wrong here. by matman · · Score: 1

      Yikes, if that's not flaimbait, I don't know what is. Not through the content, but rather through the tone.

    8. Re:/something/ is wrong here. by Anonymous Coward · · Score: 0

      Way to have a sense of humor, moderators.

    9. Re:/something/ is wrong here. by emag · · Score: 2

      Having one of the most unreliable laptops I've ever seen (a He^H^HDell Latitude, work-supplied, with enough replacement parts to build 3 new systems in a year), I determined after one-too-many crashes for no apparent reason, I *needed* a journalling FS if I wanted any of my work data to remain intact.

      I was faced with a difficult decision, however, since I didn't particularly have anywhere to STORE all my data to convert to ReiserFS, (at the time) fairly new XFS, JFS, etc. So I made the logical (for me) choice of going with ext3. The best part? Sitting in LAX waiting for a flight, converting my FS to ext3 (I'd compiled the kernel the night before, during my < 36 hours "home time").

      Since then, I've had the system board in the machine spontaneously fry itself while the machine was sitting on a desk not being touched, but updating Debian in the background, I've had various other lockups (usually some combination of ALSA and resuming from a suspend), etc, and not once have I lost any data, or had FS corruption, nor have I experienced anything that could be attributed as "slowness".

      I can see the merits of XFS, especially since I have friends who are familiar with the Irix version, but, if you've got a running system that absolutely can't take another drive in it to migrate data to, or happen to be somewhere you can't get temp storage, EXT3 is a logical way to go.

      It might not be the BEST solution, or the solution for everyone, but so far for me, it's been a damned good one.

      (Incidentally, I lucked out w/ that fried system board, since I was on-site at a customer with other coworkers, and was able to verify immediately, via swapping drives, that all my data was intact. Yay.)

      --
      "The urge to save humanity is almost always a false front for the urge to rule." --H.L. Mencken
    10. Re:/something/ is wrong here. by Eivind · · Score: 2
      I'm doing my fsck -n's in RW mode. From the less file system experienced Linux user's perspective, I wonder what ext2 does when going from RW to RO that cleans up for fsck. I can understand the value of delaying some writes, but shouldn't that get flushed when the box is not active? Would fsck -n work on a RW mounted ReiserFS, JFS, XFS or ext3?

      Well, the reason whz zou get inconsistencies when doing fsck -n in rw mode is actually quite simple. Remember that fsck checks that the filesystem is consistent, this means for example: reference-counts are correct, directories all have a ".." and an "." entry, no block is both marked as being part of a file, and on the list of free blocks and so on.

      Many fs-operations are not atomic. For example, to remove a directory the kernel will first remove everything it contains (including "." and "..") and then remove the directory itself. Now, what happens if fsck happens to touch the directory *between* those two steps ? That is, after "." and ".." have been removed, but before the dir itself is gone ? You'll get an inconsistency.

      This is just one silly example, it's easy to think of others. You are rigth that the problems should be smaller if the filesystem has been inactive for a while so that most fs-operations are finished, but a total guarantee that everything is finished you don't get. Possibly running "sync" would be enough though, you migth try that.

      remounting in ro-mode helps because it ensures that all write-operation fully completes, and also stops the kernel from making any changes to the fs in the middle of your fsck-run.

  4. fsck around 50 days ? by Anonymous Coward · · Score: 0

    Something must be very very wrong if errors occur
    around every 50 days, and an fsck might be required say half/one third of the time.
    That would be totally unacceptable to me, and I'm very glad it doesn't happen on our servers, we'd been out of business.
    Our servers run ext2/ext3, so far I havn't noted any diffrence in reliability between them.

  5. ReiserFS by e8johan · · Score: 2

    I've user ReiserFS since it first appeared in Mandrakes distro. I have never had any problems what so ever with it. It just keeps running.

  6. old hardware by wotevah · · Score: 2, Informative
    I had a couple of these systems, they would develop filesystem errors out of nowhere. It always turned out to be faulty old hardware (memory, cables, motherboard etc). The PC components get old really fast, my plan has been to get new hardware at least every three years and get real server hardware (ServerWorks mobo etc).

    For the last series, I have not noticed any unexpected filesystem errors after 200-300 days of uptime (they need to be rebooted from time to time for kernel upgrades).

    To conclude, always suspect your hardware first, especially if it's at least a couple years old.

  7. Ext3 vs ReiserFS by DarkDust · · Score: 5, Informative

    I don't know about XFS and other journalling fs's since I've only used ReiserFS and Ext3 so far.

    My experience so far is that Ext3 is more reliable (read: repairable) than ReiserFS simply due to the fact that Ext3 is a kind of "extension" to Ext2, so you can just run the good old well tested and known to work fsck.ext2 on a Ext3 partition should it screw up.

    But I have yet to see a Ext3 partition screwing up, I've set up several PCs and servers with Ext3 and it works fine, no single problem to date.

    Unlinke ReiserFS. I have to admit, my only experiences with ReiserFS were about one and a half years ago or so, but at that time I had set up a home PC with ReiserFS and somehow I f***ed it up beyond all repair. I don't remember what I did then but I just got scared of ReiserFS :-)

    On the other hand I have still another home PC, running SuSE Linux 7.2 updated to 7.3 with ReiserFS which just runs fine, and this is my home server, running 24-7.

    So I guess until you don't do anything stupid like I did both ReiserFS and Ext3 are pretty reliable today, given their widespread use you would probably have heard of any major glitches/problems ;-) The decision whether to use on or another is more performance/religion-wise, IMHO :-)

    1. Re:Ext3 vs ReiserFS by GigsVT · · Score: 5, Informative

      Well I can fill in your gaps on XFS. XFS has run flawlessly for me for over a year now, on a 1.9TB RAID volume. EXT2/3 seems to have rough edges in regard to large file systems, for example, by default mkfs will set aside 5% of your disk for root use. That's 97GB wasted! Another problem is that you need to specify -T largefile4 or it will try to create way too many inodes, taking forever to create the filesystem, or fsck if you ever need to.

      XFS is a very mature file system, and file systems that are many TB work fine with its defaults. Performs more consistantly too. EXT2/3 was very sensitive to RAID stripe size, and things like that. Even setting the special stride option, you had to recreate the filesystem many times to make sure things worked right. XFS performed consistantly at any stripe size, with no strange dips in performance if boundaries didn't line up just right.

      In all, if you are building a large RAID, I would go with XFS. For day-to-day use of 200GB or less on a single disk, EXT2/3 is fine. (You probably still don't want to let it waste 5% of the disk, that is such a retarded default, use -m 1 to help reduce it)

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    2. Re:Ext3 vs ReiserFS by DarkDust · · Score: 2

      Good to know, thank you... I've never set up something beyond the 200GB scale, not even RAID servers so I didn't run into any of your mentioned problems yet... except for the 5% root-reservation, but normally I just don't care about that ;-)

    3. Re:Ext3 vs ReiserFS by OeLeWaPpErKe · · Score: 2

      I once did dd if=disk.img of=/dev/md0 instead of /dev/fd0 (only 1 letter difference). It took about 4 days of reading docs before I even could begin any recovery.

      fucking up your filesystem is easy, and you generally have yourself to blame (in my experience).

    4. Re:Ext3 vs ReiserFS by JLester · · Score: 3, Interesting

      The more recent ReiserFS is much more stable. We run it on a RAID5 system that serves e-mail for about 7000 users. We use Maildir format mailboxes and ReiserFS is supposed to be really fast for small files. We converted from a dual-550 to a dual-1.26 system and went from ext2 to ReiserFS and noticed a huge speed increase in opening large mailboxes.

      Jason

      --
      "FORMAT C:" - Kills bugs dead!
    5. Re:Ext3 vs ReiserFS by joshki · · Score: 3, Insightful
      My personal experience with XFS has been horrible -- I've had several partitions completely trashed.

      Quoting from the Gentoo x86 install guide:

      "Please be careful with XFS; this filesystem has a tendency to fry lots of data if the system crashes or you lose power. Originally, it seemed like a promising filesystem but it now appears that this tendency to lose data is a major achilles' heel."

      I would not recommend XFS until some major work has been done.

      --
      I do not read or respond to AC's. If you want a discussion, log in. Otherwise, don't waste your time.
    6. Re:Ext3 vs ReiserFS by schotty · · Score: 1

      I second his motion. Twice now I have been burnt by Reiser. Granted the first time can be very easily attributed to the infancy of the FS itself, but the last time (about a year ago) was inexcusable. Ext on the other hand, has been very nice to work woth, once in a blue moon repair, and upgrade. I was able to migrate my Ext2 partition to Ext3 VERY easily. I didnt have to worry about the integrity of the data. Considering that I would literally have to WORK at not finding a tool to assist me in a crisis with Ext -- I feel at this point its a no brainer. Now, I would like to ask this one qwuestion however : RedHat pumped an assload of time and money into ext3. Has anyone done the same for Reiser? I realize that Mandrake, Slackware and SuSE have included it as an FS option for quite some time (whereas RedHat has not).

      --
      Sigs are nice guns ...
  8. High uptime? by mnordstr · · Score: 2, Insightful

    "when the uptime gets high, 20-50 days"

    That's not high uptime! Maybe if you're running Windows 95. I've had my system running for a little over 320 days now, and I haven't experienced any problems on any of my ext3 drives. And I've never before experienced any problems, on ext3/2 HDs. If you want reliability, I think the best thing you can do is buy a UPS. That makes it much more reliable than any FS change can do.

    1. Re:High uptime? by Anonymous Coward · · Score: 1, Funny

      When you say "320 days", do you mean "320 days since the last reboot", or "320 days since the last crash, ignoring normal maintenance restarts (e.g. for security patches)"? If the former, what is the host's IP address? I need another zombie for my DDoS network.







      (For the sarcasm impaired, I'm just kidding about that DDoS thing. My zombie network has nothing to do with denial of service attacks. Instead, I will use it to take over the world! Mwhahahahaha! Phear my elite haxor skillz, luddite-mortals!)

    2. Re:High uptime? by MrResistor · · Score: 2

      I used to get 20-50 days uptime regularly with Win98SE, back when I let it run 24/7. That was before I started downloading patches and such for it, though, which seems to have made it much less stable. I wonder if they do that on purpose, so people will be encouraged to upgrade for increased stability?

      --
      Under capitalism man exploits man. Under communism it's the other way around.
  9. Reiserfs keeps saving my ass by rise · · Score: 1

    It's anecdotal evidence only, but I swear by Reiser these days. My Dell Inspiron 7500 has developed a problem that causes it to hang on return ing from suspend a couple of times a week and I haven't lost data once. At this point I've probably had upwards of one hundred hangs and crashes because of the flakey hardware and Reiserfs has saved my data every time. It can't do anything to protect data that hasn't hit disk yet, but once it's there it's pretty good about keeping it intact. After living through crashes that left Ext2 filesystems chiseled spam it's a good feeling to be blase about such things.

  10. Re:There's more (thanks for crediting me) by Per+Wigren · · Score: 5, Informative

    Yes, this is a known troll but I still want to comment on this particular line:

    On other unices, crashes usually are caused by external sources like power outages. Crashes in Linux are a regular thing, and nobody seems to know what causes them, internally. Linux advocates try to hide this fact by denying crashes ever happen. Instead, they have frequent "hardware problems".

    Crashes in Linux are NOT a regular thing, unless you want to be extremly bleeding edge and/or use NVidia's drivers and/or ALSA (at least up to 0.90rc5) on 2.4 with lowlatency- and preemptive-patches. Especially if the above stuff are used on SMP-systems.

    My system used to crash (freeze) frequently (every 2nd or 3rd day).. But after I sold my GeForce4-card and got a Matrox G450 instead, and switched back to using OSS instead of ALSA (I've got a SB Live..), I've not had a single crash! It has been running for several months without a single reboot, and everything is super-stable! I've used it heavily every day, burnt more than 150 CD-Rs, been on Direct Connect and Freenet 24/7 etc.. That's despite I run the heavily patched 2.4.19-gentoo-r8 kernel, and my whole system (including the kernel) is compiled with gcc 3.2 "-march=athlon-mp -O3 -mfpmath=sse -pipe"..

    So my conclusion is: Linux IS stable! Extremly stable! The cause of 99% of the "linux crashes!"-bullshit is because of NVidia's crap-drivers (fast but unstable) and drivers still not "preemtive"-safe (ALSA on SMP for example).. But those things are not used on servers anyway.

    And about the "hardware problems": Yes, you DO get hardware problems MUCH MUCH more often on cheap PCs than on multi-million-dollar Unix-servers from Sun/HP/IBM.. Cheap PCs uses the cheapest-of-the-cheapest variant of all components to cut down the price. Expensive Unix-servers use expensive components and have a lot of redundancy, so you don't have to have downtime just because a CPU, a harddisk som RAM or something else failed.

    --
    My other account has a 3-digit UID.
  11. No problems with Reiser by inkfox · · Score: 5, Informative
    I'm afraid you won't get much more than anecdotal evidence on this question. Here's mine.

    We had a bad network adapter which would fail when other DMA devices were busy. This meant that whenever disk I/O was heavy, using the network adapter was likely to cause a complete system lockup. This took a while to diagnose as the problems took upward of two weeks to reproduce.

    Despite the equivalent of having the power cable yanked randomly a dozen times when the machine was at its busiest, we never had a single problem with Reiser. The file which was being written to existed as the old version, and there wasn't even a lengthy fsck. Integrity was 100%.

    --
    Says the RIAA: When you EQ, you're stealing bass!
    1. Re:No problems with Reiser by lewiscr · · Score: 1

      I'm having a similiar problem. What kind of symptoms did you see? I get a blank screen, and the machine requires a power cycle.

      What kind of NIC was it?

      How did you diagnose? Just start some massive dd's and ftp's?

      I was getting ready to buy a new motherboard, because I thought the IDE chipset was going bad (had a power supply overheat, so it's possible but unlikely.) The NIC is much more likely.

      Any info is appreciated!

    2. Re:No problems with Reiser by inkfox · · Score: 2
      RealTek 8139.

      The RealTek DMA system is a joke. Tiny failures aren't tolerated, nor do the two popular drivers handle buffer overfills -- a bad network cable is enough to destabilize a system.

      Still, it was a hard lock, no screen blanking. (Or are you sure it hasn't locked up while the console screen blanker was active?)

      Grabbing a DEC Tulip-based card will increase performance and stability both, and you can get one for $5-15 if you have a decent shop near you.

      --
      Says the RIAA: When you EQ, you're stealing bass!
    3. Re:No problems with Reiser by lewiscr · · Score: 1

      Still, it was a hard lock, no screen blanking. (Or are you sure it hasn't locked up while the console screen blanker was active?)

      Most of the lockups are while the blanker is active (the machine sits in a closet.) The one time I saw it lockup, it hung leaving the boot up fsck progess indicator on the screen. No errors were displayed.

      The machine has 2 different 3Com cards. But as I think back, I did see the machine lock up with both cards out. That leaves the Video card, motherboard, RAM or CPU. Guess I swap out the RAM, since that's cheap-n-easy.

  12. Do not trust your fs by jsse · · Score: 5, Insightful

    trust your backup.

    Two cents from an old admin.

  13. If ext2 is flaky, journaling will be worse by shoppa · · Score: 2
    You say that when you fsck your ext2 filesystems, you get errors. I'm assuming that you're only running fsck on dismounted partitions here.

    This indicates, to me, some hardware flakiness on your end. (Even though you say this happens on a wide variety of hardware.) In every account I've seen, journaling filesystems are more stressful on the hardware because - surprise! - the journal is constantly being written to. I'd stick to ext2 if I were you, and figure out why you get any errors when you fsck a dismounted file system.

    I'm in charge of roughly forty Linux boxes, including many desktops and many servers. I've never seen any problems that I could blame on the filesystem. (Though there have been kernel releases in the past - including one in the 2.4.x series, IIRC - where there was a bad filesystem bug, fixed within a day.)

  14. Reseating eliminates contact cruft. by Futurepower(R) · · Score: 2


    This is correct. Actually, however, suspect that your hardware has developed a bad connection first. Many problems are corrected by pulling every adapter and cable out about 1 millimeter, then pushing it in again. That wipes the contacts clean of oxide.

  15. EXT3 by haplo21112 · · Score: 2

    I have been using ext3 since around 2.4.13...and it has not given me one problem yet and has in fact saved my ass serveral times. I have frequent power failures at my house, but ext3 recovers gracefully every single time....

    --
    Power Corrupts,Absolute Power Corrupts Absolutely, leaving one person(group)in charge is absolutely corrupt.
  16. Re:There's more (thanks for crediting me) by Spoing · · Score: 2
    So my conclusion is: Linux IS stable! Extremly stable! The cause of 99% of the "linux crashes!"-bullshit is because of NVidia's crap-drivers (fast but unstable) and drivers still not "preemtive"-safe (ALSA on SMP for example).. But those things are not used on servers anyway.

    I agree. The only time I've hung my systems

    I can't speak for ALSA drives being bad -- when I used them they seemed to be fine -- but Nvidia's do cause regular hangs for me on an AMD Athlon system (Chipset: "VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]"). Even adding mem=nopentium to the boot line and using Nvdia's latest drivers this system hangs on a regular basis.

    The type of instability I get with Nvidia's drivers reminds me of the odd crashes I used to get when I used Windows. For what it's worth.

    --
    A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
  17. Re:There's more (thanks for crediting me) by Covener · · Score: 1

    Crashes in Linux are NOT a regular thing, unless you want to be extremly bleeding edge and/or use NVidia's drivers and/or ALSA (at least up to 0.90rc5) on 2.4 with lowlatency- and preemptive-patches. Especially if the above stuff are used on SMP-systems.

    Don't you think you're projecting your anecdotal experience just a little much?

  18. Re:There's more (thanks for crediting me) by Paladin128 · · Score: 2

    I've been running NVIDIA's drivers on my desktop machine for a year now... only crashed I've experienced were when I experimented with the 2.5 kernel, or development versions of KDE (which in turn just crash or sometimes hang X, which can be fixed by sshing in from anothre box). My updtime is generally 2-3 months (I don't have a UPS).

    --
    Lex orandi, lex credendi.
  19. Re:There's more (thanks for crediting me) by Unknown+Lamer · · Score: 2

    Wierd, I use the preempt, low latency, and ALSA 0.9.0rc[forgot, I think four] on my 2.4.19 kernel. My uptime is now 87 days, that doesn't seem unstable at all.

    --

    HAL 7000, fewer features than the HAL 9000, but just as homicidal!
  20. ext3 on woody by Anonymous Coward · · Score: 1, Informative

    I've been running Debian3.0 with ext3 on a few machines here. None of them have extremely high loads, but I haven't had any problems. For me it seems the biggest advantage of ext3 is that if something happens you can always use something without ext3 support and mount it as ext2.

    1. Re:ext3 on woody by Anonymous Coward · · Score: 0

      I'm using ext3 on Slackware 8.1 for our database server in a RAID5 configuration. I have not seen any problems with it. Of course, I just started using it the past month so I can't give a real hard opinion, but as far as I know, I've not received any problems related to the filesystem itself.

      I just hope it keeps on chugging. :D

  21. Well, ext3 has worked fine for me... by Anonymous Coward · · Score: 0

    I ha d aproblem a while ago when the UPS batteries failed. The first time we noticed was when, a few minutes into an extended power failure, I decided to take the servers down and noticed the main file server had already dropped. Oops.

    Fortunately, it was running ext3 and there were no problems.

    So after things came back, I decided to do a more controlled test to see how much life the batteries had left. I got a stopwatch, pulled the plug and waited for low battery. Plug out, wait a few seconds... low battery, boom!

    Something was definitely fucked. Urgent battery replacement needed! I really should have scheduled that test for after-hours...

    It happened once more (Florida has a *lot* of lightning strikes at the right time of year) before the battery arrived.

    Fortunately, in all these cases, ext3 kept the file system together. Nice, because running fsck over a large RAID array after a business-hours unscheduled outage while everyone is breathing down your neck is not a pleasant wait.

    Note that if you run fsck -n on a live file system, you WILL see errors. But the "inconsistency" is just the file system changing while fsck is running. It needs to be at least queiscent to get a real idea. (And even then, deleted temp files held open will show up.)

  22. Do not trust your backup... by V.+Mole · · Score: 5, Insightful

    Test your backup. Not just once, but periodically.

    Two cents from a (different) old admin.

    1. Re:Do not trust your backup... by SpaceLifeForm · · Score: 2

      Backup twice.
      Two cents from (another) old admin.

      --
      You are being MICROattacked, from various angles, in a SOFT manner.
    2. Re:Do not trust your backup... by Chemical · · Score: 1
      Don't trust your offsite data storage company. We use Iron Mountain and they have let us down consistantly. We recall a contain for emergency delivery (2 hours max) and get charged $200, and they still don't show up for 6 hours after numerous calls, because the assholes can't find the friggin container. They apparently don't understand what "emergency" means. And to top it all off, they still charged us $200 which we had to fight to get refunded.

      My advice, keep all the tapes burried under your house. It's still safer than Iron Mountain.

  23. Been paranoid, but its okay so far by teqo · · Score: 1

    I have been paranoid about moving away from ext2 in production environment myself, so I have been trying out ext3 and reiserfs on personal boxes for the last eigth months. No real problem so far (once I still had a fs error on reiserfs after replaying the log, a fsck fixed that). These are no hard hit or database servers, but I tend to copy a lot of files and data (several gigs) simultaneously between reiserfs file systems and mix up my cd-rom eject button with the reset button from time to time, and on the ext3 box I tried hacking some usb driver which made it crash every ten minutes over three weeks...

    Being concerned for reliability, you might look into the ext3 options to have ordered writes etc., or otherwise your meta data might be fine, although your actual file data might get screwed... The kernel mailing list should hold hints on stability (the more absent complaints are, the more stable it is ;), and as said before, in the end you have your backups... Don't you?

    I still wonder about meta journaling on database servers with huge data files, and the next thing I am paranoid about is using LVM on Linux :)

  24. ext3 gets my vote by photon317 · · Score: 2


    I've only tried reiserfs and xfs for a few days each, for the most part I've stuck to ext3 in recent days. I've hard-crashed (pull the plug type of thing) several different machines with ext3 while filesystem write activity was going on and never had a problem. Based on my time with ext3, my limited experience with reiserfs/xfs, and reading lots of lkml, I think ext3 is the safest choice at this point in time, even if it's not neccesarily the best performance.

    --
    11*43+456^2
  25. ext3 is simple to install or uninstall. I promise. by Cecil · · Score: 3, Informative

    Aside from everybody telling you "that shouldn't happen, you're doing something wrong", which is probably true, I just wanted to chime in with my support of ext3. I think you're making a mountain out of a molehill.

    You obviously haven't looked very closely into ext3, because it's an extremely simple layer on top of a standard ext2 filesystem. Essentially, all it is, is an extra file in /, a daemon to do journalling, and a bit or two toggled on the disk itself.

    the FAQ has one question that lists the two steps required to install a journal on a stock ext2 filesystem (provided you've got a 2.4.16+ kernel, or have patched your older kernel).

    Not only is it very simple to install, but it's very simple to uninstall too. Blindingly easy, in fact. Mount your filesystem as ext2. Done. No journal. If you want to do it permanently, there's an answer about that in the FAQ too.

    So really, you have nothing to lose by trying ext3. I've had 0 problems with it, and I use it on a laptop that gets a lot of abuse WRT being turned off at random times (I can't view my battery level in Linux, but I can in Windows. Thanks broken ACPI BIOS...)

    The only downside is that the filesystem will sync every 5 seconds or so, which completely destroys any possibility of ever letting the disks spin down for power saving, but that's more of a laptop issue than a server issue.

  26. Re:There's more (thanks for crediting me) by orangesquid · · Score: 2, Interesting

    " This is interesting, considering that the DOS heritage in the Windows 9x/ME series was considered a very bad thing by the Linux community, even though it provided what could be called one of the best examples of compatibility, ever. "

    IBM's Mainframe line of computers kicks WinDOS ass. You can run binaries compiled on slow, clunky 1960's System-360 refrigerators on modern multiprocessing, fault-tolerant, redundant zSeries systems. I can't even run my favorite DOS 5.0 apps under DOS 6.0, least of all under Windows 3.1 or Windows 95. My PC, when it was a DOS machine, had DOS 3.0, DOS 5.0, DOS 6.2, Windows 3.1, and Windows 95. Lots of rebooting to use all my old apps, unless I wanted mysterious crashes and freezes.

    Linux can still run QMAGIC executables compiled against BSD libc4 on a modern ELF/glibc2.3 system by turning on a kernel option and copying a few .so files.

    My only complaint about Linux compatibility, actually, is just the idiots careless programmers who change the API of their library without changing the major revision number. (*cough QT cough*)

    --
    --TheOrangeSquid Is it any wonder things seem so awry? We swim in a sea of confusion and don't have to think to survive
  27. Reiser works for me by MrResistor · · Score: 2

    I've been using reiserfs exclusively since SuSE 7.1, and it's been great. I haven't had a single problem, even during power outages and such (no, I don't have a UPS).

    That's about as annecdotal as it gets!

    Anyway, I'm not going to recomend reiser over the others since I don't have any experience with them, but I will say that I've developed great confidence in reiser's reliability. If I had any old data that I really cared about and wanted to use the same drive, though, I would probably go with Ext3 for the non-destructive (or so I've heard) upgrade.

    --
    Under capitalism man exploits man. Under communism it's the other way around.
  28. Re:There's more (thanks for crediting me) by ethereal · · Score: 1

    I'm pretty sure you're supposed to work in "charnel house" somewhere, so points off for that.

    Oh wait, that's the other troll. Sorry, I got you guys confused for a minute :)

    --

    Your right to not believe: Americans United for Separation of Church and

  29. Re:There's more (thanks for crediting me) by Per+Wigren · · Score: 2

    Maybe it's an SMP bug then, because I've only used ALSA on my dual athlon box... I've heard others who have had problems with ALSA being unstable on SMP-boxes also...

    --
    My other account has a 3-digit UID.
  30. Re:There's more (thanks for crediting me) by Anonymous Coward · · Score: 0

    just because the parent post is long, doesn't mean it's insightful, informative, or interesting

    FOR THE LOVE OF ALL THAT IS GOOD, PRAISEWORTHY, AND OF GOOD REPORT, MOD THE PARENT TROLL DOWN

  31. My (abnormal) experience with ext3 by Mulligan · · Score: 1

    I know this is not common, but I had a bad experience with ext3 a few months back that resulted in the first ever time I have had a complete catastrophic data loss without hardware failure.

    I'll start with the observed events (the stuff I know that happened):

    1. System crash. Probably one of those that happens when my laptop's flaky power management hardware picks a fight with the kernel
    2. First reboot. fsck runs and claims to be fixing some broken stuff. The box doesn't make it through the init scripts before hitting a busted program
    3. Second reboot. fsck runs and claims to be fixing some broken stuff. The box makes it less far through the init scripts before hitting another (newly) busted program
    4. Third reboot. Kernel cannot find init
    5. Boot from floppy and manually run fsck on the disk. (In retrospect, this was probably not the best of choices.)
    6. Fourth reboot. lilo cannot find the kernel. (Or more appropriately, the kernel cannot make it through its own init.)
    7. Rescue disk, take two. No surviving superblocks to be found on the disk.

    The following is my hypothesis as to what was happening at the low level to cause this series of problems. (Note, however, that my knowledge of the ext3 internals is sketchy, so the following is probably somewhere between slightly mistaken and out to pasture in left field.)

    1. System crashes. Data corruption happens. Probably both in the journal file and the superblocks/inodes referring to the journal. Possibly, the magic ext3 bit indicating an active journal on disk is not set properly.
    2. Aparently, fsck.ext2 was running, trying to fix broken file structure, but ended up making changes to the journal itself.
    3. Some brilliant (buggy) piece of software on my poor machine decides to replay the "fixed" journal. Since this is no longer the carefully constructed entry we expect (see #2), this is effectively spitting random information around my hard drive.
    4. I observe still busted machine and think that I should give the restore process another chance. I reboot, effectively returning to step 2.
    5. There is no step 5. The partition has effectively ceased to exist.
    6. Note that I cannot confirm that steps 2 and 3 took place in that order. My guess is that if the journal damage was bad enough, step 2 would not even be necessary (though it was clearly happening). I can only assume that the progressive nature of the failure resulted from data from journal replays between each step #2

    My guess is that some buggy/un-updated version of some piece of software was likely to blame. (I was quite a bit behind on my updates.) However, my hypothesis leads me to believe that storing the journal as a file on the journaled filesystem itself is a bad design decision that probably contributed to the extent of data loss on my system.

  32. Hmmmm... by BrokenHalo · · Score: 1

    Yes, I looked at a lot of benchmark studies a few months ago when I was overhauling my systems.

    In the end i figured that I would stay with ext2 for the time being.

    I'm not simply being reactionary here. Being an old-timer (since late '70s) as a sysadmin and sysprog, I have always placed a high value on good backups carried out rigorously and systematically, e.g. grandfather/father/son in daily, weekly, monthly etc cycles as required by the data turnover, even for desktop boxen.

    I've seen all too many people (often very technically savvy) lose important stuff by ignoring backups and trusting journalling systems.

    With my setup (on the basis of the benchmarks I read) ext2 outperforms any of the journalling systems, and I'll live with the risk of an occasional fsck on bootup.

  33. Journalling Filesystems Under Linux by charon.de · · Score: 1

    Hi,

    I've been using farious Linux FS on multiple production servers, ext2/reiserfs, no problems at all. Today, I'm using ext3, quota support is better and you still can use chattr with ext3;)

    Anyway, you should run RAID1 or 5, if possible with a hw RAID controller, hotswapable disks and perform regular backups, if you are serious about your data.

    If anyone has some info, why ext3 is in most benchmarks faster then ext2, I would like to hear.

  34. Re:There's more (thanks for crediting me) by chunkwhite86 · · Score: 1

    FYI I'm running Athlon SMP (tyan tiger mp) with 2.4.19, preemptive patch, NVidia drivers, and haven't had a single problem. I use my machine heavily for gaming, web/email, 3D rendering, DIVX movie encoding, and i have two folding@home threads running in the background the whole time. I guess it's a YMMV thing.

    --
    I'd rather be a conservative nutjob than a liberal with no nuts and no job.
  35. Anectdotal evidence. by Anonymous Coward · · Score: 0

    With EXT3, everything was good till I moved the drive to my $400 duron system with all sorts of issues involving AGP bus signal inconsistencies and the NVidia drivers. It crashed once a week, then once a month after some tweaking, and now it's mostly stable except mplayer does a system reboot 45 minutes into most of my DVD's. (Remember: $400 system.)

    (In response to any comments about Linux being unstable, I have actually isolated these crashes as being hardware problems, and the Win2k drivers require the same workarounds that I've had to use on my system.)

    OK, so it crashes when I play DVD's. Fine. Ext3 recovers the journal, who cares? It's a desktop. It was really cheap. While it boots, I can watch the DVD on my Shinsonic.

    Well, Debian 3 decided to force a real FS check after 170 days of this, and found all sorts of errors on (only!) one of its partitions that the journal playback had missed. There was no data corruption that I could tell, but it's still makes me a bit nervous, since I've been running an inconsistent FS for possibly months.

    I've used Reiser on other hardware, and my big beef was that in all of the setups Iive had, after downloading a 5MB+ file the entire system would hang for a few seconds while the journal flushed out to disk.

    Again, I didn't see any file corruption on those systems either.

    For the production systems I deal with at work, I've stuck with ext2, mostly because of performance concerns with the journaling systems. (We deal with very large files, and system responsiveness and file throughput are both very imporant to us) Hopefully, some of the performance fixes in 2.6 will fix these problems.

    I think there's a fundamental problem with the journalling systems though. If you get a flakey disk controller, you might not catch the problem as soon if you're not running real checks fairly often. By definition, journalling systems are skipping some of these full fs checks. On the other hand, you're saving lots of downtime. The ability to have it fsck a read write mounted file system (maybe as a cron job) would be very nice.

  36. Re:ext3 is simple to install or uninstall. I promi by Omniscient+Ferret · · Score: 1
    The only downside is that the filesystem will sync every 5 seconds or so, which completely destroys any possibility of ever letting the disks spin down for power saving, but that's more of a laptop issue than a server issue.

    If you mount it with the noatime option, then it won't constantly rewrite the last access time for files; this means that your disks can spin down again. It's worked for me.