Slashdot Mirror


Putting Linux Reliability to the Test

Frank writes "This paper documents the test results and analysis of the Linux kernel and other core OS components, including everything from libraries and device drivers to file systems and networking, all under some fairly adverse conditions, and over lengthy durations. The IBM Linux Technology Center has just finished this comprehensive testing over a period of more than three months and shares the results of their LTP (Linux Test Project) testing."

6 of 296 comments (clear)

  1. Almost 1P, but I RTFAd :( by Anonymous Coward · · Score: 4, Interesting

    Anyone know if the test will be repeated with kernel 2.6.x?

  2. s/w -vs- h/w failure? by Quixote · · Score: 5, Interesting
    I skimmed over the article (heretic!), and was wondering: how do they distinguish between software failures (the purpose of the test) and hardware failures (for example, random bit errors in the memory that could be caused by higher temperatures due to the stress testing)?

    I seem to recall getting random crashes with cheapo memory, and it was a pain to track down the offending component. Of course, one would assume that IBM wouldn't go for cheapo components, but still: how does one point the finger at the software, instead of hardware? Is it just repeatability?

  3. Re:You don't trust Microsoft to evaluate Windows.. by davidstrauss · · Score: 5, Interesting
    Why shoudn't we trust this test?

    The people performing it have a vested financial interest in having it turn out a specific way, notably positive. If the test resulted showed poor reliability, then I would understand trusting it because it would go against the motives of the people performing it. Since the test affirms their business model, no matter how documented it is, it should be suspect.

    It doesn't appear to be a test rigged to make one platform look better than the other.

    It looks a bit skewed to me. Many of the test results depend on the computer systems meeting expectations of the people testing it, particularly in overload cases. Since the people who tested work in the Linux Technology Center, their expectations stand a greater likelyhood of being consistant with the system.

    Take C/C++ and Java. Someone who regularly works with C/C++ knows certain libraries (notably the character ones) return ints for status in the form 0 being false and not 0 being true. If someone expects that, the system meets expectations and passes. If someone comes from a different background, say Java, he or she may not expect that, and the system would consequently fail the test of meeting expectations. I would like an evaluation from somewhere in-between, not someone whose years of experience allow them to gloss over what might be problems for another person.

  4. Why? Here's why... by Crypto+Gnome · · Score: 5, Interesting
    • because the test methodologies are documented
    • because it's disclosed up-front that it's IBM Linux Team testing Linux (ie no hidden conflict of interest
    As opposed to the usual (ie in the Microsoft World)
    • ZDNet (and/or others) "testing" Microsoft Products (but only vaguely describing how things were configured)
    • Microsoft paying someone to "report" on the quality/performance of a Microsoft product, but the evaluation is worded in such a way as to convince the user that it's an independent review and the "funded by microsoft" fact is never mentioned anywhere in the evaluation
    --
    Visit CryptoGnome in his home.
  5. WHAT is the failure? by SharpFang · · Score: 4, Interesting

    95% success ratio... does that mean that 1 in 20 programs I run segfaults or what? What do they mean by "failure"? Not finishing given task in predefined time? Getting the results wrong? Hanging?

    Sorry but that means nothing. Even if there -was- a comparison to other systems, it would still mean nothing. 95% success ratio, 78% happiness factor and 93% user satisfaction.

    --
    45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
  6. My experience: Linux survives hard drive crash by crow · · Score: 4, Interesting

    I've been using an old P120 laptop as a firewall/router for my house for the past several years running 2.2.something. I wondered why it rebooted after noticing an uptime of only a day or two, but found that instead I was experiencing the uptime rollover bug (at about 500 days; Windows used to crash on a similar bug after 48 days). About a month ago, it stopped giving out DHCP addresses. I went downstairs to investigate, as I couldn't log in remotely, and found that the hard drive was making that nasty clicking sound. I eventually managed to ssh in (sshd and sh were in ram; I just waited for the logging to time out). I was able to kill syslog and cron, and now dhcp is again giving out addresses.

    It's been running just fine for a month now with a dead hard drive.

    (Yes, I'm getting a replacement because it won't survive an extended power outage on that ancient battery.)