Slashdot Mirror


Calculating the Mean Time Between Failures?

Blue Booger asks: "I was looking over some fibrechannel hard drives and noticed that the Mean Time Between Failures was rated at 1.2 million hours. I thought that was pretty high, and figured it up to be close to 137 YEARS!! I went to check some regular IDE drives just for comparison, and they were rated at 500,000 hours (57 years). Now, as I understand it, this is supposed to be the average time that you can expect the drive to last before failures. I rarely have an IDE drive last more than 4 years, and my record is 10 years, so what is the deal? BTW, that is 57 years running 24 hours a day...the MTBF is rated as power on time. Here you can find Western Digital's glossary that defines the term MTBF (pdf). Here you can find a spec sheet on one of their 20GB IDE drives. I checked, and Seagate also lists similar MTBFs. How the heck are they coming up with these numbers?"

16 of 100 comments (clear)

  1. Duty Cycle by m0rph3us0 · · Score: 3, Informative

    Usually they have a duty cycle associated with an MTBF which can drastically alter the MTBF at a 100% duty cycle.

  2. not just drives... by ryanmoffett · · Score: 4, Interesting

    Cisco used to sell Catalyst 3548XL switches that were listed as having a MTBF of 120,000+ hours. Their current replacement for that line (3550)comes in at 163,000+ hours. We had 7 of 24 3548XL switches fail in the first year we had them. They had poor air flow from a tiny fan, no heatsinks and tons of hot chips. The newer model has the same issue, though they did stuff a cheap foam baffle in the case to get air to flow closer to the chips, none of which have heatsinks. I have no idea how they tested them and got a MTBF of 13 years.

  3. Simple, it's called "lies" by Anonymous Coward · · Score: 5, Funny
    Sure, the test engineers sit and rub their chins and write numbers on paper and do stupid tests in the lab, but in the end it comes down to this:
    • WD Guy 1 Hey, what's the MTBF for our new drive?
    • WD Guy 2 Dunno, what's Maxtor saying?
    • WD Guy 1 sez here "300,000" hours
    • WD Guy 2 okay, ours is 500,000 then
    • WD Guy 1 I smell a NEW VICE PRESIDENT
  4. You are wrong by Mensa+Babe · · Score: 4, Informative

    I rarely have an IDE drive last more than 4 years, and my record is 10 years, so what is the deal?

    If you have twenty drives with twenty years MTFB (Mean Time Between Failures) each, then you have one failure per year on average. These are basic statistics fighting always against you.

    --
    Karma: Positive (probably because of superiour intellect)
  5. Here's a wild-ass guess by HotNeedleOfInquiry · · Score: 4, Insightful
    First they specify a sample period, perhaps a year. Then they multiply the number of units shipped during that time times the estimated hours per year that the drives are run then divide it by the number of units returned due to failure

    For example, shipped 2 million drives last year, each ran 2080 hours ( 8 hours * 52 weeks), roughly 4 trillion hours total. Out of those 2 million units, they got 3466 returns. So the average MTBF was 1.2 million hours.

    --
    "Eve of Destruction", it's not just for old hippies anymore...
    1. Re:Here's a wild-ass guess by elmegil · · Score: 3, Interesting

      A lot of hardware vendors actually test before they ship. But aside from that your basic math is about right. A controlled number of units is tested (possibly in stressed environments) and used to build the statistics that say what the expected MTBF should be.

      --
      7 November 2006: The day Americans realized corruption and incompetence weren't addressing 11 September 2001
  6. Look at the definition by aaarrrgggh · · Score: 3, Interesting

    If they run 500 drives for 2,000 hours and observe only one failure, that is a MTBF of 500,000 hours.

    Unfortunately, that equation doesn't take into account the fact that some equipment degrades over time; if a product is very reliable for 1,000 hours, and less reliable after that, just double the sample size (maybe triple for statistics), and see what you get.

    Real reliability calculations are much more difficult than just what users think MTBF means...

  7. Labs by MazTaim · · Score: 4, Insightful

    That's the key word.

    MTBF is probably determined by taking a bunch of drives, putting them into PERFECT conditions that NEVER exist in the real world. Run them in a way that, although test all functionality, really doesn't provide true conditions for drives (IE head always reading/writing up and down the disk probably never seeking, disks always spinning, etc..). Something that drives never do in real life. Statistics...statistics...statistics...(speeling too :)

  8. Marketing BS... by Alomex · · Score: 3, Insightful

    Anybody who has a large number of drives running knows that the figures have become meaningless over time. They use to predict to the T the expected time of failure. They are now a marketing term assuming "a duty cycle" and computed by an absurd "units x time to failure". Using that system, the MTBF of the Honda Civic engine is 100,000 years as there are 1 million Civic's out there and none of them had their engine seize up in the first month.

    Somebody ought to sue them for deceptive advertisement.

  9. No, you are wrong by anthony_dipierro · · Score: 3, Insightful

    Actually, you are wrong... If you have one drive fail per year for 20 years, then the mean time between failures is 10.5 years.

  10. Re:As a sidenote by NickDngr · · Score: 5, Funny

    DISCLAIMER: The views expressed hereafter are not necessarily those of MENSA, which I am only a member of.

    Shouldn't that be "The views expressed hereafter are not necessarily those of MENSA, of which I am only a member." I would think proper grammar usage would be a prerequisite for being a MENSA member.

    --
    Yoda of Borg am I! Assimilated shall you be! Futile resistance is, hmm?
  11. MTBF... by m0rph3us0 · · Score: 3, Insightful

    The best way to determine *REAL* MTBF is how long the drives are warrantied for, no one warranties a product longer than it is supposed to last. When you see a company reduce it's warranty expect quality to drop in accordance.

  12. MTBF calculation and estimation by crmartin · · Score: 4, Informative

    You know, it's almost a shame to screw up the amusing notions /.ers come up with by adding actual information, but I can't help it, all those years of teaching I guess.

    Okay, first of all: "mean time between failures" is obviously a statistical measure -- it is an average over a large number of individual items. In most electronic components (including light bulbs!) the statistical distribution of the time between failures is the exponential distribution, which has the odd property that it's "memory-less" -- it doesn't matter how long since the last failure it's been, the mean time to the next failure will still be the same. A consequence of this is that if the MTBF is 10,000 hours, the probability of failure in any particular hour would be 1/10,000th. So, if you set up 10,000 components, all running simultaneously, you'd expect one of them to fail within the first hour; conversely, if you ran them for 1000 hours, and 998 of them failed, you could be fairly certain that the MTBF would be around 10,000 hours.

    Note, by the way, that this is only true when the failure time distribution is exponential -- so it works for electronic components, but not for, say, bicycles and cars and roller skates, which are more likely to fail the older they get.

    This has an obvious problem, of course: if the MTBF is high, it can take forever to test. Consider, for example, something I worked on for NASA some years ago: trying to prove that a fly-by-wire system will have a mean time between failures of 1e10 hours. (This is about the same failure rte as the airframe, which is how they came up with the number.) 1e10 hours is about 1.141 million years, by the way.

    (Pop quiz: if MTBF is a million years, how do you explain the occasional airframe failure, say, eg TWA 800? Hint: It doesn't require any foul play.)

    At that point, you've got a couple of choices: first, you can make a lot of copies and run them simultaneously. Relatively easy for $50 disks, hard for billion dollar 747s.

    Second, you can make the estimate by computation and modeling which is what you do for web systems. Conceptually, it's pretty simple to do this, although it can be a kind of pain in the ass.

    The third way, which is new and cool, is by Bayesian estimation of failure rates. This method lets you make increasingly accurate estimates of the failure rate based on short experiments. I don't have time to go into it, but there are some good sources available on the web.

  13. Re:As a sidenote by The+Clockwork+Troll · · Score: 4, Funny
    I would think proper grammar usage would be a prerequisite for being a MENSA member.
    <input type="radio" name="gift" value="IQ" disabled>
    <input type="radio" name="gift" value="money" disabled>
    <input type="radio" name="gift" value="penis size" disabled>
    <input type="radio" name="gift" value="ability to nitpick trivia" checked>
    --

    There are no karma whores, only moderation johns
  14. Why is hardware always mean? by cookd · · Score: 4, Funny

    Whatever happened to *NICE* time between failure?

    --
    Time flies like an arrow. Fruit flies like a banana.
  15. It all depends on the distribution... by anthony_dipierro · · Score: 3, Informative

    I went to check some regular IDE drives just for comparison, and they were rated at 500,000 hours (57 years). Now, as I understand it, this is supposed to be the average time that you can expect the drive to last before failures. I rarely have an IDE drive last more than 4 years, and my record is 10 years, so what is the deal?

    Let's say I have a drive that has a 99% chance of failing after 10 years, and a 1% chance of failing after 4710 years. The MTBF is 57 years.

    In fact, with the proper distribution (think 2^n) you could have an infinite MTBF, but still have a 99% chance of failure within 10 years. See for example the St. Petersburg paradox.