Slashdot Mirror


Disk Failure Rates More Myth Than Metric

Lucas123 writes "Using mean time between failure rates suggest that disks can last from 1 million to 1.5 million hours, or 114 to 170 years, but study after study shows that those metrics are inaccurate for determining hard drive life. One study found that some disk drive replacement rates were greater than one in 10. This is nearly 15 times what vendors claim, and all of these studies show failure rates grow steadily with the age of the hardware. One former EMC employee turned consultant said, 'I don't think [disk array manufacturers are] going to be forthright with giving people that data because it would reduce the opportunity for them to add value by 'interpreting' the numbers.'"

8 of 283 comments (clear)

  1. There are only two kind of peeps... by **loki969** · · Score: 5, Insightful

    ...those that make backups and those that never had a hard drive fail.

    1. Re:There are only two kind of peeps... by Raineer · · Score: 5, Insightful

      I see it the other way... Once I start taking backups my HDD's never fail, it's when I forget that they crash.

  2. Marketplace can't function without good data by dpbsmith · · Score: 5, Insightful

    If everyone knows how much a disk drive costs, and nobody can find out how long a disk drive really will last, there is no way the marketplace can reward the vendors of durable and reliable products.

    The inevitable result is a race to the bottom. Buyers will reason they might was well buy cheap, because they at least know they're saving money, rather then paying for quality and likely not getting it.

  3. Re:Never had a drive fail by Anonymous Coward · · Score: 5, Funny

    Wait. You've got a huge Wang, and you're throwing it out? D00d, that's just uncool. Give it to someone else at least. It would be fun to ask people "wanna come see my huge Wang?" just to see their reaction! :)

    hah. captcha word: largest

  4. What MTBF is for. by sakusha · · Score: 5, Insightful

    I remember back in the mid 1980s when I received a service management manual from DEC, it had some information that really opened my eyes about what MTBF was really intended for. It had a calculation (I have long since forgotten the details) that allowed you to estimate how many service spares you would need to keep in stock to service any installed base of hardware, based on MTBF. This was intended for internal use in calculating spares inventory level for DEC service agents. High MTBF products needed fewer replacement parts in inventory, low MTBF parts needed lots of parts in stock. Presumably internal MTBF ratings were more accurate than those released to end users.

    So anyway.. MTBF is not intended as an indicator of a specific unit's reliability. It is a statistical measurement to calculate how many spares are needed to keep a large population of machines working. It cannot be applied to a single unit in the way it can be applied to a large population of units.

    Perhaps the classical example is about the old tube-based computers like ENIAC, if a single tube has an MTBF of 1 year, but the computer has 10,000 tubes, you'd be changing tubes (on average) more than once an hour, you'd rarely even get an hour of uptime. (I hope I got that calculation vaguely correct)

  5. Re:Failure rates ! warranty period. by ABasketOfPups · · Score: 5, Informative

    Warranty periods for 750 gig and 1 terabyte drives from Western Digital, Samsung, and Hitachi, are 3 years to 5 years according to the info on zipzoomfly.com.

    A one year warranty doesn't seem that common. External drives seem to have one year warranties, but even SATA drives at Best Buy mostly have 3 years

  6. Re:Never had a drive fail by serviscope_minor · · Score: 5, Funny

    I'm about to lug a huge Wang hard drive out to the trash pickup on Monday - weighs over 100 pounds... still runs. Actually it uses removable platters but still...

    <Indiana Jones> IT BELONGS IN A MUSEUM!</Indiana Jones>

    --
    SJW n. One who posts facts.
  7. Re:Temperature is the key by ABasketOfPups · · Score: 5, Interesting

    Google says that's just not what they've seen. "The figure shows that failures do not increase when the average temperature increases. In fact, there is a clear trend showing that lower temperatures are associated with higher failure rates. Only at the very high temperatures is there a slight reversal of this trend."

    On the graph it's clear that 30-35C is best at three years. But up until then, 35-40C has lower failure rates, and both have lower rates by a lot than the 15-30C range.