Disk Drive Failures 15 Times What Vendors Say
jcatcw writes "A Carnegie Mellon University study indicates that customers are replacing disk drives more frequently than vendor estimates of mean time to failure (MTTF) would require.. The study examined large production systems, including high-performance computing sites and Internet services sites running SCSI, FC and SATA drives. The data sheets for the drives indicated MTTF between 1 and 1.5 million hours. That should mean annual failure rates of 0.88%, annual replacement rates were between 2% and 4%. The study also shows no evidence that Fibre Channel drives are any more reliable than SATA drives."
I have had 3 personal use hard drives go bad in the last 5 years, they were either Maxtor or Wester Digital. I am not hard on the drives other than leaving them on 24/7. The drives that failed were all just for data backup and I put them in big, well ventilated boxes. With this use I would think the drives would last for years (at least 5 years), but nope! The drives did not arrive broken either, they all functioned great for 1-2 years before dying. The quality of consumer hard drives nowadays is way, WAY low, and the manufacturers should do something about it.
I don't consider myself a fluke because I know quite a few other people who have had similar problems. What's the deal?
Also, does anyone else find this quote interesting?:
"and may have failed for any reason, such as a harsh environment at the customer site and intensive, random read/write operations that cause premature wear to the mechanical components in the drive."
It's a f$#*ing hard drive! Jesus H Tapdancing Christ how can they call that premature wear, do they calculate the MTTF by just letting the drive sit idle and never reading and writing to it? That actually wouldn't suprise me.
Hey, there is only one Return and it's not of the King, it's of the Jedi.
"If they told me it was 100,000 hours, I'd still protect it the same way. If they told me if was 5 million hours I'd still protect it the same way. I have to assume every drive could fail."
Just common sense. It's "common sense," but not as useful as one might hope. What MTTF tells you is, within some expected margin of error, how much failure you should plan on in a statistically significant farm. So, for example, I know of an installation that has thousands of disks used for everything from root disks on relatively drop-in-replaceable compute servers to storage arrays. On the budgetary side, that installation wants to know how much replacement cost to expect per annum. On the admin side, that installation wants to be prepared with an appropriate number of redundant systems, and wants to be able to assert a failure probability for key systems. That is, if you have a raid array with 5 disks and one spare, then you want to know the probability that three disks will fail on it in the, let's say, 6 hour worst-case window before you can replace any of them. That probability is non-zero, and must be accounted for in your computation of anticipated downtime, along with every other unlikely, but possible event that you can account for.
When a vendor tells you to expect 1 0.2% failure rate, but it's really 2-4% that's a HUGE shift in the impact to your organization.
When you just have one or a handful of disks in your server at home, that's a very different situation from a datacenter full of systems with all kinds of disk needs.
Fibre Channel drives, like SCSI drives, are assumed to be "enterprise" drives and therefore better built than "consumer" SATA and PATA drives. It's nothing inherent to the interface, but a consequence of the environment in which that interface is expected to be used. At least, that's the idea.
Chernobyl 'not a wildlife haven' - BBC News
Redundant Array of Irritating Discussions?
This is handled in the paper. See this graph: http://www.usenix.org/events/fast07/tech/schroeder /schroeder_html/img14b.PNG
Unfortunately there is no big "spike"; the average replacement rate just grows and grows with time.
Not that this is actually relevant or anything, but there's been a long-standing schism between the computing community and the scientific community concerning the meaning of the SI prefixes Kilo, Mega, and Giga. Until computers showed up, Kilo, Mega, and Giga referred exclusively to multipliers of exactly 1,000, 1,000,000, and 1,000,000,000, respectively. Then, when computers showed up and people had to start speaking of large storage sizes, the computing guys overloaded the prefixes to mean powers of two which were "close enough." Thus, when one speaks of computer storage, Kilo, Mega, and Giga refer to 2**10, 2**20, and 2**30 bytes, respectively. Kilo, Mega, and Giga, when used in this way, are properly slang, but they've gained traction in the mainstream, causing confusion among members of differing disciplines.
As such, there has been a decree to give the powers of two their own SI prefix names. The following have been established:
These new prefixes are gaining traction in some circles. If you have a recent release of Linux handy, type /sbin/ifconfig and look at the RX and TX byte counts. It uses the new prefixes.
Schwab
Editor, A1-AAA AmeriCaptions