Everything You Know About Disks Is Wrong
modapi writes "Google's wasn't the best storage paper at FAST '07. Another, more provocative paper looking at real-world results from 100,000 disk drives got the 'Best Paper' award. Bianca Schroeder, of CMU's Parallel Data Lab, submitted Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? The paper crushes a number of (what we now know to be) myths about disks such as vendor MTBF validity, 'consumer' vs. 'enterprise' drive reliability (spoiler: no difference), and RAID 5 assumptions. StorageMojo has a good summary of the paper's key points."
MT[TB]F has become a completely BS metric because it is so poorly understood. It only works if your failure rate is linear with respect to time. Even if you test for a stupendously huge period of time, it is still misleading because of the bathtub curve effect. You might get an MTBF of say, two years, when the reality is that the distribution has a big spike at one month, and the rest of the failures forming a wide bell curve centered at say, five years.
Suppose a tire manufacturer drove their tires around the block, and then observed that not one of the four tires had gone bald. Could they then claim an enormous MTBF? Of course not, but that is no less absurd than the testing being reported by hard drive manufacturers.
Every single mechanism with moving parts will fail. It's just a matter of when. In a few years, when everybody is using solid state drives, people will look back and shake their heads, wondering why we were using spinning magnetic platters to hold all of our critical data for such a long time.
I don't respond to AC's.
Component failure in large-scale IT installations is becoming an ever larger problem as the number of components in a single cluster approaches a million.
In this paper, we present and analyze field-gathered disk replacement data from a number of large production systems, including high-performance computing sites and internet services sites. About 100,000 disks are covered by this data, some for an entire lifetime of five years. The data include drives with SCSI and FC, as well as SATA interfaces. The mean time to failure (MTTF) of those drives, as specified in their datasheets, ranges from 1,000,000 to 1,500,000 hours, suggesting a nominal annual failure rate of at most 0.88%.
We find that in the field, annual disk replacement rates typically exceed 1%, with 2-4% common and up to 13% observed on some systems. This suggests that field replacement is a fairly different process than one might predict based on datasheet MTTF.
We also find evidence, based on records of disk replacements in the field, that failure rate is not constant with age, and that, rather than a significant infant mortality effect, we see a significant early onset of wear-out degradation. That is, replacement rates in our data grew constantly with age, an effect often assumed not to set in until after a nominal lifetime of 5 years.
Interestingly, we observe little difference in replacement rates between SCSI, FC and SATA drives, potentially an indication that disk-independent factors, such as operating conditions, affect replacement rates more than component specific factors. On the other hand, we see only one instance of a customer rejecting an entire population of disks as a bad batch, in this case because of media error rates, and this instance involved SATA disks.
Time between replacement, a proxy for time between failure, is not well modeled by an exponential distribution and exhibits significant levels of correlation, including autocorrelation and long-range dependence.
Bianca Schroeder, of CMU's Parallel Data Lab, submitted Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you?
It means I should be storing my important, important data on a service like S3.
The theory of relativity doesn't work right in Arkansas.
http://www.cs.cmu.edu/~bianca/
;-P
I would love to give her my very large hard drive. For "performance evaluation and measurement", you understand.
You mean to tell me these people have found hard drives that don't fail beyond repair by the end of the first year? I've never encountered a HD that has done this, much to the despare of my wallet. Now, I am serious, what is wrong with the harddrives I choose that kills them so quickly? Is Western Digital no longer a good manufacturer? Should I maybe not run a virus check nightly and a disk defrag weekly? Is 6.5GB of virtual memory too much to ask? Of course not, the manufacturers are just making crappier hds. This article has told me one thing: it's time to get a RAID setup. I've been looking at RAID 5, but two things still trouble me, the price and the performance hit. Does anyone have any information on just how much a performance hit I might experience if I have to access the HD a lot?
Demented But Determined.
Didn't I read about this on Slashdot a few days ago or did some drives fail and the story was lost?? Must have been a drive failure cause it's unlike Slashdot to have dups :)
And here i knew disks stored data on a set of rotating platters, i guess its really stored on the alien spacecraft hiding behind hale bopp!
VLC FOR MAC IS DYING! IF YOU DEVELOP, PLEASE SAVE IT!!
I suspect that the 'infant mortality' syndrome really has to do with the drives being abused before they are installed in the machines (getting dropped during shipping for example)
the large shops like these studies are looking at get the drives in bulk directly from the manufacturer, the rest of us who have to go through several middle-men before we get our drives have more of a chance that something happened to them before we received them.
David Lang
You might get an MTBF of say, two years, when the reality is that the distribution has a big spike at one month, and the rest of the failures forming a wide bell curve centered at say, five years.
Well, the article actually says that drives don't have a spike of failures at the beginning. It also says failure rates increase with time. So you're right that MTBF shouldn't be taken for a single drive, since the failure rate at 5 years is going to be much higher than at one.
The other thing that the article claims is that the stated MTBF is simply just wrong. It mentioned a stated MTBF of 1,000,000 hours, and an observed MTBF of 300,000 hours. That's pretty bad. It's also quite interesting that the "enterprise" level drives aren't any better than the consumer level drives.
AccountKiller
Or maybe powering up the drives off and on is more stressful to the components; say in a desktop environment. With servers racked up, the drives are always spinning with near constant thermal conditions.
Life is not for the lazy.
Machines built at the molecular level can't wear out. There's either enough energy to break the bond at the molecular level or there's not. Just run it within spec. and it'll never break.
From StorageMojo's article: Further, these results validate the Google File System's central redundancy concept: forget RAID, just replicate the data three times. If I'm an IT architect, the idea that I can spend less money and get higher reliability from simple cluster storage file replication should be very attractive.
:w
For best-of-breed open source IMAP, that means Cyrus IMAP replication.
as head of an independent testing lab. That would probably be a heckuva lot more interesting, and lucrative, than some random gig with Google, IBM, or MS Research.
If something has an MTBF of 1 million hours (that's 114 years or so), then you'll be a long time dead before it fails.
At this stage, the only reasonable non-volatile solid state alternative is NAND flash which costs approx 2 cents per MByte ($20/Gbyte) and dropping. NAND flash has far slower transfer speeds than HDD, but is far smaller, uses less power and is mechanically robust. NAND flash typically has a lifetime of 100k erasure cycles and needs special file systems to get robustness and long life.
Engineering is the art of compromise.
Everything else in there, I think most of us us already knew... Except the "infant mortality" one really surprised me.
I have to wonder, though, did she include DOAs in that, or did she only include drives that worked at least for a few minutes/hours/days? I have to strongly suspect the later - I can't argue with the statistics from 100k drives, but my personal experience with a few dozen drives has shown that they have a strong bias toward either never working, or working for at least a year.
Love the RAID5 stat, though... Perhaps this study will finally convince people to only use RAID for performance or huge-JBOD reasons, never for (the illusion of) reliability.
What's interesting about both of these papers is that previously-believed myths are shown to be, in fact, myths.
The Google paper shows that relatively high temperatures and high usage rates don't affect disk life.
The current paper shows that interface (SCSI, FC vs ATA) had no effect either. The Google paper shows
a significant infant mortality that the CMU paper didn't, and the Google paper shows some years of flat
reliability where the current paper shows decreasing reliability from year one.
The both show that the failure rate is far higher than the manufacturers specify, which shouldn't come
as a surprise to anybody with a few hundred disks.
I'm particularly pleased to see a stake driven through the heart of "SCSI disks are more reliable."
Manufacturers have been pushing that principle for years, saying that "oh, we bin-out the SCSI disks
after testing" or some other horseshit, but it's not true and it's never been true. The disks are
sometimes faster, but they're not "better".
Thad
I love Mondays. On a Monday, anything is possible.
Of course if we count relatively minor failures (like forgetting to take out the trash or pick up dirty underwear), then MTBF is approx 27 minutes!
Engineering is the art of compromise.
09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
> Um, but doesn't the summary of the paper say that there is no infant mortality effect, and that
> failure rates increase with time, and thus the bathtub curve doesn't actually apply?
That may be the new 'theory' but we all know about theory vs reality. Here in reality if you put a couple of dozen new drives into service you have one or two spare hard drives to replace the ones that WILL fail in the first week. Especially with consumer grade drives typical in workstation deployment. If you only have one dud out of twenty it was a good rollout.
And as for some of the other assertions in this paper (well the summary, haven't read this one yet, still wanting to reread the google paper again, need to hours in a day.... bah!).......
> Costly FC and SCSI drives are more reliable than cheap SATA drives.
Sorta. Again, real world vs theory. Try banging the hell out of an off the shelf consumer drive 24/7/365 and see how long it holds up. Yea, thought so. Hope you didn't have anything important on that paperweight.
> RAID 5 is safe because the odds of two drives failing in the same RAID set are so low.
This one should bother ya if you are overly relying on the 'infallibility' of RAID5. Remember kids, drives fail from two major groups of causes, internal and external. If a power event kills one drive in the array the odds are pretty low of only one being dead, you just might not KNOW about #2 yet. And filesystem corruption will be faithfully mirrored onto the array. Obey the 1st Commandment: "Thou Shalt Make Backups."
Democrat delenda est
No its 131072 Does noone care about base-2 anymore?? /To the sarcasm disabled.. Its a joke..
Software RAID FTW!!
In all seriousness, in truly critical storage you save your stuff under a RAID1. RAID5 is simply too unreliable for the task(not to mention that those controllers aren't exactly cheap).
So save yourself trouble, money, and grief, and just user logical volume management to replicate drives.
It didn't conclude RAID 5 doesn't help, it concludes RAID 5 doesn't help as much as people think, because people think the probability of another failure before the rebuild is complete is negligible and they're wrong.
It helps, and distributing the data more helps more. Someone concerned about multi-drive failures can, for example, use a 3-way RAID 1 array, or a RAID 6 array (which can tolerate the loss of any 2 drives).
I rarely criticize things I don't care about.
I wonder if anyone looked at what actually failed in the drives? An arm, a platter, an actuator, a board, an MPU?
Would an analysis tell us that SSDs are not only faster but more reliable and if so by how much?
The fact that another drive in an array is more likely to fail if one has already failed makes a lot of sense, but the conclusion to forget RAIDs doesn't. Arrays are normally composed of the same drive model, even the same manufacturing batch, and are in the same operating environment. If something is "wrong" with any of these three variables, and it causes a drive to fail, it's common sense the other drives have a good chance at following. I've seen real-world examples of this.
In my real-world situations, the RAID still did it's job, the drive was replaced, and nothing was lost, despite subsequent failure of other drives in the array. Sure you can get similar reliability at a lower price by replicating data, but I think that's always been understood as the case. Furthermore, as someone else in the forum mentioned, enterprise-class RAIDs are often used primarily for performance reasons. A modern hardware RAID controller (with a dedicated processor and ram) can create storage performance unattainable outside of a RAID.
is neither working nor broken... Unless you look at it of course ;)
What's interesting to me is that neither of these papers mentions the issue of pre-installation handling. The good folks over at Storage Review seem to be of the opinion that the shocks and bumps that happen to a drive between the factory and the final installation are the most significant factor in drive reliability (much more than brand, for example).
The google paper talks a bit about certain drive "vintages" being problemmatic, but I wonder if they buy drives in large lots, and perhaps some lots might have been handled roughly during shipping. If they could trace back each hard drive to the original order, perhaps they could look to see if there's a correlation between failure and shipping lot.
-R
I doubt MTBF fits into anyone's thoughts when buying a drive, unless they are buying bulk or such for a business and have to justify the choice. I am only talking about home use here.
Personally I have only ever had one drive go on me (a quantum scirroco) in 10 years. For myself, and most home users, that's a great track record. On the other hand, I have had friends and relatives who's drives just up and quit. New ones, old one, many brands. As long as you buy a major brand, they seem to be more or less equal in practice.
That said, with drives going at 10K rpm, the heat, etc, there are going to be lemons. I suspect that will always be a long as we use mechanical drives. I am not suprprised warranty periods dropped about the time drives began to exceed 7200 rpm. Always remember to back up data that's important and keep those receipts.
I'm particularly pleased to see a stake driven through the heart of "SCSI disks are more reliable."
I have been saying that for at least 10 years. Back then I worked at a large government contractor and we set up what was then a very large 2 TB array of SCSI drives (about 100 drives). Those damn things were "industrial grade" certified by a large well known server vendor yet we were losing 2 or 3 drives per day for several months. Totally rediculous because I extrapolated the failure rates of IDE drives from another government setup and found it was actually much better than the SCSI drives and they weren't even rated for heavy duty usage.
Of course prior to this article the group-think Slashweenies would moderate me into oblivion (probably will anyway, but meh).
Hard drives die often because the manufacturers build them cheaply, the same as every other component in a PC. Why would they ever make a bulletproof hard drive ? They'd go out of business!
Sure, some of them end up being replaced under warranty, but a lot of them don't, and so Maxtor/IBM/Hitachi make another buck off your sorry ass. There isn't a sane server admin that doesn't keep a set of spares in his desk drawer, because it's not a question of "if" it dies but WHEN. Hell, most decently-geared techies have a whole box of hard drives, pre-mounted in hotswap bays ready to rock. And if it weren't for the fact that I was just laid off a month ago, I'd be buying a couple spare SATA drives myself, I just have a funny feeling something's going to go tits up in my media server. I haven't had any warnings or hiccups, but I just know the Seagate devil's planning his move, waiting for 2 drives to start straying so he can kill my Raid-5 nice and fast. Hard drives are little more than Murphy's Law in a box.
-Billco, Fnarg.com
All the hard drives I installed in my family's computers have failed in the last 5 years - including mine. :-(
Waaaah! They cry, when I tell them there is no hope for the family photos, barring a media reclamation service == $$$
I tell everyone: "Assume your hard drive will fail at any moment, starting now! What is on your hard drive that you would be upset if you never saw it again?"
"No matter where you go, there you are." -- Buckaroo Banzai
This paper was co-authored by Garth Gibson!
Almost makes some of these posts look like these in retrospect.
https://www.eff.org/https-everywhere
Yeah I wouldn't worry about quantum effects on machines built at the quantum level either.
As mechanical devices, hard drives are appallingly reliable.
The electronics on the hard drive rank as major players in heat generation in the boxen.
Heat kills transistorized components.
"Hard Drive Data Recovery" companies often have nothing more sophisticated than a hard drive buying program, and very competent techs soldering and unsoldering drive electronics. They buy a few each of most available hard drives, as the drives appear on the market. When a customer sends them a hard drive for "recovery", the techs find a matching drive in inventory, disconnect the electronics, and replace the electronics in the drive. The percentage of drive failures due to mechanical failure is very low.
When I bought a desktop computer for an unsophisticated family member, I also purchased and installed a drive cooler - a special fan that blows directly on the drive electronics.
I was very concerned about MTBF. I just assumed that the manufacturer's information was totally irrelevant to my situation - a hard drive in a corner of the tower, covered with dust, and no air circulation.
I occasionally pick up used equipment from family and friends. Usually, it is broken. Often, it is the hard drive. What is amazing is not that they failed, but that they lasted so long with a 1.5 inch coating of insulating dust.
I suspect this would also explain the rising failure rate with time. Nobody seems to clean the darned things. They just sit and run 24/7/365, until they fail.
All is paradox. Retired lawyer, so this is just one more layman's opinion.
I suppose dupes are good!
Several boxes in my office closet contain a pretty good history of desktop PC hard drive technology from about 1988-2005. Much like archaeological sediments, on the bottom you will find the oldest, 10Mb and 20Mb drives, capacities increasing as you move up through the layers, and at the top the most recent addition, a 30Gb retired from a Dell retired last Xmas. All of these HD's were retired in good working order, and as far as I know they all still work. Every one of them succumbed to the REAL nemesis of hard drives, that is they were swallowed up by new drives with 10x their capacity.
Sure, I've seen a couple drives fail, but they've been few and very far between. I've seen a lot more drives run long beyond their usefulness whilst packed solid in dust-bunnies, running scorchingly hot, on questionable power, some even sticky with spilled Mountain Dew.
Just be sure to get good backups, and enjoy the cheap storage.
You can have my SIG when you pry it from my cold, dead hands.
> I tell everyone: "Assume your hard drive will fail at any moment, starting now! What is on your
> hard drive that you would be upset if you never saw it again?"
True enough, I use a similar warning. Mine is, "Don't leave anything on your hard drive you care about. If you manage to make it a year without reloading Windows the drive can crap out with no warning. Burn anything you can't download again to a CD/DVD."
Personally I don't have to worry about Windows and I have a RAID5 at home.... but I still burn anythiing I care about. Important stuff like photos get backed up to a DVD-RAM until I fill it then I burn two DVD-R copies on different brands of quality media.
The problem is hard drives have become freaking huge. Where can you backup a modern large drive? We are back where we were when backing up a 60MB drive meant a crate of floppies, only now we need a spindle of DVD-Rs and we actually need more time. We need those holographics DVDs!
I have taken to recommending RAID1. It is cheap and almost any non-laptop can do it these days. With drives as unreliable as they have become it makes sense for anything other than a gaming rig.
Democrat delenda est
I've had more RAM chips die than hard drives.
Really! My experience is just the opposite. I've had three drives fail within the past year or so, but I've never had a RAM chip fail. I would guess that hard drives fail more often than RAM chips, and that your experience is the exception to the rule. (Perhaps some better grounding would help.)
I've only seen flash fail once, and that was a failure of my USB key to turn up after disappearing into the crack between the sofa cushions. Other than that, my flash experience is the same as yours: flawless in low volume usage.
When our name is on the back of your car, we're behind you all the way!
Nothing is wrong? Phew!
Until it encounters an energetic cosmic ray or an alpha particle.
Mea navis aericumbens anguillis abundat
Well, the article actually says that drives don't have a spike of failures at the beginning.
Hmm, the Google paper says they do, from 3-6 months (Figure 2).
Which leaves us with confirmation that 50% of all studies are wrong.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
"oh, we bin-out the SCSI disks
after testing"
As I understand it, the kernel of truth in those claims is that the testing/sampling rate on the SCSI assembly line is higher. I don't know how much higher or if it's statistically significant, but I've heard from quality engineers who work in some of these plants that they do (or did a couple years back). I expect they still do even if it's only to have something marginally defensible to back up their salesmen's pitches.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
I keep hearing this persistent rumor that it's disk spin-up which is the most significant contribution to disk failure. The moral of the story is that systems which are left on 24/7 are less likely to see HD failures than systems turned on/off everyday.
Now if that's really true, wouldn't it be quite simple for the manufacturers to simply spin-up the disk more slowly by putting in very simple and reliable motor control circuitry ?
Does anyone have any real evidence, i.e. not anecdotal, that this is really true.
Absolute statements are never true
Why haven't we moved to a system where you NEVER delete?
Security would become a concern (But not too much of one, you should already shred your drive, and if you could overwrite all of one type of bits with the other one)...
This would require a new type of file system (One with a pretty strange [or flash based] file table). But you could have data that would last thousands or millions of years, and considering how many dots I can fit on a peice of paper and how much I suck at making dots a pretty damn large storage size.
Is comparing MTBF correct when saying different drive types are of the same reliability? By looking at an existing system you really aren't looking at idependant variables. Let's say I have two servers one that hits the drives alot and another that barely touches the drives so using what I'll call "the old rules of thumb" I would put the FC drives on the intensive server and SATA drives on the less intensive server. So after 1 year I get my first drive failure on each. One could conclude that SATA is as reliable as FC, but is it really? I setup my environment to more heavily hit the FC drives, when would the SATA drive have failed if I had placed SATAs where the FCs were? The only way to really compare drives would be to hit a large number of each different drive type with the same workload. If you look at the way most places do tiered storage they'll put highly accessed data on an FC tier and then migrate less used data to a SATA tier, this might be one reason why the failure rates of the two drives look the same.
On the other hand, you could get a cheap drive controller, and do software RAID, using OSS tools; the setup might be more complex than hardware RAID, but there shouldn't be any issues with recovering your data later due to the format it's written in.
I agree though, that for most people, some sort of "userland RAID" where the disks are just mounted as regular volumes to the filesystem, and then you just write the data twice, is probably the best bet. There's no format problems, and you'll always be able to pull a drive out, stick it in another machine, and get at your data.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
You're absolutely right. I do the IT work for an office of about 30 people, which recently has involved building a couple of new servers and setting up a solid backup plan. In turn this has involved buying around a dozen new disks over the course of about 6 months.
Of these drives, two (one each from two different retailers) were damaged upon unpacking them. One had a section of the plastic surrounding the jumper block punched in at an angle, and the other had a small but obvious bit of rasping on the metal of one side. (Once the damage was found, it was obvious that there was corresponding damage to the packaging, but in the process of paying etc at the store I hadn't looked close enough. Lesson: with fragile goods such as disks, insist on unpacking them in the store before paying.)
I returned both of course; the first one the retailer wouldn't accept back until I kicked up a stink and demanded to speak to the manager; the second took it back with no arguments (and even an apology).
I guess the thing is though, for every drive that has visible damage, how many have been mishandled and will show a correspondingly shorter lifetime without having any obvious damage? I can imagine many ways these drives could be dropped or shocked without leaving marks.
Every single mechanism with moving parts will fail. It's just a matter of when. In a few years, when everybody is using solid state drives, people will look back and shake their heads, wondering why we were using spinning magnetic platters to hold all of our critical data for such a long time.
You mean the "solid state" of the CF card I am recovering just this moment because the partition table magically became hosed the moment I removed the card from teh camera? Or the "solid state" of the CF card that died so hard Lexar themselves could get nothing from it?
Just because something does not have moving parts does not mean it'll last forever, and if you keep a solid state device operation long enough you'll figure that out.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
This paper prooves something I've always suspected - that the best backup solution for a small organization (or a home) is RAID 1 (mirror) along with swapping out one of the drives regularily. You then have three copies (as Google mandates) and also a little bit of a buffer for recovery in case you delete or modify something you should not have.
All those people who have RAID5 at home are just asking for trouble, especially if there is ever a fire...
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Google didn't see overtemp failures only because google kept their drives cool. Possibly too cold. Their graph
7 8&cid=18063644
p ?t=7677
cuts off at only 10 degrees hotter than a typical PC. But if you extrapolate the data on the right hand side of
the graph, you see that drives fail at higher temperatures just as expected. Also, they appear to have looked
at average temperatures over the life of the drive, not the temperatures near the time of failure. And they
totally ignored temperature fluctuations.
In fact, the conclusion one should reasonably draw from their data (if it can be trusted and I called that into
question saturday) is that drives are designed to operate at 40 degrees C (which, happens to be the operating
temperature of the hard drive on this machine right now in a typical mid tower case) and that any deviation higher
or lower will result in increased failure rates:
But it is also possible that the cooling systems, and not the temperatures themselves, are possible for the
drive failures seen in googles systems. They had some hard drives (the ones particularly responsible
for the low temp failures) that were operating at around room temperature. With light fan cooling, a drive
operates at around 20 celsius degrees above ambient. So how do you get an operating temperature around
room temperature? You cool the server room to freezing, you put A/C evaporator coils inside the server boxes or racks,
you water cool them, or you sandblast the drives with hurricane force winds (slight exaggeration). All of those
approaches raise the possibility of creating environmental hazards other than temperature.
But it is quite possible that it is just the temperature and that drive manufacturers have done the sensible thing and optimized their designs for the typical operating temperature of a drive. I also point out that there are
a number of failure modes associated with over-temp, under-temp, and temperature variation.
In a typical PC, the most likely cause of an overtemp failure is a fan failure.
http://hardware.slashdot.org/comments.pl?sid=2229
Using google, ironically, I found at least one example dating back to 2003 of people discussing the effects of
too low an operating temperature (i.e. room temperature) through excessive cooling adversely affecting hard drives (not even getting into industrial or outdoor temperature ranges). And I wasn't even looking for that: http://www.silentpcreview.com/forums/viewtopic.ph
Conclusion: For best results use a closed loop temperature control system with redundant variable speed fans to keep
the drive itself (not the ambient air) at a constant temperature of 40 degrees C. Or operate your machine with
moderate cooling in an environment comfortable for humans and use software to power down the drive and raise alarms if
it gets much above 50 degrees C. Whether you should shut down if the drive gets below 25 degrees C (after time to
come up to operating temp) is debatable. If you have had a major heating system failure or a broken window in winter, the drives own heat might be giving it some protection but the drive is also more vulnerable when operating than when shut down.
forget RAID, just replicate the data three times.
Sounds incompatible with most DRM that ties a key to hardware.
The truth shall set you free!
They'd go out of business!
Are you kidding? With Moore's law, repeat consumers would build extreme brand loyalty. Let's face it. Even though it works great, there is very little market for my 20 Meg CDC 5-1/4 inch drive on an ST-503 interface.
It's yours for free if you want to pick it up.
The truth shall set you free!
Having worked for a disk drive manufacturer, this doesn't surprise me one bit.
The specified MTBF is theoretical - after all you print the data sheets before you start selling drives, and by the time you have some experience making the drive you can't back off. Your customers would crucify you. Also, the competitive pressure is too high to be realistic.
The theoretical failure rate is basically the sum of the component failure rates, under the assumption that there are no surprises. That means you'll see the actual MTBF approaching the theoretical MTBF by the time the product has matured, aka is getting obsolete. If it ever gets that far given the short product cycles.
A realistic failure rate that's about four times the theoretical value sounds about right.
Posting as AC for a reason...
Which is why all computers in the future will be placed in VERY large isolated lead boxes 100 meters below the earths surface. As for quantum effects, we won't be able to open these boxes to see if they are functioning, thus they are in a state of super-position, therefore cannot actually fail.
A patriot must always be ready to defend his country against his government. -edward abbey
The two don't really contradict each other that much. Google's spike is relatively small and it's really a spike in the first 1-3 months. By the 6th month it's basically settled. In this paper half the time they graph in whole year increments, so that kind of a spike would be averaged into the first year. So, no, they don't contradict each other as such. And in at least one of the graphs by month in this paper (HPC1), there is something that looks like a spike in the first month.
/dev/nul or something.) Well, now we know they're not actually any worse. If you don't actually need the extra bandwidth or lower latency or a 15,000 RPM drive, then you can just as well drop a SATA drive in that machine. Even for 10,000 RPM, 4.5ms, there are the WD Raptor drives with SATA interface, and they're cheaper than a SCSI or FC drive. For a lot of stuff you don't even need those, a 7200 RPM will do perfectly fine.
More importantly, they don't contradict each other in respect to the rest of the curve. With or without that spike, the curve just doesn't look like the bathtub fairy tale that drive makers try to bullshit us with. You're led into a false sense of security that, basically, if a drive didn't fail within the first couple of months, then it'll be at a (nearly) constant and very small probability to fail for the whole next 5 years, and only then it starts rising again. Basically that if you upgrade your drives every 4 years, whatever didn't fail within 2-3 months, heck, it's very unlikely to fail. And the curve just doesn't look that way. The probability to fail rises continuously, and (again whether that spike actually exists or not) after as little as 1 year you're above the starting height of the "bathtub" already.
In retrospect, I don't even know when and why the "bathtub" myth even started. The bathtub distribution was originally for stuff like electronic components, without moving parts. For something with mechanical wear and tear like a hard drive, who the heck came up with the idea that the same curve must apply? Shouldn't it have been common sense all along that it linearly gets more wear and tear?
Both papers also tell us that the manufacturers' MTBF numbers are, basically, pure bullshit. They're some impressive number put there for the benefit of the marketting department, not because someone at Seagate/Maxtor/whatever actually believes that number.
In retrospect, again, we should have had an alarm signal when the manufacturers lowered there warranty from 3 to 1 year. If indeed there was (1) the MTBF they claim, and more importantly (2) the bathtub curve they claim, the reduction wouldn't have even made too much of a difference. I mean, most drives would have failed withing a couple of months, followed by barely a trickle of deffective drives for the next 5 years straight. Why bother doing the bad-for-marketting thing of lowering the warranty in that scenario? Or did they already know that they lie?
And finally, a very important point is that (again, bullshit marketting claims be damned) there is no difference in reliability between cheap SATA and expensive SCSI and FC. There is this assumption permeating the whole society that if something is expensive, it _must_ automatically be better and more durable than the cheap stuff. That if you buy a big plasma TV, it's automatically better and last longer than an el-cheapo CRT. (Yeah, right. Plasma is actually known for its decay over time.) A whole edifice of consumerism, conspicuous consumption, and SFV (Stupid Fashion Victim) syndrome is based on that bullshit excuse to spend more than you need to spend. "Yeah, but it'll be better and last longer!" Yeah, right.
I've actually met people who wouldn't even _consider_ putting a ATA drive in any kind of server. "What, you're going to put your enterprise data on ATA drives???" (Said with a perplexed look, as if I had proposed flushing it to
A polar bear is a cartesian bear after a coordinate transform.
What do you consider "reliable" ? /dryer.
There are usb sticks around that survive driven over by a semi.
Flash regularily survives a round in the washing maschine
The "ruggedness" of SSD is much bigger than mechanical, although the "soft errors" are yet really to be explored (i.e. aging of flash cells with the lover structure sizes, ect, cell vs controller failure rates, ect).
Otoh, with SSDs, it should be very cheap to create on-disk-redundancy (maybe 2 redundant controllers with fail-over, chipkill for the flash banks, ect)
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
There. Fixed that for you. Enterprises don't put up with this crap if an alternative exists, and they often have the purchasing power to ensure that alternative exists. Consumers shouldn't have to either.
It's good for VMware that they still have a significant edge in management and failover tools as compared to Linux-based solutions using Xen and KVM/QEMU. When the latter two start getting sufficiently sophisticated, CIOs worth their salt will look at moving to Xen or KVM not because of licencing costs, but because of the hassle (and additional failure modes) of dealing with the new Licence Manager in ESX 3.0/VI 2.0
Laissez lire, et laissez danser; ces deux amusements ne feront jamais de mal au monde. - Voltaire
They should use 3 phase commit :-)
http://en.wikipedia.org/wiki/Three-phase_commit
Of course even 3 phase commit can fail lol. Though I'm a bit rusty from my Operating Systems class...
- overwrite the logically overwritten sector
- write to a different sector, but move that different sector to the logically overwitten sector
So as you see, it's not possible. The only way they can save your disk is if you don't use the whole thing. Normally they don't let you access the whole space, but that's just like spinning drives, which has reserved sectors for failures.And the flash disks that allow direct access, without IDE controllers, don't do any load balancing. But normally one will use a load-balancing filesystem designed for flash, like JFFS.
Unfortunately, this paper is severely flawed. Similar to the Google paper, it is written by academics with little understanding of the subject matter, but a strong desire to publish lengthy papers.
To write a meaningful paper, there is a lot of data about the drives and the systems they are used in that needs to be collected. These are initial conditions and operating conditions that any real system scientist will tell you cannot be ignored (to say the least). One cannot look at drives in the abstract, but must look at many details of how they are used, including the storage systems they are part of.
Google, to their credit, did collect the SMART measurements. That is a good start, but not sufficient data to support the conclusions of the Google paper.
For example, the orientation of each drive needs to be taken into account. What percentage of the drives analyzed were mounted horizontally vs. vertically? How were the drives themselves mounted? Specific mounting techniques result in a greater incidence of particular failure patterns. How were the drives cooled? Particular cooling techniques similarly result in specific failure patterns. What sort of data usage patterns were in use? What levels of RAID were used across the various drives?
I see no measurements of vibration in this paper. Drive orientation and drive vibration (including system-based vibration) are two factors that are very important in determining drive reliability. Drives have a certain resistance to vibration (and shock) that varies based on the directionality of the vibration.
We also see no meaningful treatment of the conditions for the HPC1, COM1, and COM2 systems. In HPC1 and COM1 we see massive failure levels for memory, likely indicating severe heat problems in those systems. In the COM2 system, we see a very high incidence of motherboard failure, again mostly likely indicating heat problems (or possibly bad caps). Specific heat conditions are operating conditions for drives that must be taken into account. Maybe the early onset of wear-out degradation is at least in part due to heat?
I have merely touched on several important elements of study that were neglected in both papers. To gain a real understanding of drive failure in the "real world", real and comprehensive data is needed first. Otherwise we are dealing with merely variations on the "GIGO / Garbage In Garbage Out" theme.
Also, I see a number of irrational conclusions being put forth by readers -- no value in RAID just replicate your data 3 times? This sounds a bit like how to get home from Oz. It works in the movies. But it doesn't work as well in real life.
RAID1 is a very solid solution for many businesses (and their correspondent data usage models), especially if there is a hot spare on the system as well. Many studies have shown the business value of the simple, transparent, low cost redundancy that RAID1 delivers. Even simple probability theory will tell you that RAID1 has clear potential for reliability improvements (that are well measured and proven in the real world).
I see a lot of analysis of RAID5 which people in the real world know is not a good choice for data that matters. There is no sane recovery procedure for RAID5. The drive access patterns tend to result in a lot of vibration as well.
Overall, I am disappointed that with all the investment that large organizations make in purchasing and deploying storage, they seem to have no one in their organization that (1) understands the mechanics and physics of even a single disk drive, (2) understands the concept of initial conditions, (3) understands the concept of operating environment/conditions (4) has the willingness to make actual measurements vs. barf up a bunch of hearsay, and (5) truly wants to understand the reliability of storage systems vs. take pot shots at the drive industry.
Each of these papers, CMU's and Google's is incomplete. There is not enough data to support the conclusions. There is not even enough data to support almost any conclusion beyond the basic observation, "drives fail, some days more than others."
It turns out they are actually triangular
Home fucking is killing prostitution.
This paper from Seagate claims that SCSI drives are individually tested, whilst (S)ATA discs are only batch tested. As a result, I started running badblocks in write-test mode on my new ATA discs before putting them into service so as to attempt to reclaim that relative advantage. I also suspect that SCSI drives have a larger pool of reserved blocks for remapping failed blocks, which would go some way to explaining their funny sizes.
Google's idea of "high temperature" however is somewhere around 40 degrees C, and you'll be lucky to get such a low temperature in most desktop PC's, especially models that only bother to cool the CPU with a fan that only kicks in when CPU temperature becomes too high.
In most desktop systems drives can easily push beyond 45 C, and then you WILL see a drastic reduction of hard disk life (most drives are rated for 55 degrees C max). Often enough I see setups where people have packed 2 or 3 hard disks on top of each other in 3,5" bays, often sandwiched above or below a floppy drive. Air flow is minimal. Drives in such a configuration easily go beyond 60 degrees C.
Most of these are useless to me as my point of view is consumer and you have to name names.
Hrnghk!
The Google paper shows that relatively high temperatures do significantly affect disk life, and pins the safety point at about 45 degrees. Which is about where the manufacturers said it should be in the operating specs for the disks, if you bothered to read them.
They absolutely do not show that high temperatures don't affect disk life. Quite the opposite. Their graphs clearly show increased failure rates as the temperature rises above 45 degrees.
What they do show is that abnormally low (below 40 degree) temperatures don't improve it. That disproves the sanity of attaching watercooling rigs to your hard drive, but apart from that it's not very significant (did anybody seriously think that was a good idea?).
Normal operating temperature for a correctly installed disk in a 1-drive PC is typically 35-40 degrees. 10k RPM disks are hotter. Densely packed stacks of disks in servers can reach 50-60 degrees if not cooled *very* carefully, and that's *bad*, as the Google study shows. No myth was disproved - rather, one of the few figures which the manufacturer can and does test properly was demonstrated to be correct (it's really easy for them to find the safe operating temperature, so this is no surprise).
"When I hear of Schrödinger's cat, I reach for my gun" - Stephen Hawking
No I don't mean rugged I mean reliable. USB drives so far have limited performance and a limit on the rewrite cycles. Real SSD's are basically, complicated RAM assemblies with their own power backup. I wonder how, if you remove all of the mechanical components, the reliability of the drive stacks up, over time? Do SSD's have the same kind of error vs age characteristics? Compared to RAID-5 what's the real difference in availability vs recoverability for example? See we really don't care about RAID-5 except that it purports to offer us better availability and recoverability. But if that's not really the case vs BODs or mirroring then it's possible or it at least bears some investigation whether chucking mechanical drives altogether is a better approach. We're spending gobs of money anyway. So why not spend it differently?
but what happens when we run out of cats to power them?
While having two or more drives in a system would reduce the MTBF on the system as a whole, a raid array still makes sense as cheap insurance. You still should back things up because data loss doesn't only happen due to disk failure, there is the human error element too! As soon as one drive in a raid1 array fails, you should replace BOTH drives at once, first rebuilding the array after replacing the bad drive, and again after replacing the other original drive. Of course at the time you have a failure the exact same make/model hard disks you had in the array are probably no longer available. Just buy something as large or larger and create partitions as large or larger than on the original. When you are done replacing both drives, everything will be identical again.
Having a hot spare drive in a raid1 array isn't the best idea as the spare is aging at the same rate as the two in use. If the hot spare was kept totally powered down until needed this might make sense (but then it's not really a 'HOT' spare is it?).
The real advantage of raid isn't protection from data loss (backups are the only way to do that), but rather a good way to recover from the loss of a drive with minimal down time while the data is being restored (since rebuilding the array can sometimes be done while the array is in use, though I'd rather bring the system down to level 1 while rebuilding).
Please send us your broken RAID5s
We're quite good at recovering them and other difficult ones. (unabashed self promotion)
ESS Data Recovery
simple, fast homepage with your links: http://www.ngumbi.com/
Best dating site ever ?h p
http://women.cs.cmu.edu/Who/Profiles/Grad/index.p
All schools should have something like that.
Am I the only person who read the person's name as Bianca Schrödinger?
Schrödinger's Hard drives are (dead/not dead).
"Forget RAID, just replicate the data three times".
Just be sure not to do it on the same disk...
Slashdot: news for Apple. Stuff that Apple.
I just looked up specs to jog my memory. The RK05 drive is not 5 Meg. It was only 2.4 Meg.
The truth shall set you free!
That would probably be leftover from the good old days of stiction. There is nothing quite like pulling a drive out of a computer and smacking/snapping/why the hell are you doing to that! in front of a novice user.
I want to know end user experience with data recovery services. I ignored that click click sound on my WD Caviar, and now I'm crying. What is the succes rate of these data recovery services, and the price range? Are there happy endings to disaster?
This is a good idea, and if/when computer tech stops advancing so fast it'll be possible. Right now, if you want 5 year data you're looking at 10GB drives using completely different technology than what is currently available. If I want to buy a 500GB SATA drive there simply isn't data going back very far. Once the data on 500GB SATA drives is collected, I'll be buying a 20TB QWERTY drive using holographic biostorage.
I suppose there might be a small amount of value in knowing how good each manufacturer's drives were 5 years ago, but I'd be surprised if they are even making drives in the same country now.
Man, you really need that seminar!
Are you a datacenter engineer? Do you have extensive experience with component cooling in datacenters?
Your whole analysis of the Google paper relies on premises like "they cool their drives too much, so moisture must be killing them" and "they really don't know how to analyze their data, they should have done forensic analysis on their drives". Particularly ridiculous was your assertion that their data should be analyzed by people less concerned with statistical analysis. Do you realize that these people are datacenter engineers? That as part of one of the biggest custom datacenter operators on the planet, they probably are mostly concerned with the engineering and cost-effectiveness aspects of their analysis? That this data was collected using highly automated methods based on SMART readings in huge environments, and performing forensic analysis on failed parts is usually a ridiculous proposition on several grounds? That this paper's material is nothing more than an extract of their internal reliability analysis, whose sole purpose is to maximize reliability and which probably analyzes factors like cooling regimes and humidity to death?
The above mostly applies to the post you're linked. Each of your statements here has merit, and most agree with the Google paper's conclusions, but the difference is they're analyzing their massive operational data, while you seem to be drawing shaky conclusions from rationalizations.
[an error occurred while processing this directive]
So you're right that MTBF shouldn't be taken for a single drive, since the failure rate at 5 years is going to be much higher than at one.
That's like saying that the inertia of an object shouldn't be taken at standstill, since the inertia at near-light-speed is going to be much higher than at rest. It's reductionist and absurd. There is no reason to discard legitimate and informative statistical measurements based on someone else's inability to apply them correctly.
An MTBF implies exactly the behavior that paper presents. If you saw MTBF 1 million hours and thought that meant for all your million drives, the flaw isn't in the measurement, it's in your comprehension of tenth grade mathematics.
Anyone with a functional understanding of basic statistics knows that the significant MTBF risk increases per non-redundant failure source either exponentially (if failures don't cascade) or combinatorially (if they do.) That means the MTBF goes *down* as you add drives. Two non-redundant drives with an MTBF of X have a mean time of single failure for the group of sqrt(x). Three non redundant drives, and it's cuberoot(x).
Really, there's nothing sadder than someone sitting by the sidelines, claiming that good solid measurements should be discarded because they're too stupid to not be misled by them. You remind me of the jerks who sued because they don't know the difference between a megabyte and a mebibyte.
StoneCypher is Full of BS
Every critical piece of data should have a backup plan, making this expensive recovery obsolete. But you knew that already, didn't you.
I hear your pain though. I'm not looking forward to the first time I have a customer bring me a dead solid state drive with vital data. Telling him that there is nothing that anyone can do will be painful.
(this assumes that new recovery tech is not developed - but this will certainly be much more expensive)
I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
I recently put together a system for a client and one of the three drives was DOA from the factory. Western Digital, 250gig - the exact ones you use in raid arrays for servers/drive arrays.
Booted up - but the heads never moved. Probably something broken inside with the armature or stepper motor.
So, yes, it happens all the time. The business I work for repalces drivs in their data center every day. Now, they have something like a thousand drives in there, but that's an astonishing rate when you think about it. A year is about all you get out of a drive today before you are on borrowed time. Two years is common for home users, IME.
As for data protection, four things are key:
#1: Make a CD or DVD with all of your installers. AV, firewall, acrobat, divx, and all the rest - so you can get the machine ready to install you main aps from a clean boot in an hour or so. Also include all of your data recovery software and utilities, plus sound and video drivers.
#2: Weekly backup of email and documents and such(use the tool of your choice) - this should be 20-30MB at most per week. A fe miutes at most out of your schedule. Save it to a USB drive that you leave in one of the rear slots. This can be a 128MB "free" drive that you see coming with a spindle of CDs or whatever. In my case, it's a 512MB card, now, since my email and such is included, and it's grown quite large.
Now, obviously, if you have a 4 gig flash-drive, this solves #1 and #2. If it's an 8 gig model, you can install Windows(to boot/recover with) and still have enough room to partition it for data backup. But even 128MB is better than nothing, since the data is good for more than a decade.
#3:Go out and buy a good surge protector. By this, I mean an IsoBar strip.(more like a metal brick - heh). It works. Most everything else doesn't. If you can nail the issues related to power as a cause for failure, or mitigate them to the level of "freak accident", you're that much better off. A UPS is of course, better, but the number of people running without either is astounding.
#4: Raid 1 is a godsend for the average user. MTTF for both drives at once is amazingly high - on the order of 1/100K+ per day versus something closer to 1/250 or higher for a single drive. Given that a drive to run as a mirror is $60-$80 these days, it's infinately cheaper than data recovery costs if you have a drive crash on you.
$60 now or ~$2000 to have Drivesavers recover it(no joke - it's that expensive.)
Thank you for you input. I will consider it carefully.
Love Vellmont.
AccountKiller
Good thinking, I've been doing the same lately after a couple painful failures. It really hurts under USB.
You might need to do that with the SCSI drives anyway. From the paper you linked:That doesn't say to me that they test the entire SCSI drive, just that they test them for more time than ATA. Which isn't good enough for me. I admit, I only skimmed the paper.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
um, you're not logged in, so you can go fuck yourself.
As you know, you can buy different speeds of CPU or RAM. The reason why they come in fast and slow is not because anyone sets out to make a slow CPU. They make one speed of CPU, and then test them. Most of them are duds and get melted down to make new wafers. Some test okay, and they get packaged and shipped. Some test bad at high clock speeds and okay at low clock speeds, and these get packaged as lower-speed CPUs.
Now I don't know much about hard drive manufacturing, but I would guess that there's a similar thing going on here. An "enterprise"-level drive is one that tested better at the factory, but it's designed to the same specs as a consumer-level one.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
Even so... holy crap, that's still impressive. I remember thinking, in 1983, that a megabyte was an inconceivably large amount of information. I couldn't imagine anyone ever needing to store a megabyte, and that was three years after you had a 2.4M drive. Whoa.
-----
PGP Key ID 0xCB8FF658
Good thinking, I've been doing the same lately after a couple painful failures. It really hurts under USB. :(
Yes, raw PATA speeds are ~50-60MB/s, the best I've managed via USB 2.0 is ~25-30MB/s, with the same disc rehoused in a caddy using a Prolific PATA-to-USB bridge chipset, plugged into an Intel USB controller (NEC was slower, IIRC).
Also, another advantage of running badblocks in write-test mode before using the drive is that hopefully any marginal or failed blocks will be remapped before they contain useful data. I've never seen that happen, and to be honest, I'd now be inclined to reject a drive that did so as D<ead|amaged>OA.
Also, another advantage of running badblocks in write-test mode before using the drive is that hopefully any marginal or failed blocks will be remapped before they contain useful data. I've never seen that happen, and to be honest, I'd now be inclined to reject a drive that did so as DOA.
Good policy. I've been inclined to think that way myself in the past 6 months or so, and the Google study adds credence to my anecdotal experience.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
It is a waste of time and money.
And it creates unnecessary risks (why the fluffy bunny should you be touching that hardware when it is not failing?).
You will need to replace disks, no question about it, but given the redundancy and hot swappability in modern devices of enterprise quality, preemtive action just increases the risks of something else going wrong (pulling a cable, doing something stoopid).
Or tell me, how do you explain shuting down that machine by mistake when you were changing a diks that was in perfect working order? If I was your user, it would make absolutely no sense to me.
IANAL but write like a drunk one.
I think you don't have a full grasp of prioritites....
IANAL but write like a drunk one.