Hard Drive Reliability Study Flawed?
storagedude writes "A recent study of hard drive reliability by Backblaze was deeply flawed, according to Henry Newman, a longtime HPC storage consultant. Writing in Enterprise Storage Forum, Newman notes that the tested Seagate drives that had a high failure rate were either very old or had known issues. The study also failed to address manufacturer's specifications, drive burn-in and data reliability, among other issues. 'The oldest drive in the list is the Seagate Barracuda 1.5 TB drive from 2006. A drive that is almost 8 years old! Since it is well known in study after study that disk drives last about 5 years and no other drive is that old, I find it pretty disingenuous to leave out that information. Add to this that the Seagate 1.5 TB has a well-known problem that Seagate publicly admitted to, it is no surprise that these old drives are failing.'"
Is he saying that 1.5TB drives are all 5 years old? If you look at the table in TFA, it talks about "release date" -- which may well be some time ago, but I'm sure 1.5TB drives may had new, even if the design hasn't changed in a while.
What changed under Obama? Nothing Good
Back in '99 we used to get factory sealed boxes of Seagate drives DOA, or already having cluster collapse. There's nothing quite like 250-500+ brand new units which are all dead or dying, and then shipping them back. I've only started using Seagate again in the last few years.
Om, nomnomnom...
I've either personally owned or purchased for companies I've worked for dozens of hard drives of all (except ESDI) technologies including MFM/RLL, IDE (parallel and serial), SCSI (original, wide, ultra wide, etc.) of form factors from full height 5.25 inch to 2.5 inch, dating back to 1991, and in my experience most hard drives last until you throw them away after 10 or 15 years because they're too small.
A few hard drives die in the first 6 months, and maybe 5-10% die in 3-5 years. Saying that disk drives last about 5 years just doesn't agree with my experience at all. Hard drives essentially last an infinite amount of time, defined by me as until they're so small that their storage can be replaced for under a dollar.
I do agree with the author's other points. Certain lines of hard drives have more like a 100% failure rate after 5 years. One 250 GB hard drive I purchased was RMA replaced with a 300 GB model because the 250 GB line was essentially faulty.
I think these studies might be looking at 7200 or 10000 RPM SCSI units under extremely high use. That's not how consumers use hard drives.
Someones working overtime to make seagate look good.
But the pile of dead seagates at work says otherwise.
Yeah, this guy is essentially saying the pre-known facts validate this research finding so therefore the research was deeply flawed.
It really doesn't matter what the accumulated knowledge over the intervening years says, the facts remain that for this user, Blackblaze, the results were the results, and it happened to match what the industry already knew.
Their results: Hitachi has the lowest overall failure rate (3.1% over three years). Western Digital has a slightly higher rate (5.2%), but the drives that fail tend to do so very early. Seagate drives fail much more often — 26.5% are dead by the three-year mark."
If anything, this guy just validated Blackblaze's study,
Sig Battery depleted. Reverting to safe mode.
BS. I have had at least 2/3 of my newer seagates fail. From 500 gigs to 2TB drives. At LEAST 10 in the last 3 years. In the same time I have had 1 of 6 hitachi and 2 of 18 western digital. I will NEVER buy another seagate drive. Just lost my external 1.5TB USB3 drives go last week with 0 warning and TON of my data. I hate seagate with a passion that I feel for no other.
My understanding based upon reading the originally posted materials was that they published their reliability findings based upon their own experience. I did not see anywhere that they claimed that it was comprehensive research into the reliability of hard drives. We should not crap upon backblaze because people could not be bothered to read the articles and made some faulty assumptions based upon the headlines, to do so would just serve to dissuade others from releasing their experiences. As for the argument about some of the hardware having known faults... If a company does not want bad press they should do more quality control before releasing crappy hardware...
"I myself am made entirely of flaws, stitched together with good intentions."
Eh, it depends. I've had plenty of bad luck with Seagate's consumer drives dying pretty quickly. On the other hand, I've yet to have to replace a single enterprise ES (or ES2) series drive. We use Seagate's ES series drives in the arrays we depend on and Western Digital black drives in the arrays we don't care too much about (video editing rigs). Though I said, "Don't care too much about", I at least expected them to last more than a few months. Unfortunately, a few months is a tall order for Western Digital. The black drives die so often that their entire warranty department probably knows me by name...
Sorry, you're full of shit Henry Newman. How many people follow specifications about burn-in on a drive when they buy it wholesale OEM and it comes in nothing more than a plastic bag? How many people only buy drives released recently? If you're like most people and you want a 1.5TB drive you go out and buy the cheapest one that meets your needs. If Seagate still has 8 yr old drives on the market, then it's damned right that their failure rate should be considered. And so what if a drive "has a well-known problem that Seagate publicly admitted to"? As long as Seagate publicly admits all the issues with every drive they release we should then adjust stats to eliminate those flaws? That's ridiculous. This study was about "If you go out and buy a drive off the market, this is the rate you can expect it to fail at." I don't think any consumer that got a Seagate drive, had it fail and lose all their data, would then say "Oh! Well they publicly admitted to a problem! Shit! My bad!"
Sounds like Mr Newman is going to get a nice paycheck soon.
If the entire box is dead, wouldn't that imply mishandling during shipping?
Bad batches during production. Seagate used to be famous for this, and if you look back at their 90's financials you can see that for quite a while they were hanging on from folding by the edge of their teeth.
Om, nomnomnom...
This article states everything anyone competent already knew. Consumer drives come rated for a lighter workload than enterprise.
Duh? That's the point - it's a cost:reliability tradeoff. With "enterprise" drives being 1.5x+ the expense, for uses like Backblaze where you can survive multiple disk failures with ease it's a no-brainer.
I also got "burned" by these Seagate 1.5TB disks. By *far* the worst drives we have in production (~300 or so these days), and they have had an annual failure rate around 20% since the day they were put into service. Other consumer drives don't even come close to that metric, but are rated similarly.
I actually like Seagate - every disk manufacturer has problematic models from time to time. No big deal, we knew the risks when we bought them. However, the data Backblaze published is completely validated by our own internal data. It's a drive model to avoid when at all possible. Most of our disks have a less than 5% annual failure rate, but this specific model is close to, or over, 20%. That's a major difference.
This article just states the obvious. Consumer drives generally fail earlier under heavy loads. This is not interesting, it's a known tradeoff anyone with a high school degree can figure out for themselves by looking at cycle ratings and MTBF. The only thing I care about for this workload, is if my failure rate exceeds the savings I get from utilizing the lesser drives. The answer has thus far (even with 20% of drives failing each year) been a resounding yes.
There is a difference between consumer drives, data like this is *great* to have published as it can add to your own data and you can compare notes. Will I make a buying decision based off it? Probably not. But it will certainly be one data point of many when it comes time to buy more disk. Known issue? I don't care. All I care about is if the drive works or not, and this particular Seagate model does not. The author of this article completely glances over the fact Seagate admitted to the issue, but did absolutely *nothing* to make it right for their customers essentially blaming them. This fact is what bothers me the most, not the fact they had a problematic drive model - and will likely be the largest factor when it comes to my evaluating Seagate products in the future.
Or a bad batch?
No, of course not. This is /. It must be that a major hard drive manufacturer that was around 20+ years prior, and is still around 14 years later made nothing but bricks and packaged them as hard drives. That's how they survived when so many of their competitors went bankrupt. Bricks are so much cheaper to produce, so the profit margin is considerably higher. ;-)
Install that drive in a server in an online backup company and see how long it lasts.
I've got ~170 failed Seagate Enterprise 500G drives sitting here in my cube. That's pretty close to a 50% failure rate after 4 years of that fleet. Sadly Dell who branded them won't warranty them after 1 year. I'm pretty close to playing hard drive dominoes with them and posting that on youtube. Also noteworthy, we have almost as many Western Digital drives of that same generation with just one failure. Due to this, my company refuses to buy any more Seagates until we see things get better.
Yes. They are getting cheaper and faster. They are already much faster than magnetic rotating discs in read/write/iops.
Don't be facetious, you can't get a 1TB SSD for 100$ yet and you know this. The OP clearly wrote "getting cheaper", he didn't say they have parity on price.
The reliability rate for current generation SSDs is now higher than traditional HDDs. So in regards to " run 24hours/24hours for 5 years without any problems ?", take your pick, they can all do it better than a traditional HDD.
I think traditional HDDs have precious few years left.
People seem to forget that Seagate denied the issue for almost a year.
I remember.
I was a seagate buyer, before they lied. It was my preferred vendor. We had a number of drives in disk arrays, but when it was time to swap them out, I avoided Seagate as replacements. Never had any data loss due to Seagate drives, but the company was a client of the software my team wrote for enterprise customers, so I did get a view on the edges of the company. Something changed.
Last year, those drives were 6 yrs old and had never gave us any issues, but old drives can't be trusted. The new drives were Hitachi - because I can read reliability reports. I'm still using the old Seagate for unimportant things from time to time. Mainly transporting large amounts of data. No issues and if there are any at this point, the old drives have exceeded expectations.
However, I don't plan to buy another Seagate drive again. They lied! Didn't step up and tell the truth. That is a management issue, not technical, and I remember it. It was a management failure. I will always remember it and were I work (BTW, I'm a CIO) - we will never buy Seagate drives again, if there is a choice.
Life and work is too important to deal with liars.
Find the full text here: http://www.justice.gov/osg/briefs/1996/w961430w.txt
Now, though, Seagate is not Miniscribe.
We have well over 150 3TB Seagate Barracuda's at work that are halfway into their second year of operation. The first year has been pretty flawless, maybe 1 failure, but the second year, we've had about 15 already get peppered with bad sectors and its continuing to happen at least once per week or so on more drives. This hands a lot of crediblity to Backblazes findings if you ask me. Again these are modern 3TB Barracuda's, (non-XT) I was sad when they discontued the XT line, simply because we have about 30 of the XT 2TB models into their 3rd year and no failures yet. Oh, right, they didn't discontinue the XT at all, rather turned it into the Constellation series and sold it for double the price!
This reflects my anecdotal experience of late as well. My Dell server just turned 3 years old (and I had a 3 year service agreement on it). It came with three 1-terabyte drives. All failed before my service period ended and were replaced; the last of the three was replaced this past summer. 100% failure of the original drives in less than 3 years.
No.
Henry Newman's response, however, is deeply flawed.
1) Newman complains that average drive age is a "useless statistic." But he seems to prefer "time since product release" which is far worse than useless -- it is an obviously incorrect way to estimate the age of a drive population and is directly contradicted by the average age data reported in the blog post.
2) Newman has questions about Backblaze's burn in. He can find answers by googling "Backblaze burn in" to learn more about the company's remarkably transparent operations. Beach does not go into these details because an effective blog post will focus on its key conclusions rather than discussing every detail of methodology. It is not a research paper.
3) Newman digresses into hard error rate which is unrelated to drive failures. I look forward to a future Backblaze blog post about error rates. In any case since all these drives are consumer drives and all but one have the same specified error rate it is a non-sequiter.
4) Newman points out that Backblaze probably vastly exceeds manufacturer specs for drive throughput. I think this is exactly the point. Is there really enough difference in reliability between commodity and enterprise drives to justify their price difference? Or is it just a form of price discrimination? Does the spec sheet reflect reality or is it a marketing-driven fiction?
Overall this article strikes me as being written by an industry flack: someone who is more interested in parroting jargon and received wisdom rather than indulging in genuine curiosity.
He seems to be trying his best to find flaws in the study, but his own logic is pretty poor. For instance.
"I’ve noted that we just found that the Seagate 1.5 TB drives are about 8 years old since release, for the failure rate, but the average age of the Seagate drives in use are 1.4 years old. Averages are pretty useless statistic, and if Seagate drives are so bad then why buy so many new drives?"
If the company began rolling out Seagates for 3 years at 5k a year and stopped after three years because of the high failure rate, moving on to Hitachi and such, then the average age even over 8 years could very well be only 1.4 years. Because, let's face it, when it's your ass on the line and you see a particular type of drive putting your servers into a precarious state, you might start migrating away as fast as you can.
Those Seagate drives still running are probably either running in very low IO servers or very low-risk servers (clustered or such), but in such few quantities that their continued lifespans are not increasing the overall average much. The remainder could be shelved to avoid the risk of failing in a critical system and while they are listed in the total number of drives purchased, their age might not be included in the average presented.
SSD is getting cheaper and faster every day.
You know whats getting cheaper? TLC flash, the kind that degrades WHEN YOU READ IT, the kind that has internal read counter and needs to be written again after a certain number of reads to level cell voltages, the kind that has ~300 writes life span. Its designed to DIE no matter what you do with it.
Who logs in to gdm? Not I, said the duck.
I have an 80gb ide deskstar that runs as primary storage for my DNS, key, and SSH jump box for my home network.
Theolder a drive gets, you need to put it into higher positions of authority and privilege due to its years of experience. It inverts the failure rate (which is mainly from burnout and boredom of routine).
Please play hard drive dominos and post to YouTube and /.
my karma will be here long after I'm gone
IIRC Backblaze's workload is write once read maybe once (I mean, they are a backup company). So it's quite likely that they are massively under the specs for throughput.
The truly interesting thing about this study is that they name names; previous work in the area (lke Bianca Schroeder's FAST 07 paper, http://www.cs.cmu.edu/~bianca/... or Google's FAST 07 paper, http://research.google.com/arc..., or NetApp's FAST 08 paper http://www.usenix.org/event/fa...) doesn't give away vendor names. The Backblaze results broadly agree with the previous results.
Correct me if I'm wrong, but I believe Seagate absorbed Miniscribe by way of Maxtor. I wouldn't be so sure that 'shipping bricks' isn't in their patent portfolio.
Since most /. folks weren't even alive back then, let me recap a few of Miniscribe's business tactics:
- Set up off-the-books companies to which they "sold" drives that were simply stored in warehouses.
- Claimed the sale of drives which had not yet been delivered to customers. Their outside auditors called them out on the fact that they couldn't claim the income from drives that were still on the boat from China, and made them restate earnings. When it all fell apart, the criminal investigation discovered that the drives had never even existed to begin witth.
- Took returned dead disk drives, tossed them onto a pile in the office which was nicknamed the "dog pile", and when the pile got big enough, packed them up and shipped them out as new orders.
So no, Seagate is nothing at all like Miniscribe ;-)
One of the patterns I've noticed with Seagate is that drive failures seem to spike when manufacturing moves. The reliable Barracuda IV models made in Singapore were replaced by shoddy ones made by newer facilities in Thailand. Then around 2009-2010 they shifted a lot more manufacturing into China, and from that period the Thailand drives were now the more reliable ones from the mature facility. A lot of the troubled 1.5TB 7200.11 models came out of that, and perhaps some of your 500GB enterprise drives too.
If you think about this in terms of individual plants being less reliable when new, that would explain why manufacturers go through cycles of good and bad. I think buying based on what's been good the previous few years is troublesome for a lot of reasons. From the perspective of the manufacturer, if a plant is above target in terms of reliability, it would be tempting to cut costs there. Similarly, it's the shoddy plants that are likely to be improved because returns are costing from there are costing too much. There's a few forces here that could revert reliability toward the mean, and if that happens buying the company that's been the best recently will give you the worst results.
At this point I try to judge each drive model individually, rather than to assume any sort of brand reliability.
Recent reliability testing has been downright horrifying for the TLC based drives. I predict a whole lot of people buying Samsung 840 drives because they're cheap are going to regret that.
Well I go through a lot of drives at the shop and while I haven't had a chance to try the new 4Tb I can say their 1Tb, 1.5Tb, and 2Tb had a LOT of fails, enough that I actively avoid them now. Rumor has it that its caused by Maxtor shitty ARM chips and lousy firmware but that's rumor so who knows if its true.
From what I've seen, in order from least failures to most, Samsung (especially Ecogreens, they just seem to last and last), Hitachi, WD, and finally Seagate. Maybe their business side is better but on the consumer side their 500Gb drives are good but anything bigger than the 640Gb just seem to die.
Sadly it really doesn't matter now that Samsung and Hitachi have had their drive business bought by Seagate and WD so if you need real storage space? Gotta pick one of the other. The WD blues and greens seem to be decent ATM and the red NAS drives make good storage for HTPCs.
ACs don't waste your time replying, your posts are never seen by me.
Although the 840 Series is clearly in worse shape than the competition, these results need to be put into context. 500TB works out to 140GB of writes per day for 10 years. That's an insane amount even for power users, and it far exceeds the endurance specifications of our candidates.
Seems like it's not as bad as you make it out, I don't think i'd be using a 'puny' 250gb drive in 10yrs much like I don't use 250gb HDDs now that drives over 1TB are around. 1TB SSDs are already around the $500 mark and after ~5 yrs I think they'll be quite affordable.
Have you metaroderated recently?