Annual Hard Drive Reliability Report: 8TB, HGST Disks Top Chart Racking Up 45 Years Without Failure (arstechnica.com)

← Back to Stories (view on slashdot.org)

Annual Hard Drive Reliability Report: 8TB, HGST Disks Top Chart Racking Up 45 Years Without Failure (arstechnica.com)

Posted by msmash on Wednesday February 1, 2017 @05:20AM from the reliability-report dept.

Online backup solution provider Backblaze has released its much-renowned, annual hard drives reliability and failure report. From a report on ArsTechnica: The company uses self-built pods of 45 or 60 disks for its storage. Each pod is initially assembled with identical disks, but different pods use different sizes and models of disk, depending on age and availability. The standout finding: three 45-disk pods using 4TB Toshiba disks, and one 45-disk pod using 8TB HGST disks, went a full year without a single spindle failing. These are, respectively, more than 145 and 45 years of aggregate usage without a fault. The Toshiba result makes for a nice comparison against the drive's spec sheet. Toshiba rates that model as having a 1-million-hour mean time to failure (MTTF). Mean time to failure (or mean time between failures, MTBF -- the two measures are functionally identical for disks, with vendors using both) is an aggregate property: given a large number of disks, Toshiba says that you can expect to see one disk failure for every million hours of aggregated usage. Over 2016, those disks accumulated 1.2 million hours of usage without failing, healthily surpassing their specification. [...] For 2016 as a whole, Backblaze saw its lowest ever failure rate of 1.95 percent. Though a few models remain concerning -- 13.6 percent of one older model of Seagate 4TB disk failed in 2016 -- most are performing well. Seagate's 6TB and 8TB models, in contrast, outperform the average. Improvements to the storage pod design that reduce vibration are also likely to be at play.

33 of 114 comments (clear)

Min score:

Reason:

Sort:

HGST nearly always on top by Anonymous Coward · 2017-02-01 05:23 · Score: 5, Insightful

Every time Backblaze publishes a report the HGST drives always come out on top.
It's a little more expensive to fill your NAS with them but in my experience it's been worth it.
1. Re:HGST nearly always on top by VernonNemitz · 2017-02-01 05:49 · Score: 3, Funny
  
  Here's a wild idea that might extend disk-drive lifespan even more. All the drive-spindles/axles in one of those pods should be aligned parallel with the Earth's rotation axis (details in the link).
2. Re:HGST nearly always on top by OverlordQ · 2017-02-01 05:58 · Score: 2
  
  That effect is so inconsequential to be practically zero.
  
  --
  Your hair look like poop, Bob! - Wanker.
3. Re:HGST nearly always on top by BenJeremy · 2017-02-01 06:22 · Score: 3, Insightful
  
  Seagate blows. I've got a lot of hard drives - Toshibas, Hitachis, WDs, Seagates.... I have exactly three Seagates out of 12 that are currently working.
  On the other hand, I've bought up some "refurb" Hitachis (server pulls with 20k hours) and they just work.
  Seagate hasn't made a quality drive since they bought up Maxtor and, apparently, dumped all of their factories, QA people and engineers in favor of Maxtor's. It's the only explanation I can think of for the nosedive in quality.
  It's rather ironic that the former "Deathstar" line is more reliable than Seagate these days
4. Re:HGST nearly always on top by dgatwood · 2017-02-01 06:46 · Score: 2
  
  Seagate was bad before they bought Maxtor. There was one year when I had a 100% failure rate of every Seagate drive that I had bought within the past year, consisting of a various drives of various sizes, some laptop, some desktop, some internal, some external.
  If anything, my experience with Maxtor drives led me to hope that Seagate would actually improve after the merger, because Maxtor's quality seemed better to me than the level Seagate had fallen to over the few years leading up to the merger. And IIRC, the Seagates that were rebadged Maxtor drives had a considerably lower failure rate than the non-rebadged drives.
  
  --
  Check out my sci-fi/humor trilogy at PatriotsBooks.
5. Re:HGST nearly always on top by drinkypoo · 2017-02-01 07:09 · Score: 3, Funny
  
  I grew up in Santa Cruz, which meant I was very near to Seagate. One of the most popular local BBSes was run by a Seagate engineer. A lot of cheap used Seagate disks mysteriously found their way onto the local market, and you could generally get ST-506 disks for $1/MB, which was exciting and astounding at the time. Back then, Conner was pure crap, Maxtor was interesting and decent, WD or Quantum seemed to be the best, and even locally we called Seagate "Seizegate" and everyone learned to take their drives out of their PC and whack them just so in order to free them up from stiction... approximately weekly. Just one or two good temperature cycles without turning on the system was sometimes enough to make them freeze.
  These disks weren't marked in any way as engineering samples, though I suppose that's not impossible. But it wasn't just me. It was a fairly universal experience.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
6. Re:HGST nearly always on top by ASCIIxTended · 2017-02-01 09:40 · Score: 2
  
  We've used thousands of RE4 and hundreds of RE3 drives over the last decade and have had very few failures (15). We run them 24/7 but they do not get worked very hard - storing a very small amount of data at a 15 minute interval. We always use them in mirrored arrays. All but three failures have been within the first day of using them, with one being DOA.
  
  --
  I do not belong to the church of the lowercase 'i'
7. Re:HGST nearly always on top by adolf · 2017-02-01 13:26 · Score: 3, Informative
  
  The thing about anecdotes like this is that the sampleset is always very small, so it's not a very meaningful datapoint.
  I have several 8-year-old 250 gig Seagate 7200.10 drives that refuse to die. I took them out of service a few weeks ago.
  At work, I've seen a lot of Seagates die in NVRs. But then, the NVRs were all mostly populated with 2TB Seagates from day 1 -- so of course I'm going to see a lot of them die. I also see plenty of 2TB WD drives bite the dust. I've never seen an HGST drive die in an NVR, but then none of them have HGST drives installed....
  Statistically, from my perspective, it's about a wash. But that doesn't really mean anything, because again my own sampleset is very limited in scope compared to Backblaze or Google.
  Meanwhile, IBM at one point was making some absolutely stellar hard drives. Their 9ES SCSI drives were the bee's knees at the time, and were resoundingly reliable. It's hard to characterize a brand of hard drive -- some models are good, and some are bad, from just about any manufacturer.
  (Except for Quantum. Quantum was never good. And Miniscribe, because fuck those crooks.)
  
  --
  Kid-proof tablet..
8. Re:HGST nearly always on top by martinfb · 2017-02-02 07:26 · Score: 2
  
  I second this sentiment. HGST drives have been the best I have ever used.
  
  --
  
  Self-importance and self-indulgence is the root of ALL evil.
Real article by TypoNAM · 2017-02-01 05:29 · Score: 5, Informative

Arstechnica just borderline copy&pasting from the source. See the actual article at: https://www.backblaze.com/blog...
Shame on Arstechnica for not even bothering to link their source material.

--
This space is not for rent.
1. Re:Real article by supercell · 2017-02-01 05:55 · Score: 4, Insightful
  
  They don't want you to leave their web site, it's that simple.
The figure is meaningless. by jcr · 2017-02-01 05:35 · Score: 2

45 years spread over a bunch of drives without a failure doesn't mean that we can expect any individual drive to last 45 years.
-jcr

--
The only title of honor that a tyrant can grant is "Enemy of the State."
1. Re:The figure is meaningless. by pla · 2017-02-01 05:54 · Score: 2
  
  You are absolutely correct. The trivial counterexample is a device that contains a semi-consumable substance, such bearings with an oil that slowly dries out; 100% might last a year, even if 0% will last two (not saying that is the case here, but just as a possibility).
  
  These numbers do, however, suggest that you can expect a very low failure rate of those drives within the first year (less than 2.2%). And realistically, you'll probably get far more than that under similar conditions.
Re:In other news - in 2062 they will have time tra by sinij · 2017-02-01 05:36 · Score: 2

And "few years" is approximately half of MTTF, or do we now know enough to determine failure distribution indirectly?
Re:In other news - in 2062 they will have time tra by Archangel+Michael · 2017-02-01 05:38 · Score: 3, Interesting

The bathtub curve is real, and if you follow BackBlaze tips, they show that years 2-4 are usually exceptional in terms of reliability.
My recommendation is to buy the NAS/SAN/POD/Whatever and spin it up for 3 months, then put it into production and then wait 42 months. After that, start planning and when the next drive fails in the 42-48 month range, start the purchasing process (depending on lead time needed), get it installed, wait 3 months to get early failures out of the way than transfer data ... wash/rinse/repeat. You'll get close to five years between purchase and retirement, with a bit of overlap between versions.
If you have several decks of drives, you can get a reasonable cycle going, and it becomes second nature. Data loss is not an option.

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
45 years by trb · 2017-02-01 05:49 · Score: 4, Insightful

Aggregate years are not years.
"Nine women can't make a baby in one month."
1. Re:45 years by The-Ixian · 2017-02-01 06:01 · Score: 4, Informative
  
  "Nine women can't make a baby in one month."
  Well... some people will tell you that babies are created at the time of conception...
  
  --
  My eyes reflect the stars and a smile lights up my face.
2. Re:45 years by sinij · 2017-02-01 06:21 · Score: 2
  
  No, this is not how it works. You measure pregnancy duration of 9 women, then conclude that if you have a collective of approximately 300 women they will produce 1 baby a day.
  
  However, 1 baby a day is not a useful metric unless you very carefully manage the process.
3. Re:45 years by Ant2 · 2017-02-01 07:13 · Score: 2
  
  Evidence to the contrary. I tried locking up 300 women for more than 2 years, yet no babies were produced.
  I plan to repeat the experiment by adding 1 man.
Re:In other news - in 2062 they will have time tra by ShanghaiBill · 2017-02-01 06:01 · Score: 3, Interesting

The bathtub curve is real
This Backblaze report, previous Backblaze reports, and the Google logitudinal disk reliability study, have all found that the "bathtub curve" is a myth. HDDs do not have high early failure rates, nor does the failure rate suddenly rise after a set period of time.
Another myth that these studies have debunked is that HDDs do better if kept cool. Actually, failure rates are lower for disks kept at the higher end of the rated temperatures. This is one reason that Google runs "hot" datacenters today, with ambient temps over 100F.
Re:In other news - in 2062 they will have time tra by TechyImmigrant · 2017-02-01 06:24 · Score: 2, Informative

In my job we sell tens of millions of each product. We warrant for X years (E.G. 8-10 would be typical for something with a natural replacement cycle of 4-5 years), so we then design the things such that the curve is at a low point at time X. Component aging is heavily modeled and measured so we don't mess up. It would indeed get very expensive if there were lots of early failures. You find bad batches through testing.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Re:In other news - in 2062 they will have time tra by arth1 · 2017-02-01 06:24 · Score: 5, Interesting

I find it hard to believe. It isn't measuring 45 years worth of things like metal fatigue, material decay or degeneration, wear and tear etc.
For spinning disks, factors that do not materialize in the first few years and thus can't be determined from an aggregated short time test include (but are not limited to)
- demagnetization of fixed magnets (leads to write failures)
- magnetization of paramagnetic materials (leads to bit rot)
- wear on ball bearings (leads to all kinds of fatal crashes)
- accretion of and contamination of lubrication (leads to sticktion)
Based on my own experience as a long term sysadmin, the quality and longevity of drives go up and down. Late 1990s drives were bad, early 2000s were good, late 2000s were bad, early 2010s were good, and now it's pretty bad again. It's not just vendor specific, because vendors seem to adjust to each other to arrive at common price/quality point. Sure, there are exceptions, like the Deathstars, but overall, I think the drives tend to be similar in longevity not so much based on brand, but what generation they are.
Re:In other news - in 2062 they will have time tra by lgw · 2017-02-01 06:40 · Score: 4, Informative

The bathtub curve is certainly real, but most drives aren't kept in service long enough to see the far wall - Google's study only goes to 5 years, for example. As failure rates start climbing you tend to replace the lot of them, rather than keep them in service until you reach 50% failure/year. (Note that the PDF you linked does show high infant mortality for drives in heavy use.)
I used to work with very old HDDs, though, and even with a busy used market, the supply of old drives would fall off a cliff at a certain point. When everyone is seeing 50% failure/year, it doesn't take long until spares just can't be found.
(If you're curious why anyone would put up with that sort of thing - the software that works only works on a machine old enough that only very old drives can attach to it. And since demand at the time was maybe 1% of the peak, you'd be using old drives until about 90% ever made had failed.)

--
Socialism: a lie told by totalitarians and believed by fools.
Model numbers much more important than brand name by raymorris · 2017-02-01 07:14 · Score: 3, Insightful

Looking at data from both Backblaze and Google, what's apparent to me is that all brands have some good models and some bad. Google made sure to point that out in their report. Something like "the most reliable model and the least reliable model are the same brand. While reliability is somewhat consistent within samples of the same model, there is little to no correlation between any brand name and reliability".
In other words, these studies show that HGST Model #12345678 is a good drive. They don't show that HGST (or any other company) consistently makes good drives.
Re:In other news - in 2062 they will have time tra by ShanghaiBill · 2017-02-01 07:42 · Score: 2

I can believe the length of the bathtub curve is much longer than the useful life of the disk drive.
The "bathtub curve" has two ends. Neither end is valid for HDDs. If a HDD spins up and formats, then it is no more likely to suffer an "early death" in the first few months than it is to fail in subsequent months. Likewise, the sharp rise in failures after 3-4 years doesn't appear valid. There is certainly a rise, but it is not that sharp. Also, HDD failure is more strongly correlated with accumulative spin time than with calendar age.

It's real in board products, cars and silicon.
These are different issues. Boards often fail because electrolytic capacitors dry out, and less often because of tin whiskers. Those are both aggravated by age and heat. I am not a "car guy" so I don't want to comment on that. I very much disagree that there is a "bathtub" failure rate for actual silicon (rather than chip to chip connections). I have 40 year old TTL chips that still work just fine.
Re:Model numbers much more important than brand na by AmiMoJo · 2017-02-01 08:46 · Score: 2

Hitachi is by far the most consistent though. With other brands a few models have really high failure rates, while Hitachi varies from excellent to just very good. They seem to test their designs much better, and of course you pay for that.
I'd be interested to see a comparison of SMART data between models and manufacturers too. I strongly suspect that some are much better at warning you of impending failure than others.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
That would be interesting. Some *different* by raymorris · 2017-02-01 08:59 · Score: 2

> I'd be interested to see a comparison of SMART data between models and manufacturers too. I strongly suspect that some are much better at warning you of impending failure than others.
That *would* be interesting. That might be more consistent, with some manufacturer normally providing good data. Of course no matter what SMART does, if a model with 5 platters is subject to catastrophic failure, SMART can't do anything about that.
People who know more than I about hard drives have written that different manufacturers calculate SMART data *differently*, so to make predictions based on SMART, you need to know how to interpret the data from that specific manufacturer. HGST data may not be *better* or *worse* than Toshiba, just different, so if you use Toshiba drives you want to know how to understand Toshiba data.
Example: 5 platters vs 1 platter, same manufacture by raymorris · 2017-02-01 09:04 · Score: 3, Interesting

Ps a clear example of this is that all manufacturers make drives with different numbers of platters. A drive with 5 platters is FAR more likely to fail than a drive with 1 platter. They may be made by the same manufacturer, but the 5-platter model is at least 5 times as likely to fail (platters interfere with each other).
Even Backblaze warns these numbers mean little by Leslie43 · 2017-02-01 10:48 · Score: 2, Insightful

Even Backblaze warns these numbers shouldn't really be used by the average consumer to justify their drive purchases, and for very good reasons.

The numbers lie.
They lie because you don't use drives in the manner that they do, Backblaze starts a pod, it fills with data and then primarily sits IDLE from that point on. In other words, they fire it up, does a ton of writes then does nothing, whereas your drives write, read erase, spin up, spin down constantly. Your drives sit in a box that may be in a warm closet, lack air flow, or sit by your feet getting bumped all of the time.
Re:Check out the Google reports. 5 platter drives by adolf · 2017-02-01 13:06 · Score: 2

This can't be repeated often enough. The more complicated a thing is, the more likely it is to fail. Sometimes, it's a linear relationship.
I remember buying some early PoE switches about a decade ago. I needed 48 ports total. The 48-port model was about exactly twice as expensive as a 24-port, and had exactly half of the MTBF rating.
Based on this, I bought two 24-port switches. The net MTBF of the system was still halved to be the same as a singular 48-port switch (because the complexity was doubled) but I reasoned that it would fail modularly instead of absolutely, and then could be repaired modularly.
Hard drives are no different. You can count platters if you want, but I think the real factor is the number of heads. More heads on the stack == more chances for things to go south, and more work for the actuator to do.
Odd head counts are actually fairly common. It's entirely possible to have a 2-platter drive with 3 heads, or a 4-platter drive with 7 heads. Usually, this is due to yields and bin-sorting: One side of a platter might have a defect, where the other is perfectly fine.

--
Kid-proof tablet..
Re:Check out the Google reports. 5 platter drives by the_B0fh · 2017-02-02 04:03 · Score: 2

I use ZFS, with raid Z3, so, I personally go for the cheaper stuff. As long as I replace the drives when it fails, I'm good. Unless 3 of them fail at the same time.
But that's why I do a 1-2 month burn in before I deployed my last set of disks.
Hopefully with scrubbing and automatic SMART by raymorris · 2017-02-02 05:27 · Score: 2

I do similar, using LVM and mdadm. I've found it works well. Reliability is much increased by a) automatic monitoring of SMART data which warns me of impending failures via email and b) weekly scrubbing, checking that all blocks are consistent.
> As long as I replace the drives when it fails
The above monitoring and scrubbing lets me replace drives shortly BEFORE they fail, and mostly ensures that the remaining disks don't have hidden errors. A rebuild is intensive, so it can certainly cause a "working" disk to fail at the worst possible time, if you're not verifying the health of those "working" disks weekly.
Re:In other news - in 2062 they will have time tra by myowntrueself · 2017-02-02 06:10 · Score: 2

Based on my own experience as a long term sysadmin, the quality and longevity of drives go up and down. Late 1990s drives were bad, early 2000s were good, late 2000s were bad, early 2010s were good, and now it's pretty bad again. It's not just vendor specific, because vendors seem to adjust to each other to arrive at common price/quality point. Sure, there are exceptions, like the Deathstars, but overall, I think the drives tend to be similar in longevity not so much based on brand, but what generation they are.
I agree. And even within a vendor and within a model there can be huge variation. I worked on a server farm where we had hundreds of 'identical disks' (same make, model and vendor) except some were made in Hungary and some were made in Thailand. The Thailand disks were failing at an enormous rate.

--
In the free world the media isn't government run; the government is media run.