Slashdot Mirror


Annual Hard Drive Reliability Report: 8TB, HGST Disks Top Chart Racking Up 45 Years Without Failure (arstechnica.com)

Online backup solution provider Backblaze has released its much-renowned, annual hard drives reliability and failure report. From a report on ArsTechnica: The company uses self-built pods of 45 or 60 disks for its storage. Each pod is initially assembled with identical disks, but different pods use different sizes and models of disk, depending on age and availability. The standout finding: three 45-disk pods using 4TB Toshiba disks, and one 45-disk pod using 8TB HGST disks, went a full year without a single spindle failing. These are, respectively, more than 145 and 45 years of aggregate usage without a fault. The Toshiba result makes for a nice comparison against the drive's spec sheet. Toshiba rates that model as having a 1-million-hour mean time to failure (MTTF). Mean time to failure (or mean time between failures, MTBF -- the two measures are functionally identical for disks, with vendors using both) is an aggregate property: given a large number of disks, Toshiba says that you can expect to see one disk failure for every million hours of aggregated usage. Over 2016, those disks accumulated 1.2 million hours of usage without failing, healthily surpassing their specification. [...] For 2016 as a whole, Backblaze saw its lowest ever failure rate of 1.95 percent. Though a few models remain concerning -- 13.6 percent of one older model of Seagate 4TB disk failed in 2016 -- most are performing well. Seagate's 6TB and 8TB models, in contrast, outperform the average. Improvements to the storage pod design that reduce vibration are also likely to be at play.

114 comments

  1. HGST nearly always on top by Anonymous Coward · · Score: 5, Insightful

    Every time Backblaze publishes a report the HGST drives always come out on top.

    It's a little more expensive to fill your NAS with them but in my experience it's been worth it.

    1. Re:HGST nearly always on top by rthille · · Score: 1

      Maybe, but apparently not at BackBlaze's scale. The higher failure rate of the Seagates is offset by the lower price.

      --
      Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
    2. Re:HGST nearly always on top by VernonNemitz · · Score: 3, Funny

      Here's a wild idea that might extend disk-drive lifespan even more. All the drive-spindles/axles in one of those pods should be aligned parallel with the Earth's rotation axis (details in the link).

    3. Re:HGST nearly always on top by Anonymous Coward · · Score: 1

      It is speculated that the gyroscopic affect such an alignment would have, deployed on a massive scale, could actually shorten the length of the earth's day noticeably.

    4. Re:HGST nearly always on top by OverlordQ · · Score: 2

      That effect is so inconsequential to be practically zero.

      --
      Your hair look like poop, Bob! - Wanker.
    5. Re:HGST nearly always on top by JustNiz · · Score: 1

      Just quickly scanning the figures rather than doing any actual math *cough*, it seems that WD are actually the worst brand for reliability.

    6. Re:HGST nearly always on top by Anonymous Coward · · Score: 1

      Well if that allowed us to avoid adding leap seconds every few years, perhaps it would be worth it.

    7. Re:HGST nearly always on top by BenJeremy · · Score: 3, Insightful

      Seagate blows. I've got a lot of hard drives - Toshibas, Hitachis, WDs, Seagates.... I have exactly three Seagates out of 12 that are currently working.

      On the other hand, I've bought up some "refurb" Hitachis (server pulls with 20k hours) and they just work.

      Seagate hasn't made a quality drive since they bought up Maxtor and, apparently, dumped all of their factories, QA people and engineers in favor of Maxtor's. It's the only explanation I can think of for the nosedive in quality.

      It's rather ironic that the former "Deathstar" line is more reliable than Seagate these days

    8. Re:HGST nearly always on top by Gr8Apes · · Score: 1

      Perhaps, but how much is your time worth to fix the data issues from failed drives? I personally haven't touched a Seagate since their terrible >1 TB drives came out. I personally have 3 of those that failed within a year of purchase. It wasn't even worth exchanging them, I just replaced them with WD/Hitachi and never looked back.

      --
      The cesspool just got a check and balance.
    9. Re:HGST nearly always on top by dgatwood · · Score: 2

      Seagate was bad before they bought Maxtor. There was one year when I had a 100% failure rate of every Seagate drive that I had bought within the past year, consisting of a various drives of various sizes, some laptop, some desktop, some internal, some external.

      If anything, my experience with Maxtor drives led me to hope that Seagate would actually improve after the merger, because Maxtor's quality seemed better to me than the level Seagate had fallen to over the few years leading up to the merger. And IIRC, the Seagates that were rebadged Maxtor drives had a considerably lower failure rate than the non-rebadged drives.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    10. Re:HGST nearly always on top by drinkypoo · · Score: 3, Funny

      I grew up in Santa Cruz, which meant I was very near to Seagate. One of the most popular local BBSes was run by a Seagate engineer. A lot of cheap used Seagate disks mysteriously found their way onto the local market, and you could generally get ST-506 disks for $1/MB, which was exciting and astounding at the time. Back then, Conner was pure crap, Maxtor was interesting and decent, WD or Quantum seemed to be the best, and even locally we called Seagate "Seizegate" and everyone learned to take their drives out of their PC and whack them just so in order to free them up from stiction... approximately weekly. Just one or two good temperature cycles without turning on the system was sometimes enough to make them freeze.

      These disks weren't marked in any way as engineering samples, though I suppose that's not impossible. But it wasn't just me. It was a fairly universal experience.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    11. Re:HGST nearly always on top by Anonymous Coward · · Score: 1

      You read my mind. Mod parent Up. Leap seconds suck.

    12. Re:HGST nearly always on top by Jason+Levine · · Score: 1

      I bought a Seagate 3TB drive awhile back and it's currently not being recognized by my computer all the time. It gets noticed, works for a bit, and then dies. I have everything backed up in other locations, but I'm going to buy another (non-Seagate) drive to replace it with before it goes completely.

      --
      My sci-fi novel, Ghost Thief, is now available from Amazon.com.
    13. Re:HGST nearly always on top by Anonymous Coward · · Score: 0

      That's always been my experience. Higher price tag and they're the only ones that have completely failed from both my own experiences and those of others I know.

      Some 320GB Seagates definitely had some trouble and I got them refurbished by Seagate for free—they've been running just dandy since then. I haven't had any issues with any of their 2TB and 4TB drives.

    14. Re:HGST nearly always on top by Anonymous Coward · · Score: 0

      "It is speculated [...]" by people who don't understand about conservation of angular momentum.

    15. Re:HGST nearly always on top by flappinbooger · · Score: 1

      I've had some seagate constellations fail. I've had some WDs fail. Never a WD black though... But those aren't supposed to be used in RAID even though I have used on in a RAID in a pinch.

      When I buy a mechanical HD I use WD black. Shrug.

      --
      Flappinbooger isn't my real name
    16. Re:HGST nearly always on top by Aereus · · Score: 1

      You get what you pay for (usually) I think. Been running several WD RE3 and RE4 drives for 5+ years now and never had any issues with them.

    17. Re:HGST nearly always on top by Anonymous Coward · · Score: 0

      Seagate has sucked since they bought Conner! Maxstore ALWAYS sucked from day 1!

      Experience has taught me that all mechanical devices fail sooner or later. Later is always better though. Western Digital and Hitachi drives have proved to last longest and to be the most reliable in my experience. I have been using computers since the DOS 3.3 days.

      Experience has also taught me that multiple backups with at least one off-site are a necessity! I use a combination of USB hard drives (only connected while doing twice a month backups, Western Digital Sata drives in USB enclosures), and flash drives (64GB) for both twice a month backups, and daily file changes, additions/edits.

    18. Re:HGST nearly always on top by the_B0fh · · Score: 1

      What's really ironic is that Seagate bought HGST and now makes HGST drives too.

    19. Re:HGST nearly always on top by ASCIIxTended · · Score: 2

      We've used thousands of RE4 and hundreds of RE3 drives over the last decade and have had very few failures (15). We run them 24/7 but they do not get worked very hard - storing a very small amount of data at a 15 minute interval. We always use them in mirrored arrays. All but three failures have been within the first day of using them, with one being DOA.

      --
      I do not belong to the church of the lowercase 'i'
    20. Re:HGST nearly always on top by Anonymous Coward · · Score: 0

      This is incorrect. HGST is owned by Western Digital, not Seagate.

    21. Re:HGST nearly always on top by Anonymous Coward · · Score: 0

      The maxtors from just before it was bought seem pretty good, slow but reliable.

    22. Re:HGST nearly always on top by adolf · · Score: 3, Informative

      The thing about anecdotes like this is that the sampleset is always very small, so it's not a very meaningful datapoint.

      I have several 8-year-old 250 gig Seagate 7200.10 drives that refuse to die. I took them out of service a few weeks ago.

      At work, I've seen a lot of Seagates die in NVRs. But then, the NVRs were all mostly populated with 2TB Seagates from day 1 -- so of course I'm going to see a lot of them die. I also see plenty of 2TB WD drives bite the dust. I've never seen an HGST drive die in an NVR, but then none of them have HGST drives installed....

      Statistically, from my perspective, it's about a wash. But that doesn't really mean anything, because again my own sampleset is very limited in scope compared to Backblaze or Google.

      Meanwhile, IBM at one point was making some absolutely stellar hard drives. Their 9ES SCSI drives were the bee's knees at the time, and were resoundingly reliable. It's hard to characterize a brand of hard drive -- some models are good, and some are bad, from just about any manufacturer.

      (Except for Quantum. Quantum was never good. And Miniscribe, because fuck those crooks.)

    23. Re:HGST nearly always on top by SharpFang · · Score: 1

      I wouldn't be so sure. Take a running hard disk in your hand and tilt it this way and that. The gyroscopic force is surprisingly strong - it's well beyond "perceptible" - it actually requires some effort to fight it. Sure if you distribute the tilt over 24 hours it's much weaker, but then add that up over many years and it's no longer negligible.

      --
      45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
    24. Re:HGST nearly always on top by Anonymous Coward · · Score: 0

      Well if you read the actual stats you'd know that WD blows. Your brand based anecdote isn't presented in the data from a company running 71,939 hard drives. How you got 5 Insightful is beyond me

    25. Re:HGST nearly always on top by martinfb · · Score: 2

      I second this sentiment. HGST drives have been the best I have ever used.

      --


      Self-importance and self-indulgence is the root of ALL evil.
    26. Re:HGST nearly always on top by haruchai · · Score: 1

      In 4 years in a 1000+ user environment, Seagate Barracudas were the #1 replaced drives, at least 2x more than WD or Seagate

      (Except for Quantum. Quantum was never good....

      By the time they were absorbed by Maxtor, yeah, they weren't ever going to be great again.
      But, holy shit, the Quantum Fireball 1Gbyte IDE was awesome. Far quicker than anything else at the time and mine lasted 6 years. For several years Boot Magazine, now Maximum PC, used almost nothing else in their testing rigs

      --
      Pain is merely failure leaving the body
    27. Re:HGST nearly always on top by adolf · · Score: 1

      You know, I think I do remember the 1GB Fireball being a trooper. But that takes a back-seat alongside my own ancient Seagates, as far as meaningless anecdotes go. There was also a time when the Fireball name was taken derogatively (like Deathstar) because the drives turned terrible, and then came the [fucking] Quantum Bigfoot abomination.

      Which again, just backs up what I said, although I'm going to have to modify my stance a bit I'm still on the same soapbox: Your sampleset is good, but without a proper study or a basis for comparison, it's just anecdotes and confirmation bias and a story.

      Mostly I've replaced Seagates professionally, too, but then most of the units I work with were populated entirely by Seagates from the factory.

      I always replace them with low-end WD Purple drives because the speed is adequate and the caching algorithm is allegedly optimized for this use-case.

      Eventually, the Seagates will all be gone and 100% of the drives I replace from then on will be WD (although of MTBF is to believed, some of these Seagates will outlive me by a long shot).

      If most of the cars on the road are Fords, then most shops will see Ford cars needing repair.

      I've had an old BMW daily driver for over a decade. People say that they're expensive to work on (they aren't), and that they're unreliable (mine isn't). Exceptions: Small-town mechanic sees a fancy-pants red BMW roll up and starts thinking about "boat payments." Mechanic then realizes that his local Autozone rep can't get many parts for it, so they assume that they'll be paying someone to drive an hour to the nearest dealer for whatever it needs. The price (and therefore perceived unreliability) begins to multiply.

      And so, folks become biased about it.

      A smart, good mechanic (I know exactly one) can figure out how to get quality, OEM (literal OEM, not OEM-like) parts rather cheaply. But what I usually do when I get in over my head working on it myself, I chat with my mechanic about it on the phone for a bit, I order the parts myself (saving him the hassle of working out of his network), and he simply charges me his hourly shop rate to do the work. I don't quibble over the bill. My mechanic is, to me, unimpeachable.

      And so, I'm biased differently than a lot of other folks.

      None of us are right. We all have our stories, but that doesn't make us right. It just means that we have stories. We believe them because they're true -- after all, we were there -- but that doesn't mean that anyone else needs to place any value on them. Be it my own Seagate stories, or my long-winded car analogy, it's just a story like any other.

      And there's nothing wrong with that, I suppose, as long as we take them at face value.

    28. Re:HGST nearly always on top by Anonymous Coward · · Score: 0

      Basically if you want to buy a disk for your own desktop and where a single failure means a lot of restoration problem then buy HGST .
      Right now my desktop has on 1.5 tb seagate that is already showing smart failures and another 4Tb hitachi nas drive that is happily churning away. I will not buy seagate desktop drives anymore if I can help it

  2. In other news - in 2062 they will have time travel by sinij · · Score: 0

    In other news, in 2062 they will have time travel, otherwise how could you possibly know that just-released 8TB drive would last 45 years?

    Is aggregate usage even a meaningful metric?

  3. Re:In other news - in 2062 they will have time tra by TechyImmigrant · · Score: 1

    In other news, in 2062 they will have time travel, otherwise how could you possibly know that just-released 8TB drive would last 45 years?

    Is aggregate usage even a meaningful metric?

    It tells you the MTBF for right now, but it's not useful to predict MTTF unless you know the shape of the bathtub curve. It takes a few years to build that curve.

    --
    I should use this sig to advertise my book ISBN-13 : 978-1501515132.
  4. Real article by TypoNAM · · Score: 5, Informative

    Arstechnica just borderline copy&pasting from the source. See the actual article at: https://www.backblaze.com/blog...

    Shame on Arstechnica for not even bothering to link their source material.

    --
    This space is not for rent.
    1. Re:Real article by supercell · · Score: 1

      I would *up vote* this if I had points.

    2. Re:Real article by Walter+White · · Score: 1

      Yes, Why the resistance to publishing the URL to the original article? Does ARS pay kickbacks?

    3. Re:Real article by supercell · · Score: 4, Insightful

      They don't want you to leave their web site, it's that simple.

  5. Re:In other news - in 2062 they will have time tra by enriquevagu · · Score: 1

    Yes, aggregate usage is a meaningful metric, if you know what it defines. MTBF can be tricky, in many cases it is converted to Annualized failure rate (AFR) to obtain a meaningful metric.

    However, it makes no sense to employ metrics based on an exponential distribution model (which does not have memory) to compare different sets of disks. In particular, the summary says 13.6 percent of one older model of Seagate 4TB disk failed in 2016... If such drives are older (and thus present a longer uptime) their age induces a higher failure rate, which is not observed in the model since it only considers the uptime hours in the given year but not the previous ones.

  6. The figure is meaningless. by jcr · · Score: 2

    45 years spread over a bunch of drives without a failure doesn't mean that we can expect any individual drive to last 45 years.

    -jcr

    --
    The only title of honor that a tyrant can grant is "Enemy of the State."
    1. Re:The figure is meaningless. by Anonymous Coward · · Score: 0

      The number of times I've bought new drives because I needed increased capacity far exceeds the number of times I've had to replace a failed drive.

    2. Re:The figure is meaningless. by supercell · · Score: 1

      That number is very misleading, but does contain a valuable metric.

    3. Re:The figure is meaningless. by Anonymous Coward · · Score: 1

      Everyone who buys a drive isn't you.

    4. Re:The figure is meaningless. by pla · · Score: 2

      You are absolutely correct. The trivial counterexample is a device that contains a semi-consumable substance, such bearings with an oil that slowly dries out; 100% might last a year, even if 0% will last two (not saying that is the case here, but just as a possibility).

      These numbers do, however, suggest that you can expect a very low failure rate of those drives within the first year (less than 2.2%). And realistically, you'll probably get far more than that under similar conditions.

    5. Re:The figure is meaningless. by Walter+White · · Score: 1

      The number of times I've bought new drives because I needed increased capacity far exceeds the number of times I've had to replace a failed drive.

      I had to replace a couple 2TB Seagate drives before I needed more capacity. Of course I replaced them with bigger drives but could have gone years with the existing capacity had the drives continued operation without difficulty. This model also had a very high failure rate in older Backblaze reports.

    6. Re:The figure is meaningless. by ilsaloving · · Score: 1

      That is true, and that is why absolutely no one has ever said, ever, that a single drive will guaranteed last that long. That's like complaining about climate change because your local forecast was off.

      However, the metric does give a good indication of *general* reliability. Namely, HGST kicks ass, Toshiba has really upped their game, and Seagate is still as much of a wankjob as it has been for the past decade or so.

      I wish they'd put more effort into reliability and less into raw capacity. I have personally sworn completely off of Seagate because I have had horrible experiences with the last several that I purchased. The final straw was when I RMA'ed one of their hybrid drives. Not once, not twice, but FOUR times. All within 6 months. I didn't even bother RMAing the last one. I gave up and bought from a different manufacturer instead, and so far it's still running great.

    7. Re:The figure is meaningless. by thegarbz · · Score: 1

      Just because it's not the metric you're after doesn't make it meaningless.

      I for one am not at all interested in the length of time to reach wear-out related failure mechanisms because the mission life of my drives is so short that I don't expect any to actually get to this point.

      I am however interested in infant mortality and the statistically random failure rate. These two have been directly measured by aggregating the failure data of multiple drives over one year and it makes it a far more relevant metric than one drive running 45 years.

    8. Re:The figure is meaningless. by Anonymous Coward · · Score: 0

      How can he be absolutely correct that the figure is meaningless if you found a meaning to the figure? Pick a side and be consistent.

    9. Re:The figure is meaningless. by pla · · Score: 1

      How can he be absolutely correct that the figure is meaningless if you found a meaning to the figure?

      Well, I know this is Slashdot, but some of us can read beyond the subject line... He said, "45 years spread over a bunch of drives without a failure doesn't mean that we can expect any individual drive to last 45 years". That statement is entirely true.

      Going further, most people will, charitably, choose to infer a context that makes sense when reading something that could otherwise seem untrue. If you're in a theater that has "Cool Hand Luke" playing, and yell that title to your friend across the room at the ticket counter - Only a "special" few would choose to interpret that as complimenting the fingers of some guy named Luke.

  7. Re:In other news - in 2062 they will have time tra by sinij · · Score: 2

    And "few years" is approximately half of MTTF, or do we now know enough to determine failure distribution indirectly?

  8. Re:In other news - in 2062 they will have time tra by Archangel+Michael · · Score: 3, Interesting

    The bathtub curve is real, and if you follow BackBlaze tips, they show that years 2-4 are usually exceptional in terms of reliability.

    My recommendation is to buy the NAS/SAN/POD/Whatever and spin it up for 3 months, then put it into production and then wait 42 months. After that, start planning and when the next drive fails in the 42-48 month range, start the purchasing process (depending on lead time needed), get it installed, wait 3 months to get early failures out of the way than transfer data ... wash/rinse/repeat. You'll get close to five years between purchase and retirement, with a bit of overlap between versions.

    If you have several decks of drives, you can get a reasonable cycle going, and it becomes second nature. Data loss is not an option.

    --
    Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
  9. Er... by Anonymous Coward · · Score: 0

    45x3 = more than 145?? Who worked this out?

    1. Re:Er... by TrumpShaker · · Score: 1

      It's the new "Math", get over it or get with it! It's tremendous! More Yuge-er than before!

  10. Re:In other news - in 2062 they will have time tra by TechyImmigrant · · Score: 1

    One you see the error rate start to rise, it can be effective to fit to the expected curve shape, but not always. Crystal balls are unreliable.

    --
    I should use this sig to advertise my book ISBN-13 : 978-1501515132.
  11. Re:In other news - in 2062 they will have time tra by sinij · · Score: 1

    Interesting, wouldn't manufacturer be incentivized to minimize early failures, as it would be most expensive for them?

    Also, how do you account for bad batches/production runs or do they always show up during initial 3 month period?

  12. 45 years by trb · · Score: 4, Insightful

    Aggregate years are not years.
    "Nine women can't make a baby in one month."

    1. Re:45 years by The-Ixian · · Score: 4, Informative

      "Nine women can't make a baby in one month."

      Well... some people will tell you that babies are created at the time of conception...

      --
      My eyes reflect the stars and a smile lights up my face.
    2. Re:45 years by PRMan · · Score: 1

      So, 9 women can make 9+ babies in a few days.

      --
      Peter predicted that you would "deliberately forget" creation 2000 years ago...
    3. Re:45 years by sinij · · Score: 2

      No, this is not how it works. You measure pregnancy duration of 9 women, then conclude that if you have a collective of approximately 300 women they will produce 1 baby a day.

      However, 1 baby a day is not a useful metric unless you very carefully manage the process.

    4. Re:45 years by Ant2 · · Score: 2

      Evidence to the contrary. I tried locking up 300 women for more than 2 years, yet no babies were produced.
      I plan to repeat the experiment by adding 1 man.

    5. Re:45 years by zifn4b · · Score: 0

      Aggregate years are not years. "Nine women can't make a baby in one month."

      Wow, any time I point out something like this, it's as if I receive the "-1 for being negative and not believing the marketing hype" moderation. Excuse me for questioning stuff on the internet that sounds suspicious. Maybe I should have wired that money to that Nigerian prince so I could retire comfortably on a deserted island and forget the internet even exists.

      --
      We'll make great pets
    6. Re:45 years by larryjoe · · Score: 1

      Aggregate years are not years.
      "Nine women can't make a baby in one month."

      It depends. The theoretical bathtub curve is real and simply says that there is an intrinsic constant failure rate that is dependent on the system and an assumed constant environment. That constant failure rate component always exists but is added to the effects of early-life failures and aging/wearout failures. The average failure rates due to early-life failures and wearout at any system age are never truly zero, but there is often an in-between period where both are near zero. This is where the constant failure rate becomes evident.

      If the systems are monitored only (or mostly) in this age range that manifests a constant failure rate, then the elapsed time may be aggregated.

    7. Re: 45 years by Anonymous Coward · · Score: 1

      I will be that man, for science!

    8. Re:45 years by cerberusss · · Score: 1

      if you have a collective of approximately 300 women,

      This is my ultimate dream.

      they will produce 1 baby a day.

      This is my ultimate nightmare.

      --
      8 of 13 people found this answer helpful. Did you?
  13. Re:In other news - in 2062 they will have time tra by Anonymous Coward · · Score: 0

    In another study, Ford took 1,000 cars and ran them for a year without a problem. This translate to all Ford cars will last 1,000 years without a problem.

    Sheesh!

  14. Re:In other news - in 2062 they will have time tra by myowntrueself · · Score: 1

    In other news, in 2062 they will have time travel, otherwise how could you possibly know that just-released 8TB drive would last 45 years?

    Is aggregate usage even a meaningful metric?

    I find it hard to believe. It isn't measuring 45 years worth of things like metal fatigue, material decay or degeneration, wear and tear etc.

    What its really saying is that early failures are at a very low rate; they've measured lots of disks for a few years and can show that these disks don't typically fail in the first few years of use. Totally different from saying that one of these disks can last 45 years of continuous use. To represent it as that seems like something doomed to litigation.

    --
    In the free world the media isn't government run; the government is media run.
  15. Seagate still sucks. by Virtucon · · Score: 1

    In a server, always on environment these are great numbers but in power conservative desktops/home NAS situations I'd love to see CSS numbers. Again though, Seagate still sucks.

    --
    Harrison's Postulate - "For every action there is an equal and opposite criticism"
  16. Re:In other news - in 2062 they will have time tra by ShanghaiBill · · Score: 3, Interesting

    The bathtub curve is real

    This Backblaze report, previous Backblaze reports, and the Google logitudinal disk reliability study, have all found that the "bathtub curve" is a myth. HDDs do not have high early failure rates, nor does the failure rate suddenly rise after a set period of time.

    Another myth that these studies have debunked is that HDDs do better if kept cool. Actually, failure rates are lower for disks kept at the higher end of the rated temperatures. This is one reason that Google runs "hot" datacenters today, with ambient temps over 100F.

  17. Re:In other news - in 2062 they will have time tra by TechyImmigrant · · Score: 1

    >have all found that the "bathtub curve" is a myth

    I don't believe that. I can believe the length of the bathtub curve is much longer than the useful life of the disk drive.

    It's real in board products, cars and silicon.

    --
    I should use this sig to advertise my book ISBN-13 : 978-1501515132.
  18. Life by Anonymous Coward · · Score: 0

    you've inherited your grandfather's old farm plot in Stardew Valley. Armed with hand-me-down tools and a few coins, you set out to begin your new...More this Website http://bit.ly/2kfQKaX

  19. Re:In other news - in 2062 they will have time tra by TechyImmigrant · · Score: 2, Informative

    In my job we sell tens of millions of each product. We warrant for X years (E.G. 8-10 would be typical for something with a natural replacement cycle of 4-5 years), so we then design the things such that the curve is at a low point at time X. Component aging is heavily modeled and measured so we don't mess up. It would indeed get very expensive if there were lots of early failures. You find bad batches through testing.

    --
    I should use this sig to advertise my book ISBN-13 : 978-1501515132.
  20. Re:In other news - in 2062 they will have time tra by arth1 · · Score: 5, Interesting

    I find it hard to believe. It isn't measuring 45 years worth of things like metal fatigue, material decay or degeneration, wear and tear etc.

    For spinning disks, factors that do not materialize in the first few years and thus can't be determined from an aggregated short time test include (but are not limited to)
    - demagnetization of fixed magnets (leads to write failures)
    - magnetization of paramagnetic materials (leads to bit rot)
    - wear on ball bearings (leads to all kinds of fatal crashes)
    - accretion of and contamination of lubrication (leads to sticktion)

    Based on my own experience as a long term sysadmin, the quality and longevity of drives go up and down. Late 1990s drives were bad, early 2000s were good, late 2000s were bad, early 2010s were good, and now it's pretty bad again. It's not just vendor specific, because vendors seem to adjust to each other to arrive at common price/quality point. Sure, there are exceptions, like the Deathstars, but overall, I think the drives tend to be similar in longevity not so much based on brand, but what generation they are.

  21. As always, YMMV by Anonymous Coward · · Score: 0

    Personally, I get tired of Backblaze numbers. It's great that someone takes the time to compile data but everyone in social settings points to them as a reason to buy/not buy a product. Drive failures will vary due to conditions, who you purchased them from, etc.

    Always run diagnostics before shoving a drive somewhere. Always have redundancy if it must be up 24/7. Always have backups.

    Personally WD Red's have been my source of pain over the years.

    FYI, Toshiba does not provide diagnostic tools for their drives. If you have or suspect a failure, you must RMA the drive through them.

    1. Re:As always, YMMV by Anonymous Coward · · Score: 0

      I've yet to encounter a Toshiba drive that does not have SMART.

  22. Re:In other news - in 2062 they will have time tra by thegarbz · · Score: 1

    In other news, in 2062 they will have time travel, otherwise how could you possibly know that just-released 8TB drive would last 45 years?

    It won't, but may. The only thing that we know for certain is that ... you don't understand reliability figures.

    The aggregate data shows the combined effect of random failures and infant mortality. It has nothing to do with wear out failures which limit the ultimate life of a drive. These combined effects are also what is most relevant to most HDD use cases, mainly those drives that aren't abused, and don't find a home in some bank's basement where they have to sit for 20 years without fault.

  23. Re:In other news - in 2062 they will have time tra by thegarbz · · Score: 1

    What its really saying is that early failures are at a very low rate;

    Not just early. Early and random failures. When you include both of those, providing your equipment has a relatively short mission time compared to wear out it gives you a good indication of how reliable it will be over-all in your server.

  24. Re:In other news - in 2062 they will have time tra by lgw · · Score: 4, Informative

    The bathtub curve is certainly real, but most drives aren't kept in service long enough to see the far wall - Google's study only goes to 5 years, for example. As failure rates start climbing you tend to replace the lot of them, rather than keep them in service until you reach 50% failure/year. (Note that the PDF you linked does show high infant mortality for drives in heavy use.)

    I used to work with very old HDDs, though, and even with a busy used market, the supply of old drives would fall off a cliff at a certain point. When everyone is seeing 50% failure/year, it doesn't take long until spares just can't be found.

    (If you're curious why anyone would put up with that sort of thing - the software that works only works on a machine old enough that only very old drives can attach to it. And since demand at the time was maybe 1% of the peak, you'd be using old drives until about 90% ever made had failed.)

    --
    Socialism: a lie told by totalitarians and believed by fools.
  25. Re:In other news - in 2062 they will have time tra by lgw · · Score: 1

    MTBF has little to do with MTTF, Film at 11.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  26. Do you mean Asstechnica* by Anonymous Coward · · Score: 0

    because that's how I've started to think of them.

  27. Re:In other news - in 2062 they will have time tra by Anonymous Coward · · Score: 0

    In other news, in 2062 they will have time travel, otherwise how could you possibly know that just-released 8TB drive would last 45 years?

    Is aggregate usage even a meaningful metric?

    It tells you the MTBF for right now, but it's not useful to predict MTTF unless you know the shape of the bathtub curve. It takes a few years to build that curve.

    Is that a standard tub or a "garden" tub?

  28. Again reinforces that Japanese technology by Anonymous Coward · · Score: 0

    is superior. WD may have bought them up, but it's the same engineering departement, thus the highest quality drives. WD's own breed, as we can often see from reports like these, are churning out mechanical turds as ever.

  29. Model numbers much more important than brand name by raymorris · · Score: 3, Insightful

    Looking at data from both Backblaze and Google, what's apparent to me is that all brands have some good models and some bad. Google made sure to point that out in their report. Something like "the most reliable model and the least reliable model are the same brand. While reliability is somewhat consistent within samples of the same model, there is little to no correlation between any brand name and reliability".

    In other words, these studies show that HGST Model #12345678 is a good drive. They don't show that HGST (or any other company) consistently makes good drives.

  30. Do you keep your drives for 45 years each? by raymorris · · Score: 1

    Are you currently using a drive built in the 1970s? The 1980s or 1990s even? If not, you probably don't care about a drive that may last 45 years. You care, probably, about how likely it is that a drive will fail in the ~3 years before you upgrade it.

    1. Re:Do you keep your drives for 45 years each? by Anonymous Coward · · Score: 0

      Are you currently using a drive built in the ... 1990s even?

      Yes. Only as (tiny) backups, but still... yes.

  31. Data seems to mirror my experience mostly by mandark1967 · · Score: 1

    I'm no hosting/cloud service provider. I have a pretty solid rig outfitted with a boot SSD, 6 Mechanical drives, and a BluRay burner.

    I bought six of the 4TB Seagates when they first came out and all but one had died by the fall of last year. I replaced the 4Tb Seagates with 8GB HGSTs the first day I could, and all have been working fine since installation. (though I did have a problem getting one of the HGST drives recognized by the OS (Win 8.1 Pro x64)

    --
    Sig Follows: "Suppose you were an idiot. And suppose you were a member of Congress. But I repeat myself." -- Mark Twain
  32. Re:In other news - in 2062 they will have time tra by ShanghaiBill · · Score: 2

    I can believe the length of the bathtub curve is much longer than the useful life of the disk drive.

    The "bathtub curve" has two ends. Neither end is valid for HDDs. If a HDD spins up and formats, then it is no more likely to suffer an "early death" in the first few months than it is to fail in subsequent months. Likewise, the sharp rise in failures after 3-4 years doesn't appear valid. There is certainly a rise, but it is not that sharp. Also, HDD failure is more strongly correlated with accumulative spin time than with calendar age.

    It's real in board products, cars and silicon.

    These are different issues. Boards often fail because electrolytic capacitors dry out, and less often because of tin whiskers. Those are both aggravated by age and heat. I am not a "car guy" so I don't want to comment on that. I very much disagree that there is a "bathtub" failure rate for actual silicon (rather than chip to chip connections). I have 40 year old TTL chips that still work just fine.

  33. Re:In other news - in 2062 they will have time tra by TechyImmigrant · · Score: 1

    In other news, in 2062 they will have time travel, otherwise how could you possibly know that just-released 8TB drive would last 45 years?

    Is aggregate usage even a meaningful metric?

    It tells you the MTBF for right now, but it's not useful to predict MTTF unless you know the shape of the bathtub curve. It takes a few years to build that curve.

    Is that a standard tub or a "garden" tub?

    Neither. It's a rub-a-dub-dub-tub.

    --
    I should use this sig to advertise my book ISBN-13 : 978-1501515132.
  34. Re:In other news - in 2062 they will have time tra by TechyImmigrant · · Score: 1

    If manufacturers do their job, consumers should never see the leading edge. If the HDD study says there's no leading edge then that's good enough for me.

    For modern silicon, electromigration has much less distance to travel than in 40 year old TTL chips. E.G. for a 10 year old chip, the distance to travel is a lot less that 1/4 of the distance to travel in a 40 year old chip that is 4X as old. You won't see the leading edge because manufacturing test is effective.

    --
    I should use this sig to advertise my book ISBN-13 : 978-1501515132.
  35. Re:Model numbers much more important than brand na by Anonymous Coward · · Score: 0

    In other words, these studies show that HGST Model #12345678 is a good drive. They don't show that HGST (or any other company) consistently makes good drives.

    It doesn't matter if it's good when I can't find model #12345678 sold anywhere I search.

  36. Re:Model numbers much more important than brand na by AmiMoJo · · Score: 2

    Hitachi is by far the most consistent though. With other brands a few models have really high failure rates, while Hitachi varies from excellent to just very good. They seem to test their designs much better, and of course you pay for that.

    I'd be interested to see a comparison of SMART data between models and manufacturers too. I strongly suspect that some are much better at warning you of impending failure than others.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  37. Aggregate? by superdave80 · · Score: 1

    one 45-disk pod using 8TB HGST disks, went a full year without a single spindle failing... 45 years of aggregate usage without a fault.

    Oh, yeah? I have a 1,000 disk pod that went a MONTH without a single spindle failing... 83 years of aggregate usage without a fault!

  38. Sample size is too small by Solandri · · Score: 1
    There were only 146 of the Toshiba 4TB drives and only 45 of the 8TB drives. The math implodes when a sample has 0 failures (reflecting the possibility that what you're sampling simply can't fail - I think it's safe to say that's not really possible here). But with these sample sizes, if there had been the minimum number of failures (just 1), the margin of error is:
    • for 4TB with a 95% confidence interval, 1.96 * sqrt [(1/146) * (145/146) / 146 ] = 0.0134, or +/- 1.3%
    • for 4TB with a 99% confidence interval, 2.58 * sqrt [ (1/146) * (145/146) / 146 ] = 0.0178 or +/- 1.8%
    • for 8TB with a 95% confidence interval, 1.96 * sqrt [(1/45) * (44/45) / 45 ] = 0.0430 or +/- 4.3%
    • for 4TB with a 99% confidence interval, 2.58 * sqrt [ (1/146) * (145/146) / 146 ] = 0.0567 or +/- 5.7%

    These error margins put the actual failure rate (within the confidence interval) well within the range of most other drives tested. So you can't say with confidence that these particular drives with zero failures were the most reliable. (Looking over their data, it does seem HGST drives are statistically more reliable than Seagate and WD drives. Goody for me - I've been a big fan of the IBM/HGST/Toshiba drives ever since they went overboard improving them following the "DeathStar" fiasco, and have been using them predominantly. Sometimes an embarrassing product failure is the best thing for a company.)

  39. Re:In other news - in 2062 they will have time tra by Dogtanian · · Score: 1

    In other news, in 2062 they will have time travel, otherwise how could you possibly know that just-released 8TB drive would last 45 years?

    You know damn well that's unlikely and you're purposefully misunderstanding this.

    It's quite obvious to *anyone* with an ounce of common sense that it refers to an 8TB drive they've been running continuously since 1971. Occam's razor, see?

    --
    "Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
  40. Re:In other news - in 2062 they will have time tra by networkBoy · · Score: 1

    When everyone is seeing 50% failure/year, it doesn't take long until spares just can't be found.

    (If you're curious why anyone would put up with that sort of thing - the software that works only works on a machine old enough that only very old drives can attach to it. And since demand at the time was maybe 1% of the peak, you'd be using old drives until about 90% ever made had failed.)

    At what point do you look at emulation of the system?

    I supported an *old* customer tracking/billing system for a local oper for a while. I was able to move him off the 80286 to a new Pentium 4 (at the time) and was able to tune a QEMU system to support him correctly. Hardest part was supporting the printer (app was hardcoded for a positively ancient HP Laser, or an Oki dot matrix).

    --
    whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
  41. That would be interesting. Some *different* by raymorris · · Score: 2

    > I'd be interested to see a comparison of SMART data between models and manufacturers too. I strongly suspect that some are much better at warning you of impending failure than others.

    That *would* be interesting. That might be more consistent, with some manufacturer normally providing good data. Of course no matter what SMART does, if a model with 5 platters is subject to catastrophic failure, SMART can't do anything about that.

    People who know more than I about hard drives have written that different manufacturers calculate SMART data *differently*, so to make predictions based on SMART, you need to know how to interpret the data from that specific manufacturer. HGST data may not be *better* or *worse* than Toshiba, just different, so if you use Toshiba drives you want to know how to understand Toshiba data.

  42. Example: 5 platters vs 1 platter, same manufacture by raymorris · · Score: 3, Interesting

    Ps a clear example of this is that all manufacturers make drives with different numbers of platters. A drive with 5 platters is FAR more likely to fail than a drive with 1 platter. They may be made by the same manufacturer, but the 5-platter model is at least 5 times as likely to fail (platters interfere with each other).

  43. Re:Model numbers much more important than brand na by the_B0fh · · Score: 1

    Show me a report where HGST drives are not in the top 20% in terms of reliability.

  44. Re:In other news - in 2062 they will have time tra by lgw · · Score: 1

    At what point do you look at emulation of the system?

    We were doing hardware-assisted emulation for new systems, but old systems were in the field. Eventually they were replaced, but they were expensive enough that we didn't until we finally couldn't get drives.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  45. compare to tires by Anonymous Coward · · Score: 0

    if I put 100 tires on a track and ran them for a year and got 10000 (normal year) of wear on them.
    It would not mean that any of them would be good for 10 million miles.
    Once you get past the first burn in period, they should all run about the same distance.
    Run them until failure and let me know the MTBF.

  46. Even Backblaze warns these numbers mean little by Leslie43 · · Score: 2, Insightful

    Even Backblaze warns these numbers shouldn't really be used by the average consumer to justify their drive purchases, and for very good reasons.

    The numbers lie.
    They lie because you don't use drives in the manner that they do, Backblaze starts a pod, it fills with data and then primarily sits IDLE from that point on. In other words, they fire it up, does a ton of writes then does nothing, whereas your drives write, read erase, spin up, spin down constantly. Your drives sit in a box that may be in a warm closet, lack air flow, or sit by your feet getting bumped all of the time.

    1. Re:Even Backblaze warns these numbers mean little by Anonymous Coward · · Score: 0

      It depends how long it takes to fill their pods, but you are right, not much random access.

    2. Re: Even Backblaze warns these numbers mean little by Anonymous Coward · · Score: 0

      Why did I have to scroll so far to find this. Looking at the general failure rates for other drives, there isn't enough data for the Host or Toshiba that they mentioned. Statistics wise it is possible that they could be the best or the worst on the list.

  47. Check out the Google reports. 5 platter drives by raymorris · · Score: 1

    Check out the Google reports over the years. I think you'll find they have some good and some bad. Specifically, the more platters a drive has, the greater the chance of failure - regardless of manufacturer.

    If you want to be a fan of any one brand, that's fine, doesn't bother me.

    1. Re:Check out the Google reports. 5 platter drives by adolf · · Score: 2

      This can't be repeated often enough. The more complicated a thing is, the more likely it is to fail. Sometimes, it's a linear relationship.

      I remember buying some early PoE switches about a decade ago. I needed 48 ports total. The 48-port model was about exactly twice as expensive as a 24-port, and had exactly half of the MTBF rating.

      Based on this, I bought two 24-port switches. The net MTBF of the system was still halved to be the same as a singular 48-port switch (because the complexity was doubled) but I reasoned that it would fail modularly instead of absolutely, and then could be repaired modularly.

      Hard drives are no different. You can count platters if you want, but I think the real factor is the number of heads. More heads on the stack == more chances for things to go south, and more work for the actuator to do.

      Odd head counts are actually fairly common. It's entirely possible to have a 2-platter drive with 3 heads, or a 4-platter drive with 7 heads. Usually, this is due to yields and bin-sorting: One side of a platter might have a defect, where the other is perfectly fine.

    2. Re:Check out the Google reports. 5 platter drives by the_B0fh · · Score: 2

      I use ZFS, with raid Z3, so, I personally go for the cheaper stuff. As long as I replace the drives when it fails, I'm good. Unless 3 of them fail at the same time.

      But that's why I do a 1-2 month burn in before I deployed my last set of disks.

  48. Re:In other news - in 2062 they will have time tra by adolf · · Score: 1

    But...modern HP LaserJet printers still grok PCL, and Oki still makes dot matrix printers.

    Wouldn't printing have been the easy part?

  49. You can't extrapolate like that. by SharpFang · · Score: 1

    With lower tolerances, more precise engineering, variance drops. Mean time to failure may remain the same, but instead of one disk working a month, and another ten years, you have twenty disks failing within three months of each other, five years from now. Bearings wearing the same, grease drying up at the same rate, springs losing flexibility at the same rate - the date of failure ceases to be a random factor, and becomes highly deterministic. And the fact that not a single disk failed within a year means only that they are performing very similarly - not that they are performing extremely well.

    --
    45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
  50. Helium by jgotts · · Score: 1

    I checked the specific 8 TB hard drive referenced in the article, and it's helium filled.

    That's not the type of hard drive I'd want to rely on for any more than a few years, at least until they've perfected helium technology.

    Mainly I wonder how they plan on keeping the helium sealed inside the hard drive given that seals degrade over time.

  51. Where do they get 145 from? by zennling · · Score: 1

    3 * 45 is 135?

  52. Is it so hard to link the PRIMARY source? by allo · · Score: 1

    The ars article even seems to lack the link to the primary source.

    So stop linking secondary sources here!

  53. Hopefully with scrubbing and automatic SMART by raymorris · · Score: 2

    I do similar, using LVM and mdadm. I've found it works well. Reliability is much increased by a) automatic monitoring of SMART data which warns me of impending failures via email and b) weekly scrubbing, checking that all blocks are consistent.

    > As long as I replace the drives when it fails

    The above monitoring and scrubbing lets me replace drives shortly BEFORE they fail, and mostly ensures that the remaining disks don't have hidden errors. A rebuild is intensive, so it can certainly cause a "working" disk to fail at the worst possible time, if you're not verifying the health of those "working" disks weekly.

  54. Re:In other news - in 2062 they will have time tra by myowntrueself · · Score: 2

    Based on my own experience as a long term sysadmin, the quality and longevity of drives go up and down. Late 1990s drives were bad, early 2000s were good, late 2000s were bad, early 2010s were good, and now it's pretty bad again. It's not just vendor specific, because vendors seem to adjust to each other to arrive at common price/quality point. Sure, there are exceptions, like the Deathstars, but overall, I think the drives tend to be similar in longevity not so much based on brand, but what generation they are.

    I agree. And even within a vendor and within a model there can be huge variation. I worked on a server farm where we had hundreds of 'identical disks' (same make, model and vendor) except some were made in Hungary and some were made in Thailand. The Thailand disks were failing at an enormous rate.

    --
    In the free world the media isn't government run; the government is media run.
  55. Re:In other news - in 2062 they will have time tra by networkBoy · · Score: 1

    That's exactly what I thought.
    no. Trying to get this thing to print was a *bitch*.

    and when I say hardcoded I mean hardcoded. Parallel port output only, puked on the windows virtual port, total pain in the ass. I hinestly think they didn't actually use the printer drivers, but rather bit bashed the port output.

    --
    whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
  56. Deathstars and temperature by DrYak · · Score: 1

    Another myth that these studies have debunked is that HDDs do better if kept cool. Actually, failure rates are lower for disks kept at the higher end of the rated temperatures. This is one reason that Google runs "hot" datacenters today, with ambient temps over 100F.

    Funnily though, I've had very good success with IBM Deskstars (back during the infamous era of "Deathstars" click death) simply by running them cooler.
    Though again, I did only have a few. So I only have anecdotal evidence.
    Maybe I was just lucky to have the few Deskstars that didn't went "Dearthstar".
    But maybe thermal management was indeed exceptionnally a problem on these old drives.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  57. Re:In other news - in 2062 they will have time tra by adolf · · Score: 1

    MS-DOS didn't have printer drivers. Bit-banging was unusual because MS-DOS did provide a character interface to the printer port (typically as a device called LPT1:, which you can easily parse as the equivalent of /dev/whatever, plus or minus some CPM-ish),

    But even early Windows releases were half-fucking-decent at capturing LPT1: output and spooling it appropriately for MS-DOS applications, but you said this shit the bed, too.

    That said, doesn't QEMU (and friends) provide a properly-virtualized parallel port -- bit-banging and all?

    (And if not, it should.)

  58. Not server drives by Anonymous Coward · · Score: 0

    The use-case for B.B is very different than NAS users. Pay more server NAS drives to reduce failures. Learned my lessen losing media files with no warning due to low level sector failures. SMART is not smart. I am using WD RED drives now in my QNAP with very good results.