8TB Drives Are Highly Reliable, Says Backblaze (yahoo.com)

← Back to Stories (view on slashdot.org)

8TB Drives Are Highly Reliable, Says Backblaze (yahoo.com)

Posted by BeauHD on Tuesday August 2, 2016 @10:10AM from the data-loss dept.

An anonymous reader writes from a report via Yahoo News: Cloud backup and storage provider Backblaze has published its hard drive stats for Q2 2016. Yahoo News reports: "The report is based on data drives, not boot drives, that are deployed across the company's data centers in quantities of 45 or more. According to the report, the company saw an annualized failure rate of 19.81 percent with the Seagate ST4000DX000 4TB drive in a quantity of 197 units working 18,428 days. The next in line was the WD WD40EFRX 4TB drive in a quantity of 46 units working 4,186 days. This model had an annualized failure rate of 8.72 percent for that quarter. The company's report also notes that it finally introduced 8TB hard drives into its fold: first with a mere 45 8TB HGST units and then over 2,700 units from Seagate crammed into the company's Blackblaze Vaults, which include 20 Storage Pods containing 45 drives each. The company moved to 8TB drives to optimize storage density. According to a chart provided in the report, the 8TB drives are highly reliable. The HGST HDS5C8080ALE600 worked for 22,858 days and only saw two failures, generating an annualized failure rate of 3.20 percent. The Seagate ST8000DM002 worked for 44,000 days and only saw four failures, generating an annual failure rate of 3.30 percent." For comparison, Backblaze's reliability report for Q1 2016 can be found here.

UPDATE 8/2/16: Corrected Seagate Model "DT8000DM002" to "ST8000DM002."

25 of 209 comments (clear)

Min score:

Reason:

Sort:

Yeah, but... by by+(1706743) · 2016-08-02 10:13 · Score: 5, Funny

...they use helium in the drives, so all your music sounds like Alvin and the Chipmunks.
Re:Reliability by rthille · 2016-08-02 10:28 · Score: 2

OTOH, given SSDs and the inability to guarantee the erasure of all data on the drive, unencrypted data should never hit the drives at all, and the key should of course also never be stored on the same media (unencrypted).
That said, only my newer systems use encrypted volumes. My old drives I take apart and shatter/melt the platters.

--
Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
High failure rate by JustAnotherOldGuy · 2016-08-02 10:44 · Score: 5, Insightful

"...the company saw an annualized failure rate of 19.81 percent with the Seagate ST4000DX000 4TB drive"
A failure rate of almost 20% in a data center? Geez, that's pathetic.
A temperature-controlled environment, clean power, low shock and vibration, and 1 out of 5 still fails? Remind me never to buy Seagate. Oh, wait, I already vowed never to buy another Seagate- about 10 years ago after experiencing their unequaled propensity to die fast and hard.
Maybe other people have had better luck with Seagate than I have, but for me they've always been disappointing.

--
Just cruising through this digital world at 33 1/3 rpm...
1. Re:High failure rate by radarskiy · 2016-08-02 13:01 · Score: 2
  
  If you wrote off every manufacturer that hit a 20% annualized failure rate you would now be unable to buy any drives.
2. Re:High failure rate by waveclaw · 2016-08-02 13:44 · Score: 4, Informative
  
  If only that blackbaze pods were even remotely like other datacenter equipment. As far as vibration is concerned they are still pretty much a torture test for anything with a spinning motor. Minimal vibration protection while being mechanically coupled to a weak foundation while crammed in as tightly as geometry allows.
  
  A temperature-controlled environment, clean power, low shock and vibration, and 1 out of 5 still fails
  The density and structure of a pod is only temperature-controlled in that it is going to get hot, quickly.
  
  Remind me never to buy Seagate.
  The numbers from Backblaze you'll actually see that you shouldn't buy one particular desktop model of hard drive for your "datacenter." Numbers like Backblaze releases are quite fascinating in that you can analyze them. You can find which models at any vendor to prefer or avoid.
  
  Oh, wait, I already vowed never to buy another Seagate- about 10 years ago after experiencing their unequaled propensity to die fast and hard.
  Sorry to hear about your loss. I hope you kept backup copies. If not, I hope it taught you that if you don't have a copy then you don't have a backup.
  It is certainly reasonable to avoid a vendor when a lot of their products from many lines have defects at a given time. Seagate's desktop line certainly took a hit from the initial Backblaze numbers. The DM1000's huge failure rate is almost as legendary as the IBM Death Star line or Maxtor click-of-death. But stuff from before or after a given run may have better or worse quality. And of course even manufactures can get batches of bad parts. (Hidden variables like that are one of the reasons why the singular of data isn't anecdote.)
  I also wonder if we'll ever get numbers from Backblaze on things like the actual temperature, decibels and power these drives lived through. More than just avoiding a particular model. It would be nice to know how hot, loud and nasty you can get before your commodity-class storage starts pooping out.
  
  --
  
  "You cannot have a General Will unless you have shared experiences. You cannot be fair to people you don't know."
3. Re:High failure rate by brianwski · 2016-08-03 05:59 · Score: 2
  
  Brian from Backblaze here.
  
  > I also wonder if we'll ever get numbers from Backblaze on things like the actual temperature ... power these drives lived through.
  
  The raw data dump includes drive temperatures as reported by "smartctl". You can find a dump here: https://www.backblaze.com/b2/h...
  
  We analyzed the failures correlated with temperature in this blog post in 2014: https://www.backblaze.com/blog...
  
  In a conversation with some of the Facebook Open Storage people, they said hard drives have increased failure rates at extremely high temperatures but our drives never get anywhere NEAR the temperatures required to cause failures. We monitor every drive for temperature, taking readings once every 2 minutes, and we have had situations where the drive temperatures caused our internal warning alerts to go off (well below those catastrophic levels Facebook saw failures at). When we go to investigate, the most common cause of rising pod drive temperature is that some of our fans in that pod have died. We used to have 6 gigantic fans to keep it cool, but we reduced it to 3 with no increase in drive temperature. If one of the fans dies it doesn't get warm enough to set off any alerts, but if 2 out of 3 fans die it can't move enough air to keep the pod within reasonable operating temperatures. We don't monitor the fans directly, but drive temperature has been such a good proxy for it we don't feel any pressing need to figure out how to monitor the fans.
4. Re:High failure rate by brianwski · 2016-08-03 06:19 · Score: 2
  
  Brian from Backblaze here.
  
  > I think their pods only have GigE interfaces
  
  Originally (up until 3 years ago) that was true, but all new pods have 10 GbE interfaces, and 100% of the pods in our "Backblaze 20 pod Vaults" have 10 GbE interfaces. And there are some really strange (and wonderful) performance twists on using 20 pods to store each file: when you fetch a 1 MByte file from a vault, we need 17 pods to respond each supplying only 60k bytes to reassemble the complete file from the Reed Solomon. So the actual bandwidth when fetching just one medium size file can reach more like 170 Gbit/sec theoretical bandwidth. However, if you tried to fetch ALL the files from a pod all at once, the raw 7200 RPM drive performance is our current limiting factor.
  
  Here is a link to a blog post on the 20 pod Backblaze Vault architecture: https://www.backblaze.com/blog...
  
  Here is a link to the Reed Solomon encoding we open sourced that we use on the 20 pod Vaults: https://www.backblaze.com/blog...
Re:Reliability by Jahoda · 2016-08-02 10:53 · Score: 2

You know what? You're absolutely right and I do stand corrected - I recall this about the 3 TB - probably from Backblaze's data - and I want to say that I think they were first hitting the market after the Thailand disaster? It seems like the 4 TB models are pretty resilient. Anecdotally, I have 8 handling my home library and backups, and have had no failures since I started buying them in March 2013..
Re:Correct Seagate 8 TB Model by Anonymous Coward · 2016-08-02 10:54 · Score: 5, Funny

I've found one! The mythical Slashdot editor who edits. All hail the editing editor!
Re:More that HGST are reliable by gweihir · 2016-08-02 10:57 · Score: 2

More like Seagate 8TB not being utter trash (like so many other Seagate drives).

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:comment by cheater512 · 2016-08-02 11:13 · Score: 4, Insightful

If you've got 3,000 drives at home to come up with directly home applicable numbers, then please share them.
This is mostly useful to compare models vs models as the environment is kept the same.
It's completely legitimate to say model X is more reliable than model Y, it's not valid to say model X has a Z% failure rate in a home environment however.
Re:Reliability by PCM2 · 2016-08-02 11:22 · Score: 2

Reliability is not so great an issue with raid systems being what they are today.

At the scale Backblaze is talking about, I would say it's an issue. Somebody has to keep all those drives in stock and walk back to a cage to replace them. It's not data loss we're worried about here, it's costs.

--
Breakfast served all day!
22,858 and 44,000 days?!? by Linsaran · 2016-08-02 11:32 · Score: 4, Funny

I presume there's some detail I'm missing here since we did not have 8 TB hard drives 120 years ago.

--
In a bit of shameless internet panhandling, I accept Litecoin Donations at Lbd2oH9QsthD1GfuUXPyka12YxvWJYnBVf
1. Re:22,858 and 44,000 days?!? by sexconker · 2016-08-02 11:47 · Score: 2
  
  Drive days.
  1000 drives for 44 days each would get you 44000 drive days.
Re:Not SSD Drives by msauve · 2016-08-02 12:30 · Score: 3, Interesting

"There are so few 8GB HGST drives, and they're so new, that the current data about them is statistically insignificant/unreliable"

The numbers in the summary come from different places, because the first chart in the linked article, for the April-June quarter says:

Seagate 8TB, 2720 drives, 35840 drive days, 3 failures (13 days average per drive, 3% annual failure rate)
HGST 8TM, 45 drives, 3825 drive days, 0 failures (85 days average per drive, 0% annual failure rate)

The second chart, from April 2013 through the end of June, doesn't show drive numbers, just days, failures, and rates. The numbers in the summary seem to be pulled from both.

Assuming that the 8TB drives stay in use until they die, here's where the stats seem to come from (drive days/# of drives). Drive days pulled from the "all time" chart, # of drives from the latest quarter chart):

22858/45= 507 days average use HGST HUH728080ALE600
44000/2700= 16 days average use Seagate ST8000DM002

Now, anyone experienced with Seagate wouldn't expect the 3.3% annualized failure rate to be that low in another year and a half. The HGST rate _is_ after almost a year and a half.

--
"National Security is the chief cause of national insecurity." - Celine's First Law
Re:Riiiiiight. by Anonymous Coward · 2016-08-02 13:23 · Score: 3, Insightful

Come back in 3 or 5 years and tell me out of all the 8TB sold in 2016/2017 just how many are still functional and THEN what the failure rate is/was.
My "prediction" is it will most likely be that there is an 70% failure rate with Seagate being the top offender.
By then the data is worthless to anybody except the manufacturer. We necessarily have to accept a deficit of statistical quality to make forward predictions that are actually worth something, like knowing if I'm building a SAN, what drives I should buy.
In 5 years, I'm not going to be buying 8TB drives, so knowing what the failure rate for some 8TB drive was is inconsequential. Either HDDs continue to improve and I buy 32TB or larger HDDs, or they don't, and I'll be filling my SAN with 8TB or larger SSDs, Xpoint memory, memristor, who knows.
I'm looking at this data and it's informing me that I ought to be buying HGST drives, and that I made a mistake installing 3TB Seagate drives (though the drives tested are not the capacity or exact model ones I have), and that as they begin to fail, I would be better to replace them with 4TB HGST.
I don't really care what happens to my drives 5 years out, they'll probably be replaced with higher capacity stock whether they start to fail or not. If my capacity needs to grow, I can buy new JBOD cards, a bigger mainboard to accomodate the extra channels, more JBOD trays, more racks, upgrade the AC, and pay a higher power bill -or- I add an extra slice to a mirror with a 2x density drive, resilver, replace one of the old drives with a 2x density drive, resilver, and continue until all the drives in the mirror have been 2x'd, rinse and repeat gradually throughout the array until sufficient capacity is reached. The cost works out lower. Sure the bigger drives are more expensive than the cheap drives and I only get the incremental value, but the balance of costs is such, that it's still cheaper than endlessly growing JBODs.
The cost is lower still, considering that the drives are bought according to schedule, and when failures occur, the replacements are the larger capacity according to our schedule, and drives removed from mirrors due to capacity upgrades are put in the hotspare pool, ready to repair older, unupgraded mirrors.
All 2TB drives are outgoing at the end of this month, 6TB drives are incoming for replacements and capacity growth.
Re: More that HGST are reliable by dgatwood · 2016-08-02 13:33 · Score: 2

I dunno about you, but 3.3 vs 3.2 isn't blowing anyone's mind. Not even back blaze, and they make their money by crunching the numbers.
They're both terrible numbers, though perhaps not terrible by Seagate standards. The best of the HGST 4 TB drives had an annualized failure rate of only 0.4%. If these numbers are correct, then these drives are about an order of magnitude less reliable than previous generations of hardware....
Of course, the confidence intervals on these numbers are huge. On the low end, the HGST 8 TB drive could be approximately as reliable as the 4 TB HGST drives (.4%). On the high end, it could be as bad as a 12% annualized failure rate, which would put it into the "complete junk" category. In other words, 45 drives just aren't enough data points to be much more reliable than the anecdotal evidence from folks posting on Slashdot.

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
If it's working for them by dbIII · 2016-08-02 13:47 · Score: 4, Insightful

If it's working for them in their packed in boxes with crap airflow and really poor heat transfer then it will work even better in conventional file servers with hot swap drives at the front and a heap of airflow.

Take it with a grain of salt when Backblaze say a drive is crap since it may only be crap in their very hostile environment, but if they didn't break it then it's very likely to work well anywhere.
1. Re:If it's working for them by stoatwblr · 2016-08-03 10:28 · Score: 2
  
  "Short answer: the coolest drives are 21.92 Celcius and the hottest drive was 30.54 degrees."
  Based on the Google stats from a few years ago it was pretty clear that drive temperature was only a problem above 55C
  I target 45C as allowable maximum and 35C as normal with no apparent increase in mortality over colder temperatures, but that saves a lot in terms of running the cooling plant. The batches of Seagates Constellations we had with stupidly high failure rates ran well under 30C
  For home use my fileserver's drives have peaked out at 50C in hot weather (AC is rare in a UK house) and those drives currently have 42-48,000 hours on them (5-6 years) so it doesn't seem to have affected their reliability. OTOH Seagate ST2000DM drives in the same fileserver lasted less than 9 months (The DL series tended to run for 3 years before failing)
  Based on years of experience the best advice for RAID work I can offer is "Don't use a raid composed of the same make/model drives and if you must do that, then FFS try to ensure the drives come from different batches. Otherwise one drive failure is the only warning you have of impending array doom" (RAIDZ3 is good, but multiple drive failures within the same batch is still a high risk thing)
Re:Reliability by AK+Marc · 2016-08-02 13:57 · Score: 4, Insightful

OTOH, given SSDs and the inability to guarantee the erasure of all data on the drive,
Wow, SSD even survives incinerators? Where I used to work, the policy for drives was to open them up and strip them for their magnets, then have magnet fun. The platters made good frisbees, but the problem is that they go through car windows, and the dents in cars are deep, so frisbee with care.

--
Learn to love Alaska
Re:Did they really have 8TB drives 62 years ago? by ihtoit · 2016-08-02 18:02 · Score: 2

yes it would be unit-days, as in nX=22858 so each drive in the array (n) had an uptime of X=22858/n. We know what n is. It's 45. Therefore, X=22858/45=~508 days. The stated MTBF of the HGST Enterprise-class drives is 2.5 million hours. That would put the expected array failure rate at 2,314 days (2.5mill. divided by array size).
So don't be impressed, this is actually a failure report.

--
Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
Contrasting anecdote by billcopc · 2016-08-02 18:57 · Score: 4, Interesting

I'm an independent white-box NAS guy, and with the exception of the truly awful 1.5TB Seagate drives from 2008-2009 or so, I have not had any significant problems with them. I've got a few thousand 3 to 8 TB drives deployed with my clients, most of them cheap consumer drives (not even the "NAS" editions), and the annual failure rate is roughly 2% across all brands. This has been consistent for many years and I factor these stats into my costs and warranty projections. I have
The thing that bothers me about Backblaze, and the reason why I have a very hard time taking their results seriously, is the way they design their pods. They take a custom fabbed chassis, then fill it with the most ghetto components known to man: SATA port multipliers, ultra-low-end HBAs, dual "gamer" power supplies, very substandard cooling, and until recently they used super sketchy desktop boards. It's only last year that they finally changed the board for a Supermicro, primarily to get 10GbE very cheaply. For that same money, you can buy a ready-made 60-bay Supermicro chassis with redundant power and SAS - and a warranty. Hell, I bet SM would deliver directly to Backblaze's doorstep *and* give them a friendly discount.
Anyway... epic digression aside, when people ask me which brand is better, I tell them to buy whichever has the best warranty. A hard drive *will* die, the question is when, so the only logical course of action is to plan around its inevitable demise by keeping backups and redundancies, and learning the ins and outs of the RMA process.

--
-Billco, Fnarg.com
What I've learned... by adolf · 2016-08-02 20:01 · Score: 3, Insightful

What I've learned from reading the comments here is that people are just as clueless when it comes to storage reliability as they ever were, and are just as capable of throwing the baby out with the bathwater as at any other time.
Dear Slashdot: Never change.

--
Kid-proof tablet..
Re:Reliability by NormalVisual · 2016-08-02 20:45 · Score: 2

The platters made good frisbees, but the problem is that they go through car windows, and the dents in cars are deep, so frisbee with care.

And they can hurt too. Not that I'd have any personal experience with that....

--
Please stand clear of the doors, por favor mantenganse alejado de las puertas
The biggest drives always are by bravecanadian · 2016-08-03 01:40 · Score: 2

the most unreliable.
That is why you buy in the sweet spot for best value and let someone else prove new technologies and HD densities for you..