Backblaze Hard Drive Stats for 2017 (backblaze.com)
BackBlaze is back with its hard drive reliability report. From the blog post: Beginning in April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. As of the end of 2017, there are about 88 million entries totaling 23 GB of data. At the end of 2017 we had 93,240 spinning hard drives. Of that number, there were 1,935 boot drives and 91,305 data drives. This post looks at the hard drive statistics of the data drives we monitor. We'll review the stats for Q4 2017, all of 2017, and the lifetime statistics for all of the drives Backblaze has used in our cloud storage data centers since we started keeping track.
Seagate is garbage and cheap while HGST is better and more expensive. WD falls in the middle. Price be GB has not fallen in a long time either. I'm out of space and always wonder about saving $90 by shucking a WD EasyStore or paying for HGST.
Only the State obtains its revenue by coercion. - Murray Rothbard
Member Randi Van De Loo? He said so 100 years ago.
For like the 5th year in a row HGST has the lowest failure rates.
For personal use it's clear HGST is the way to go. Sure you pay a few extra bucks, but I don't have a massive data farm with redundant data spread out all over the place.I can't afford multiple drive failures.
Can we get some stats about something interesting like APK Hosts File Engine Turbo 2 Alpha Extreme edition?!
How many machines use this software?
How many hosts entries are in a typical file?
Can we see metrics of this blazing fast kernel level speed?
These are the questions we sloshditters want answered!
APK for FCC Chairman 2018
Hi I’m Bilal, cofounder of ClearBrain (https://clearbrain.com). ClearBrain helps you automatically build predictive models for which of your users are most likely to convert or churn in your app. Think AmazonML for marketing analysts.
Our founding team comes from Optimizely & Google where we built similar predictive tools for our marketing teams. At each company, we kept building the same components of a predictive pipeline - javascript snippets to collect data, ETL jobs to transform that data, and cron jobs to run a regression. We were spending hours a week maintaining these pipelines, but the time-consuming part wasn’t the algorithms (as they’re open sourced) it was the transformations.
So with ClearBrain we decided to automate the data transformation steps. We built our system in Spark ML (scala), Data Pipeline, and Go. Instead of instrumenting yet another Javascript snippet, we use existing data in Segment (YC S11) and Heap (YC W13) through standard integrations. And because every Segment/Heap dataset has the same schema, our system can process it with the same transformations into a machine-readable feature matrix. When a customer selects a user action tracked in Segment/Heap to predict, our transformed matrix is run through a logistic regression via Spark ML, and outputs a probabilistic score for each user to perform that action based on users who performed it in the past.
This distills the predictive modeling process to a simple UI to identify high-probability users in minutes. We’ve built the tool with marketers in mind, to help them identify which users may convert or churn, and export those users to marketing tools like Facebook Ads, Hubspot, etc. We’ve also found good reception from startups that have marketing objectives but lack the resources to deploy ML-driven campaigns themselves.
We look forward to feedback from the /. community! :)
Bilal
I have used nothing but HGST drives for all the machines I have built, including NAS's, for as long as I can remember. This is an awesome study and I am sure it probably has some peeps at seagate steaming right about now.
Wondering if any other storage company releases their HD failure rates?
predictive analytics
This is actually pretty funny. How could they not predict that they would get ripped apart posting this here? I don't do predictive analytics and even I could have predicted this. They must not be very good at what they do.
I have them. 20+ dead Seagates... internals and externals. Only 2 drives in the past 10 years have survived... yet I have no dead Hitachis, one dead Samsung and a couple dead WDs.
Seagate and Maxtor merging combined the worst of both companies into one terrible behemoth.
Also, drive prices still suck. The floods in Thailand were an excuse to gouge customers as insurance companies funded the construction of shiny new plants capable of producing 10+TB drives as fast and as cheaply as they had been churning out 2TB drives (for around $45 - 7 years ago!). We should be getting 10TB drives for $50 by now.
Seagate SG4000 series life expectancy: 32 years
Average HD life expectancy: 50 years
HGST HDS5C series: 167 years
I'm curious if vendor is the most strongly correlated factor to drive failure. I could imagine that bad PDUs could cause drive failures across vendors, or different data centers or racking methods or ...
"Had they used Seagate's world-class global-leading enterprise-caliber hardware, their experience would be much more favorable." --Seagate rep"
So, what does that mean? Does Seagate deliberately make drives that have a higher failure rate? Why?
What is Seagate's "enterprise-caliber"? Is there a "sloppy-manufacturing caliber" or a bad-design caliber"?
The Backblaze statistics are of limited use because only a few were tested of many of the drive models.
Only two of the Seagate drive models (ST400DM001, ST400DM005) had excessively high failure rates, and the worst one (ST400DM005) had been in use the shortest time of all drive models in the report by far and suffered a single failure. The confidence interval chart shows this - the low end of the confidence interval of that model is 0.0% - meaning for all we know it could be the most reliable drive in the report, it just had the misfortune of a random failure soon after they began using it.
Subtract those two models, and Seagate's aggregate failure rate is lower than WD's.
Whenever Backblaze puts one of these reports out, I keep having to tell people: Every drive model tries using different components and different technologies to eek out better performance and capacity. Sometimes it works, sometimes it doesn't. The way you should be using these reports is to decide which drive models to avoid, not which manufacturer. That's why Backblaze breaks it down by drive model, and apparently they've wisened up and not made a chart summarizing each manufacturer.
Can someone explain to me how the Seagate ST4000DM005, of which they had sixty running and a single failure in a quarter, equate to a massive 29.08% annualized failure rate?
They make an attempt to explain that case at the bottom of the page but it makes no sense to me. With a single failure causing such massive spikes I'd be leaving them off as "insufficient data" or at least introducing some error bars.
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
I see that Seagate continues being a piece of shit drive, and quite unfortunately the only one at reasonable price that I can find in the local market.
Oh well...
Backblaze is a backup service company. Basically, all they do with their drives is put them up in a bespoke cabinet, slowly fill them up with data at internet speed, then let them running for a long time doing hardly anything at all. Infrequently, when someone loses some data somehere, they read a small portion of them. This is very far from what most people do with their drives. In particular read/write performance and reliability does not matter to Backblaze.
With the highest failure rates.
This is neat and all, but I didn't see a mention of how the data is normalized in time. Just blindly comparing how many of a particular drive failed any given quarter is misleading at best. Their failures must be normalized in time. i.e. the failure rate must be scaled by the amount of time in service.
This is usually specified as FIT (failures in time). It makes no sense to directly compare a batch of drives that might have only been in service 1 year with ones that might have been in service for 1 day.
Maybe they are doing something like this, but it doesn't seem like it.
On order for Seagate to be at the top of most failures every year, I would have to say yes. They are doing it deliberately.
...that I most look forward to on Slashdot. I wish they'd also publish more stuff on SSD torture testing / failure rates.
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
this is actually the summary for 2017, and it was true for 2016 and 2015 as well. If you want reliability and quality, buy Toshiba or Hitachi/HGST, avoid Western Digital/WDC and Seagate.
There is no data for IBM, if they even make HDDs anymore, but most likely they follow the WD and Seagate trend. It is what it is.