Backblaze Releases Billion-Hour Hard Drive Reliability Report (extremetech.com)
jones_supa writes: The storage services provider Backblaze has released its reliability report for Q1/2016 covering cumulative failure rates of mechanical hard disk drives by specific model numbers and by manufacturer. The company noted that as of this quarter, its 60,000 drives have cumulatively spun for over one billion hours (100,000 years). Hitachi Global Storage Technologies (HGST) is the clear leader here, with an annual failure rate of just 1% for three years running. The second position is also taken by a Japanese company: Toshiba. Third place goes to Western Digital (WD), with the company's ratings having improved in the past year. Seagate comes out the worst, though it is suspected that much of that rating was warped by the company's crash-happy 3 TB drive (ST3000DM001). Backblaze notes that 4 TB drives continue to be the sweet spot for building out its storage pods, but that it might move to 6, 8, or 10 TB drives as the price on the hardware comes down.
...can these statistics be used to find parcel trowing delivery drivers???
HGST is owned by WD now if I recall, so it's not Japanese anymore. (Sorry if somebody already mentioned this.)
It will affect you, if you ignore the results and choose to buy a Seagate drive. Trust me, I've been there...
Is this a troll?
The point isn't that any drive lasts that long. It's an appeal to the statistical significance of the result.
Can anyone tell me how this affects anyone? A billion hours is a ridiculous amount of time that makes this irrelevant to any reasonable person. No one cares if a hard drive lasts a billion hours.
I suggest you look at the definition of the word "cumulatively".
Here is a hint: divide 1,000,000,000 by the 60,000 HDD of the report, this makes 16,667 hours which is approximately 2 years.
it puzzles me why they aren't running red pros, which have almost twice the warranty, the vanilla reds do suck in terms of reliability
Good god! opening that webpage is like walking trough treacle. I had to turn on Ghostery - 25 trackers!!
Yeah, I've been there. There's nothing quite like the sinking feeling you get when you first hear the "bird chirp" sound which is the first sign of impending catastrophic failure.
I had 3 of those drives fail in a 6 month period, all of them relatively new and only subjected to consumer-level usage. It got to the point where I was getting agitated every time there was birdsong outside my window. Seagate drives don't get anywhere near my home PC since then.
I think the author of the summary is mixing up the results for Seagate and WD.
From the summary:
That drive isn't even in the table in the report. It's the 4TB drives that pull Seagate's rating down.
Also, looking at the graph by manufacturer and year, it looks like it is Seagate that improved much the last year, not WD. And it's Seagate that comes out third for 2016 in that graph, while WD is last, not the other way around.
From TFS:
Backblaze also notes that the 8.63% failure rate on the Toshiba 3TB is misleadingly high — the company has just 45 of those drives, and one of them happened to fail.
Wut?
That's what they call 'disk nearline storage'. MAID arrays. :)
I also remember that RAID originally stood for "Redundant Array of Inexpensive Disks", not for "top-of-the-line SCSI moneyburners"
Why does the summary compare annual failure rates if the measure that matters is bytes*MTBF/dollar ?
As they note, one of their drives has an 8% annual failure rate because they have 45 and one happened to fail this quarter. A lot of the others are similar numbers, with the difference between 0 and 1 failures being 4-8%. The only ones where they have enough data to be useful are HGS, one WD, and two Seagate models. One Seagate is a lot less reliable than most HGST drives (and less reliable than the worst HGST model), the other is the most reliable disk in the set. The WD drive is the least reliable.
I am TheRaven on Soylent News
Ever price out "enterprise" drives?
When you're buying 10 drives, you pay the premium because man-hours to deal with failures are expensive. When you're buying 10,000? Not so much because failures are built into the design at that scale.
At the scale Backblaze operates, it's cheaper to build redundant systems that can handle consumer drive failures and just buy twice as many drives.
when they sold out to Western Digital. This is the kind of thing they could capitalize on, to have the absolutely highest quality storage drives available. Anyway, what's clear from these charts is that the American products are of the lowest quality, and if quality is what you want, in general you buy Japanese or German.
It's how statistics work.
There are over 7 billion people on the planet divided among 100 or so ethnicities and about 200 countries. If you're trying to determine the demographics of the world, checking only 10 random people will not give you any meaningful data. Checking a million random people, on the other hand, will give you a fairly good idea of the demographics of the world.
Same with hard drives. Statistics on 5 hard drives won't tell you anything about the likelihood of a 6th drive failing. Statistics on 100,000 drives will.
Yes, it is a troll. "Why does this matter?"
Not only have Seagate chained themselves to the declining HDD market, apparently happy with their inevitable fate of oblivion, but they can't even get that right! Their HDDs are totally crap as well.
== Jez ==
Do you miss Firefox? Try Pale Moon.
My 3x Seagate NAS 4TB drives sitting in my file server have power-on hours of about 19000 without any sign of problems. I will replace them at the 3 year running mark anyway, but mostly because I am running out of space. If you wanna talk about really horrible drives, look at first generation WD Green. I sold mine after 2 months because of extremely lacking performance and worrying mechanical noises. During the 20 years I have been using hard drives I have come to the conclusion that one should never buy the largest drives and not the first generation of new technology drives. Choose the safe middle-ground.
You misinterpreted: this is a billion drive-hours worth of data, not a drive operating for a billion hours(given that that's a bit over 110,000 years, we don't really have that sort of reliability data, even if anyone cared).
And, when it comes to reliability analysis, that 'ridiculous amount of time' is enormously helpful. How else are you going to draw statistically significant conclusions about something with such an element of chance?
You're right, I misunderstood. Reliability does matter. I don't understand why people think Seagate has awful reliability, though. The real issue isn't which manufacturers make good drives, but which models are reliable.
Someone hasn't been paying attention. They consumer drives actually have performed better over time than enterprise grade drives. You should read their previous reports before typing on the internet and showing everyone else you don't know what you're talking about.
Four years ago, when I bought my old laptop, there were so many with 1 tb hard drives, although a bit expensive. That one died on me (not the hard drive, but the hinge snapped) and so I went to look for s new laptop. Turns out we've regressed, and now laptop hard drives are only a fraction of what they used to be at double the price, all for a tiny bit of speed and thinness. Fucking bullshit, I say.
Depends on your use case: the Backblaze people are operating a system specifically designed for cheapo drives that are expected to have a fairly high chance of falling over and dying(pragmatically speaking, that's part of why they are so nice and friendly about drive reliability data and sharing the designs for their 'pods': their real asset as a company is the software sauce that allows them to offer cheap, reliable, storage through software-level redundancy on top of a pile of low-end drives packed tight and connected with really cheap HBAs and SATA port multipliers: no fancy hardware RAID, no redundant-controller SAS, etc.)
If you are buying drives to use as the boot volume for computers that only get a single HDD, or even systems with small RAID arrays, you are going to be seriously inconvenienced by drive models that drop dead atypically fast, even if you save a few bucks upfront. Re-imaging a replacement drive or swapping out a failed RAID disk and rebuilding the volume take time and trouble.
If your purposes are very similar to theirs, then your sensitivity to failure is lower and getting a slightly better deal per GB might start to make sense; but you have to be pretty failure insensitive(or the price of reliability really steep) to be in the same boat.
IIRC, the WD green drives had firmware issues - they were "green" because the FW would power them down prematurely in an effort to save energy, only to have them powered up again because the OS requested a read/write. Too many off/on cycles = premature failure.
Also, I must be lucky - I've had one Seagate failure in 12 years, and it was replaced under warranty. Small sample, admittedly - somewhere between 100 and 150 in domestic use.
They sentenced me to twenty years of boredom
Their whole thing is a software-level redundancy arrangement designed to provide adequate reliability through redundancy on top of utter shit hardware. That's the company's niche. It does mean that they massacre drives like crazy; but their cost/GB is pretty impressive, so long as you are doing fairly cold storage, not something IOPS intensive.
Actually, look at the URL your browser shows before modding anything down.
It's a plain simple google search link that will prove that what he said is true, the top (and probably third) comment in this thread is a troll.
And a dumb one at that, who starts all of his comments with the exact same words.
ST3000DM001 is just a particularly problematic drive. Backblaze has been using consumer drives for years, with great success. At their scale, It's much more cost-effective to keep replacing cheaper, maybe less reliable drives. Enterprise drives may make sense for some architectures, but certainly not for them.
Don't say "X is bad" or "Y is bad" without adding "today", and keep an eye on how things evolve.
5 years ago WD had a piss poor reputation here at work, while Seagate counted as decent.
I think a certain shipment of WD drives (don't remember the exact type) didn't contain a single one that has lasted a full year before giving up.
Similarly, at a certain point in time, I think about 15 years ago, I wouldn't have recommended Hitachi to anyone, and Samsung was even worse.
Funny how the industry is still making excuses for Seagate.
Worst crap in the industry for over 20 years.
I have an SGI Octane @ work with 15K Cheetahs that hasn't been shut off since 2003.
No, it will affect you if you choose to ignore the results and buy a *3TB* Seagate drive.
When will people stop picking stupid manufacturer sides when it comes to drive reliability? It has nothing to do with manufacturers and everything to do with models. *Every* drive maker has put out shitty models that fail in dumb ways, from HGST (ex-IBM)'s DeathStars to Samsung's firmware fail (I still own a bunch of HD204UIs with an unfixed firmware bug that eats data if you dare use SMART self-tests) to Seagate's 3TB failures. Picking manufacturer sides just means you'll get hit whenever they make the next broken drive.
If you actually look at their per-drive stats, you'll see that Seagate's 4TB drive is, so far, *more* reliable than WD's current drives. I have a bunch of those and they're mostly running fine - though I had one drop off the controller last weekend (came back after reboot), first failure in years, I need to look into that. We'll see. Right now, 4TB Seagates seem to be the best bang per buck with decent reliability. Next year it might be another brand/drive.
They are bad drives for sure, but they are not exactly using 90's raid tech either. It's distributed mirrors the OS just sees a JBOD and higher levels deal with making copies etc. Looking at their hardware spec they are not realy worried about performance with a lot of sata port multipliers. But their industry is write it twice and probably never access it again outside of bitrot detection and correcting for failures.
No sir I dont like it.
And not some news website which doesn't even have the courtesy to provide a link to the actual source report.
https://www.backblaze.com/blog/hard-drive-reliability-stats-q1-2016/
It includes historical models as well as statistical confidence intervals - very useful for determining which model drive is more reliable. I know everyone wants to use an easy rule like "Seagate bad" when buying, but it's not that simple. Each new model of drive includes new design changes to try to increase capacity, improve speed and reliability, and/or reduce cost. Sometimes these design changes work, sometimes they don't and the model is less reliable (e.g. Samsung 840 EVO). The statistics have the greatest orthogonality when broken down by model, not by manufacturer.
So they label the data table as being for the first quarter 2016, but then for some inexplicable reason they change the failure rate to be annual? Are they using historical or projected data? Why skew the failure rate?
And then the bar graph - failure rates by manufacturer. How are they getting this data? For example, 2016 for HGST they list a failure rate of 1.03%, but that isn't borne out in the table data. The table data suggests only a 0.2% failure rate (44 failures / 22731 drives).
This is NOT the first report in which HGST hard drives resulted to be the most reliable, and very much not the first report where Seagate came dead last in reliability. In fact Seagate's unreliability is becoming legendary.
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
Indeed. Of the (only) 100 desktops I deal with, exactly one had a 3TB Seagate, and it failed.
When will people stop picking stupid manufacturer sides when it comes to drive reliability?
When people stop continuing to get tons of anecdotal evidence. I was refurbishing some old storage boxes, and testing all the drives. Found about 10 bad out of 50. All the bad ones were Seagate. (Farious sizes and models.) None of the WD were bad. This reinforces my belief that Seagate is likely to be crap. It works the same way for others.
What the fuck is this company doing using consumer hard drives in a goddamn data center?
Making lots money while still offering services cheaper then anyone else. Their B2 storage is even cheaper then Amazon S3. Yet, the service overall is very reliable, and the customers seem very happy.
Yes! And similarly, seeing the sun rise in the east every day for the last few thousand years of human recorded history tells us nothing about what'll happen tomorrow!
Stop being an ass. Obviously there are assumptions that go into predictions about the future, cheif of which is that observations of the past are representative of general behaviour. But then dismissing that as utterly invalid is to dismiss the predictive power of the entirety of science.
not even once, and i run seagates everywhere and even an old maxtor 80gig drive on a 2002 desktop i use for torrenting, then again all have a fan for cooling and none use those fancy alternate power states in which the computer isnt on, but isnt off either, im not a faggot here, i like women with big titties and binary power states
but if you are running a seagate 3tb drive thats a monstrosity with 4 plates and 5 heads or whatever, or a model thats been on the market for 2 months and you love to put it to sleep and wake it up again until the firmware craps on you, you are dumb, and probably a faggot too
This data is only for a 91 day period. To actually be useful, data should be presented for a rolling 6M, 1Y, 2Y and 3Y periods of time, or at least for however long they keep drives in service. They should also include mean and median age of the group of drives. Perhaps Backblaze has that info elsewhere but nothing like it in the article.
If you randomly select drives from a population than it absolutely does tell you something about the unsampled units. Obviously they don't run their drives for one hour an then retire them.
Stop being a Grade-A cunt, you know damn well what he meant.
These drive statistics aren't used to give a 100% insight into how the next batch WILL behave, they're used to ESTIMATE what COULD happen to the next batch of drives, and sample sizes DO affect the results.
Go back and finish high school.
I will resist the impulse to shout "hey, stupid". If you even bothered to glance at the small table in the report, you would see that no Seagate 3 TB at all were covered. But the ST4000DX000 4 TB (5 failures, 9.63% failure rate) and ST4000DM000 4 TB (198 failures, 2.54% failure rate) were.
This is the most wrong reply I've seen on Slash this week, kudos!
You must be one of those the chance of winning the lottery is 50:50, you either win or you don't people.
A long analysis of statistics of 100000 drives most definitely gives you information about the 100001th drive when it's in a population group compared to another population group.
Build a drive that self destructs after 2 hours, run a billion of them for 1 hour ... billion hours of time with no failures!!!!!!
Your absurd abuse of statistics would give very valuable insight into the assembly process and QA process of a manufacturer. This would produce very valuable information despite your attempt to show it's worthless, especially since infant mortality is a thing.
YOU don't understand how statistics ACTUALLY work.
You don't seem to understand. The whole point of statistics is to give predictive power. If you can't predict what the 100,001st drive will probably do, then you're not using statistics.
Correct. I disabled the power down timeout. It stopped the on/off cycles, but didn't improve performance. I ended up buying regular 7200 rpm Seagate disks. Sold them after 4 years and replaced with my current Seagate NAS disks. They run 5 degrees cooler than my old disks with no noticable performance decrease. Highly worth the small price difference.
Aside from comments on specific models and specific manufacturers, has anyone else noticed the downward trend?
I wonder if this is due to more careful selection or (except in the case of Seagate which is quite obvious) the manufacturers are actually getting better, or age related issues in the way the stats are reported.
I had two of the terrible 1.5TB Seagates fail early. Didn't even do a warranty exchange on them, wasn't worth having to do another one 3-6 months down the road, and then another, and another and... So I bought WD, Toshiba, HSGT, pretty much anything but Seagate. I still won't buy Seagate. Trust once lost is hard to earn back. Their drives just haven't been better than the brands I do trust, so no reason to go back to them.
The cesspool just got a check and balance.
You were lucky to get a year out of one of their drives.
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
They're using all of the drives the same way and one brand fails 3x+ more than others. It doesn't matter if they're using the harddrives "wrong".
God I hope you're not in a position where anybody might mistake you for someone who knows what they're talking about.
In case ExtremeTech is listening, I added them to my hosts file (several months back) and now never go there any more. Used to be worth a periodic visit...
I come here for the love
A billion hours is 114 thousand years
civilization didnt exist that long ago
at least on this planet
even if they tested a thousand drives for 114 years that would still be amazing. what sort of hard drives were around in 1902 ?
I know it's a reliability report, but shouldn't drive warranties be considered?
If a drive is still under warranty, do I really care if it fails at time X versus 2*X? Rather than choosing a drive based on overall reliability, shouldn't I make the decision based on reliability after the warranty period has elapsed?
I realize advertising is king here, but a link to the original and far more detailed report would have been nice. https://www.backblaze.com/blog...
They also intentionally broke RAID functionality to force you to buy their more expensive drives. I used one or two in a RAID anyway after correcting all the firmware settings, but it still caused problems.
They were still using 3TB Seagates in their last report (Q4 2015). They discontinued all use of them as a result of their findings.
Does it really pay off in the long-run to buy lower quality drives?
For example, a 5400 RPM 4 TB WD Blue (desktop) drive is $130 with a 2-year warranty. The 4 TB WD Gold (datacenter) is $264 with a 5-year warranty, but spins faster and has twice as much cache. The more expensive drive is slightly cheaper per warranty-year and provides more IOPS, but does draw almost 4 Watts more power when active.
Without knowing how long the drives last beyond their warranty period, which the Backblaze report doesn't mention, isn't it less risky to buy the more expensive drive?
If I have 3 women have a baby, it will get done in 3 months.
MS Project told me so.
Never answer an anonymous letter. - Yogi Berra
They buy Seagate because Seagate will allow them to do volume purchases.
It's a bit easier to go to your local Best Buy and get one or two drives of whatever manufacturer you want then to buy 10,000 drives in a single order. The article specifically says that WD and Toshiba haven't been able to get that done, where Hitachi and Seagate have.
Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
The sun could explode right now. Any number of reasons could cause such a catastrophic failure.
We might get ripped out of Orbit by a small black whole, or our rotation nullified.
The UNIVERSE could just blip out of existence, rip, reverse time or worse.
Past events can only guess at the future. We only have one universe and one star as reference.
But just like weather prediction, a million little discrete weather events can bubble up and make it snow in the middle of a heat wave. That was a nice summer.
No matter how many drives you have, even if a million lasted 10 years, the next could easily fail to turn on.
It is like arguing 777777777 isn't as random as 564568962, they are both random, 2 different forms of random numbers.
Statistics != evidence. It's an estimated guess, a risk assessment, but not evidence.
Unless we have 100% reliable prediction, in which case the future is deterministic and boring!
Yup, they don't have any Seagate 3TB drives this time around... because they were so bad they ditched them all late last year. Meanwhile, as you mention, the ST4000DM000 (at 2.54% failure, sample size 34k drives) is doing better than the WD drives. The ST4000DX000 stat is not statistically significant, as they don't have many of those drives.
You do know that large part of the information necessary to know if a drive is good or not, is if it's reliable, right?
And how does one measure reliability? By operating a shitload of them for a reasonable period of time, and look for failure trends. Hey, wait...
> If a drive is still under warranty, do I really care if it fails at time X versus 2*X?
I certainly do. Buying the drive costs maybe $130. Compare the cost to handle a failure:
Having a tech pull the pod, hook it up to the pod tester to find the bad drive, run that drive through the test sequence to prove (to the manufacturer) that it really is bad, fill out the RMA request, box it up and ship it, put a replacement drive in the pod, reinstall and activate the pod, handle receipt of new drive later.
Handling costs of a failure (under warranty or not) is probably $200. That's more significant than the purchase cost of a drive that's out of warranty.
The above description is for Backblaze. In my case, the procedure for a failed disk starts with "drive over to the datacenter". It ends with "hope that the firmware on the replacement drive doesn't have any glitches in my environment". I don't want to drive over there and deal with it. I'm more interested in drives that don't fail than drives that will be replaced under warranty.
On a local PC, the process probably begins with "hope that the backups worked correctly last night, and that nothing goes wrong with the restore".
When will people stop picking stupid manufacturer sides when it comes to drive reliability? It has nothing to do with manufacturers and everything to do with models.
Completely disagree. Of course there are variations between models made by the same company, but it's the company's Big Bosses that decide on the margins and upper limits on tolerances and failures. If HGST is content with a 20% profit margin but Seagate expect 30% and both companies sell in the same segment, that extra 10% has to come from somewhere. Perhaps by using cheaper electrical components on the control board or more lax quality controls -- more likely some combination. That's the difference.
"What do you despise? By this are you truly known." --Princess Irulan, Manual of Muad'Dib
/)
If there was ever a website that needed a "-1, Objectively Wrong" moderation option, it's Slashdot.
"What do you despise? By this are you truly known." --Princess Irulan, Manual of Muad'Dib
/)
My cousin worked in a multi-petabyte datacenter and told me Seagate was crazy and Hitachi was the best.
Statistics can give you 99.9999%+ predictability of a group.
> What ... is this company doing using consumer hard drives in a ... data center? .... they will fall out of an array every time there's a URE
Brian from Backblaze here. You assume we use RAID (inside of one computer), which is incorrect. We wrote our own layer where any one piece of data is Reed Solomon encoding across 20 different computers in 20 different locations in our datacenter (which is using some of the excellent ideas from RAID and ditching some of the parts that don't work well in our particular application). Our encoding happens to be 17 data drives plus 3 parity. We can make our own decisions about what to do with timeouts. When doing reads, we ask all 20 computers for their piece, and THE FIRST 17 THAT RETURN are used to calculate the answer. Now if one of the computers does not respond at all we send a data center tech to replace it. But if it was just momentarily slow a few times a day we let it be (we don't eject it from the Reed Solomon Group).
> These drives are only meant to be powered on a few hours a day and consumer workload duty cycles
I think a really interesting study would be to power a few thousand drives up once per day for an hour and shut them down. Compare it to a control group of the same drives left on so their temperature did not fluctuate. See which ones last longer without failure. I honestly don't have the answer. (Really, I don't.) What I do know is that Backblaze has left 61,590 hard drives continuously spinning, most of these are often labeled as "consumer drives", and that the vast majority of drives last so long that we copy the data off onto massively more dense drives (like copying all the data off a 1 TByte drive into an 8 TByte drive) not because the 1 TByte fails, but because it ECONOMICALLY MAKES SENSE. An 8 TByte drive takes less electricity per TByte, takes 1/8th the rack space rental, etc. So Backblaze honestly wouldn't care if the "Enterprise Drives" lasted 10x as long in our environment-> we would STILL replace them at the same moment.
Another person who doesn't understand data.
If 3 women have a baby and all end up the same you known nothing. If two of them miscarry in those three months, you learn a hell of a lot without ever getting a baby. You don't need to run every life to failure to learn something from statistics.
Well you do if you can't understand statistics, ... I recommend you do your master's thesis on mayflys, otherwise you'll never get done.
seeing the sun rise in the east every day for the last few thousand years of human recorded history tells us nothing about what'll happen tomorrow!
When you look a little closer and have enough data points, you'll find that the sun doesn't rise in the same exact place every day. The position varies over a cycle of about 365.25 days. You can indeed see that pattern with hundreds of thousands of data points. You cannot see it with 10.
Thanks for proving my point, even if you were being an ass in the process.
"Another person who does not understand sarcasm."
Fixed that for you.
Never answer an anonymous letter. - Yogi Berra
Oh look! It's the autism-hating Slashdot troll again!
Sarcasm is easily understood when the context or the content alludes to it.
You're communicating using a form where 90% of the context is absent. I can't hear the tone of your voice, I can't see your facial expressions, and I just finished dealing with someone who had absolutely no idea. Hence your reply looked like just another person who had no clue.
In order to communicate sarcasm you need to actually communicate it. Someone failing to understand it is a symptom of a filed communication.
But no one on Slashdot would EVER do that .... /sarcasm.
Well, the phrase "MS project told me so" was sort a huge fucking clue mate. Unless you are insulting my intelligence by thinking that was a serious comment.
So was the fact that "2 women can't make a baby in half the time" is a well known axiom about the futility of trying to shorten the timeline of a single threaded task.
Never answer an anonymous letter. - Yogi Berra
Unless you are insulting my intelligence by thinking that was a serious comment
I was just talking to someone who had no fucking clue at all. So... yes I actually took a good chunk of your comment seriously.
But again. Communication is a two way street and maybe I'm just a complete and utter idiot who takes everyone's word as literal. You would be wise to remember someone could always confuse what you're saying especially in an impersonal context free communication medium.
is a well known axiom
And to extend on this I've never heard this before. Not all slashdotters are programmers. ... well not yet ... according to the government we'll all be programmers soon :-)