How Reliable Are 10TB and 12TB Hard Drives? Backblaze Publishes Q1 2018 Hard Drive Reliability (zdnet.com)
Wolfrider writes: Backblaze's hard drive report for the first quarter 2018 makes very interesting reading for anyone who is interested in hard drive performance and reliability. As of March 31, 2018, the company had 100,110 hard drives working for it, made up of 1,922 boot drives and 98,188 data drives, ranging from 3TB WDC WD30EFRX drives all the way up to 10TB and 12TB Seagate ST10000NM0086 and ST12000NM0007 drives, along with 10 Samsung 850 EVO SSDs. [...] The overall Annualized Failure Rate (AFR) for Q1 sat at just 1.2 percent, well below the Q4 2017 AFR of 1.65 percent. Some drives had an AFR of 0 percent (in other words, no drives failed during the period), while the 4TB Seagate ST4000DM000 had the highest AFR of 2.3 percent (out of 30,941 drives the company had in service, 178 failed during the Q1 period).
My NAS has 2 TB drives... I think it's maybe 10% full.
I recall their SCSI drives being the shit...
Over time I've had pretty good luck with Seagate drives, and if you look at the data it seems some models are more stable than others...
That said it does seem like in recent years HGST has gotten pretty good so I've started to shift to them.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
If they're not mining Burst they're missing out on some money.
Burstcoin was a cute idea, but no --- mining it's not profitable if you buy 4TB drives for that: you'll lose money on the purchase,
and probably cause a premature failure of your hardware.
The coin would either need much more value, or we'd need a much cheaper storage medium than even tape.
Do any other cloud-storage services publish stats like this?
Thank you, Backblaze.
couple months back, a separate division of my employer had 3 of a 16 disk group fail within 2 days -- killing a raid-6 group. These were 8TB seagate disks and were ~6 months old.
This is why, as a rule, you shouldn't populate a RAID with drives from the same manufactured batch.
FYI Samsung only makes SSDs at this point. They sold off their HDD business to Seagate at the end of 2011: https://www.seagate.com/about-...
Very true, old advice, but can not be repeated often enough!
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
I have a Synology NAS loaded with Western Digital Red drives configured in a Raid 10 array.
First drive failure happened a few days ago with an unimpressive 385 hours on it. Drives normally run around 88f temp wise.
Considering the MTBF is supposed to be around 1,000,000 hours I am less than impressed with its lifespan. Glad I don't have their 10TB models installed as the cost to replace one is triple.
So, take any claims of how reliable it is with a grain of salt because while Unit X may outlast the universe, Unit Y may keel over tomorrow.
I have a wide variety of drives at work, both HDD and SSD. I mostly buy "enterprise" grade drives, and specifically look for models with a 5-year warranty. What I've discovered recently, however, is that here are huge differences in how manufacturers fulfill their warranties. When a drive fails, what I'm looking for is to obtain a replacement as soon as possible. I can live with a degraded RAID array for a few days, perhaps, but not for weeks. With an "Advance RMA", the manufacturer will ship a replacement drive immediately rather than waiting to receive the defective drive. (A credit card is provided to cover their loss if the defective drive is never received).
My most recent experiences can be summarized as follows:
Western Digital HDD - Advance RMA is available
Seagate HDD - Advance RMA is not available
HGST HDD - Advance RMA is not available
Even with Advance RMA, I have to wait for ground shipping. I wish that expedited (air) shipping was also available.
I'm saving a special category of experiences for Intel SSDs - experiences so awful there are in a class by themselves. I've had the misfortune to suffer two failed Intel SSDs. Both happened to be M.2 format SSDs. One was SATA, the other NVMe. Firstly, just getting an RMA started with Intel is painful. Be prepared to disassemble whatever computer is affected, because providing a model number and serial number are not enough. They also require something called an "SA" number that can only be found on a sticker attached to the device. Second, be prepared to wait a LONG time. I'm talking weeks to MONTHS to get a replacement. If you need the affected computer back up and running within a reasonable timeframe, you'll need to purchase another SSD in spite of your warranty coverage.
Once upon a time in the penultimate decade of the last century, I was chief fixer dude for a manufacturer which had built some custom stuff Seagate used to bulk-test drives in their engineering department. That stuff kept coming back for warranty service but nothing was ever found to be wrong with it, which was a red flag, and the creation of the test setup required about six hours of tech labor so the damn flag was on fire. I got nowhere in my first round of calls to Seagate, but when the stuff came back yet again I was more persistent and finally got to the bottom of it.
Seagate had a guy who was somehow involved with that engineering test system, and every time something went wrong, whether it was an actual system failure or just an unexpected outcome, said guy jerked everything still under warranty out of the system and sent it back to the manufacturers for service. Everything, whether it was potentially related to the troubling observation or not. In driving my way to someone in charge I spoke with folks at Seagate who were incredibly frustrated with the shotgun approach because it kept their test system out of service for far longer than it ever should have been, and eventually they allowed me to reach the shotgun monkey's boss's boss. I explained to him that our warranty terms applied only to product which had failed in normal service, and that on-demand conformance testing was a full pop T&M (time and materials) service for which they would henceforth be charged.
The stuff was not seen again in the time I remained employed by that company and I've happily avoided Seagate ever since.
Warning: This signature may offend some viewers.
I'm not talking about us guys, but the guys in the article, who are keeping almost a hundred thousand drives powered up all the time for their tests. It's a positive delta if they get some Burst while at it. While it's a rounding error for you and me, the rounding error might become noticeable at that capacity.
"Everybody's naked underneath" -- The Doctor
No.
Efficient frontier
Shannon's theorem: as you approach the Shannon coding limit, the cost of failure becomes linear.
The primary term in the linear model is cost_of_drive / (mean_working_life * drive_capacity). In metric, the unit comes out to Big Macs/B-s, but we'll use USD/TB-year.
There is also a power consumption term, and a performance term. The first of these is significant to Backblaze, whereas the second does little to differentiate the qualified brands in the present Backblaze business model (though it could quickly push you into a different product mix between HDDs and SSDs if the model changed; the proximal economic margin is vertical, not horizontal).
The main effect of the power term is that technological evolution in W/TB-year makes it reasonable to hard-cap drive service life somewhere around seven years (it used to be much less, but times they aren't a-changing very much these days).
If one brand hits the seven year wall with 95% of the drives functional, and another brand hits the wall with 98% of the drives functional, this justifies about a 3% difference in drive sticker price for the same capacity (and mean power draw under the specified workload).
Now, if you were working with a business model where coincident failures could feasibly add up to a risk of total loss, this calculation starts to involve an exponential term, and the factor of two in failure rate over seven years begins to matter again. Even if your total loss is just a spindle loss, and you have a 24-hour from tape, the magnitude of your 24-hour out-of-normal-service event starts to lift the exponential term into economic view.
I've given enough information here to construct a pretty good first-order linear approximation to Backblaze's efficient frontier.
Hint: to usefully diagram this, you need to 86 your third-year engineering school log-linear graph paper, and haul out some third-year elementary school linear-linear graph paper.
Over in MBA school—which you like to pretend is populated by linear-linear spreadsheet-toting cluetards—they will teach you that one of the arts of business is to devise a business model which runs against the grain of prevailing economic assumptions.
Shannon's theorem, properly understood, allows one to do this.
And, yes, the exponential cost model requires engineering school to properly understand, while the linear cost model requires only Econ 101 to fully understand, so of course, as hugely overtrained engineers we deride this model, with a sniff, as merely buying the cheapest shit they can find.
But a funny thing happened on the way to the forum: it's the engineers here who have failed to make the cognitive leap on Shannon's model, properly understood.
Shannon's corollary: once my theorem is properly understood, you don't even need to be smart anymore.
Shannon's theorem, properly understood, is a universal wormhole from log-linear to linear-linear. This pretty much makes it the most Fucking A theorem of the 20th century.
These an underlying reason why our digital technology grew like the Beanstalk of Babel, yet never ultimately toppled over under its own weight.
Victory: MBA.