Backblaze Dishes On Drive Reliability In their 50k+ Disk Data Center
Online backup provider Backblaze runs hard drives from several manufacturers in its data center (56,224, they say, by the end of 2015), and as you'd expect, the company keeps its eye on how well they work. Yesterday they published a stats-heavy look at the performance, and especially the reliability, of all those drives, which makes fun reading, even if you're only running a drive or ten at home. One upshot: they buy a lot of Seagate drives. Why? A relevant observation from our Operations team on the Seagate drives is that they generally signal their impending failure via their SMART stats. Since we monitor several SMART stats, we are often warned of trouble before a pending failure and can take appropriate action. Drive failures from the other manufacturers appear to be less predictable via SMART stats.
The type of use case they subject their drives to is very unlike the type of use case you will likely see. I wouldn't try to read too much into their statistics. They really apply only to themselves, or someone in the same business.
Architecture or Filesystem. Anyone know? ZFS perhaps?
Whether a drive fails and you replace it, or whether its GOING to fail, so you replace it early, you still need a rebuild. Why not just let them fail?
This page can’t be displayed.
Check your links, editors...
Considering how awful their failure rates are in general, they need to get good at reporting them before hand or they (as a company) won't exist much longer. After all, investing in quality is clearly too expensive...
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
Around here, Seagate 6TB disks cost 50ish % more than WD Red NAS and Hitachi disks are yet more expensive. So all these graphs are basically in line with the old adage "You get what you pay for".
The comment about Seagate's SMART being more on point seems to make those disks a nice compromise.
Funny enough, considering there is this saying in Switzerland: "Sie geit oder sie geit ned." (where "Sie geit" sounds awfully close to "Seagate") which roughly translates to "It works or it doesn't" and is a stab at the sometimes abysmal failure rates they had back when.
Can't help but feel for all the people who read Blackblaze's previous report and decided Seagate was junk and bought WD instead. I tried to warn them that the model of the drive mattered more than the manufacturer, because each manufacturer tries new technologies and new cost-cutting strategies with each different model. Sometimes it works and the model is reliable. Sometimes it doesn't and the model is unreliable. But everyone was eager to get on the bash Seagate, praise WD bandwagon and ignored me.
Well, WD was least reliable this time around. The Seagate stats in the previous report were probably being skewed by just one or two bad models. It's skewed this time by one bad model, which due to the passage of time means it makes up a tiny portion of their Seagate sample, so doesn't spike Seagate's score like before. (You can pretty much ignore WD in the 4TB graph, as a sample size of just 46 drives means the confidence interval is a 0.3% - 8.8% failure rate.)
At least Blackblaze addressed my criticism from before - they've broken down the stats to individual drive models. And you can see that like I said, there's huge variability in reliability between models within a manufacturer's lineup. Now they just need to add confidence interval to the graphs.
Does Seagate send out the SMART data before or after the failure? Their crap 3TB drive reported nothing before it crashed.
""When will your hard drive fail"
I pointed out that Blackblaze chassis configuration improperly stressed the fragile SATA/Power connectors by implementing a vertical disk drive mounting configuration,.
Where the mass of drive(&vibration) is placed upon the fragile SATA data and power connectors.
This type of vertical drive storage/raid cabinet is not conducive for long term/reliable drive lifespan., thus any number of other factors could kick in and cause a premature failure.
I'm impressed by the HGST drives, less than 1% failure rate. I haven't touched the Deskstar line of drives since the IBM Deathstar debacle, but I think it's time to take a second look. Hopefully they have not switched over to Western Digital's technology.
What is Backblaze doing to check the drives for bad sectors? I manage a 10,000 disk openstack swift installation and I've noticed the auto sector remapping doesn't work correctly, there are a portion of drives (maybe 3%) that have a few bad sectors that need to be manually remapped using ddrescue. I ended up having to write a custom monthly cron job script that ran badblocks to first identify these drives, and then ddrescue to force a sector remap.
ZFS is what saved my ass. No SMART warnings. No other indication of a failure other than my scrub going "Eh, your drives are shit, we took them out of the pool".
All things fail, including hard drives. The question isn't "if", it is "when".
Picking between WD or Seagate hoping to get a "good drive" is missing the point, what happens when both drives fail?
Do you have your data backed up?
I run both Crashplan and Backblaze, I also have a copy stored on Amazon Glacier and important files on OneDrive. I also have two external drives that I rotate backups on and keep unplugged.
For most people, what I do is "overkill", but I've lost data before... never again...
I run both Crashplan and Backblaze, I also have a copy stored on Amazon Glacier and important files on OneDrive. I also have two external drives that I rotate backups on and keep unplugged.
For most people, what I do is "overkill", but I've lost data before... never again..
I lost data once too when an IBM Deskstar died suddenly and my backups somehow got corrupted too. So I have an external drive that backs up every night, and another that backs up the first Sunday of every month and one that backs up the last Sunday of every month. That last one and one other that I run manually get swapped with ones that are kept off site. I'm still trying to decide on an on line option. So no, I don't think that's overkill at all.
I think the point here isn't that there's a drive or manufacturer out there that doesn't fail. The point here is that with such a huge sample range, you can make somewhat useful trends and comparisons between failure rates on a macro scale that no standard user would be able to do themselves. If you look at 56,000 disks and see that Seagate accounts for a larger percentage of drives and lower equivalent failure rate among manufacturers, you can *generally* expect that buying a drive of an equivalent model as compared and evaluated here will have *on average* a better reliability rate than a comparative drive shown to have a worse value in this study. None of this absolves you of responsibility to your data, but it gives you a guideline toward making your data storage medium as reliable as possible.
While those are fair points, and good advice... I still have a concern...
I don't think there is a large enough disclaimer that Backblaze runs their equipment in a 24/7 environment that is quite different than most users. Oh sure, they say it and it is there, but I think it deserves highlighting.
If you look at the percentage failure rates, they are higher across the board than what I've seen. Sure, drives fail, but honestly I have some of those same Seagate drives in a server here and they have been running for years without an issue. They are however, installed in tower cases flat (rather than vertical) and the most I have installed in one tower is 8, each in its own drive bay.
I suspect Backblaze is quite hard on drives and the rates are worse than you'd see outside of that environment. It is also worth noting that those drives are not all installed in the same type of "pod". Backblaze has changed pod designs a few times and now uses an "anti-vibration" system they didn't used to.
Their data is interesting, and I'm glad they offer it. I like how open they try to be, more companies should do that. However, it is just one slant and not the whole picture. I fear that some people will read it and say to themselves, "well I bought a WD, so I guess I don't have to backup". And yes, I've heard such things from real computer users, sadly...
I don't understand the point of doing a burn-in with known good drives and then replacing them with new units of unknown reliability. I would think you'd want to do a burn in with the drives you're going to use since disk failures typically happen either early or late in a drive's life?
One of the significant notes is that it seems the Seagate 4TB drives are doing much better than some earlier versions, and that WD is no longer doing so well.
Another thing that gets brought up every time one of these is released is "Why are they still using Seagate drives if they're so bad?" and the answer is simple: it remains a balancing act between cost and reliability. Backblaze has the redundancy and processes in place to not worry about single-drive failures, so FOR THEIR USAGE the lower drive cost is more important. If you're on a smaller setup where you have everything on just a few drives with inadequate redundancy, a few dollars extra for better reliability is worth the cost.
When you really get down to it Backblaze is looking at cost per gigabyte per day, and if ($LESS_RELIABLE_DRIVE_COST + $DRIVE_REPLACEMENT_COST) is lower than ($MORE_RELIABLE_DRIVE_COST) then they're going with the cheaper option.
fencepost
just a little off
A: They're cheap
B: They scream really loud before they die, hopefully when someone's listening.
C: They're cheap.
I'll stick with Western Digital and HGST.
If they die off that infrequently in their sweatbox environments, the chances that they're going to die under normal desktop use are orders of magnitude less.
Chas - The one, the only.
THANK GOD!!!
I want to switch to ZFS, but I'm not sure how ZFS handles failure on the boot drive, and my Google searches weren't very successful in answering the question either.
I haven't had any issues. I've randomly pulled a boot drive and ZFS doesn't complain. I use FreeBSD with ZFS on root.
Sounds like a very secure pr0n collection.
Should work just as you would expect, from my experience with FreeNAS. If there is a boot drive with errors, it will use the good copy or whatever parity it has to boot up and inform you of a bad boot disk. If there are no more good copies then it can't boot. You can do all the normal scrubs on them to catch drives going bad.
The response from guys that claim to work there is that the duty is fairly light, and that they choose drives by focusing on cost before reliability.
The data might be from more rigorous conditions, but that doesn't make it useless. If a drive model exhibits a low failure rate even under supposedly awful conditions, then that reflects even better on the drive. If anything, I'd be more concerned about ways in which their environment is better than a typical consumer environment, such as how a forced-airflow server in a temperature-controlled datacenter is probably going to keep the drives at a better (or at least more consistent) temperature than some random dust-clogged PC with one wimpy fan.
HGST makes the most reliable stuff, but the models they were using are no longer available, and they're expensive.
Seagate is in a dead tie for Worst Shit In The Universe. (esp. when you use the "DM" series DESKTOP drives) They use them because they're dirt cheap, and falling off every truck in NY. Plus, they'll tell you when they're about to die. (i.e. shortly after first power-on. :-))
The short of it is: when you buy 10,000 drives a year, you care more about price and availability than reliability.
Consider the conditions - this is selecting for the environment of a lot of drives packed into poorly ventilated cases so those that cope best with heat will win.
While heat over time is a common cause of drive failure there are others, so the results are not so useful for drives in desktop cases or in well ventilated servers (eg. ones with hot-swap bays so there is no way to pack the drives in as densely as Backblaze do).
I lost data once too when an IBM Deskstar died suddenly and my backups somehow got corrupted too.
You don't have a backup until you've tested the restore. The nice thing about simply copying all files to an external drive (with nothing clever going on, just a file tree copy) is that the "restore" is just using the new drive. But that approach doesn't really scale past home/home office use.
I wish there was a better selection of tape backup software in the world: LTO-7 finally shipped, and a 6 TB (uncompressed) tape is nice.
Socialism: a lie told by totalitarians and believed by fools.
ZFS on Ubuntu is problematic because it doesn't properly rebuild the kernel modules when the kernel is upgraded.
And to add to this -- not just tested the restore BUT actually have a plan for recovery, that goes beyond just the testing.
How are you going to retrieve data from a remote location?
What processes will you use to mitigate an attack during recovery?
What known-clean can you put online to retrieve patches in the case of malware/compromise?
Disparate networks to ensure clean recovery?
Things like that. It needn't be written down, but it should be planned out. If it's a business, it should be written down and a policy. At home, you can be more relaxed and not have it set in stone. I keep certain recovery options at the OS level, store almost no data locally, and often don't even use an installed OS but that's just because I like to play. (When you've got this much RAM, you can do that.) I also like images from VMs. I keep things located all over the place - including multiple on-site locations and disparate physical locations.
Like you folks mentioned... I lost data once. It was extremely flaky and absolutely absurd and infuriating. I'm still not entirely sure how it happened but a very, very close lightning strike hit and all magnetic media was gone. Not even the MBR remained. Drives not powered on were gone. Some would not work, even after reformat. I've no idea the how or why (I suspect EMP) but it was infuriating. I had *some* at a different location and am very fortunate that it was mostly my personal data and wasn't at my office. The following Monday, however, some serious discussions were had and we had a whole new backup plan and all the rest within a week. We did some testing and I'd say that we were fully set within six months but we were already pushing data out (and it was a lot of data) as well as buying more tapes and shipping them to a nearby storage unit before we moved further out with it.
Never again.
"So long and thanks for all the fish."
SMART monitoring is where modern OSes utterly fail, it should be a core part of OS functionality, the OS should warn you when a SMART stat goes bad but MS et al would rather put some stupid shopping experience into the OS instead.
Waterfox - a Firefox fork with legacy extension support, security updates and better privacy by default.
For home use it's worth paying a little more of a Hitachi (HGST) drive. They are owned by WD, but use different tech, different factories etc. You pay more but get better reliability.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
That link is the average of all the drives, including the ones that did not fail, over a long time period and has been discussed elsewhere in this thread.
An average sadly doesn't tell us as much as you appear to think it does, especially about failed drives since the data is diluted by the ones that did not fail and since the drives are idle for so much of the time. Maximum temperatures of the ones that failed would support your argument but that is not what you are using.
First off, based on this I would be buying HGST drives exclusively. 1.8% or less failure rate? Yes please! I've been buying HGST drives (mostly Ultrastar's) for a couple of years now. They are super fast and reliable in my home NAS. My second choice would be WD RE drives. In bulk the price difference between HGST and Seagate cannot be THAT much... I would think the additional reliability would give you a better ROI instead of keeping replacing cheaper drives.
I suspect Backblaze is quite hard on drives and the rates are worse than you'd see outside of that environment. It is also worth noting that those drives are not all installed in the same type of "pod". Backblaze has changed pod designs a few times and now uses an "anti-vibration" system they didn't used to.
Your typical home desktop/server drive is likely to see a far harsher life than your average Backblaze drive.
Instead of jumping on threads to play some wank of a mass debate game where you attempt to convince people of things contrary to reality why don't you do something useful, or at least less annoying? You are in slimy confidence trickster preying on the weak territory and nowhere near the "Devil's Advocate" you are probably telling yourself.
You had me going for a while and I really did thing you were as dim as your posts suggest but the bit about working drives not getting hot was a clue that you do not believe your own words your self and are just playing an argument game at my expense.
It's not funny.
I know to watch out for you next time - it's pieces of shit like you playing mind games that make this site far less enjoyable than it used to be.