Backblaze Dishes On Drive Reliability In their 50k+ Disk Data Center

← Back to Stories (view on slashdot.org)

Backblaze Dishes On Drive Reliability In their 50k+ Disk Data Center

Posted by timothy on Wednesday February 17, 2016 @05:50AM from the learning-from-experience dept.

Online backup provider Backblaze runs hard drives from several manufacturers in its data center (56,224, they say, by the end of 2015), and as you'd expect, the company keeps its eye on how well they work. Yesterday they published a stats-heavy look at the performance, and especially the reliability, of all those drives, which makes fun reading, even if you're only running a drive or ten at home. One upshot: they buy a lot of Seagate drives. Why? A relevant observation from our Operations team on the Seagate drives is that they generally signal their impending failure via their SMART stats. Since we monitor several SMART stats, we are often warned of trouble before a pending failure and can take appropriate action. Drive failures from the other manufacturers appear to be less predictable via SMART stats.

30 of 145 comments (clear)

Min score:

Reason:

Sort:

Seagate SHOULD be good at that by damn_registrars · 2016-02-17 06:09 · Score: 3, Insightful

Considering how awful their failure rates are in general, they need to get good at reporting them before hand or they (as a company) won't exist much longer. After all, investing in quality is clearly too expensive...

--
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
1. Re:Seagate SHOULD be good at that by mattventura · 2016-02-17 06:29 · Score: 5, Funny
  
  Seagates are great at reporting impending failures.
  Does it say Seagate on it? It's about to fail.
2. Re:Seagate SHOULD be good at that by damn_registrars · 2016-02-17 06:36 · Score: 2
  
  That is more the result of how few manufacturers remain than anything.
  
  --
  Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
3. Re:Seagate SHOULD be good at that by slaker · 2016-02-17 06:41 · Score: 4, Informative
  
  HGST drives are manufactured by a different division, using different processes and different engineering teams. I was told by a WD engineer that HGST stuff is still entirely separate on a manufacturing level.
  Of course, I'm just some guy on the internet, but based on my own experiences with a few hundred 3 and 4TB drives in service, the Hitachi/HGSTs are worth going out of my way to obtain and Seagate 4TB drives don't seem to have the problems the 3TB units did.
  
  --
  -- I wanna decide who lives and who dies - Crow T. Robot, MST3K
Re:Doesn't make any mention of.. by Anonymous Coward · 2016-02-17 06:11 · Score: 4, Informative

https://www.backblaze.com/blog/vault-cloud-storage-architecture/
They mention their architecture here
Re:RAID, let them fail by Dareth · 2016-02-17 06:18 · Score: 5, Insightful

The purpose of RAID is to keep data available for a purpose. You have some level of redundancy measured in terms of number of disk that can fail before you have a data loss for the array. Once a disk has an impending failure smart alert, you no longer have full confidence in that disk. If you leave it to fail, what if another disk in the array happens to fail. You now have an array with a failed disk, possibly in a degraded mode. You also have a disk with a better than normal chance of failure. It just makes sense to be proactive and fix the issue before it escalates into a failure.

--

I only look human.
My mother is a halfling and my dad is an ogre, so that makes me an Ogreling
Sorry WD fans by Solandri · 2016-02-17 06:21 · Score: 5, Interesting

Can't help but feel for all the people who read Blackblaze's previous report and decided Seagate was junk and bought WD instead. I tried to warn them that the model of the drive mattered more than the manufacturer, because each manufacturer tries new technologies and new cost-cutting strategies with each different model. Sometimes it works and the model is reliable. Sometimes it doesn't and the model is unreliable. But everyone was eager to get on the bash Seagate, praise WD bandwagon and ignored me.

Well, WD was least reliable this time around. The Seagate stats in the previous report were probably being skewed by just one or two bad models. It's skewed this time by one bad model, which due to the passage of time means it makes up a tiny portion of their Seagate sample, so doesn't spike Seagate's score like before. (You can pretty much ignore WD in the 4TB graph, as a sample size of just 46 drives means the confidence interval is a 0.3% - 8.8% failure rate.)

At least Blackblaze addressed my criticism from before - they've broken down the stats to individual drive models. And you can see that like I said, there's huge variability in reliability between models within a manufacturer's lineup. Now they just need to add confidence interval to the graphs.
1. Re:Sorry WD fans by 0100010001010011 · 2016-02-17 06:50 · Score: 2
  
  I wish I saw Backblaze's previous report. I have a whole lot of Seagate paperweights. I couldn't do anything but laugh when one of their SNs ended in FML
  In comparison all of the WD Red's that I bought to replace those (and their warrantied replacements) are still going strong. I did everything 'right'. Spread out my purchases, bought from Newegg and Amazon, kept them cool, etc. I think out of the 12 or so 2 & 3TB Seagate drives my current FreeNAS machine still has all of 1 or 2 still running. And one of those just started throwing SMART errors (even though the zpool scrub go through fine).
Re:Not very useful. by Anonymous Coward · 2016-02-17 06:35 · Score: 5, Funny

Exactly, so even though these are the best large scale numbers we have, they are garbage. We shouldn't use them even though they are the largest sample size. They're useless like the people that carefully compiled these numbers. Instead, we should trust drive manufacturer's marketing numbers, as you suggest.
This is a repeat of 6/23/15 topic . "When will" by FirstOne · 2016-02-17 06:36 · Score: 2, Interesting

""When will your hard drive fail"
I pointed out that Blackblaze chassis configuration improperly stressed the fragile SATA/Power connectors by implementing a vertical disk drive mounting configuration,.
Where the mass of drive(&vibration) is placed upon the fragile SATA data and power connectors.
This type of vertical drive storage/raid cabinet is not conducive for long term/reliable drive lifespan., thus any number of other factors could kick in and cause a premature failure.
1. Re:This is a repeat of 6/23/15 topic . "When will" by Anonymous Coward · 2016-02-17 08:32 · Score: 3, Insightful
  
  Considering they are hitting 5-6 years on a decent population of their drives I think they are doing OK.
Re:RAID, let them fail by Old97 · 2016-02-17 06:47 · Score: 3, Insightful

Yes, and if one disk in an array fails, the likelihood that another disk in the same array will fail soon goes way up. That's because they many disk failures are related to environmental factors - power, air, particulate matter, etc. Whatever factors contributed to the first disk failure are also present for the other disks in the array. So it's best to replace disks that have impending failure as soon as you can.

--
Very often, people confuse simple with simplistic. The nuance is lost on most. - Clement Mok
Re:RAID, let them fail by sexconker · 2016-02-17 06:55 · Score: 4, Informative

Because you don't know how it will fail, you don't know what other drive may fail next, and you don't know when a 2nd, 3rd, nth, drive will fail.
Further. drives that manage to actually report that they're dying are typically fucked to the point of impacting your performance significantly. If you're still writing to a drive that's hobbling along, it will slow down the whole array.
Reads are usually okay (depending on your controller and setup) but writes need to be completed at some point, regardless of your cache scheme or cache size.
Sustained writes to an array with a crippled drive will eventually either result in the drive being taken offline or the array's write performance turning to shit. If you're lucky, the drive is taken offline gracefully, doesn't catch fire, and you do the hot spare / cold spare dance, the rebuild boogaloo, etc.
Bad sectors? by nbritton · 2016-02-17 07:00 · Score: 5, Interesting

What is Backblaze doing to check the drives for bad sectors? I manage a 10,000 disk openstack swift installation and I've noticed the auto sector remapping doesn't work correctly, there are a portion of drives (maybe 3%) that have a few bad sectors that need to be manually remapped using ddrescue. I ended up having to write a custom monthly cron job script that ran badblocks to first identify these drives, and then ddrescue to force a sector remap.
1. Re:Bad sectors? by nbritton · 2016-02-17 09:36 · Score: 2
  
  How are you doing this with ddrescue?
  grep "error.*sector" /var/log/kern.log | awk '{print $(NF-2)$NF}' | sort -u | while IFS=, read device sector; do dd if=/dev/$device of=/dev/null bs=512 count=1 skip=$sector 2>/dev/null || dd_rescue -d -A -m8b -s ${sector}b/dev/$device /dev/$device; done;
  For the badblocks cron job I use this:
  #!/bin/bash
  if [ $EUID -ne 0 ]; then
  echo "you must be root to run this... exiting." exit 1
  fi
  if ! [ -f /sbin/badblocks ]; then
  echo "can't find /sbin/badblocks... exiting."
  exit 1
  fi
  if ! [ -d /var/log/badblocks ]; then
  if ! mkdir /var/log/badblocks; then
  echo "can't create /var/log/badblocks... exiting."
  exit 1
  fi
  fi
  for i in $(ls /dev/disk/by-path/ | grep -v part); do
  nohup ionice -c 3 nice -n 19 badblocks /dev/disk/by-path/${i} > /var/log/badblocks/${i}.log 2>/dev/null &
  done
2. Re:Bad sectors? by nbritton · 2016-02-17 09:46 · Score: 2
  
  It may be different with 10,000 disks vs 4 disks, but I wouldn't trust a drive once it has one remapped (or pending remap) sector. I'd be worrying about replacing it, not remapping, because it tends to be a sign of impending failure.
  Of the drives with sector errors (n = 286) the number of bad sectors typically ranged from 4 to 16, with a median of 8. However, values above 25 bad sectors were statistical outliers, meaning they were more than 3 standard deviations off the normal curve. Our policy now is to replace any drive with more than 25 bad sectors.
All drives fail, sooner or later... plan for it.. by FlyHelicopters · 2016-02-17 07:19 · Score: 2

All things fail, including hard drives. The question isn't "if", it is "when".
Picking between WD or Seagate hoping to get a "good drive" is missing the point, what happens when both drives fail?
Do you have your data backed up?
I run both Crashplan and Backblaze, I also have a copy stored on Amazon Glacier and important files on OneDrive. I also have two external drives that I rotate backups on and keep unplugged.
For most people, what I do is "overkill", but I've lost data before... never again...
Re:RAID, let them fail by shawn2772 · 2016-02-17 07:26 · Score: 2

Yes, and if one disk in an array fails, the likelihood that another disk in the same array will fail soon goes way up. That's because they many disk failures are related to environmental factors - power, air, particulate matter, etc.
Even, more, the process of rebuilding a degraded array is very intensive, touching every sector of every disk in the array, old and new. This means that if there are any latent failures that just haven't been noticed, the rebuild process will find them with very high probability. RAID is good, and useful, but as soon as there's a hint of a failure on any disk, you should replace it ASAP. This is also why I favor RAID modes that allow for more than one failed disk. That way if you have one failure and the rebuild triggers/uncovers another, you're not out of luck. I learned this the hard way.
Re:RAID, let them fail by shawn2772 · 2016-02-17 07:31 · Score: 3, Informative

Oh, one more thing: You should also ensure that every sector of every disk is read regularly. There are more sophisticated options available, but just setting up a cron job that does something like "cat /dev/sdX > /dev/null" on every drive once per week or so is a reasonable and very simple approach. The goal is to trigger failures early, before they get too bad.
Re:All drives fail, sooner or later... plan for it by Keiran+Halcyon · 2016-02-17 07:52 · Score: 2

I think the point here isn't that there's a drive or manufacturer out there that doesn't fail. The point here is that with such a huge sample range, you can make somewhat useful trends and comparisons between failure rates on a macro scale that no standard user would be able to do themselves. If you look at 56,000 disks and see that Seagate accounts for a larger percentage of drives and lower equivalent failure rate among manufacturers, you can *generally* expect that buying a drive of an equivalent model as compared and evaluated here will have *on average* a better reliability rate than a comparative drive shown to have a worse value in this study. None of this absolves you of responsibility to your data, but it gives you a guideline toward making your data storage medium as reliable as possible.
Re:Not very useful. by brianwski · 2016-02-17 08:15 · Score: 5, Informative

Disclaimer: I work at Backblaze.

> very unlike the type of use case you will likely see

Being extremely specific - we (Backblaze) keep the drives powered up and spinning 24 hours a day, 7 days a week. So if you leave your drives powered off most of the time and boot them only sometimes, the failure rates we see may or may not be something like yours?

I'm curious if anybody has any other suggested differences with "what you will see". Most of our drive activity is light weight - we archive data for goodness sake, we write the data once then maybe read it once per month to make sure the data has not been corrupted. We stopped using RAID a while ago, so you can't say you need drives that are designed for RAID, because we don't use RAID (we do a one time Reed-Solomon encoding and send it to different machines in different parts of our datacenter and write it to disk with a SHA1 on this "shard" where that shard lives it's life independently without RAID).

ANOTHER POINT MANY PEOPLE MISS -> you can't just pick the lowest failure rate drive and then skip backups!! *EVERY* drive fails, every single solitary last drive. So you must have a backup if you care about the data, you really really do. And if you have a backup, then you are free to choose a drive that fails at a higher rate if there are other considerations such as it is a much cheaper drive. Hint: Backblaze doesn't always choose the most reliable drive, we look at the total cost of ownership including the amount of power the drive will consume and the drive's failure rate and let a spreadsheet kick out the correct drive for us to purchase this month. It is rarely the most reliable drive.
Re:Doesn't make any mention of.. by brianwski · 2016-02-17 08:21 · Score: 5, Interesting

Brian from Backblaze here.

The individual drives in our datacenter run ext4 (the OS is Debian). We do an extremely simple Reed-Solomon encoding that is 17+3 (17 data drives and 3 parity) but the 20 drives are spread across 20 different computers in 20 different locations in our datacenter. This means we can lose any 3 drives and not lose data at all.

We released the Reed-Solomon source code free (open source but even better) for anybody else to use also. You can read about it in this blog post: https://www.backblaze.com/blog...
Drive generation matters and You Are Not Backblaz by Fencepost · 2016-02-17 08:27 · Score: 3, Insightful

One of the significant notes is that it seems the Seagate 4TB drives are doing much better than some earlier versions, and that WD is no longer doing so well.

Another thing that gets brought up every time one of these is released is "Why are they still using Seagate drives if they're so bad?" and the answer is simple: it remains a balancing act between cost and reliability. Backblaze has the redundancy and processes in place to not worry about single-drive failures, so FOR THEIR USAGE the lower drive cost is more important. If you're on a smaller setup where you have everything on just a few drives with inadequate redundancy, a few dollars extra for better reliability is worth the cost.

When you really get down to it Backblaze is looking at cost per gigabyte per day, and if ($LESS_RELIABLE_DRIVE_COST + $DRIVE_REPLACEMENT_COST) is lower than ($MORE_RELIABLE_DRIVE_COST) then they're going with the cheaper option.

--
fencepost
just a little off
Re:Not very useful. by Archangel+Michael · 2016-02-17 09:23 · Score: 2

Actually, since they actually KEEP their stats, they are the most reliable information on drive failures. My rough experience is similar, about 5% failure rate across the fleet. Some drives, last long time, others not so much. Same drives.
My take on this is that Backblaze has dispelled plenty of myths about drive lifespans. I don't really trust anecdotal evidence offered by geeks (including my own above!)

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Re:RAID, let them fail by brianwski · 2016-02-17 13:01 · Score: 3, Interesting

Brian from Backblaze here.

Sometimes the "drive failure" is as simple as the little circuit board on the bottom of the hard drive has a component die. This won't be predicted by SMART stats at all. We have chatted very informally with the people at "Drive Savers" ( http://www.drivesaversdatareco... ) and they say one of the early steps in attempting to recover the data from a drive that won't work is to replace the circuit board with the board from an identical hard drive of same make and model.

I have no affiliation with "Drive Savers" but from my interactions with them I trust them as quite a good and valuable service who know their craft. We even used them once in a panic once to get back the minimum number of drives for data integrity in a RAID array (a long time ago before our multi-machine vault architecture). It worked - we got all the data back from the drive!
Consider the conditions - YMMV by dbIII · 2016-02-17 13:25 · Score: 2

Consider the conditions - this is selecting for the environment of a lot of drives packed into poorly ventilated cases so those that cope best with heat will win.
While heat over time is a common cause of drive failure there are others, so the results are not so useful for drives in desktop cases or in well ventilated servers (eg. ones with hot-swap bays so there is no way to pack the drives in as densely as Backblaze do).
Re:Not very useful. by Anonymous Coward · 2016-02-17 13:40 · Score: 3, Interesting

Backblaze doesn't always choose the most reliable drive, we look at the total cost of ownership including the amount of power the drive will consume and the drive's failure rate and let a spreadsheet kick out the correct drive for us to purchase this month. It is rarely the most reliable drive.
Do you factor in the work cost? In an environment where services are bought by the hour, the cost of a single maintenance operation is more than the cost difference between the most expensive drive in a selected class and the drive of an average cost.
Re:Not very useful. by brianwski · 2016-02-17 15:41 · Score: 3, Informative

Brian from Backblaze here.

> Do you factor in the work cost?

Yes. And I think the mods were being unreasonable to vote you down, it is a fine question!

We have enough drives (56,000+ all in one datacenter) so that we need a team of 4 full time employees working inside the datacenter to take care of it. If we purchase a drive with higher replacement rates, we will need to hire more datacenter techs, so it gets entered into the equation. ANOTHER area this comes up is server design: most datacenter servers put the drives mounted up front for fast and easy replacement without having to slide the computer around. Our pods put 45 drives accessed through the lid of the pod which means it takes longer to swap the drive - the pod is shut down, the pod is slid out like a drawer, some screws or (most recently) a tool less lid is detached, the drive is swapped, then repeat backwards to put the pod back in service. We did the math, and we feel there is (significant) cost savings that outweighs the additional effort and time to replace the drives. Front mounted (traditional) is something like 1/3rd the drive density with what we have, which means the datacenter space bill would be 3x larger but we would hire fewer datacenter techs.
Re: Not very useful. by omnichad · 2016-02-18 02:09 · Score: 2

From Backblaze's own mouth:

After looking at data on over 34,000 drives, I found that overall there is no correlation between temperature and failure rate.
To check correlations, I used the point-biserial correlation coefficient on drive average temperatures and whether drives failed or not. The result ranges from -1 to 1, with 0 being no correlation, and 1 meaning hot drives always fail.
Correlation of Temperature and Failure: 0.0
Re:Not very useful. by nerdbert · 2016-02-18 10:28 · Score: 2

I worked in the drive industry for almost two decades, and I've made this comment before, but I'll make it again.
Modern disk drives detect failing sectors automatically. They go through increasing complex recovery schemes to recover that data (I know one company used to use around a dozen unique methods with various parameters) and once that data is recovered they remap the failing sector onto spare tracks. All without the knowledge of the user, and without triggering SMART (unless the sector is unrecoverable, of course). This is not cheating, it's actually fairly common and an expected part of the drive aging as debris hits the surfaces.
It's when the drive runs out of spare tracks that SMART comes into the picture and starts letting you know that things are heading south. That's why my advice has been consistent for more than the last decade: when the drive starts telling you it's having trouble, back it up and replace it fast. There are failure modes that SMART isn't good about detecting, too, like the electronics components since those usually give you far less warning than the magnetic or mechanical components.
But SMART is NOT a universal standard, it's more a format for reporting errors and what one company views as "normal rewriting" is another's "exceeds thresholds". It's how the drive makers decide to set their thresholds (what level of recovery algorithm was required to read the sector and whether that is something that SMART should know about) and what they measure that determines how useful SMART actually is. The biggest differentiation there is company culture. IMO, HGST had (has?) probably the best balance of any company I worked with having had the longest cultural exposure to the idea of SMART, and some really, really sharp guys.