Backblaze Dishes On Drive Reliability In their 50k+ Disk Data Center

← Back to Stories (view on slashdot.org)

Backblaze Dishes On Drive Reliability In their 50k+ Disk Data Center

Posted by timothy on Wednesday February 17, 2016 @05:50AM from the learning-from-experience dept.

Online backup provider Backblaze runs hard drives from several manufacturers in its data center (56,224, they say, by the end of 2015), and as you'd expect, the company keeps its eye on how well they work. Yesterday they published a stats-heavy look at the performance, and especially the reliability, of all those drives, which makes fun reading, even if you're only running a drive or ten at home. One upshot: they buy a lot of Seagate drives. Why? A relevant observation from our Operations team on the Seagate drives is that they generally signal their impending failure via their SMART stats. Since we monitor several SMART stats, we are often warned of trouble before a pending failure and can take appropriate action. Drive failures from the other manufacturers appear to be less predictable via SMART stats.

5 of 145 comments (clear)

Min score:

Reason:

Sort:

Seagate SHOULD be good at that by damn_registrars · 2016-02-17 06:09 · Score: 3, Insightful

Considering how awful their failure rates are in general, they need to get good at reporting them before hand or they (as a company) won't exist much longer. After all, investing in quality is clearly too expensive...

--
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
Re:RAID, let them fail by Dareth · 2016-02-17 06:18 · Score: 5, Insightful

The purpose of RAID is to keep data available for a purpose. You have some level of redundancy measured in terms of number of disk that can fail before you have a data loss for the array. Once a disk has an impending failure smart alert, you no longer have full confidence in that disk. If you leave it to fail, what if another disk in the array happens to fail. You now have an array with a failed disk, possibly in a degraded mode. You also have a disk with a better than normal chance of failure. It just makes sense to be proactive and fix the issue before it escalates into a failure.

--

I only look human.
My mother is a halfling and my dad is an ogre, so that makes me an Ogreling
Re:RAID, let them fail by Old97 · 2016-02-17 06:47 · Score: 3, Insightful

Yes, and if one disk in an array fails, the likelihood that another disk in the same array will fail soon goes way up. That's because they many disk failures are related to environmental factors - power, air, particulate matter, etc. Whatever factors contributed to the first disk failure are also present for the other disks in the array. So it's best to replace disks that have impending failure as soon as you can.

--
Very often, people confuse simple with simplistic. The nuance is lost on most. - Clement Mok
Drive generation matters and You Are Not Backblaz by Fencepost · 2016-02-17 08:27 · Score: 3, Insightful

One of the significant notes is that it seems the Seagate 4TB drives are doing much better than some earlier versions, and that WD is no longer doing so well.

Another thing that gets brought up every time one of these is released is "Why are they still using Seagate drives if they're so bad?" and the answer is simple: it remains a balancing act between cost and reliability. Backblaze has the redundancy and processes in place to not worry about single-drive failures, so FOR THEIR USAGE the lower drive cost is more important. If you're on a smaller setup where you have everything on just a few drives with inadequate redundancy, a few dollars extra for better reliability is worth the cost.

When you really get down to it Backblaze is looking at cost per gigabyte per day, and if ($LESS_RELIABLE_DRIVE_COST + $DRIVE_REPLACEMENT_COST) is lower than ($MORE_RELIABLE_DRIVE_COST) then they're going with the cheaper option.

--
fencepost
just a little off
Re:This is a repeat of 6/23/15 topic . "When will" by Anonymous Coward · 2016-02-17 08:32 · Score: 3, Insightful

Considering they are hitting 5-6 years on a decent population of their drives I think they are doing OK.