Backblaze Dishes On Drive Reliability In their 50k+ Disk Data Center

← Back to Stories (view on slashdot.org)

Backblaze Dishes On Drive Reliability In their 50k+ Disk Data Center

Posted by timothy on Wednesday February 17, 2016 @05:50AM from the learning-from-experience dept.

Online backup provider Backblaze runs hard drives from several manufacturers in its data center (56,224, they say, by the end of 2015), and as you'd expect, the company keeps its eye on how well they work. Yesterday they published a stats-heavy look at the performance, and especially the reliability, of all those drives, which makes fun reading, even if you're only running a drive or ten at home. One upshot: they buy a lot of Seagate drives. Why? A relevant observation from our Operations team on the Seagate drives is that they generally signal their impending failure via their SMART stats. Since we monitor several SMART stats, we are often warned of trouble before a pending failure and can take appropriate action. Drive failures from the other manufacturers appear to be less predictable via SMART stats.

145 comments

Min score:

Reason:

Sort:

Not very useful. by Anonymous Coward · 2016-02-17 05:54 · Score: 0, Interesting

The type of use case they subject their drives to is very unlike the type of use case you will likely see. I wouldn't try to read too much into their statistics. They really apply only to themselves, or someone in the same business.
1. Re:Not very useful. by Anonymous Coward · 2016-02-17 06:35 · Score: 5, Funny
  
  Exactly, so even though these are the best large scale numbers we have, they are garbage. We shouldn't use them even though they are the largest sample size. They're useless like the people that carefully compiled these numbers. Instead, we should trust drive manufacturer's marketing numbers, as you suggest.
2. Re:Not very useful. by omnichad · 2016-02-17 07:59 · Score: 1
  
  If you pick something that doesn't fail under their extreme circumstances, it's a lot less likely to fail at home.
  I should have used their report to plan my home RAID. I have 4 x 3TB Toshiba drives. And 3 of them are the DT01ACA300, which is a little less reliable than the Seagates they chose (but thankfully way better than Western Digital). I didn't even buy the 3 drives from different vendors. I bought 3 in one place, likely from the same batch. The fourth was an external drive, scavenged for its internal drive and as a replacement enclosure for a friend because it was a good price and for a little diversity. That was a DT01ABA300, which is apparently a little different and potentially more reliable.
3. Re:Not very useful. by b0bby · 2016-02-17 08:13 · Score: 1
  
  That may be so, but my experience with the 3TB Seagates mirrors theirs - they were the worst drives we ever used in our RAIDs.
4. Re:Not very useful. by brianwski · 2016-02-17 08:15 · Score: 5, Informative
  
  Disclaimer: I work at Backblaze.
  
  > very unlike the type of use case you will likely see
  
  Being extremely specific - we (Backblaze) keep the drives powered up and spinning 24 hours a day, 7 days a week. So if you leave your drives powered off most of the time and boot them only sometimes, the failure rates we see may or may not be something like yours?
  
  I'm curious if anybody has any other suggested differences with "what you will see". Most of our drive activity is light weight - we archive data for goodness sake, we write the data once then maybe read it once per month to make sure the data has not been corrupted. We stopped using RAID a while ago, so you can't say you need drives that are designed for RAID, because we don't use RAID (we do a one time Reed-Solomon encoding and send it to different machines in different parts of our datacenter and write it to disk with a SHA1 on this "shard" where that shard lives it's life independently without RAID).
  
  ANOTHER POINT MANY PEOPLE MISS -> you can't just pick the lowest failure rate drive and then skip backups!! *EVERY* drive fails, every single solitary last drive. So you must have a backup if you care about the data, you really really do. And if you have a backup, then you are free to choose a drive that fails at a higher rate if there are other considerations such as it is a much cheaper drive. Hint: Backblaze doesn't always choose the most reliable drive, we look at the total cost of ownership including the amount of power the drive will consume and the drive's failure rate and let a spreadsheet kick out the correct drive for us to purchase this month. It is rarely the most reliable drive.
5. Re:Not very useful. by Voyager529 · 2016-02-17 08:29 · Score: 0
  
  Backblaze doesn't always choose the most reliable drive, we look at the total cost of ownership including the amount of power the drive will consume and the drive's failure rate and let a spreadsheet kick out the correct drive for us to purchase this month. It is rarely the most reliable drive.
  You must be a force to be reckoned with in EVE Online.
6. Re:Not very useful. by Maxo-Texas · 2016-02-17 08:34 · Score: 1
  
  Exactly- I mean the drives in four of my computers are never used since I don't turn those drives off.
  Any data that doesn't give failure rates for drives which are only used a couple times a year is pointless.
  
  --
  She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
7. Re:Not very useful. by dgatwood · 2016-02-17 08:55 · Score: 1
  Yeah, early Seagate perpendicular storage drives had serious problems, including (supposedly) some firmware bugs that made the problem worse. This was about the same time period where I lost five or six drives in the same year, all Seagate. I stopped using their hardware after that, and haven't looked back. Good to know that their reliability has gotten back to acceptable levels since then, but they should never have shipped that junk.
  The things that stand out to me in that data are:
  
  The Toshiba drives have such a wide confidence interval (presumably because of low device count) to make the data mostly useless.
  Same goes for the newest HGST (8 TB) drive.
  Same goes for the smallest WD (2 TB) drive, to some degree, but there's cause for concern there.
  The 1.5 GB Seagate appears to be garbage, and should probably be recalled en masse.
  The 3 TB WD line appears to be garbage and should probably be recalled en masse.
  The rest of the WD line looks dubious and should be watched very carefully.
  But the biggest takeaway is that HGST drives appear to be about an order of magnitude more reliable than any other manufacturer, on average, with the possible exception of Toshiba (for which the data is insufficient to render judgment), and ignoring HGST's 8 TB drive, for which there's still not enough data to judge its reliability.
  --
  Check out my sci-fi/humor trilogy at PatriotsBooks.
8. Re:Not very useful. by Anonymous Coward · 2016-02-17 09:00 · Score: 0
  
  Um, I think the summary covers that pretty easily. All drives will fail, but knowing when that failure will occur is hard and gold in a corporate environment. So if you can know this you are ahead of the game. Do you want to come in, in the morning to a dead workstation or server or do you want to be warned ahead of time, plan and replace that drive? Apparently you don't care or think about this stuff or you're just too young to realize that stuff happens. Be prepared and you'll survive, be unprepared and you'll be on the streets.
9. Re:Not very useful. by Archangel+Michael · 2016-02-17 09:23 · Score: 2
  
  Actually, since they actually KEEP their stats, they are the most reliable information on drive failures. My rough experience is similar, about 5% failure rate across the fleet. Some drives, last long time, others not so much. Same drives.
  My take on this is that Backblaze has dispelled plenty of myths about drive lifespans. I don't really trust anecdotal evidence offered by geeks (including my own above!)
  
  --
  Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
10. Re:Not very useful. by gweihir · 2016-02-17 09:52 · Score: 1
  
  I very much agree on the backups. Sure, catastrophic failures are rarer today, but they happen and only backup protects you.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
11. Re:Not very useful. by wbr1 · 2016-02-17 10:02 · Score: 1
  
  Interesting stuff.
  I work for a small break/fix shop that is transitioning to MSP (2000 nodes and counting). I have made drive recommendations based off of your data for some time now. While our clients tend to use smaller consumer grade drives, once can make a generalization that manufacturing quality follows brand to some degree regardless of capacity and model. Anecdotally, our experience fairly closely matches the numbers you have given in past reports, but any numbers I generate are not corrected for factors like market share.
  For our MSP clients we do monitor various drive factors. Currently we monitor the drives on SMART pass/fail and for other OS level errors (bad blocks, failed reads, etc). My experience also shows that many drives fail with other indicators in the SMART stats that still fall within manufacturer tolerances. Primarily with reallocated/pending/uncorrectable sector counts. Once you see any increase above 0 there, typically the drive is done and the failure proceeds fairly rapidly. So we are working on scripting to monitor those numbers too.
  Interestingly, we often see failures from completely idle and powered down drives too. My suspicion is that a large portion of failures are magnetic domain issues that increase with age regardless of usage, but I have nothing to back that conjecture with.
  I hope that as we grow I can collect failure data from our install base as well as it can help with many IT decisions.
  
  --
  Silence is a state of mime.
12. Re:Not very useful. by KingMotley · 2016-02-17 12:27 · Score: 1
  
  Same here. I've just about phased out the 3TB Seagates (The ST3000DM001 variety). Had absolutely horrible fail rates with that model. In fact, I have a class action notice sitting on my counter regarding that particular model as well. They were so bad that even when they would fail within the 1 year warranty period I refused to send them back. I just replaced them as they failed. I believe 1 single drive now remains.
13. Re:Not very useful. by dbIII · 2016-02-17 13:36 · Score: 1
  
  If you pick something that doesn't fail under their extreme circumstances, it's a lot less likely to fail at home.
  That depends entirely on the failure mechanism.
  If their drives are running hot for very long periods of time and you have very well ventilated case in comparison then the extreme test may not be very relevant.
  Conversely if there's a drive prone to failure from frequently powering up and down then their results from running 24/7/365 wouldn't pick it up.
14. Re:Not very useful. by dbIII · 2016-02-17 13:38 · Score: 1
  
  Correct me if I am wrong, but aren't the Toshiba drives still just rebadged Hitachi drives?
15. Re:Not very useful. by Anonymous Coward · 2016-02-17 13:40 · Score: 3, Interesting
  
  Backblaze doesn't always choose the most reliable drive, we look at the total cost of ownership including the amount of power the drive will consume and the drive's failure rate and let a spreadsheet kick out the correct drive for us to purchase this month. It is rarely the most reliable drive.
  Do you factor in the work cost? In an environment where services are bought by the hour, the cost of a single maintenance operation is more than the cost difference between the most expensive drive in a selected class and the drive of an average cost.
16. Re: Not very useful. by omnichad · 2016-02-17 14:34 · Score: 1
  
  I do run 24/7/365, so most of their data will just be more extreme. I think that it's better than no starting point at all for buying drives for a home server.
17. Re:Not very useful. by craighansen · 2016-02-17 15:35 · Score: 1
  
  Your 3TB Toshiba drives are way better than the 3TB Seagates (ST3000DM001) - Backblaze had a cumulative failure RATE of 28% - that's 28% failed per year. In my experience, they are ALL going bad before their third year of use. Backblaze has taken them all out of service, and mine are now paperweights. I do concur with Backblaze that most of them showed SMART failures before they died.
18. Re:Not very useful. by brianwski · 2016-02-17 15:41 · Score: 3, Informative
  
  Brian from Backblaze here.
  
  > Do you factor in the work cost?
  
  Yes. And I think the mods were being unreasonable to vote you down, it is a fine question!
  
  We have enough drives (56,000+ all in one datacenter) so that we need a team of 4 full time employees working inside the datacenter to take care of it. If we purchase a drive with higher replacement rates, we will need to hire more datacenter techs, so it gets entered into the equation. ANOTHER area this comes up is server design: most datacenter servers put the drives mounted up front for fast and easy replacement without having to slide the computer around. Our pods put 45 drives accessed through the lid of the pod which means it takes longer to swap the drive - the pod is shut down, the pod is slid out like a drawer, some screws or (most recently) a tool less lid is detached, the drive is swapped, then repeat backwards to put the pod back in service. We did the math, and we feel there is (significant) cost savings that outweighs the additional effort and time to replace the drives. Front mounted (traditional) is something like 1/3rd the drive density with what we have, which means the datacenter space bill would be 3x larger but we would hire fewer datacenter techs.
19. Re:Not very useful. by KGIII · 2016-02-17 17:24 · Score: 1
  
  To add to the above; Not just backups but *verified* backups *and* a recovery plan. A decent backup strategy should include several things including value, location, and recovery speed. In the days of cheap hardware and cheap bandwidth, it's really silly to rely on a backup that's just an external disk or two in your house. Put box up at a friend's and host one for them.
  Err... As mentioned before, I'm kind of anal about backups. I have all but the verification pretty well automated. I can automate the verification but I can't automate it with a level of confidence that matches my paranoia. So, I sometimes (not always) do manual verification including sometimes slapping the image right back on the box and ensuring it works. So far, so good. I do multiple stages to disparate locations with varied values depending on the type of data.
  
  --
  "So long and thanks for all the fish."
20. Re:Not very useful. by KGIII · 2016-02-17 17:30 · Score: 1
  
  Just an FYI...
  They weren't voted down (according to the data given - the pop-up shows no history when you click on it). AC posts start at 0 by default. Logged in users start at 1. People with Excellent Karma can start at 2 if they want but most of us keep that turned off. It's not really much of a metric to determine quality, at that level, but it's a way to filter out AC posts as some people are disinclined to read those posts. You can change your filter settings to -1 (the lowest of the low and where I prefer to read) if you want.
  
  --
  "So long and thanks for all the fish."
21. Re:Not very useful. by dgatwood · 2016-02-17 18:22 · Score: 1
  
  Apparently, they are based on designs that they acquired from WD as part of WD's acquisition of HGST, but I wouldn't go so far as to say that they're rebadged Hitachi drives. After all, HGST is owned by WD, not Toshiba. So the answer is kind of convoluted. :-)
  
  --
  Check out my sci-fi/humor trilogy at PatriotsBooks.
22. Re: Not very useful. by dbIII · 2016-02-17 20:26 · Score: 1
  
  I'd better be a bit more blunt and clear.
  For a variety of reasons BackBlaze pack more drives into each case than others consider sane and they have a lot of heat with very minimal airflow. It may make sense with their business model but it's not a typical server environment.
  So the results are selecting almost purely for drives that handle high temperatures better than others. While that may simulate rapid ageing for some sorts of defects they are mostly going to fail due to the conditions that most drives are not going to be exposed to.
  
  So IMHO use it as a rough guide but take it with a bucket of salt.
23. Re:Not very useful. by Anonymous Coward · 2016-02-17 20:41 · Score: 0
  
  I'm curious if whether you think LTO-7 can be used competitively with hard drives. A 6TB LTO-7 tape cartridge is cheaper than a 6TB hard drive, but the LTO-7 tape drives are ridiculously expensive.
24. Re: Not very useful. by drsmithy · 2016-02-17 22:23 · Score: 1
  
  For a variety of reasons BackBlaze pack more drives into each case than others consider sane and they have a lot of heat with very minimal airflow.
  Backblaze have their drives in datacentres with ambient temperatures in the low 20s C, probably less. I'd be surprised if their drives got over 30C.
  Most home PCs/servers would be lucky to keep their drives _under_ 30C unless they're somewhere where the ambient is quite low.
25. Re: Not very useful. by omnichad · 2016-02-18 02:09 · Score: 2
  
  From Backblaze's own mouth:
  
  After looking at data on over 34,000 drives, I found that overall there is no correlation between temperature and failure rate.
  To check correlations, I used the point-biserial correlation coefficient on drive average temperatures and whether drives failed or not. The result ranges from -1 to 1, with 0 being no correlation, and 1 meaning hot drives always fail.
  Correlation of Temperature and Failure: 0.0
26. Re: Not very useful. by omnichad · 2016-02-18 02:28 · Score: 1
  
  They keep their drives between 20 and 30 degrees Celsius and find no correlation at all between drive failure and temperature: https://www.backblaze.com/blog...
27. Re:Not very useful. by PapaSurf · 2016-02-18 06:11 · Score: 1
  
  Yes! Very unlikely that people will use hard drives to store data! Brilliant.
28. Re: Not very useful. by drsmithy · 2016-02-18 09:08 · Score: 1
  
  Yes, the Google study found the same.
  However, I suspect this is mostly because in datacentres drives simply don't get hot enough for heat to become a factor - rarely over 30 degrees.
  In home PCs and servers, 40-50 degrees C is quite common. Hard disks in machines like iMacs regularly get over 50 degrees C.
29. Re: Not very useful. by omnichad · 2016-02-18 09:25 · Score: 1
  
  Mine are in the basement, and are at 35 degrees right now. And I have 5 drives crammed into a mini tower (OS + 4 drive RAID). Not too far off from their test range, so I'll be using their help in a year or so when I replace all my drives. I probably won't even need more storage by then, but I hate having old spinning drives. The last upgrade was so I could rip my Blu-Ray collection.
30. Re:Not very useful. by nerdbert · 2016-02-18 09:51 · Score: 1
  
  You're wrong. Toshiba's desktop/server line has very, very little in common component wise with HGST, and their own design and manufacturing centers. In fact, HGST has relatively little in common with their now parent WD as far as hardware goes (although I suspect that will change now). Toshiba drives have more components in common with WD than with HGST.
31. Re:Not very useful. by nerdbert · 2016-02-18 10:28 · Score: 2
  
  I worked in the drive industry for almost two decades, and I've made this comment before, but I'll make it again.
  Modern disk drives detect failing sectors automatically. They go through increasing complex recovery schemes to recover that data (I know one company used to use around a dozen unique methods with various parameters) and once that data is recovered they remap the failing sector onto spare tracks. All without the knowledge of the user, and without triggering SMART (unless the sector is unrecoverable, of course). This is not cheating, it's actually fairly common and an expected part of the drive aging as debris hits the surfaces.
  It's when the drive runs out of spare tracks that SMART comes into the picture and starts letting you know that things are heading south. That's why my advice has been consistent for more than the last decade: when the drive starts telling you it's having trouble, back it up and replace it fast. There are failure modes that SMART isn't good about detecting, too, like the electronics components since those usually give you far less warning than the magnetic or mechanical components.
  But SMART is NOT a universal standard, it's more a format for reporting errors and what one company views as "normal rewriting" is another's "exceeds thresholds". It's how the drive makers decide to set their thresholds (what level of recovery algorithm was required to read the sector and whether that is something that SMART should know about) and what they measure that determines how useful SMART actually is. The biggest differentiation there is company culture. IMO, HGST had (has?) probably the best balance of any company I worked with having had the longest cultural exposure to the idea of SMART, and some really, really sharp guys.
32. Re:Not very useful. by gweihir · 2016-02-18 11:17 · Score: 1
  
  Well, if you have verified backups (I fully agree on that, and just a trial-read of the backup is not enough, you need a compare), and then no recovery plan, you can still pay somebody with a clue a lot of money to do it for you ;-)
  External disks are fine, but make that in an independent location. A locker at work, in the gym, etc. is fine (you should decidedly encrypt any sensitive backup). Of course, you can have on-site backups for convenience as well, but for everything you really do not want to lose, you should store a copy off-site.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
33. Re:Not very useful. by KGIII · 2016-02-18 13:03 · Score: 1
  
  Yeah, I've a whole "server room" in my basement at home - complete with racks. I provision space for my friends and they do the same for me though I provide my own equipment at their homes as well as have it set up for them so that they can just let it run and verify on their own. I have a garage that's close to the house and there's a monthly storage in there. Then, I have the house that was here on the property when I bought it and there's storage in there - that also has connected media so I can push remote stuff there - it has it's own disparate connection to the 'net and runs a "seed box" that is more akin to a small cluster.
  I lost data once. Never again. Ever, ever, ever... I will not lose data again, at least not any meaningful data. I already don't store much data locally on the system I'm working on. It's all pushed out to storage that syncs automatically into disparate locations which will then push it out at varied intervals and cycle things or do incremental back ups. There's physically attached devices that get replication in case of drive failure. I don't use *any* RAID but disparate disks - I'll buy more of them if I gotta. I will not lose data again. I'm pretty anal about it except, well... I'm kind of lazy so I automated the hell out of it. 'Snot like I'm going to remember to do it manually.
  I had a strange lightning event, very close to the house, years and years ago when I lived in NC. It split a tree that was hanging over the glassed/screened in porch and the gear was mostly in that area. Yet, every single bit of magnetic storage, even that which wasn't plugged in, was erased - some ruined. Even the MBR was gone. Floppies (those old things) and Zip disks were erased. Hard drives that weren't plugged in were erased. Two powered-on HDDs died completely as did one that wasn't plugged in. One wasn't "dead" but made a clicking noise like the Zip click-of-death and SMART indicated it would be failing soon. I had, by sheer luck, some of my stuff at the office. I lost things that were from the 1980s that I'd carefully kept moving but never once bothered to put in a separate location. Fortunately, some was at the office just waiting for me to bring it *back home.* Yeah...
  Needless to say, we also ended up with a whole new backup process at work as well. I had a new backup plan running in a week and then spent about another month formalizing it. We'd had backup and remote backup but we ended up with a secondary, off site, and then used an archival service and multiple forms of media.
  Never again.
  
  --
  "So long and thanks for all the fish."
34. Re: Not very useful. by dbIII · 2016-02-18 13:42 · Score: 1
  
  It appears you don't understand the situation since they don't have typical servers but have drives packed in very tightly in multiple rows. Ambient temperature is what happens outside the case. If airflow is very poor then there are pockets at a much higher temperature inside the case. It can be 50C+ in there since the 20C air can only trickle through the gaps and is heated by each successive row of drives in instead of only having a single row of drives at the front of the case.
  They can get away with it because they have the writing load spread over multiple servers at once and very little reading load (archiving with retrieval of portions instead of entire archives at once). However they are still going to have some peak times and some cases are going to get very hot. It's cheaper for them to burn through drives instead of taking up more rack space so I'm not suggesting they are stupid, even if in a different context it appears to be very much so.
  I've seen a case where someone attempted to copy blackblaze for a normal, if somewhat low usage, server and it lost seven of twelve drives one week - arranged around the midpoint where the heat could not so easily escape. Three more drives were damaged enough that they failed soon after in a different case with decent airflow.
35. Re: Not very useful. by dbIII · 2016-02-18 13:48 · Score: 1
  
  Imagine if you had the air output from those five drives feeding into another five, then another five, then five more. Now turn down the speed of the single row of cooling fans. The backblaze usage of write once, read not so much can survive in that situation and just shed a drive every now and again but even your home server is likely to get in trouble with such a design.
36. Re: Not very useful. by dbIII · 2016-02-18 13:53 · Score: 1
  
  Since the temperature is going to vary wildly over the 45 drives in each "pod" something that would be far more indicative would be the average temperature of the drives that actually failed instead of all of the drives.
37. Re:Not very useful. by wbr1 · 2016-02-18 14:27 · Score: 1
  
  Good info. Some questions though. If a drive's firmware is reallocating without modifying SMART numbers, what is the reallocated sectors count for? Additionally, if the drive waits until it is out of spare tracks before incrementing the stats, how can it ever reallocate sectors? Of course the manufacturers can do quite a bit of invisible hand waving in the firmware.
  I guess one could test for this by benchmarking sequential reads and writes across the drive surface when new, then again later. If there was s significant degradation in seq. R/w speed it could indicate fragmentation due to firmware sector remapping.
  
  --
  Silence is a state of mime.
38. Re: Not very useful. by drsmithy · 2016-02-19 01:23 · Score: 1
  
  It appears you don't understand the situation since they don't have typical servers but have drives packed in very tightly in multiple rows.
  I'm quite aware of "the situation" as I've been watching Backblaze's designs for years and I've spent 15+ years dealing with Tier 1 server hardware. They're quite clearly inspired by Sun's X45xx Thumper series, with vertically stacked drives. Yes, the drives at the back will get hotter than those at the front, but it is difficult to see them getting anywhere close to 50. You do not need to shift a huge amount of air over a drive to make a non-trivial difference to how hot it gets. Even a slow-spinning, practically silent fan like some manufacturers put in their 4-in-3 cages will knock 5-10C off the typical operating temperature of drives (unless the ambient is high).
  Ambient temperature matters because blowing <20 degree air through a case (datacentre scenario) will obviously have a better cooling effect than blowing 25+ degree air through a case (average home scenario).
  Here is some temperature information from Backblaze. It shows what I expect, with the vast majority of drives under 30 degrees. Without bothering to check drive specs, I'm going to guess all the drives at the hotter end of the scale are 7200rpm models.
39. Re:Not very useful. by Biolo · 2016-02-19 02:21 · Score: 1
  
  I'm running the Toshiba DT01ACA300 drives mentioned in the report, not had a single one fail over several years of usage. Compare that to the Seagate ST3000DM001, also in that report, I had 10 of them at one point, and over 4 years 90% of them failed (not counting those replaced in the first year under warranty!). They report a nearly 30% failure rate, which is comparable to my experience. Only one Seagate left, and I expect that will be gone within the year (it's got a hot spare waiting to take over when it does).
  My Toshiba drives (and a couple of HGST HDS5C3030ALA630, which became the DT01ACA300 Toshibas after the plant transfer to Toshiba) were installed as the Seagates died, have up to 27,700 power on hours (3+ years), and so far flawless reliability. They don't look so reliable in their report, but the failure rate they report is low enough that its not unexpected that I wouldn't have seen one die yet.
  I had sworn off Seagates, but it looks like it may have just been one bad model. Useful to see these sort of numbers released as it's helped to remind me not to so easily write off the entire companies drives. Having said that, that specific drive is still available in the retail channel but I wouldn't touch it with a bargepole.
  
  --
  Stealing a rhinoceros should not be attempted lightly.
40. Re:Not very useful. by omnichad · 2016-02-19 02:29 · Score: 1
  
  couple of HGST HDS5C3030ALA630, which became the DT01ACA300 Toshibas
  
  I think that's how I ended up buying Toshiba. When I realized that it's what had been HGST, which has had a good track record for a while now (especially on Backblaze's report).
  
  it's helped to remind me not to so easily write off the entire companies drives.
  
  Except Western Digital, right? What happened to the Red drives? They run slow and don't have intentional anti-RAID hobbling. They should be decent choices. If they weren't overpriced, I would have gone with those (thankfully I didn't).
41. Re: Not very useful. by dbIII · 2016-02-19 10:59 · Score: 1
  
  Yes, the drives at the back will get hotter than those at the front, but it is difficult to see them getting anywhere close to 50
  I've seen it happen.
  Remember we are describing shoving drives in anywhere they will fit instead of a server case designed by someone that went somewhere near a technical college or university for anything other than coding.
  
  Yes, the drives at the back will get hotter than those at the front
  What about the nest row, or the one after - 45 drives jammed in tight and almost no airflow.
  It may work for them but it's not a typical environment so their results shouldn't be taken as anything other than a rough guide for a typical environment.
  
  Also with respect, it's the peak temperatures and not the averages that really matter if heat is killing those drives.
42. Re: Not very useful. by drsmithy · 2016-02-19 13:40 · Score: 1
  
  I've seen it happen.
  Anecdotes are not data.
  They don't seem to have seen any problems
  
  "About a year ago, we took a group of Storage Pods and removed the 3 fans at the end, leaving just three middle fans to cool the unit. We placed these pods into production and monitored the temperature of the hard drives utilizing the SMART stats we take each day. Nothing changed, as the drives stayed cool and didn’t fail at higher rates."
  The average drive temperatures are overwhelmingly in the 18-26 degree range. That means if there are an appreciable fraction of drives (15-20% you seem to be implying) are typically reaching the 40-50 degree mark, then to keep the overall average so low there must also be a decent percentage of drives at mid-teens, if not lower, temperatures. How do you think operating drives are going to sustain temperatures that low ?
  There is no evidence that drives in Backblaze pods are overheating. So either we can take the reasonable and logical conclusion - that they're not - or we can take the conclusion that they're lying about drive temperatures for some reason.
  Remember we are describing shoving drives in anywhere they will fit instead of a server case designed by someone that went somewhere near a technical college or university for anything other than coding.
  Yeah, you're right. I'm sure a company with a massive business interest in designing high-capacity storage servers hasn't invested a cent in hiring or consulting people with expertise in the field. </SARCASM>
  Like I said, their design is largely the same as Sun's X45xx series. So they're far from the first to line up drives one behind the other.
  Also with respect, it's the peak temperatures and not the averages that really matter if heat is killing those drives.
  What evidence is there that peak temperatures and more significant than sustained temperatures ? What evidence is there heat is killing drives at all ?
43. Re: Not very useful. by dbIII · 2016-02-19 14:14 · Score: 1
  
  Anecdotes are not data
  You gave a blanket opinion of it not happening so a single data point is enough to tell you that your opinion does not always reflect reality. Your failed oneupmanship with the resume stuff is also an anecdote BTW and is a bit of an odd thing to do in a place like this.
  
  I'm sure a company with a massive business interest in designing high-capacity storage servers
  You don't seem to have been following the thread. My point has always been that it is for a specific use-case that does not make much sense at all outside Backblaze - thus results from them shouldn't be taken as more than an indication when the drives are used outside of the tiny pods with 45 drives packed in tightly and very little airflow.
  
  What evidence is there that peak temperatures and more significant than sustained temperatures ?
  Tribology :)
  The short story is that temperature excursions take a lot of the life out of bearings, motors, lubricants etc, semiconductors have their resistance go up and generate even more heat, materials expand at different rates, polished surfaces stick to each other more etc etc. Something that gets hot every now and then is likely to fail in a different way to something that operates at a constant temperature.
44. Re: Not very useful. by drsmithy · 2016-02-19 14:56 · Score: 1
  
  You gave a blanket opinion of it not happening so a single data point is enough to tell you that your opinion does not always reflect reality. Your failed oneupmanship with the resume stuff is also an anecdote BTW and is a bit of an odd thing to do in a place like this.
  Your "someone tried to copy a Backblaze pod and got it wrong" is hardly a powerful counter-example.
  There's no evidence to suggest their pod design has heat problems. None. That - and a non-trivial amount of experience with a wide variety of server hardware, including ones with near identical designs to Backblaze - is what my opinion was based on.
  This is nothing to do with "oneupmanship". It's data vs an anecdote.
  You don't seem to have been following the thread. My point has always been that it is for a specific use-case that does not make much sense at all outside Backblaze - thus results from them shouldn't be taken as more than an indication when the drives are used outside of the tiny pods with 45 drives packed in tightly and very little airflow.
  And my point is that your reasoning is wrong, because it is based on an incorrect assumption that they have heat problems, when all the evidence suggests they do not. Your fundamental argument is that the design of the Backblaze pod has inherent hotspot issues For a variety of reasons BackBlaze pack more drives into each case than others consider sane and they have a lot of heat with very minimal airflow.). This would most certain manifest in the average temperatures and is completely independent of any drive failure stats.
  MY point was actually, in a roundabout fashion, similar to yours, but with the opposite reasoning. Their conclusions need to be taken with a grain of salt because it is based on drives in datacentres that generally will not see the sorts of high temperatures, temperature fluctuations and power cycling that home desktops and servers do. As I said originally - and before actually checking - "I would be surprised if their drives get over 30C". Well, I was a bit surprised because some of their drives are getting up into the mid-30s (though clearly not many). But that's a long way from the 40-50 a typical desktop or home server PC will be cycling into day-in, day-out.
  You're off on a dead end arguing Backblaze have heat problems in their design based on a single half-arsed anecdote, rather than looking at all the evidence available that says the do not. Yes, their conclusions should be taken with a grain of salt, but that's because the typical home user disk sees much _harsher_ conditions than the typical Backblaze drive, not vice-versa.
45. Re: Not very useful. by dbIII · 2016-02-19 17:06 · Score: 1
  
  There's no evidence to suggest their pod design has heat problems
  Do I have to keep on repeating myself - for what they do it makes sense but for general usage take a look at their first design - utterly insane if those disks are getting a lot of use at one. 45 disks packed in with very little airflow due to not much in the way of fans and disks stacked in direct contact with very little space between physical piles of disks. Almost nothing in the way of fans. Stagnant air in corners and edges. If that makes no sense to you why are you commenting? The example I mentioned was a far more conservative design but lost a lot of drives from overheating because it was used more intensely than the Backblaze guys say they use theirs.
  
  Well, I was a bit surprised because some of their drives are getting up into the mid-30s
  It is an average so some of their drives are staying at that temperature for very long periods of time. What they get up to as a maximum we can only guess at due to the design and expected usage. If they supplied that information it would be very interesting and far more useful than an average including idle time and including disks that did not fail.
  
  You're off on a dead end arguing Backblaze have heat problems in their design based on a single half-arsed anecdote
  No it's based on looking at photographs of a Backblaze pod. There are plenty of good reasons why almost nobody else packs in drives like that, and when they do (like Sun) they put in intermediate rows of fans. While modelling heat transfer was the way I got into cluster computing from engineering some time back that background is not needed to make this call - seriously it's high school stuff. Convection is better if you can move the air a bit more and conduction has to have somewhere to go. Get all those drives running and it's going to be an oven in the middle - so it's just as well Backblaze distribute what load they have.
  
  Which brings me back to the point that matters - test results from an atypical environment shouldn't be taken as more than a guideline in a typical environment.
  
  My second point that matters less is that the design as presented on the net is likely to lead to overheating and even if it doesn't overheat some drives are going to be hotter than others due to being surrounded by slow moving preheated air - so the environment for the drives is going to vary depending on where they are in the "pod" making average failure results a bit less applicable across the board. That design may work for Backblaze but in a situation where the drives are all likely to run at once it's an insane design with almost zero thought put into keeping the drives cool. If instead of running it the way they do you put ZFS on it and scrubbed the drives the heat would probably kill a few of them.
46. Re:Not very useful. by drsmithy · 2016-02-19 17:58 · Score: 1
  
  If you pick something that doesn't fail under their extreme circumstances, it's a lot less likely to fail at home.
  I said this elsewhere but it might get lost in the noise.
  Your typical home PC or server drive will likely see far, far harsher conditions than any Backblaze drive.
  So take their conclusions with a grain of salt, especially the ones around heat.
47. Re: Not very useful. by drsmithy · 2016-02-20 01:56 · Score: 1
  
  Do I have to keep on repeating myself []
  You can repeat yourself as much as you want, but it doesn’t make you right.
  The data says Backblaze don’t have any heat problems.
  Backblaze explicitly say they don’t have any heat problems and have done as long as I’ve been reading about them (they obviously track the necessary data to know and have no reason to lie that I can see).
  Experience says that even slow moving 30-35 degree air over a group of drives that would otherwise be running at 45-50 degrees will bring them down to the ~40 degree mark.
  vs
  Random internet guy says Backblaze must have heat problems because somebody he knew built something similar and it did.
  Also, their first design did not have drives in direct contact (clearly a few mm gap between each one - no worse than a standard server case with caddies) and had six case fans (three front, three rear). Any heat problems they might have had in the first design would have come from the vibration-damping rubber sheath around each drive insulating it, not lack of airflow. But, again, according to them they've never had any heat problems.
  It is an average so some of their drives are staying at that temperature for very long periods of time. What they get up to as a maximum we can only guess at due to the design and expected usage. If they supplied that information it would be very interesting and far more useful than an average including idle time and including disks that did not fail.
  No, we can’t “only guess” because we have data. They’ve stated the drives run 24/7. Drive activity - even heavy drive activity - does not significantly change operating temperature over and above idle spinning temps (a few degrees maybe). If they were getting to high maximums, that would show in the data through higher averages (i.e.: there’d be a significant percentage up around the high 30s or low 40s, rather than just the 7200rpm drives). It’s quite reasonable to use averages because it’s quite reasonable to assume even worst case, the load is equally distributed between all drives due to their scale (best are more likely case is they explicitly seek to evenly distribute workload).
  Nothing supports your assertion except your one friend who made a system something like Backblaze's and some drives in it died.
48. Re: Not very useful. by dbIII · 2016-02-20 03:46 · Score: 1
  
  The data says Backblaze don’t have any heat problems.
  I've written about the data and so have you. You pointed out that the average temperature looked very high to you. I pointed out that it's an average over a wide range of conditions and doesn't tell us enough to justify a statement that they don't have any heat problems - and now I'm suggesting that the very high average that you noticed is probably due to much higher temperatures when they are not idle skewing the average up.
  
  no worse than a standard server case with caddies
  It appears you are just being silly for the sake of an argument since it's obviously nothing like that at all.
  
  Nothing supports your assertion except your one friend
  Who said anything about a friend? I was called in to solve a problem a company had with a new server. I suggested a case with better airflow. Problem solved.
  
  Drive activity - even heavy drive activity - does not significantly change operating temperature over and above idle spinning temps
  I do believe it's time for you to read a book on the topic instead of making stupid shit up or is that a deliberate lie to catch me in some silly argument trap or some kind of joke? Do you really have so little experience with computer hardware that you have not touched a drive that has just been removed after heavy usage, such as cloning a drive?
49. Re: Not very useful. by drsmithy · 2016-02-20 14:09 · Score: 1
  
  I've written about the data and so have you. You pointed out that the average temperature looked very high to you.
  I did not.
  I said I’d be surprised if they got over 30. They are getting over 30, but not by much. The vast bulk of drives are clearly under 30. So I am surprised, but not by much.
  “Very high” to me would be, as I have alluded to several times, a substantial percentage of drives into the high 30s to low 40s. Which would be represented in the stats by that great big fat chunk of the curve shifting 10-15 degrees rightwards.
  It appears you are just being silly for the sake of an argument since it's obviously nothing like that at all.
  LOL. You’ve clearly got a giant chip on your shoulder about Backblaze and you’re having a go at me about “being silly” ?
  I do believe it's time for you to read a book on the topic instead of making stupid shit up or is that a deliberate lie to catch me in some silly argument trap or some kind of joke? Do you really have so little experience with computer hardware that you have not touched a drive that has just been removed after heavy usage, such as cloning a drive?
  FFS.
  As reported by SMART, the 16 drives in my home server idle (spinning) at 35-40 degrees. They sit in four Supermicro hot swap cases like these, though I have swapped the rear fans for quieter, slower-spinning Noctua NF-P12s as the machine sits in my office. The case is an Antec 1200 and the cabling inside is messy. It’s 30-31 degrees in the room.
  After nearly an hour of zpool scrubbing, the warmest drive is at 44 degrees and the coolest at 39. Temps have been stable for fifteen minutes. Some small fraction of that will be the ~1 degree rise in ambient over that hour.
  So, like I said, intensive drive activity pushes the temperature up by a few degrees (and in reasonably pessimistic conditions at that). The fans in Backblaze cases are shifting 2.5-3x as much air, the ambient temperature is probably 10-15 degrees cooler and their drives will be seeing constant, though maybe not constantly intensive, activity, meaning less temperature variation from “idle” to “flat out”. I could probably knock a degree or two off my numbers above just by changing the fan speeds in the hot swap cages from 900 to 1300rpm.
  There is no evidence Backblaze have any heat problems with their cases.
50. Re: Not very useful. by dbIII · 2016-02-21 18:50 · Score: 1
  
  So now you are saying something about a conventional server and pretending it's the same as a 45 drive backblaze pod - what a pathetic and stupid shell game just for the sake of keeping an argument going!
  You can not possibly be as stupid as you are pretending to be.
51. Re: Not very useful. by drsmithy · 2016-02-22 00:41 · Score: 1
  
  So now you are saying something about a conventional server and pretending it's the same as a 45 drive backblaze pod [...]
  No. I am saying something about how drive activity affects temperature. (Though it is worth pointing out the drives in my cages are physically as tightly packed as the ones in a Backblaze pod).
  Specifically, I said "drive activity - even heavy drive activity - does not significantly change operating temperature over and above idle spinning temps (a few degrees maybe)."
  As it turns out it changes it by around 4 degrees, as predicted. Not the 10-20 degrees it would need to for your scenario of drives commonly overheating to 40-50 given baselines in the 16-34 degree range.
  So, if those drives waaaayyy up the back in row three of the Backblaze case are sitting there idling along at 28-34 degrees, like the data strongly suggests they do, when they get smashed by activity they're going to heat up to maybe 34-40 degrees, tops (and probably less).
  You can try this yourself pretty easily. Put a drive in one of those USB cradles and point a desk fan at it. Let it sit and spin for 20-30 minutes to get up to a stable operating temperature. Measure that with SMART. Then smash it with a zpool scrub or a dd, or whatever you want for half an hour and measure the temperature again. Observe the difference.
  You can not possibly be as stupid as you are pretending to be.
  [...]
  Instead of jumping on threads to play some wank of a mass debate game where you attempt to convince people of things contrary to reality why don't you do something useful, or at least less annoying?
  You're like a study in psychological projection. Or an incredibly committed troll - in which case, well done.
  Everything I have written here has been supported by evidence, which I have referenced.
  Nothing you have written has.
  Not to mention the two times (at least) you substantially misrepresented something I wrote trying to pretend it was somehow disingenuous.
  "Reality" is that there is no evidence Backblaze have any temperature problems in their servers. Why this bothers you so much you feel the need to abuse people who point it out, I cannot even begin to fathom.
Doesn't make any mention of.. by Anonymous Coward · 2016-02-17 05:56 · Score: 1

Architecture or Filesystem. Anyone know? ZFS perhaps?
1. Re:Doesn't make any mention of.. by Anonymous Coward · 2016-02-17 06:11 · Score: 4, Informative
  
  https://www.backblaze.com/blog/vault-cloud-storage-architecture/
  They mention their architecture here
2. Re:Doesn't make any mention of.. by magwm · 2016-02-17 07:49 · Score: 1
  
  ext4
3. Re:Doesn't make any mention of.. by brianwski · 2016-02-17 08:21 · Score: 5, Interesting
  
  Brian from Backblaze here.
  
  The individual drives in our datacenter run ext4 (the OS is Debian). We do an extremely simple Reed-Solomon encoding that is 17+3 (17 data drives and 3 parity) but the 20 drives are spread across 20 different computers in 20 different locations in our datacenter. This means we can lose any 3 drives and not lose data at all.
  
  We released the Reed-Solomon source code free (open source but even better) for anybody else to use also. You can read about it in this blog post: https://www.backblaze.com/blog...
4. Re:Doesn't make any mention of.. by castionsosa · 2016-02-17 08:48 · Score: 1
  
  Probably the closest one can get, next to having the "secret sauce" software, would be to have a zpool with 20 drives, and have it configured as RAID-Z3, so it would take four drive failures to lose your data.
  These are ideal for vast sums of low tier data... but Backblaze is based on storing stuff as cheaply as possible. For other needs like faster access, either add a "landing zone" for data in the pool with SSDs for ZIL/L2ARC, or as a compromise, go with smaller, faster HDDs for data being worked on fairly often.
RAID, let them fail by Anonymous Coward · 2016-02-17 05:57 · Score: 0

Whether a drive fails and you replace it, or whether its GOING to fail, so you replace it early, you still need a rebuild. Why not just let them fail?
1. Re:RAID, let them fail by Dareth · 2016-02-17 06:18 · Score: 5, Insightful
  
  The purpose of RAID is to keep data available for a purpose. You have some level of redundancy measured in terms of number of disk that can fail before you have a data loss for the array. Once a disk has an impending failure smart alert, you no longer have full confidence in that disk. If you leave it to fail, what if another disk in the array happens to fail. You now have an array with a failed disk, possibly in a degraded mode. You also have a disk with a better than normal chance of failure. It just makes sense to be proactive and fix the issue before it escalates into a failure.
  
  --
  
  I only look human.
  My mother is a halfling and my dad is an ogre, so that makes me an Ogreling
2. Re:RAID, let them fail by Old97 · 2016-02-17 06:47 · Score: 3, Insightful
  
  Yes, and if one disk in an array fails, the likelihood that another disk in the same array will fail soon goes way up. That's because they many disk failures are related to environmental factors - power, air, particulate matter, etc. Whatever factors contributed to the first disk failure are also present for the other disks in the array. So it's best to replace disks that have impending failure as soon as you can.
  
  --
  Very often, people confuse simple with simplistic. The nuance is lost on most. - Clement Mok
3. Re:RAID, let them fail by sexconker · 2016-02-17 06:55 · Score: 4, Informative
  
  Because you don't know how it will fail, you don't know what other drive may fail next, and you don't know when a 2nd, 3rd, nth, drive will fail.
  Further. drives that manage to actually report that they're dying are typically fucked to the point of impacting your performance significantly. If you're still writing to a drive that's hobbling along, it will slow down the whole array.
  Reads are usually okay (depending on your controller and setup) but writes need to be completed at some point, regardless of your cache scheme or cache size.
  Sustained writes to an array with a crippled drive will eventually either result in the drive being taken offline or the array's write performance turning to shit. If you're lucky, the drive is taken offline gracefully, doesn't catch fire, and you do the hot spare / cold spare dance, the rebuild boogaloo, etc.
4. Re:RAID, let them fail by shawn2772 · 2016-02-17 07:26 · Score: 2
  
  Yes, and if one disk in an array fails, the likelihood that another disk in the same array will fail soon goes way up. That's because they many disk failures are related to environmental factors - power, air, particulate matter, etc.
  Even, more, the process of rebuilding a degraded array is very intensive, touching every sector of every disk in the array, old and new. This means that if there are any latent failures that just haven't been noticed, the rebuild process will find them with very high probability. RAID is good, and useful, but as soon as there's a hint of a failure on any disk, you should replace it ASAP. This is also why I favor RAID modes that allow for more than one failed disk. That way if you have one failure and the rebuild triggers/uncovers another, you're not out of luck. I learned this the hard way.
5. Re: RAID, let them fail by Anonymous Coward · 2016-02-17 07:27 · Score: 0
  
  The problem with raid, using modern really large drives, is that the likelihood of a single bit error is so large that if one drive fails you will likely have a bit error on one of the other drives and then you can't restore the array. Raid5 is therefore inadequate and even raid 6 is problematic.
6. Re:RAID, let them fail by shawn2772 · 2016-02-17 07:31 · Score: 3, Informative
  
  Oh, one more thing: You should also ensure that every sector of every disk is read regularly. There are more sophisticated options available, but just setting up a cron job that does something like "cat /dev/sdX > /dev/null" on every drive once per week or so is a reasonable and very simple approach. The goal is to trigger failures early, before they get too bad.
7. Re:RAID, let them fail by Anonymous Coward · 2016-02-17 08:35 · Score: 0
  
  So you're not worried, that in the 12 hours it takes to write 4TB of data to a new drive (assuming it runs at a sustained 150MB/s, which is best-case) another drive in the array won't fail? I hope you're not running RAID 5.
8. Re:RAID, let them fail by thhamm · 2016-02-17 10:22 · Score: 1
  
  I run SMART short tests every day and SMART long tests once a week, but now i'm not sure, i thought the 'long tests' check all sectors?
9. Re:RAID, let them fail by mattventura · 2016-02-17 10:49 · Score: 1
  
  If you're going to use a brute-force solution like this, run it through ionice so that it doesn't suck up all the disk bandwidth.
10. Re:RAID, let them fail by Stolpskott · 2016-02-17 11:37 · Score: 1
  
  BackBlaze might have their own alternative reasons, but in my case ... Because whether you are using RAID, the Reed-Solomon setup that BackBlaze are using, or no distributed data system at all, it is easier/quicker to recover data directly from a drive that is showing signs of failure than it is to restore from a backup or recover from a RAID parity check.
  Yes, it means that I am removing drives from my arrays that still have useful life in them, but they get repurposed - I am quite frequently asked by friends and family for a drive they can use as a one-time transport mechanism for music, photos, videos, documents, pretty much anything, and I go to great pains to point out that the drive is potentially failing. If the data size is "only" a few GB/10's of GB, then I usually have a USB drive or 3 lying around that they can use. But I also have a few external drive caddies that I can drop an old drive into, and which is either preferable or taken as a second copy, because the USB can get lost very easily, while a 1/2 kilo drive is less "lose-able".
11. Re:RAID, let them fail by Anonymous Coward · 2016-02-17 11:56 · Score: 0
  
  Smart doesn't always predict failure. That one google paper as I recall smart was only accurate about 50% of the time (drives fail just as often without any smart warning). Backblaze talks about smart too, some brands have smart that more accurately predicts than others.
  Anyway, just have backups so it doesn't matter if it fails.
12. Re:RAID, let them fail by Anonymous Coward · 2016-02-17 12:37 · Score: 0
  
  Or if you're using Software RAID on Linux, just do a resync weekly. Which will also read every sector on every drive with the bonus of making sure that all drives report back good information.
  Most hardware RAID cards have a similar feature to check the array for errors.
13. Re:RAID, let them fail by Trogre · 2016-02-17 12:53 · Score: 1
  
  Don't forget to prefix that with a:
  ionice -c 3
  or you'll kill your performance.
  
  --
  "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
14. Re:RAID, let them fail by brianwski · 2016-02-17 13:01 · Score: 3, Interesting
  
  Brian from Backblaze here.
  
  Sometimes the "drive failure" is as simple as the little circuit board on the bottom of the hard drive has a component die. This won't be predicted by SMART stats at all. We have chatted very informally with the people at "Drive Savers" ( http://www.drivesaversdatareco... ) and they say one of the early steps in attempting to recover the data from a drive that won't work is to replace the circuit board with the board from an identical hard drive of same make and model.
  
  I have no affiliation with "Drive Savers" but from my interactions with them I trust them as quite a good and valuable service who know their craft. We even used them once in a panic once to get back the minimum number of drives for data integrity in a RAID array (a long time ago before our multi-machine vault architecture). It worked - we got all the data back from the drive!
15. Re:RAID, let them fail by craighansen · 2016-02-17 15:57 · Score: 1
  
  Or if you're using Software RAID on Linux, just do a resync weekly. Which will also read every sector on every drive with the bonus of making sure that all drives report back good information.
  Most hardware RAID cards have a similar feature to check the array for errors.
  mdadm already does a "checkarray" starting at 00:57 on the first Sunday of each month by default. See /etc/cron.d/mdadm
16. Re:RAID, let them fail by Anonymous Coward · 2016-02-17 16:02 · Score: 0
  
  But I also have a few external drive caddies that I can drop an old drive into....
  Because, rather perversely, external drives are cheaper than internal drives, one can end up with a crapload of external drive boxes. I've even seen someone go to the trouble to sell them on ebay. The extra power supplies come in handy as spares, too.
17. Re:RAID, let them fail by AmiMoJo · 2016-02-17 22:23 · Score: 1
  
  ZFS seems like a good solution to all this. As well as having RAID-like levels of redundancy, it checksums all data (not just files, even FS metadata) and can check it on a schedule. What you really care about is your files being intact, so it's better to checksum those than to rely on the disk itself detecting bad sectors. That test will also pick up things like a bad SATA cable or failing enclosure.
  I just wish something that good was available for Windows.
  
  --
  const int one = 65536; (Silvermoon, Texture.cs)
  SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
18. Re: RAID, let them fail by Anonymous Coward · 2016-02-18 03:32 · Score: 0
  
  He is correct.
  When something irreparable is going to break, getting every seconds use is more important than warning that it is going to break.
  The signal of the drive becoming unresponsive is the only thing you should care for, even more so when you have years of data to justify new drive purchases anyway.
  In the end, this company has just thrown away useful runtime, which adds up over those potential lost hours * the dropped drives.
  They've probably spent 5-10% more dropping drives before failure with only a minor slowdown on re-image. (Given a well-designed system, that is)
This page can’t be displayed by Anonymous Coward · 2016-02-17 05:57 · Score: 0

This page can’t be displayed.
Check your links, editors...
1. Re:This page can’t be displayed by Anonymous Coward · 2016-02-17 06:03 · Score: 0
  
  Follow up: It *used* to be a valid link. This does not appear to be the fault of the editors and I sincerely apologize.
2. Re:This page can’t be displayed by Anonymous Coward · 2016-02-17 06:09 · Score: 0
  
  The link works, fast job.
Seagate SHOULD be good at that by damn_registrars · 2016-02-17 06:09 · Score: 3, Insightful

Considering how awful their failure rates are in general, they need to get good at reporting them before hand or they (as a company) won't exist much longer. After all, investing in quality is clearly too expensive...

--
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
1. Re:Seagate SHOULD be good at that by Anonymous Coward · 2016-02-17 06:11 · Score: 0
  
  Funny... the stats from Blackdish show that they were the 2nd most reliable.
2. Re:Seagate SHOULD be good at that by BarbaraHudson · 2016-02-17 06:15 · Score: 1
  
  What's even weirder is that the HGST drives are from Western Digital subsidiary.
  
  --
  "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
3. Re:Seagate SHOULD be good at that by mattventura · 2016-02-17 06:29 · Score: 5, Funny
  
  Seagates are great at reporting impending failures.
  Does it say Seagate on it? It's about to fail.
4. Re:Seagate SHOULD be good at that by damn_registrars · 2016-02-17 06:36 · Score: 2
  
  That is more the result of how few manufacturers remain than anything.
  
  --
  Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
5. Re:Seagate SHOULD be good at that by slaker · 2016-02-17 06:41 · Score: 4, Informative
  
  HGST drives are manufactured by a different division, using different processes and different engineering teams. I was told by a WD engineer that HGST stuff is still entirely separate on a manufacturing level.
  Of course, I'm just some guy on the internet, but based on my own experiences with a few hundred 3 and 4TB drives in service, the Hitachi/HGSTs are worth going out of my way to obtain and Seagate 4TB drives don't seem to have the problems the 3TB units did.
  
  --
  -- I wanna decide who lives and who dies - Crow T. Robot, MST3K
6. Re:Seagate SHOULD be good at that by BarbaraHudson · 2016-02-17 13:12 · Score: 1
  
  I said they were made by a subsidiary of Western Digital, not Western Digital themselves.
  
  --
  "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
7. Re:Seagate SHOULD be good at that by Anonymous Coward · 2016-02-17 16:06 · Score: 0
  
  WD really ought to learn how HGST is doing so much better - making WD drives more reliable should be worth billions to the company. I'm guessing that it's more likely that HGST will be eventually dragged down to WD's level.
8. Re:Seagate SHOULD be good at that by dave420 · 2016-02-18 02:43 · Score: 1
  
  ... which isn't "weird" at all, negating your initial point.
9. Re:Seagate SHOULD be good at that by BarbaraHudson · 2016-02-18 03:42 · Score: 1
  
  Not really - when you buy another company for their tech, you usually expand its' use to your other similar product lines. Guess this didn't happen - like when seagate bought maxtor and instead of tearing down and rebuilding the plants, continued running them as is, resulting in high failure rates - the same problem that drove maxtor to be acquired by seagate in the first place.
  
  --
  "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
Not that surprising by Kokuyo · 2016-02-17 06:16 · Score: 1

Around here, Seagate 6TB disks cost 50ish % more than WD Red NAS and Hitachi disks are yet more expensive. So all these graphs are basically in line with the old adage "You get what you pay for".
The comment about Seagate's SMART being more on point seems to make those disks a nice compromise.
Funny enough, considering there is this saying in Switzerland: "Sie geit oder sie geit ned." (where "Sie geit" sounds awfully close to "Seagate") which roughly translates to "It works or it doesn't" and is a stab at the sometimes abysmal failure rates they had back when.
1. Re:Not that surprising by drinkypoo · 2016-02-17 06:50 · Score: 1
  
  Funny enough, considering there is this saying in Switzerland: "Sie geit oder sie geit ned." (where "Sie geit" sounds awfully close to "Seagate") which roughly translates to "It works or it doesn't" and is a stab at the sometimes abysmal failure rates they had back when.
  Here in the USA, especially around the Monterey Bay Area where Seagate was (and still is) located, we just called them "Seizegate" for the tendency of their drives to fail due to stiction.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
2. Re:Not that surprising by itsownreward · 2016-02-17 06:51 · Score: 1
  
  We have a rack-mountable QNAP NAS device that our field support people back up files to when they are rebuilding a workstation. We used 3T Seagates from the compatibility list in it, and I had constant problems; we've replaced them with WD Reds, and the problems have gone away. Now in retrospect, seeing that Seagate drives report SMART events earlier, it makes sense that I had all the problems. The QNAP firmware drops and refuses to reattach any disk to an mdadm array that has SMART errors. Granted, if your data is very important you might want that warning. However, they should still have the data on an existing disk for a while, so I'd rather not be playing musical disks, so if the warnings come late it's all good.
3. Re:Not that surprising by Anonymous Coward · 2016-02-17 08:00 · Score: 0
  
  > "Seizegate" for the tendency of their drives to fail due to stiction.
  Put the drive in the freezer. Out of the nearly a hundred Seagates we had that seized, I think an hour in a freezer fixed all of them long enough to copy the data off.
Sorry WD fans by Solandri · 2016-02-17 06:21 · Score: 5, Interesting

Can't help but feel for all the people who read Blackblaze's previous report and decided Seagate was junk and bought WD instead. I tried to warn them that the model of the drive mattered more than the manufacturer, because each manufacturer tries new technologies and new cost-cutting strategies with each different model. Sometimes it works and the model is reliable. Sometimes it doesn't and the model is unreliable. But everyone was eager to get on the bash Seagate, praise WD bandwagon and ignored me.

Well, WD was least reliable this time around. The Seagate stats in the previous report were probably being skewed by just one or two bad models. It's skewed this time by one bad model, which due to the passage of time means it makes up a tiny portion of their Seagate sample, so doesn't spike Seagate's score like before. (You can pretty much ignore WD in the 4TB graph, as a sample size of just 46 drives means the confidence interval is a 0.3% - 8.8% failure rate.)

At least Blackblaze addressed my criticism from before - they've broken down the stats to individual drive models. And you can see that like I said, there's huge variability in reliability between models within a manufacturer's lineup. Now they just need to add confidence interval to the graphs.
1. Re:Sorry WD fans by Anonymous Coward · 2016-02-17 06:50 · Score: 1
  
  They make the Full data set available so you can run your own stats.
2. Re:Sorry WD fans by 0100010001010011 · 2016-02-17 06:50 · Score: 2
  
  I wish I saw Backblaze's previous report. I have a whole lot of Seagate paperweights. I couldn't do anything but laugh when one of their SNs ended in FML
  In comparison all of the WD Red's that I bought to replace those (and their warrantied replacements) are still going strong. I did everything 'right'. Spread out my purchases, bought from Newegg and Amazon, kept them cool, etc. I think out of the 12 or so 2 & 3TB Seagate drives my current FreeNAS machine still has all of 1 or 2 still running. And one of those just started throwing SMART errors (even though the zpool scrub go through fine).
3. Re:Sorry WD fans by LordKronos · 2016-02-17 07:03 · Score: 1
  
  I don't know if I'd say it was 1 or 2 bad models that plagued seagate. When I buy drives, I go by the ratings on amazon and newegg, and regardless of the drive model it seems there's always a lot more reviews of seagate drives failing than other brands.
4. Re:Sorry WD fans by epine · 2016-02-17 07:42 · Score: 1
  
  Can't help but feel for all the people who read Backblaze's previous report and decided Seagate was junk and bought WD instead.
  Why feel for them? By your own inefficient market hypothesis, every course of action is a crap shoot. The report was great for me, because we actually had one or two of those highly suspect drives in service.
  But in the larger scheme, you're absolutely right. Every vendor has manufactured a few duds. IBM, Hitachi, Seagate, Western Digital. Every company has made some poor models. Not every company has a "click of death" drive to remember them by, but still you take your chances.
  The difference with the 3 GB Seagate is that based on manufacturing conditions, I have a strong feeling Seagate knew about their excessive vice well before they sold them. (Bad Seagate!) In some other cases, the failure rate came as a shock from the field, such as the one I vaguely recall that was later attributed to tin whisker growth sensitive to environmental factors.
  You think this is easy? You try to design one. And no, you're not allowed patch Tuesday, unless Tuesday falls on a palindrome.
5. Re:Sorry WD fans by Gondola · 2016-02-17 09:17 · Score: 1
  
  The problem with this tactic is that manufacturers will change their manufacturing methodology over time. An extremely well-reviewed model can be replaced later in its product life by a worse version that retains the same exact model number. If you go to NewEgg and Amazon and look at hard drive reviews for the best drives, then look at only the more recent reviews, you may see a big drop in the average rating for some models. Bait and switch. So, be careful!
6. Re:Sorry WD fans by craighansen · 2016-02-17 16:11 · Score: 1
  
  The 3TB Seagate (ST3000DM001) wasn't in the main table because it had a 28%/year failure rate and they've all been retired. It's not that they bought a small number of them - they ripped them out - I've been doing the same. The 4TB Seagate's have been about average in reliability.
That one time Seagate didn't send out SMART data by Anonymous Coward · 2016-02-17 06:36 · Score: 0

Does Seagate send out the SMART data before or after the failure? Their crap 3TB drive reported nothing before it crashed.
This is a repeat of 6/23/15 topic . "When will" by FirstOne · 2016-02-17 06:36 · Score: 2, Interesting

""When will your hard drive fail"
I pointed out that Blackblaze chassis configuration improperly stressed the fragile SATA/Power connectors by implementing a vertical disk drive mounting configuration,.
Where the mass of drive(&vibration) is placed upon the fragile SATA data and power connectors.
This type of vertical drive storage/raid cabinet is not conducive for long term/reliable drive lifespan., thus any number of other factors could kick in and cause a premature failure.
1. Re:This is a repeat of 6/23/15 topic . "When will" by Anonymous Coward · 2016-02-17 08:32 · Score: 3, Insightful
  
  Considering they are hitting 5-6 years on a decent population of their drives I think they are doing OK.
Impressive stats for HGST drives. by nbritton · 2016-02-17 06:39 · Score: 1

I'm impressed by the HGST drives, less than 1% failure rate. I haven't touched the Deskstar line of drives since the IBM Deathstar debacle, but I think it's time to take a second look. Hopefully they have not switched over to Western Digital's technology.
1. Re:Impressive stats for HGST drives. by Anonymous Coward · 2016-02-17 06:52 · Score: 0
  
  Read further... that HGST model is no longer available. Too bad for us.
2. Re:Impressive stats for HGST drives. by tlhIngan · 2016-02-17 09:03 · Score: 1
  
  I'm impressed by the HGST drives, less than 1% failure rate. I haven't touched the Deskstar line of drives since the IBM Deathstar debacle, but I think it's time to take a second look. Hopefully they have not switched over to Western Digital's technology.
  Well, HGST drives are still more expensive than Seagate or WD drives of similar capacity.
  Remember a hard drive is a very high precision mechanical device that has traditional economic pressures applied to them - everyone wants more for less dollars. So the high precision equipment gets compromised in the name of lower costs.
  HGST drives cost more, so presumably they didn't try to eeke out every penny of savings out of it and can still use higher quality parts and manufacturing techniques.
  Of course, the best indicator I've found is warranty length - avoid drives that have warranties of under 5 years and you'll probably hit the ones with the lowest failure rates. Manufacturers don't like to do warranty replacements - it costs a lot of time and effort to do, so if they're confident enough to give it a 5 year warranty, chances are it will last all 5 years without a problem. But economic pressures have made cheaper drives with 2-3 year warranties and those generally mean they're not only saving on the warranty, but they compromised the mechanism to achieve lower costs and the cheaper mechanism really is only good for the warranty length.
3. Re:Impressive stats for HGST drives. by Anonymous Coward · 2016-02-17 12:44 · Score: 0
  
  Yeah, if you're buying drives for arrays, then you really need to:
  - Stick with drives that have 5+ year warranties
  - Use drives that support TLER (Time Limited Error Recovery) so that the drive drops out the array quickly when it has issues
  - Use array modes that can support at least 2 drive failures (no RAID-5, no RAID-0)
  That generally means you pay about 1.5x to 2x the $/TB of the cheap consumer drives. But unless you have a really big system with hundreds of drives and failure modes that can tolerate 6+ simultaneous failures, you're playing with fire trying to go cheap.
  Back all that up with fully encrypted removable backup media (USB drive or tape, whatever floats your boat). And maybe a hot or cold spare of the data at a remote site (at least 100 miles away) if you can afford it. Plan on total array meltdown at least once per year (and pray that it never happens).
4. Re:Impressive stats for HGST drives. by Anonymous Coward · 2016-02-22 02:27 · Score: 0
  
  I remember a comment on one of the previous backblaze articles that some of the discontinued HGST drives seem to have resurfaced under the toshiba brand (toshiba bought some assets related to 3.5 inch drive manufacturing from WD as part of the HGST takeover process though the news articles weren't clear on exactly what).
Bad sectors? by nbritton · 2016-02-17 07:00 · Score: 5, Interesting

What is Backblaze doing to check the drives for bad sectors? I manage a 10,000 disk openstack swift installation and I've noticed the auto sector remapping doesn't work correctly, there are a portion of drives (maybe 3%) that have a few bad sectors that need to be manually remapped using ddrescue. I ended up having to write a custom monthly cron job script that ran badblocks to first identify these drives, and then ddrescue to force a sector remap.
1. Re:Bad sectors? by Anonymous Coward · 2016-02-17 07:43 · Score: 0
  
  What is Backblaze doing to check the drives for bad sectors? I manage a 10,000 disk openstack swift installation and I've noticed the auto sector remapping doesn't work correctly, there are a portion of drives (maybe 3%) that have a few bad sectors that need to be manually remapped using ddrescue. I ended up having to write a custom monthly cron job script that ran badblocks to first identify these drives, and then ddrescue to force a sector remap.
  How are you doing this with ddrescue? badblocks with '-n' is supposed to force the remap using the traditional read-and-write-back, but I've also noticed alot of times the sectors never get realloc'd, judging by the SMART attributes. Unfortunately I don't know if this is because the spare block list is full, but they don't get fixed with a complete security wipe and badblock r/w pass either, so it's clearly not rebuilding the p-list correctly.
  Of course this is also using typical SATA drives with typical garbage firmware, no SAS or SCSI drives.
  One other thing you might know, is why badblocks doesn't always pick up on bad sectors? Consider a drive that fails the extended DST/surface scan, finding a bad sector and displaying where it occurred. Using that, I can start badblocks a little before the sector that failed, but badblocks doesn't think the sector is bad, it just rolls right past it without any errors or warnings. I'd love to use badblocks as a replacement for extended DST since it gives me useful information other than pass/fail, but it just doesn't cut the mustard correctly.
2. Re:Bad sectors? by omnichad · 2016-02-17 08:09 · Score: 1
  
  It may be different with 10,000 disks vs 4 disks, but I wouldn't trust a drive once it has one remapped (or pending remap) sector. I'd be worrying about replacing it, not remapping, because it tends to be a sign of impending failure.
3. Re:Bad sectors? by Anonymous Coward · 2016-02-17 08:43 · Score: 0
  
  From my own research and tinkering sectors can become unstable over time, just because that's what they do. When you see trouble is from the surface wearing out from lots of reads, writes, or dust damage. In the latter case, you will see more and more bad blocks spreading from the damaged area, and it tends to show itself fairly quickly if the area is growing. SCSI drives have command sets that allow you to address sectors individually for repairs, as well as give you much better info about spare block counts and the like.
  In my experience remapped drives perform worse on that specific file considering the inherent fragmentation of info across the disc surface, but this should be cleared up once the file is deleted and recopied. However the real trouble comes from areas which are so bad that even having the head move across that portion of the surface degrades it, and then it's definitely time to throw it away. Then again, sectors can sometimes be considered "bad" because of a transient bad write, and it had nothing to do with damaged heads or disk surface. It shouldn't happen, but sometimes does.
  To be good in my eyes, it has to take a secure erase at firmware level, then pass two badblock test patterns. If badblocks doesn't come back with errors, I assume the remapping functionality is working correctly and ignore the old count from SMART. (which should be reset to zero because the bad blocks need to be added to the p-list and the g-list zeroed back out. Bad firmware, bad!)
  Your mileage will definitely vary.
4. Re:Bad sectors? by nbritton · 2016-02-17 09:36 · Score: 2
  
  How are you doing this with ddrescue?
  grep "error.*sector" /var/log/kern.log | awk '{print $(NF-2)$NF}' | sort -u | while IFS=, read device sector; do dd if=/dev/$device of=/dev/null bs=512 count=1 skip=$sector 2>/dev/null || dd_rescue -d -A -m8b -s ${sector}b/dev/$device /dev/$device; done;
  For the badblocks cron job I use this:
  #!/bin/bash
  if [ $EUID -ne 0 ]; then
  echo "you must be root to run this... exiting." exit 1
  fi
  if ! [ -f /sbin/badblocks ]; then
  echo "can't find /sbin/badblocks... exiting."
  exit 1
  fi
  if ! [ -d /var/log/badblocks ]; then
  if ! mkdir /var/log/badblocks; then
  echo "can't create /var/log/badblocks... exiting."
  exit 1
  fi
  fi
  for i in $(ls /dev/disk/by-path/ | grep -v part); do
  nohup ionice -c 3 nice -n 19 badblocks /dev/disk/by-path/${i} > /var/log/badblocks/${i}.log 2>/dev/null &
  done
5. Re:Bad sectors? by nbritton · 2016-02-17 09:46 · Score: 2
  
  It may be different with 10,000 disks vs 4 disks, but I wouldn't trust a drive once it has one remapped (or pending remap) sector. I'd be worrying about replacing it, not remapping, because it tends to be a sign of impending failure.
  Of the drives with sector errors (n = 286) the number of bad sectors typically ranged from 4 to 16, with a median of 8. However, values above 25 bad sectors were statistical outliers, meaning they were more than 3 standard deviations off the normal curve. Our policy now is to replace any drive with more than 25 bad sectors.
6. Re:Bad sectors? by Anonymous Coward · 2016-02-17 12:31 · Score: 0
  
  Cool beans, I really appreciate that. It's amazing what a well written one-liner can do.
  The best script I've written so far is running hdparm read tests and polling for ata errors in the kernel logs to tell me whether a macbook pro hard drive cable is bad or going bad.
  # the useful bits
  hdparm -t $dev #speed should be > 70 MB/s
  hdparm -T $dev #link stress
Re:That one time Seagate didn't send out SMART dat by 0100010001010011 · 2016-02-17 07:04 · Score: 1

ZFS is what saved my ass. No SMART warnings. No other indication of a failure other than my scrub going "Eh, your drives are shit, we took them out of the pool".
All drives fail, sooner or later... plan for it.. by FlyHelicopters · 2016-02-17 07:19 · Score: 2

All things fail, including hard drives. The question isn't "if", it is "when".
Picking between WD or Seagate hoping to get a "good drive" is missing the point, what happens when both drives fail?
Do you have your data backed up?
I run both Crashplan and Backblaze, I also have a copy stored on Amazon Glacier and important files on OneDrive. I also have two external drives that I rotate backups on and keep unplugged.
For most people, what I do is "overkill", but I've lost data before... never again...
Re:All drives fail, sooner or later... plan for it by The+Grim+Reefer · 2016-02-17 07:47 · Score: 1

I run both Crashplan and Backblaze, I also have a copy stored on Amazon Glacier and important files on OneDrive. I also have two external drives that I rotate backups on and keep unplugged.
For most people, what I do is "overkill", but I've lost data before... never again..
I lost data once too when an IBM Deskstar died suddenly and my backups somehow got corrupted too. So I have an external drive that backs up every night, and another that backs up the first Sunday of every month and one that backs up the last Sunday of every month. That last one and one other that I run manually get swapped with ones that are kept off site. I'm still trying to decide on an on line option. So no, I don't think that's overkill at all.
Re:All drives fail, sooner or later... plan for it by Keiran+Halcyon · 2016-02-17 07:52 · Score: 2

I think the point here isn't that there's a drive or manufacturer out there that doesn't fail. The point here is that with such a huge sample range, you can make somewhat useful trends and comparisons between failure rates on a macro scale that no standard user would be able to do themselves. If you look at 56,000 disks and see that Seagate accounts for a larger percentage of drives and lower equivalent failure rate among manufacturers, you can *generally* expect that buying a drive of an equivalent model as compared and evaluated here will have *on average* a better reliability rate than a comparative drive shown to have a worse value in this study. None of this absolves you of responsibility to your data, but it gives you a guideline toward making your data storage medium as reliable as possible.
Re:All drives fail, sooner or later... plan for it by FlyHelicopters · 2016-02-17 07:57 · Score: 1

While those are fair points, and good advice... I still have a concern...
I don't think there is a large enough disclaimer that Backblaze runs their equipment in a 24/7 environment that is quite different than most users. Oh sure, they say it and it is there, but I think it deserves highlighting.
If you look at the percentage failure rates, they are higher across the board than what I've seen. Sure, drives fail, but honestly I have some of those same Seagate drives in a server here and they have been running for years without an issue. They are however, installed in tower cases flat (rather than vertical) and the most I have installed in one tower is 8, each in its own drive bay.
I suspect Backblaze is quite hard on drives and the rates are worse than you'd see outside of that environment. It is also worth noting that those drives are not all installed in the same type of "pod". Backblaze has changed pod designs a few times and now uses an "anti-vibration" system they didn't used to.
Their data is interesting, and I'm glad they offer it. I like how open they try to be, more companies should do that. However, it is just one slant and not the whole picture. I fear that some people will read it and say to themselves, "well I bought a WD, so I guess I don't have to backup". And yes, I've heard such things from real computer users, sadly...
Replace drives after burn-in testing? by h4ck7h3p14n37 · 2016-02-17 08:23 · Score: 1

We actually didn’t retire these 1TB WD drives – they just changed jobs. We now use many of them to “burn-in” Storage Pods once they are done being assembled. The 1TB size means the process runs quickly, but is still thorough. The burn-in process pounds the drives with reads and writes to exercise all the components of the system. In many ways this is much more taxing on the drives then life in an operational Storage Pod. Once the “burn-in” process is complete, the WD 1TB drives are removed and we put 4- or 6TB drives in the pods for the cushy job of storing customer data. On the other hand, the workhorse 1TB WD drives are returned to the shelf where they dutifully await the next “burn-in” session.
I don't understand the point of doing a burn-in with known good drives and then replacing them with new units of unknown reliability. I would think you'd want to do a burn in with the drives you're going to use since disk failures typically happen either early or late in a drive's life?
1. Re:Replace drives after burn-in testing? by Anonymous Coward · 2016-02-17 08:44 · Score: 0
  
  I think that test was a burn-in of the pod, not of the drives.
Drive generation matters and You Are Not Backblaz by Fencepost · 2016-02-17 08:27 · Score: 3, Insightful

One of the significant notes is that it seems the Seagate 4TB drives are doing much better than some earlier versions, and that WD is no longer doing so well.

Another thing that gets brought up every time one of these is released is "Why are they still using Seagate drives if they're so bad?" and the answer is simple: it remains a balancing act between cost and reliability. Backblaze has the redundancy and processes in place to not worry about single-drive failures, so FOR THEIR USAGE the lower drive cost is more important. If you're on a smaller setup where you have everything on just a few drives with inadequate redundancy, a few dollars extra for better reliability is worth the cost.

When you really get down to it Backblaze is looking at cost per gigabyte per day, and if ($LESS_RELIABLE_DRIVE_COST + $DRIVE_REPLACEMENT_COST) is lower than ($MORE_RELIABLE_DRIVE_COST) then they're going with the cheaper option.

--
fencepost
just a little off
So, the only reasons to use Seagate by Chas · 2016-02-17 09:11 · Score: 1

A: They're cheap
B: They scream really loud before they die, hopefully when someone's listening.
C: They're cheap.
I'll stick with Western Digital and HGST.
If they die off that infrequently in their sweatbox environments, the chances that they're going to die under normal desktop use are orders of magnitude less.

--

Chas - The one, the only.
THANK GOD!!!
1. Re:So, the only reasons to use Seagate by Anonymous Coward · 2016-02-17 10:16 · Score: 0
  
  Accurate B is worth its weight in gold. I'd take a drive that even has a 50% chance of dying this year if I could be assured it will let me know it will die early enough I can just copy the data.
  Backup is nice, but never having to use it is even nicer. At home I don't back up my data constantly. Losing a drive may put me behind by a day, or even a week.
2. Re:So, the only reasons to use Seagate by Chas · 2016-02-17 11:53 · Score: 1
  
  You're assuming that, in a standard use-case, that warning's going to come early enough.
  You're also assuming that the data extraction won't kill it either.
  In the long run, you're better off with a more reliable drive and a reliable primary backup device.
  
  --
  
  Chas - The one, the only.
  THANK GOD!!!
3. Re:So, the only reasons to use Seagate by Anonymous Coward · 2016-02-17 12:05 · Score: 0
  
  I've never had an HDD fail catastrophically, until last week when my laptop's Seagate went almost completely unreadable.
  I'll keep using WD, since the last laptop one ran fine for a year (no important data) after it developed unrelocatable bad sectors.
  Also for some reason 3.5'' drives are much more reliable (even when 2.5'' are used on desktop replacements with few vibrations and good airflow).
Re:That one time Seagate didn't send out SMART dat by Gondola · 2016-02-17 09:19 · Score: 1

I want to switch to ZFS, but I'm not sure how ZFS handles failure on the boot drive, and my Google searches weren't very successful in answering the question either.
Re:That one time Seagate didn't send out SMART dat by 0100010001010011 · 2016-02-17 09:39 · Score: 1

I haven't had any issues. I've randomly pulled a boot drive and ZFS doesn't complain. I use FreeBSD with ZFS on root.
Re:All drives fail, sooner or later... plan for it by Anonymous Coward · 2016-02-17 09:42 · Score: 0

Sounds like a very secure pr0n collection.
Re:That one time Seagate didn't send out SMART dat by Zargg · 2016-02-17 09:54 · Score: 1

Should work just as you would expect, from my experience with FreeNAS. If there is a boot drive with errors, it will use the good copy or whatever parity it has to boot up and inform you of a bad boot disk. If there are no more good copies then it can't boot. You can do all the normal scrubs on them to catch drives going bad.
Re:All drives fail, sooner or later... plan for it by Anonymous Coward · 2016-02-17 10:34 · Score: 0

The response from guys that claim to work there is that the duty is fairly light, and that they choose drives by focusing on cost before reliability.
Re:All drives fail, sooner or later... plan for it by mattventura · 2016-02-17 11:02 · Score: 1

The data might be from more rigorous conditions, but that doesn't make it useless. If a drive model exhibits a low failure rate even under supposedly awful conditions, then that reflects even better on the drive. If anything, I'd be more concerned about ways in which their environment is better than a typical consumer environment, such as how a forced-airflow server in a temperature-controlled datacenter is probably going to keep the drives at a better (or at least more consistent) temperature than some random dust-clogged PC with one wimpy fan.
TL;DR by Cramer · 2016-02-17 12:55 · Score: 1

HGST makes the most reliable stuff, but the models they were using are no longer available, and they're expensive.
Seagate is in a dead tie for Worst Shit In The Universe. (esp. when you use the "DM" series DESKTOP drives) They use them because they're dirt cheap, and falling off every truck in NY. Plus, they'll tell you when they're about to die. (i.e. shortly after first power-on. :-))
The short of it is: when you buy 10,000 drives a year, you care more about price and availability than reliability.
Consider the conditions - YMMV by dbIII · 2016-02-17 13:25 · Score: 2

Consider the conditions - this is selecting for the environment of a lot of drives packed into poorly ventilated cases so those that cope best with heat will win.
While heat over time is a common cause of drive failure there are others, so the results are not so useful for drives in desktop cases or in well ventilated servers (eg. ones with hot-swap bays so there is no way to pack the drives in as densely as Backblaze do).
Re:All drives fail, sooner or later... plan for it by lgw · 2016-02-17 13:49 · Score: 1

I lost data once too when an IBM Deskstar died suddenly and my backups somehow got corrupted too.

You don't have a backup until you've tested the restore. The nice thing about simply copying all files to an external drive (with nothing clever going on, just a file tree copy) is that the "restore" is just using the new drive. But that approach doesn't really scale past home/home office use.
I wish there was a better selection of tape backup software in the world: LTO-7 finally shipped, and a 6 TB (uncompressed) tape is nice.

--
Socialism: a lie told by totalitarians and believed by fools.
Re:That one time Seagate didn't send out SMART dat by craighansen · 2016-02-17 16:14 · Score: 1

ZFS on Ubuntu is problematic because it doesn't properly rebuild the kernel modules when the kernel is upgraded.
Re:All drives fail, sooner or later... plan for it by KGIII · 2016-02-17 17:55 · Score: 1

And to add to this -- not just tested the restore BUT actually have a plan for recovery, that goes beyond just the testing.
How are you going to retrieve data from a remote location?
What processes will you use to mitigate an attack during recovery?
What known-clean can you put online to retrieve patches in the case of malware/compromise?
Disparate networks to ensure clean recovery?
Things like that. It needn't be written down, but it should be planned out. If it's a business, it should be written down and a policy. At home, you can be more relaxed and not have it set in stone. I keep certain recovery options at the OS level, store almost no data locally, and often don't even use an installed OS but that's just because I like to play. (When you've got this much RAM, you can do that.) I also like images from VMs. I keep things located all over the place - including multiple on-site locations and disparate physical locations.
Like you folks mentioned... I lost data once. It was extremely flaky and absolutely absurd and infuriating. I'm still not entirely sure how it happened but a very, very close lightning strike hit and all magnetic media was gone. Not even the MBR remained. Drives not powered on were gone. Some would not work, even after reformat. I've no idea the how or why (I suspect EMP) but it was infuriating. I had *some* at a different location and am very fortunate that it was mostly my personal data and wasn't at my office. The following Monday, however, some serious discussions were had and we had a whole new backup plan and all the rest within a week. We did some testing and I'd say that we were fully set within six months but we were already pushing data out (and it was a lot of data) as well as buying more tapes and shipping them to a nearby storage unit before we moved further out with it.
Never again.

--
"So long and thanks for all the fish."
Core OS by MrL0G1C · 2016-02-17 19:56 · Score: 1

SMART monitoring is where modern OSes utterly fail, it should be a core part of OS functionality, the OS should warn you when a SMART stat goes bad but MS et al would rather put some stupid shopping experience into the OS instead.

--
Waterfox - a Firefox fork with legacy extension support, security updates and better privacy by default.
Re:Drive generation matters and You Are Not Backbl by AmiMoJo · 2016-02-17 22:06 · Score: 1

For home use it's worth paying a little more of a Hitachi (HGST) drive. They are owned by WD, but use different tech, different factories etc. You pay more but get better reliability.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
That link by dbIII · 2016-02-19 14:21 · Score: 1

That link is the average of all the drives, including the ones that did not fail, over a long time period and has been discussed elsewhere in this thread.
An average sadly doesn't tell us as much as you appear to think it does, especially about failed drives since the data is diluted by the ones that did not fail and since the drives are idle for so much of the time. Maximum temperatures of the ones that failed would support your argument but that is not what you are using.
HGST FTW! by Peedy · 2016-02-19 18:33 · Score: 1

First off, based on this I would be buying HGST drives exclusively. 1.8% or less failure rate? Yes please! I've been buying HGST drives (mostly Ultrastar's) for a couple of years now. They are super fast and reliable in my home NAS. My second choice would be WD RE drives. In bulk the price difference between HGST and Seagate cannot be THAT much... I would think the additional reliability would give you a better ROI instead of keeping replacing cheaper drives.
Re:All drives fail, sooner or later... plan for it by drsmithy · 2016-02-21 02:35 · Score: 1

I suspect Backblaze is quite hard on drives and the rates are worse than you'd see outside of that environment. It is also worth noting that those drives are not all installed in the same type of "pod". Backblaze has changed pod designs a few times and now uses an "anti-vibration" system they didn't used to.
Your typical home desktop/server drive is likely to see a far harsher life than your average Backblaze drive.
I should never have taken your words at face value by dbIII · 2016-02-21 18:48 · Score: 1

Instead of jumping on threads to play some wank of a mass debate game where you attempt to convince people of things contrary to reality why don't you do something useful, or at least less annoying? You are in slimy confidence trickster preying on the weak territory and nowhere near the "Devil's Advocate" you are probably telling yourself.
You had me going for a while and I really did thing you were as dim as your posts suggest but the bit about working drives not getting hot was a clue that you do not believe your own words your self and are just playing an argument game at my expense.
It's not funny.
I know to watch out for you next time - it's pieces of shit like you playing mind games that make this site far less enjoyable than it used to be.