Hard Drive Reliability Study Flawed?
storagedude writes "A recent study of hard drive reliability by Backblaze was deeply flawed, according to Henry Newman, a longtime HPC storage consultant. Writing in Enterprise Storage Forum, Newman notes that the tested Seagate drives that had a high failure rate were either very old or had known issues. The study also failed to address manufacturer's specifications, drive burn-in and data reliability, among other issues. 'The oldest drive in the list is the Seagate Barracuda 1.5 TB drive from 2006. A drive that is almost 8 years old! Since it is well known in study after study that disk drives last about 5 years and no other drive is that old, I find it pretty disingenuous to leave out that information. Add to this that the Seagate 1.5 TB has a well-known problem that Seagate publicly admitted to, it is no surprise that these old drives are failing.'"
Thank you for posting this. I've always used Seagate which have had a high success rate in terms of reliability and quietness (unlike some other makers that I won't mention).
Slashdot, fix the reply notifications... You won't get away with it...
Someones working overtime to make seagate look good.
But the pile of dead seagates at work says otherwise.
Do we ALL needs to put out a paper showing all the different drives we use and how seagate keeps topping the failures list?
"Insert anecdotal evidence from some desktop user that says seagate is the BEST EVER!"
Nope. Wish it all you want. But seagate has long been known to be a high failure cheap drive. Buy them at your own risk.
My personal fav currently is the hitachi deskstar lines. They really cleaned up their mess when the 'deathstar' problems bit them in the ass.
And that's even assuming you want a HD anymore. SSD is getting cheaper and faster every day.
No matter what you buy tho. Keep a backup.
Is he saying that 1.5TB drives are all 5 years old? If you look at the table in TFA, it talks about "release date" -- which may well be some time ago, but I'm sure 1.5TB drives may had new, even if the design hasn't changed in a while.
What changed under Obama? Nothing Good
There was a column for 'drive age'
I have an 80 GB IDE hard drive in my old desktop machine that's still alive and kicking from - I'm not even sure how old it is. At least 10 years, I'd say. I use it for temporary storage.
I've either personally owned or purchased for companies I've worked for dozens of hard drives of all (except ESDI) technologies including MFM/RLL, IDE (parallel and serial), SCSI (original, wide, ultra wide, etc.) of form factors from full height 5.25 inch to 2.5 inch, dating back to 1991, and in my experience most hard drives last until you throw them away after 10 or 15 years because they're too small.
A few hard drives die in the first 6 months, and maybe 5-10% die in 3-5 years. Saying that disk drives last about 5 years just doesn't agree with my experience at all. Hard drives essentially last an infinite amount of time, defined by me as until they're so small that their storage can be replaced for under a dollar.
I do agree with the author's other points. Certain lines of hard drives have more like a 100% failure rate after 5 years. One 250 GB hard drive I purchased was RMA replaced with a 300 GB model because the 250 GB line was essentially faulty.
I think these studies might be looking at 7200 or 10000 RPM SCSI units under extremely high use. That's not how consumers use hard drives.
Who cares about known issues. If I buy a hard drive and two years from now it has a 'known issue', then I would much rather not buy it in the first place.
BS. I have had at least 2/3 of my newer seagates fail. From 500 gigs to 2TB drives. At LEAST 10 in the last 3 years. In the same time I have had 1 of 6 hitachi and 2 of 18 western digital. I will NEVER buy another seagate drive. Just lost my external 1.5TB USB3 drives go last week with 0 warning and TON of my data. I hate seagate with a passion that I feel for no other.
My understanding based upon reading the originally posted materials was that they published their reliability findings based upon their own experience. I did not see anywhere that they claimed that it was comprehensive research into the reliability of hard drives. We should not crap upon backblaze because people could not be bothered to read the articles and made some faulty assumptions based upon the headlines, to do so would just serve to dissuade others from releasing their experiences. As for the argument about some of the hardware having known faults... If a company does not want bad press they should do more quality control before releasing crappy hardware...
"I myself am made entirely of flaws, stitched together with good intentions."
Sorry, you're full of shit Henry Newman. How many people follow specifications about burn-in on a drive when they buy it wholesale OEM and it comes in nothing more than a plastic bag? How many people only buy drives released recently? If you're like most people and you want a 1.5TB drive you go out and buy the cheapest one that meets your needs. If Seagate still has 8 yr old drives on the market, then it's damned right that their failure rate should be considered. And so what if a drive "has a well-known problem that Seagate publicly admitted to"? As long as Seagate publicly admits all the issues with every drive they release we should then adjust stats to eliminate those flaws? That's ridiculous. This study was about "If you go out and buy a drive off the market, this is the rate you can expect it to fail at." I don't think any consumer that got a Seagate drive, had it fail and lose all their data, would then say "Oh! Well they publicly admitted to a problem! Shit! My bad!"
Sounds like Mr Newman is going to get a nice paycheck soon.
This article states everything anyone competent already knew. Consumer drives come rated for a lighter workload than enterprise.
Duh? That's the point - it's a cost:reliability tradeoff. With "enterprise" drives being 1.5x+ the expense, for uses like Backblaze where you can survive multiple disk failures with ease it's a no-brainer.
I also got "burned" by these Seagate 1.5TB disks. By *far* the worst drives we have in production (~300 or so these days), and they have had an annual failure rate around 20% since the day they were put into service. Other consumer drives don't even come close to that metric, but are rated similarly.
I actually like Seagate - every disk manufacturer has problematic models from time to time. No big deal, we knew the risks when we bought them. However, the data Backblaze published is completely validated by our own internal data. It's a drive model to avoid when at all possible. Most of our disks have a less than 5% annual failure rate, but this specific model is close to, or over, 20%. That's a major difference.
This article just states the obvious. Consumer drives generally fail earlier under heavy loads. This is not interesting, it's a known tradeoff anyone with a high school degree can figure out for themselves by looking at cycle ratings and MTBF. The only thing I care about for this workload, is if my failure rate exceeds the savings I get from utilizing the lesser drives. The answer has thus far (even with 20% of drives failing each year) been a resounding yes.
There is a difference between consumer drives, data like this is *great* to have published as it can add to your own data and you can compare notes. Will I make a buying decision based off it? Probably not. But it will certainly be one data point of many when it comes time to buy more disk. Known issue? I don't care. All I care about is if the drive works or not, and this particular Seagate model does not. The author of this article completely glances over the fact Seagate admitted to the issue, but did absolutely *nothing* to make it right for their customers essentially blaming them. This fact is what bothers me the most, not the fact they had a problematic drive model - and will likely be the largest factor when it comes to my evaluating Seagate products in the future.
Install that drive in a server in an online backup company and see how long it lasts.
I have a 120MB Conner CP30101G - wonder if it still works?
I haven't spun it up in ages, doubt modern hardware knows how to talk to it.
I do have a 20GB laptop drive that does work as a really slow backup device.
If the users at large are not aware of such defects, and Seagate has not proactively sought to inform users AND replace the defective drives at the company's cost, then including such defects in a study is perfectly legitimate, only with the additional takeaway that you should factor likely RMAs into the cost of Seagate products.
(I'd also add that Seagate's warranty does not cover advance replacements, which can only be had with an additional, non-refundable service fee.)
that the garbage company he loves produces garbage. Well, that's why they're know as a garbage company. They employee idiots, rip them off on pay, and steal from supplies by not paying their bills. Of course their products are going to be complete garbage, and this moron is a piece of garbage for trying to defend this garbage company.
none of his commentary really matters because it clearly shows the failure rates of short-aged drives....
Just because Seagate admitted that they had a flaw in a drive does not automatically negate the unreliability of those drives in the first place. It is, however, a good move to try to appease the people that are angry that they just lost data or ended up with a defective drive and to try to deflect some of the negativity. Which hopefully will try to prevent further erosion of their brand's reliability.
If you have Company A make a product that doesn't fail and Company B that makes a product that also doesn't fail except when there was a manufacturing issue. Company A is still more reliable then company B. In this situation it could be that company A had better quality control then company B. They both came across the same flaw, but company A caught it fixed it before selling the product. In this situation, I would trust company A's goods over company B's goods in the future.
People seem to forget that Seagate denied the issue for almost a year.
I remember.
I was a seagate buyer, before they lied. It was my preferred vendor. We had a number of drives in disk arrays, but when it was time to swap them out, I avoided Seagate as replacements. Never had any data loss due to Seagate drives, but the company was a client of the software my team wrote for enterprise customers, so I did get a view on the edges of the company. Something changed.
Last year, those drives were 6 yrs old and had never gave us any issues, but old drives can't be trusted. The new drives were Hitachi - because I can read reliability reports. I'm still using the old Seagate for unimportant things from time to time. Mainly transporting large amounts of data. No issues and if there are any at this point, the old drives have exceeded expectations.
However, I don't plan to buy another Seagate drive again. They lied! Didn't step up and tell the truth. That is a management issue, not technical, and I remember it. It was a management failure. I will always remember it and were I work (BTW, I'm a CIO) - we will never buy Seagate drives again, if there is a choice.
Life and work is too important to deal with liars.
I have a 40 MB IDE Conner... a few years back I connected it and it still worked...
But this is possibly the worst slashot article in awhile. The study specifically mentioned drive models and ages. In any case i have massive stacks of bad seagate drivea from my datacenter and a much lower number of dead western digital drives. Those are all Enterprise modela, not consumer.
'The oldest drive in the list is the Seagate Barracuda 1.5 TB drive from 2006. A drive that is almost 8 years old!
I recently had a 1.5TB drive die and it was still new enough to be under warranty. Seagate shipped a 2TB as a replacement.
Pffft... In all fairless I have a Seagate 7200.7 that has 72180 power_on_hours. That is 11 years of power-on time. There is not a single fault on it and I still use that drive daily.
The fact is, the model matters. All manufacturers produce duds. Some drives are better than others, even from the same manufacturer.
I like it that Newman complains of "lack of intellectual rigor" then mentions a "known problem" as an excuse for eliminating from the study the 1.5TB Seagate drives. Except that according to Seagate, the known problem with this series "does not result in data loss nor does it impact the reliability of the drive". Additionally, the "known problem" was for firmware versions SD15, SD17 & SD18. Did Mr. Newman have the "intellectual rigor" to check if the tested drives were having one of the affected firmware versions?
We have well over 150 3TB Seagate Barracuda's at work that are halfway into their second year of operation. The first year has been pretty flawless, maybe 1 failure, but the second year, we've had about 15 already get peppered with bad sectors and its continuing to happen at least once per week or so on more drives. This hands a lot of crediblity to Backblazes findings if you ask me. Again these are modern 3TB Barracuda's, (non-XT) I was sad when they discontued the XT line, simply because we have about 30 of the XT 2TB models into their 3rd year and no failures yet. Oh, right, they didn't discontinue the XT at all, rather turned it into the Constellation series and sold it for double the price!
I still use some of my hard drives that are older than 5yrs old. My seagate external hard drive, while it did go on the fritz this one time after plugging it into linux, it is still alive and well after 8 years of service, and I use it daily. I recently got a new internal drive which my old one was 500gb and was over 5yrs old, my new one is only 1tb but it's not a laptop hard drive which was an accidental buy when i had originally bought it. Yes, I used a crummy laptop hard drive on my PC for over 5years for gaming, development, compiling, and rendering and it still functions perfectly and it's not seagate either but samsung I think. I have some hard drives which I occasionally go back to for old memories which are often less than 1gb in size and they still work great, although they are slow as molasses and make strange sounds they work. At an old work of mine, my boss bought a seagate hard drive, put all his important crap on there and then it died not too long after and he lost hundreds of hours of work plus a lot of other important things that cost a lot of money. I think it's more about luck with what they can dish out than anything no matter what hard drive you get. Sure, some hard drives may be more prone to breaking like Hitachi but I think that overall it's based on how lucky you are. If your hard drive does crash be prepared to cry or fork over thousands for a drive recovery.
The only thing I found flawed in the study was how many seagate drives actually made it through the warranty period.
My personal experience shows a failure rate of seagate drives at around 300-400%(pool of 20-30 drives). What I am saying is that not only did the original drives fail, but the "refurbished" replacements failed as well, numerous times. Not a single drive got through warranty without the nice green border. The amount I spent on advanced replacements could have bought me quite a few new drivers from another vendor.
I no longer buy seagate drives. I do not have any abnormal failure rates on the other brands I use.
The results of this test, accurate or not, do not replace actual experience with these drives. I have had nothing but problems with Seagate drives. The last two I purchased have had lots of problems. Between those two drives, I have had five failures. Four of them were RMA'd. I didn't bother with the fifth because, to be quite frank, even when repaired I simply cannot bring myself to trust it by any stretch of the imagination.
Meanwhile, I put WD drives in place of the Seagates and have had zero failures.
http://en.wikipedia.org/wiki/H...
http://www.forensicswiki.org/w...
http://hddguru.com/software/20...
http://hddguru.com/software/20...
http://hddguru.com/software/20...
http://www.itsecure.at/hparemo...
http://www.sleuthkit.org/infor...
No.
Henry Newman's response, however, is deeply flawed.
1) Newman complains that average drive age is a "useless statistic." But he seems to prefer "time since product release" which is far worse than useless -- it is an obviously incorrect way to estimate the age of a drive population and is directly contradicted by the average age data reported in the blog post.
2) Newman has questions about Backblaze's burn in. He can find answers by googling "Backblaze burn in" to learn more about the company's remarkably transparent operations. Beach does not go into these details because an effective blog post will focus on its key conclusions rather than discussing every detail of methodology. It is not a research paper.
3) Newman digresses into hard error rate which is unrelated to drive failures. I look forward to a future Backblaze blog post about error rates. In any case since all these drives are consumer drives and all but one have the same specified error rate it is a non-sequiter.
4) Newman points out that Backblaze probably vastly exceeds manufacturer specs for drive throughput. I think this is exactly the point. Is there really enough difference in reliability between commodity and enterprise drives to justify their price difference? Or is it just a form of price discrimination? Does the spec sheet reflect reality or is it a marketing-driven fiction?
Overall this article strikes me as being written by an industry flack: someone who is more interested in parroting jargon and received wisdom rather than indulging in genuine curiosity.
Does anyone have a magneto-optical disk drive? I've got some old files I'd like to retrieve.
Install that drive in a server in an online backup company and see how long it lasts.
Probably longer, since once the drive is filled with data, it basically just sits there spinning. Sure, there might be a patrol read of the disk every month or so, but no real work.
I expect that almost all my drives in server environments would be running fine at 8-10 years, but most get replaced after about 6 simply because bigger drives are so much cheaper at that point.
My experience has been pretty different. The only drive to ever fail pretty much beyond recovery for me, was a Western Digital drive (that was about ten years ago).
On the other hand, I've bought a lot of Seagate drives over the years and they have held up really well - only having data issues when a computer crashed at some particularly bad point. They've also generally performed really well.
Hitachi has been OK for me also, but they don't seem to be very performant.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
FWIW (not much), I've bought forty or fifty 2.5" and 3.5" drives a year for the last nine years, mostly for resale in my computer repair business; lately, I pick them up at our local Tiger Direct retail store or order them from Amazon. I have the fewest problems with Seagate drives.
Almost every time I buy another brand, the damn thing takes a crap and I get to do the job again for free. (Thank FSM Maxtor went away: they were the WORST).
" Since it is well known in study after study that disk drives last about 5 years and no other drive is that old,"
Shit son, I still have 20MB HDDs half the size of a full ATX tower and they work flawlessly. You fuckers can't seem to pick reliable hardware, can you?
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Out of the four harddrive failures I have had in the last ten years (I often replace smaller drives with bigger ones before they fail), 3 of them were Seagate drives and one was a hitachi. I will never by Seagate again. Meanwhile my other Hitachi drives and Western Digital drives still spin on.
In all fairness comments like "is well known in study after study that hard disks only last about 5 years" are unhelpful. I have designed our environment to extract value from hard disks until failure. Sometimes but not often for other reasons (generally cost of power and cooling) we retire gear before it reaches that point. I have a large (for my site) amount of drives from single batches approaching 7 years old and in that timeframe we've lost to complete failure perhaps 2 drives from a single batch. The original Backblaze author had a point to convey and presented some data. I took what was useful to my situation from that data without being terribly concerned about the authors point. If you don't realise that in 6 years the technology changes such that you can't really use past performance as a measure when selecting future equipment then you are in my opinion viewing the world with an optimism you can't really afford when you are responsible for critical data. Some people would equally call me an idiot for using equipment that is well beyond its warranty date in a production environment. To this I say: if you haven't planned for total data loss on any device at any time then its you who are the idiot. So why not repurpose all your equipment (storage in particular) whenever you need to and create hierarchies of reliability from live production data down to hourly change snapshots?
If you wanted to keep your version-cotrolled data you should have backed it up. Time Machine is a backup, but also a version control repository which itself must be backed up... it can fail for other reasons too.
Not to mention you should ALWAYS have at least two duplicates of data, so backing up your backup is just a good idea anyway.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I can't tell, the data was on my hard-drive, which just crashed.
Table-ized A.I.
the 7200.11 and 7200.12 drives (some 500gb,s 1tb and 1.5tb) had firmware issues. Seagate Expansion hard drives (which used a SATA to USB adapter) had flawed AC to DC adapters causing the drive to lost power and click when accessed. I have had one of these 500GB 7200.12 drives just lose the firmware and brick itself. I have been seeing a LOT of desktop pc's with the 7200.11 and 7200.12 get bad sectors, some even click. The drives i have had the most luck with is WD. I have a 1.5TB Barracuda LP (5900 rpm) that has been through 2 pc's that have fell over while running, shipped 2 times (sold then i bought it back), I have also had the drive in a few servers as well as regular desktop PC's and let me tell you the drive has yet to get even 1 bad sector. SMART health showing 100% and it has been through HELL!!!!!! I guess with that drive i got lucky. However i had a 2TB green barracuda that was in a PC that fell over and it got 900 bad sectors. After recovering data off of it I took the cover off and swiped a magnet over it and then killed it. I guess i'm lucky?
I am kind of mystified at all of these bad Seagate results, as I've had a number of Seagate drives without problems.
However there is a factor that I wonder makes a difference - generally I've been buying drives that were not the cheapest, but were more on the upper end of the model line - for example my latest drive purchases have been mostly 4TB drives. I'm wondering if buying early runs of newer model HD's brings you a greater success rate.
Also it seems like some models are better than others and perhaps I've just lucked into buying the more durable runs of HD models from Seagate.
I did buy a 4TB Hitachi. It may be reliable. but the performance seems rather poor - using a USB 3.0 dock, from a disk speed test app I was getting 85MB/s read 55.1MB/s (!) write on the 4TB Hitachi, while I was getting 95 MB/s read and 10.8 MB/s write on the Seagate 4TB drive. It could be the Hitachi would last longer, but would I care if I have to live with much worse performance? I'd rather just spend a bit more effort to make sure backups are rigorous and use the faster drive.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I have an 80gb ide deskstar that runs as primary storage for my DNS, key, and SSH jump box for my home network.
Theolder a drive gets, you need to put it into higher positions of authority and privilege due to its years of experience. It inverts the failure rate (which is mainly from burnout and boredom of routine).
Can't give exact numbers as I only worked in an IT role where I was dealing with largish storage for a couple years (2006-2008) ~200TB spinning disk on ~400 disks in a dozen or so raid arrays. Anyways: failures seemed fairly clustered. We'd lose a drive in an array get the replacement then a month later the same chassis would lose another drive. It might have been power supply stressed the drives, it might have been for whatever reason those disks where getting hit harder over time than other arrays, might have been the load of doing the rebuild or just that they were in the same stripe set so getting similar load, similar/same batch of drives since they came together. How knows? Anyways, server load might have a longer MTA but intrachassis failure rates seem to be from my (albeit limited) experience highly correlated.
This guy is saying the previous hard drive failure study was flawed because the drives they used had flaws. Isn't that well, the purpose of a study. If a drive had no flaws, it would not fail.
Any test of a specific type or model of drive will probably yield a very specific failure profile. That prfile may not be the same as another type of drive or one with different components.
There is a shocker..
A while ago I bought three of their HDDs...and somehow within a month seven of them failed. Not only that, a friend of mine tried to top off a rack of Seagate hybrid SSDs with unleaded and the whole server just burst into flames on the spot.
If Seagate comes in last, the study is fake. That's my golden rule. If Seagate comes in last by like 10x, the author is actually criminally crooked or technically inept.
IIRC Backblaze's workload is write once read maybe once (I mean, they are a backup company). So it's quite likely that they are massively under the specs for throughput.
The truly interesting thing about this study is that they name names; previous work in the area (lke Bianca Schroeder's FAST 07 paper, http://www.cs.cmu.edu/~bianca/... or Google's FAST 07 paper, http://research.google.com/arc..., or NetApp's FAST 08 paper http://www.usenix.org/event/fa...) doesn't give away vendor names. The Backblaze results broadly agree with the previous results.
granted, its a notebook drive (old ide style, though) but its been in my car for about 10 yrs now and still has not shown any errors, music plays and does not seem glitchy and yet its in the trunk of my car being bounced around during the daily commute every day for nearly 10 years.
drives last only 5 years? really? who said that? that's not at all my experience with home drives or notebook drives. if the drive is not bad by design, I've gotton 10 yrs continuous use from most of mine. 5 yrs seems very conservative to me.
--
"It is now safe to switch off your computer."
I'm not quite clear what general conclusion the author was going for here, but I take it that one thing he wants to convey is that it would be irrational for a person to consult this data when making purchasing decisions for desktop drives. I don't think he's quite made the point. For one, how does the fact that Seagate admitted to their being a problem with one of their drives make the failure rate of that drive irrelevant? If a car company made a model that tended to fall apart or malfunction due to a systemic problem with one of its systems, this seems like a relevant (though hardly conclusive) reason to think twice about buying a car from that manufacturer. Second, just because the ST31500341AS was first released in 2006, it doesn't follow that the drives Backblaze has are that old. Their average age is only 3.8 years. As you can see from the data (from the chart in the actual Backblaze post, not the one produced in this article), WD greens with an average of 4.4 years fared much better (though the sample is smaller). In fact, they fared better than every other Seagate drive in that table, even the younger ones. I take the point about the amount of IO, but wouldn't it be kind of surprising if these storage pods didn't evenly distribute data? Not doing so would be rather silly. While the specifications say that these Seagates shouldn't be used in high vibration environments, the same is true of the WD drives that performed better. That they handle vibration better seems to me a good indication of long-term reliability in less harsh environments, given that heat and vibration are the killers even there, despite thie not being as extreme. Again, if a car from one manufacturer fares better in tests that push it beyond its intended limits than those from another manufacturer, this seems like relevant information for a consumer looking to buy a car and keep it long-term. I think the author's washing machine analogy speaks to this point. If a laundromat published this kind of data on the failure rate of consumer-grade laundry machines in its laundromats, it seems to me that the fact that one did better than another is a good (though perhaps not decisive) reason for thinking that one of the kind that fared better will last longer in my home than the other. Given the lack of other data out there on failure rates, this is the best information we consumers have, though more information of the type the author is asking for would help to make our future purchasing decisions more informed. What the author needs to show that this limited data is so limited that it is as good as no data, but I don't see that he's done that.
Well there it is.
Burning in a drive is basically when you connect it up, and run a program to exercise the drive for a set period to make it fail. The idea is that it's better that a drive dies during the burn in process than when in use and theres actual data stored on it. Its a great idea when you want to keep your services availability figures up but won't make the drives themselves any more reliable.
It will however skew the numbers so that drives die much quicker, and will probably have people saying it's now not fair because the drives were pushed to fail.
C'mon Slashdot! I was expecting to see at least five of those kinds of posts.
We found that some combinations of SATA backplanes + lots of vibrating drives would actually cause the backplanes to short out and screw up the drives. We got sturdier chasses and the problem went away
A drive that is almost 8 years old
A sentence that is almost complete!
If this article refers to the previous article where the low priced SAN vendor used the lowest priced drives possible, I am not surprised. They touted their low cost, and tried to mask it by stating that "These are the same drives that regular old Joe's buy."
Well surprise! There are bad batches out there. I remember WAY back in the day, the Maxtor 540MB drives (yes kids, that is megabytes, as in about one half of a gigabyte). They had a ridiculously high failure rate. HP was putting them in their Vectra's. It took the better part of two years to clear those out of the channel. These things happen from time to time.
Are we at all surprised that the company that purchases the cheapest drives possible, ended up with a bad batch? I am sure that whoever had those sitting a warehouse was more than happy to sell the entire lot to the SAN vendor.
What I am curious about is how the failure rates affected their bottom line. I wonder what kind of failure rates they predicted. Anyone who has handled a storage array larger than their home computer knows that drives fail, and do so fairly regularly. The difference between a good vendor and a bad vendor is how quickly they can ship spares. In my line of work, if we do not have spares on the shelves, we expect the vendor to deliver them in four hours or less. Given that, failed drives can get costly for a vendor in a hurry if they are storing, processing service tickets, shipping, and in some cases, dispatching techs along with failed drives to replace them. The failure of a $30-50 commodity drive can easily cost the vendor many multiples of that in overhead associated with the replacement and return processes.
In addition to the numerous errors that others have pointed out, he's completely wrong for mocking Backblaze for not reading the specifications.
That's the entire point of all this testing: to determine what consumer-grade hard drives actually do as opposed to what the manufacturer guarantees.
They may do better, or they may do worse. If Backblaze depended on guaranteed specifications, they'd sometimes buy turkeys, but they'd also be paying a lot more for hgh-reliability drives that (as their testing has shown) aren't necessarily more reliable at all.
Complaining that Backblaze included 1.5 TB Seagate drives with a "known flaw" is specious. The flaw is not one that Backblaze would regard as a drive failure. It just caused long delays ("hiccups") in drive responses, but with no loss of data. (And can be fixed with a firmware upgrade, which I don't know if Backblaze applies.)
And claiming that a drive first marketed in 2006 is 8 years old... Backblaze first opened for business (with a 500-customer beta) in mid-2008. Does it seem likely that a startup purchased large volumes of drives in 2006?
My 11-year-old primary laptop drive has ~65,000 hours of power-on time & no errors, according to SMART tools (smartctl) -- it's a Hitachi Travelstar 80GN, model IC25N060ATMR04-0. Definitely the longest-lasting drive I've owned, and certainly backs the claims that Hitachi is the most reliable.
The O.P. says it's well-known that hard drives last about 5 years.... As somebody who's been in the computer industry since the late 70's I have to say that I've never observed any truth in this statement. I have hard drives in my facility that are still perfectly reliable even though they are 25 years old and while I have purchased hundreds of drives, only two have ever failed (one had a head crash, a mechanical failure, when somebody in the lab bumped the machine while the drive was very active and the other simply stopped responding to the host machine - probably a semiconductor failure). I LOVE hard drives for volatile data - they have excellent data retension, well-understood envronmental requirements, and NO limit on read/write cycles (unlike flash memory, which dies a little more with each write cycle and can be destroyed by code that hammers it with writes ... ever notice the gradual shrinkage of capacity in a flash drive as failing sectors get mapped-out???)
As I posted before, this study had included non-enterprise drives which any thoughtful enterprise data preservation expert would not have ever used for enterprise data storage.
Kriston
Install that drive in a server in an online backup company and see how long it lasts.
My experience with SAN's and SAN operators is that they are are far too over-sensitive when it comes to detecting drive failures, if the SAN even thinks the drive might possibly entertain the idea of doing anything slightly like failing in the next 30 years it'll say the drive has failed.
This is not necessarily a bad thing in an enterprise environment dependent on your SAN, especially if you've got a support contract where EMC/NetApp et al. send you replacement disks for free.
Calling someone a "hater" only means you can not rationally rebut their argument.
The whole point is to challange that idea.
The study was horribly flawed, to the point of being statistically meaningless. Specifically, it assumes a uniform rate of failure. In reality, it's commonly believed that drives have an unusually high failure rate in the first few months (DOA, infant mortality), and it's certainly the case that the failure rate increases for old drives (worn bearings). It's therefore invalid to compare annual failure rates for drives of different ages.
For example, assume drives with the following characteristics:
* A drive has a 15% chance of failing within the first 6 months.
* If it survives the first 6 months, a drive has a 10% chance of failing each year for the next 2.5 years (i.e. until it's 3 years old).
* After 3 years, surviving drives have a 30% chance of failing for each subsequent year.
For simplicity, assume each failing drive is replaced with a working drive of the same age (in reality, they'd be replaced with new ones, but the maths would be a bit too much for a Slashdot post). This means we can get annual failure rates simply by adding the ones above and dividing by age. The results vary wildly depending on when we calculate the rate:
* If we look at 6 month old drives, 15% have failed, and so the annual failure rate is 30%.
* If we look at 3 year old drives, 15% failed in the first 6 months and 25% failed in the next 2.5 years, for a total of 40%. The annual failure rate is 13.3%.
* If we look at 5 year old drives, 40% failed in the first 3 years, and 60% in the next 2, for a total of 100% (which means that some of the replacements failed as well). The annual failure rate is 20%.
So the annual failure rate massively penalizes very old and very young drives, and that's roughly what we see in the Backblaze study: the offending Seagate drives are old (average 3.8 years) and young (average 0.8 years) ones. The penalty for young drives is more severe, because it's not buffered by a long time operating with high reliability. In the example above, the actual failure rate in the first 6 months is the same as the rate after 3 years (30%), but the young drives look much worse than the old ones.
And just to stick another spanner in the works, suppose we have two batches of drives, as above. Each batch has the same number of drives, but one is 6 months old and one is 5 years old. The average age is therefore 2.75 years, and the annual failure rate is 15%. Those batches put together have a higher failure rate than a single batch of drives with a higher average age!
TLDR: drives don't fail according to a Poisson distribution, and so the study is mostly nonsense.
The data from the Blackblaze study is fine, it's not flawed, it's real. The issue is how it's being interpreted and the conclusions drawn from it and also with the lack of more data such as how old were the drives that failed vs. the one that were being "very reliable" etc...
Why would we care what a HPC Consultant has to say about consumer grade products? (Which we would purchase of the net for similar reasons as Backblaze's purchasing policy?)
Yes, there are known flaws in some drives (which anyone can check the manufacturer's website). Yes the use them to extreme or beyond the specifications of the products. Yes we can make the same conclusions that Mr Newman as quite honestly stated the obvious...
And contrary to his beliefs, consumers can retain HDDs in PCs for over 5 years... Who hasn't put an ancient HDD in a kids PC, or media server etc, just because it works?
What the 'study' does show us is that even with manufacturer noted flaws, the drives are still working 'pretty' well for the price, beyond their years... and I'm pretty much safe buying any HDD and (excluding the Lemon Factor) it should last me as long as I need it for at my 'consumer' level requirements.
Thank you Henry Newman for you inspiring words of wisdom in a domain that you clearly have no authority in. Please stick to your day job!
Thoughtful as in "unlimited budget" or "doesn't know how to use statistics to engineer more reliable solutions with lower-cost parts on the same budget"?
I still have and use my Seagate Barracuda 7200.7 (ST380011A; 7200 RPM; 80 GB) HDD for storage/backup/secondary in my current Debian stable box. I got it on 12/18/2005 for my old Linux box to to replace the dying and super slow Maxtor 30 GB HDD according to my http://zimage.com/~ant/antfarm... list. ;)
# /usr/sbin/smartctl -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourcefor...
=== START OF INFORMATION SECTION === ...
Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus
Device Model: ST380011A
Serial Number: 4JV5[deleted]
Firmware Version: 8.01
User Capacity: 80,026,361,856 bytes [80.0 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2
Local Time is: Thu Jan 30 02:43:34 2014 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
Just last December of 2013, we have purchased around 200 Toshiba (but are the same Hitachi/HGST drives before) and 250 Seagate drives (500GB). There were no DOA for Hitachi drives and there were a couple for the Seagate. After around a month, so far only Seagate has been sent for warranty.
Previously we have purchased around 350 Hitachi/HGST drives (500GB) and the failure rate is definitely less than 5% per annum in a span of around 3 years. I haven't proceed warranty of around 50 pcs. Probably it will be somewhere around 35 pcs.
In our near-line storage environment, we also had WD (1TB), Hitachi (2TB), and Seagate (1TB, 1.5TB, 3TB.) I have the following observations:
1. Enterprise drives (Seagate 1TB SAS) had similar failure rates than regular drives. We have 7 out of 10 in a span of around 5 years.
2. Eight Hitachi 2TB drives are still working well after 2-3 years without failure.
3. Seagate 1.5TB (7200.11) drives are around 3-4 years old where around 6 out of 26 drives are already dead.
4. WD Black 1TB drives are around 4-5 years old and that we have 13 out of 16 still working.
5. Seagate 3TB (SV35) drives are just over one year old and we have 2 out of 24 fail after the one year.
Statistically, the failure rates that is observed are similar to what we are getting. Unfortunately here, we can no longer get Hitachi drives per se since they became Western Digital and locally it is not promoted anymore. We are sticking with Toshiba but I hope they are able to maintain it.
Note: All drives are 7200RPM
"Known issues" are part of reliability. You cannot exonerate poor mechanical or firmware design from reliability calculation just because you know about them. Age is also a part of reliability.
This guy seems to be nothing more than a paid Seagate shill.
Ok, I know this isn't definitive, but I have an old server that is running 24/7 on a relatively old Seagate hard drive: ST340810A. It's a 40 gig IDE drive and it's been running 24/7 since around 2002 or so, so say it's around 12 years old. I have another much older hard drive running on my Amiga 500. It's a Quantum LPS52s, not positive but I think it's 1990 era. It doesn't get turned on very often, but I usually fire it up at least once a month and it functions with no problems. It's over 20 years old at any rate. I have a bunch of other old computers that still run. Their hard drives have got to be at least 10 years old. At least the 40 gig hard drives seem to have good longevity. The newer stuff, yeah they don't seem to have the same longevity. In particular laptop hard drives seem to fail a lot faster, no doubt due to the jolting they get when moved around.
A couple of thoughts:
First, throwing out hard drives after 5 years is a pretty good idea. A drive can be easily cloned using Clonezilla or Ghost and the replacement drive slipped into place requiring only a short downtime. It's exceedingly risky and silly to continue to use a system that's older than 5 years - not only drives have finite lifetimes but other important system components tend to fail more rapidly than every five years. Cooling fans are absolutely necessary to system health, but they fail, and the less maintenance a system gets, the more dirty the environment, the more quickly failure comes.
Second, and less scientifically, but anecdotally, the study seems to square with experience. I've replaced perhaps 20 drives over the last couple of years, and I've replaced at least 6 Seagate drives for every WD or Hitachi drive that needed it. Most recently a 2TB Seagate drive purchased in April began performing poorly in my home PC. SMART stats showed that pending sectors and uncorrectable sectors were out of spec, and I cloned the drive to a new replacement (another brand). I took the suspect drive to the lab and tested it using Seagate's own diagnostic tools and it failed.
This last experience underscores the importance of testing. If a system begins performing poorly even after regular housekeeping, test the drives. If SMART is indicating parameters out of spec, it's worthwhile to replace a drive even if it hasn't failed yet. In RAID systems typically more robust storage is used, but I'm talking about non-RAID systems here. Drive replacement is cheap and quick, but system replacement is better
Backblaze was analyzing their particular experience building a very large storage system using commodity drives. What they found was that certain manufacturers fared better than others. I didn't see anything in their paper about performing any types of performance tests, reliability tests, etc.
This was merely "We put X harddrives of brands A,B,C,D an E of sizes A', B', C', D' and E' and here's how they fared"
Yes Francis, the world has gone crazy.
Who cares about known issues. If I buy a hard drive and two years from now it has a 'known issue', then I would much rather not buy it in the first place.
A failed drive is a FAILED drive.
This guy sounds like some pedant who is butt-hurt because his favorite drive brand was panned. But, it doesn't really matter if it is a known issue. The drives still fail after they are put into use!
The Backblaze report showed that WD had a lower overall failure rate than the others. This precisely matches my professional and personal experience of the past 30 years, despite lots of derision about WD drives on teh internets. The weird thing is that Backblaze continues to buy Seagates, but I suspect they are basing their decision on cost and ROI rather than just reliability.
If those arrays were running RAID-5, I'm sure the stress under the rebuild is what caused the next already-marginal drive to fail.
Gamingmuseum.com: Give your 3D accelerator a rest.
As someone who has imported a bunch of 2nd hand hitachis from Japan (the land of the rising skank-whore) I must concur.....
This would all be much more interesting if Backblaze would configure their storage with drives from different manufactures. e.g., RAID10 with one each Seagate, Western Digital, Hitachi, Samsung. Then we would have a a level playing field.
Competition Good, Monopoly Bad.
The big issue, as you discovered, is that a RAID-5 rebuild puts a lot of stress on the remaining disks (RAID-6 has similar issues).
I much prefer RAID-10 (which is RAID-0 over RAID-1 pairs) because when you have to replace a drive, the rebuild only impacts the drives in that RAID-1 pairing. The rebuild time is related to the size of an individual drive, not the overall size of an array. A 12-disk RAID-10 array of 600GB drives takes just as long to rebuild as a 24-disk RAID-10 array of 600GB drives. Whereas a 24-disk RAID-5 array of 600GB drives takes at least 2x as long to rebuild as a 12-disk RAID-5 array of 600GB drives.
(Sometimes, there's no choice but to use RAID-6. You just need to understand the failure modes and that it is not as forgiving.)
Wolde you bothe eate your cake, and have your cake?
Yep that they were. Also had an issue once where we had configuration issues in a new fabric we'd installed and had to bounce an array a couple times to get it stable (some firmware setting had to be set to get 10Gb fiber working if I recall). Over the next few months a couple drives failed in that array. Once spinning it is best if you can get away with never having to stop them.
The big issue, as you discovered, is that a RAID-5 rebuild puts a lot of stress on the remaining disks (RAID-6 has similar issues).
If you call a linear read pass "a lot of stress" ... yes.
I much prefer RAID-10 (which is RAID-0 over RAID-1 pairs) because when you have to replace a drive, the rebuild only impacts the drives in that RAID-1 pairing.
Correct.
The rebuild time is related to the size of an individual drive, not the overall size of an array.
Also correct.
A 12-disk RAID-10 array of 600GB drives takes just as long to rebuild as a 24-disk RAID-10 array of 600GB drives.
Mostly correct. Assuming similar I/O load per spindle the 24-drive array should rebuild slightly faster (lower % of user I/O hits the pair that's busy rebuilding).
Whereas a 24-disk RAID-5 array of 600GB drives takes at least 2x as long to rebuild as a 12-disk RAID-5 array of 600GB drives.
Wrong.
With no user I/O, a fast enough controller and enough bandwidth to the drives, rebuild time is the same for raid1, raid10, raid5 and raid6 independent of the number of drives.
A 12-disk RAID-10 reads 1 drive of data from 1 drive and writes 1 drive of data to 1 drive.
A 24-disk RAID-10 reads 1 drive of data from 1 drive and writes 1 drive of data to 1 drive.
A 12-disk RAID-5 reads 11 drives of data from 11 drives and writes 1 drive of data to 1 drive.
A 24-disk RAID-5 reads 23 drives of data from 23 drives and writes 1 drive of data to 1 drive.
A 12-disk RAID-6 reads 10 drives of data from 11 drives and writes 1 drive of data to 1 drive.
A 24-disk RAID-6 reads 22 drives of data from 23 drives and writes 1 drive of data to 1 drive.
Note that we are always limited by linear write speed of the replacement drive.
Now if you don't have enough bandwidth, a controller with a slow parity engine or are rebuilding under heavy user I/O load, raid10 easily wins.
I recently found a similar vintage Conner that had been decommissioned back in the 90's and hadn't been touched since. Still worked fine, once I tried hooking it up to an old P3. The drive had Windows 95 (the original, not OSR2) and AOL 3.0 on it :)
The top covers are stainless steel, the rest is aluminium (pretty much) take them to the scrap yard after a thorough DBaNing, at least you'll know they'll be melted down and used again rather than ending up in landfill...
Apologies if that's what you meant.....
You stupid, worthless hillbilly. Of course you're an Apple fan.
You see, you admitted in your OP that you're a Seagate fanboy. And ALL fanboys of ALL kinds are ALL the same, no exceptions ever. You all share the same childish mindset and the same cripping mental deficiencies. You're all so terrified of facing reality that you hide in the comforting lie that there is one Objectively Best Choice to make about some petty thing and that you're smart for making it.
You didn't base your opinion of Seagate on any study, you incompetent liar. Certainly not on "every other study ever made"; partly because "every other study ever made" does not in fact say "Seagate is the best", but mostly because you've never actually read any at all. No, you based your opinion on your own subjective, deliberately-limited experience and your pathetic need to feel like your choice MUST be the best one possible. That is the ONLY possible reason anyone would have such a "golden rule".
Seagate, Apple, Microsoft, Sony, Ford, GM...all irrelevant, a fanboy is a fanboy. You're identical to Apple fanboys in every possible way. Therefore, you're one of them. Period.
Now you will prove me right. You will never be capable of doing otherwise, you filthy rube. All you'll ever do is struggle to get Steve Jobs' rotten cock just a little bit further down your throat.
That's pretty unusual for a Conner. They had a fairly uniform habit of forgetting data if they sat unpowered for 6 months or so, and some would forget the entire partition. I've seen some that didn't, but they're the exception.
As to well-aged stuff, I have a 1991 W.D. that still works fine and tests 100% perfect. (If you have a 386 or older to hook it to!)
~REZ~ #43301. Who'd fake being me anyway?
Almost all WD drives over the last 16 years.. With 5 computers loaded, that's a total of 28 drives in service. Even with WD they have a number of different ratings. I typically go for the ones tested for several million hours MTBF and the computers vary from 24 X 7 to about half that, but mostly 24 X 7. In all those years, I've only had 3 drive failures. HOWEVER: I have to add that as drives and CP/us have gone down in price and up in capacity, I've upgraded HDs every couple of years to keep up with the CPU capacity, AVI work, and Photography, with over 30,000 high resolution scans and high res digital images running 35 MB per photo, or more.. I think I counted 32 HDs, mixed, Parallel and SATA in the pile. A few old ones starting with an 80 GB, a 160 and about 4 200s. There's a bunch of 250s and 6 or 7 500s. Most of my drives in service are 2 TB, and there are 4 1 TB and 2 4TB. I hit the old drives with a bulk tape eraser, which has always rendered them beyond economical repair. To erase them using the suggested methods that would leave them useable, but none of the data recoverable would take a day or two per drive, so I just trash them and then use them for target practice. If you can get a new 500 Gig HD for less than 50 bucks, it's not worth spending a day or two to wipe them for what they are worth, Even at 3 years, that's 8766 hrs per year, or 26,298 hours running 24 X 7 and a long way from even getting near the MTBF ratings for the drive. I no longer purchase drives without published MTBFs and those are 2 million or more. This makes me glad that My HDs have no where near the failure rate of cell phones, or head phones. Both of those suffer a high mortality rate around here. Those and TVs I purchase only at Brick and mortar stores where I can try them and get the extended warranty. Every headphone set and cell phone has made the extra insurance worth it. I just hand them a bag of parts and they replace them. My present cell phone is less than 6 Mo old and it's getting difficult to read the display.
For what you are wanting to do? Grab a copy of Hiren's Boot CD, specifically you'll want to use HDAT2 or if you are in Windows you can use HDTune and run the error scan.
ACs don't waste your time replying, your posts are never seen by me.
This study would be the guide to strengthen the hard drive and prevent from having viruses.
Thanks for the tip, Hairy.... I wish I had some modpoints for you, but all I can do at the present is express my gratitude in words.
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]