25,000-Drive Study Gives Insight On How Long Hard Drives Actually Last
MrSeb writes with this excerpt, linking to several pretty graphs: "For more than 30 years, the realm of computing has been intrinsically linked to the humble hard drive. It has been a complex and sometimes torturous relationship, but there's no denying the huge role that hard drives have played in the growth and popularization of PCs, and more recently in the rapid expansion of online and cloud storage. Given our exceedingly heavy reliance on hard drives, it's very, very weird that one piece of vital information still eludes us: How long does a hard drive last? According to some new data, gathered from 25,000 hard drives that have been spinning for four years, it turns out that hard drives actually have a surprisingly low failure rate."
And how much of the failure rate are counted for those?
Yah, except for my Western Digital Green which failed 3 days after the warranty expired. And similar accounts on newegg...
"Freedom in the USA is not the ability to do what you want. It is the ability to stop others from doing what THEY want"
>> hard drives actually have a surprisingly low failure rate.
You call a 20% failure rate in 3 years LOW? My career rate is closer to 5% over 5 years - who keeps buying all those crappy hard drives?
Am I the only one that read the title and thought they were talking about cars? Those long hard car drives can be frustrating.
Some people die at 25 and aren't buried until 75. -Benjamin Franklin
I would love to see the breakdown(ha ha) by brands. But I would also like to see if they had temperature variations or power cycling stats.
Does a HD that is always on last for more or fewer hours? Ideal temperature? And a hard one to test, vibrations.
http://hardware.slashdot.org/story/07/02/18/0420247/google-releases-paper-on-disk-reliability
"Surprisingly, despite hard drives underpinning almost every aspect of modern computing (until smartphones), no one has ever carried out a study on the longevity of hard drives — or at least, no one has ever published results from such a study."
I recall reading a /. story from Google on THEIR experiences with hard drive longevity several years ago, over a much larger sampling of drives. Even linked to a PDF with the particulars....
Maybe they are to small to count, compared to an upstart backup company...
Four years isn't long enough. Come back to us when you reach 6 or 8 years. The study looked at drives during the warranty period (WD drives have 5 year warranty).
Also the information they presented doesn't show that low of a failure rate.
These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
Long... Hard... How long can it last? That's what she said, that's what she said, that's what she said!
99% of consumers have no backups and no raid, so 20% failure rate = 20% chance of losing EVERYTHING.
I call that an unacceptably high failure rate.
And note: I also have seen a 20% failure rate at home. Higher if I use the crap WD green drives.
Do not look at laser with remaining good eye.
Backblaze has done their study in their datacenter. This means they did it in a controlled environment. I'm sorry but I don't have an AC where my computer is... the air is not filtered. my PC is in my basement (as some people put it in a room) where theres 30-40% humidity using normal crappy air i breath like we all do. Some of us (not me) smoke and live in places with lots of humidity or dry air as well. Is this taken into account...nope.
Well this study is to be taken with a grain of salt as lots of variables are missing in their study but it is a good start to know what hard drives last longer under perfect condition
This study was completely useless. WHAT BRAND WERE THEY?! Hitachis and Fujitsus have a higher failure rate by a factor of about ten than a top of the line Seagate drive.
Backblaze, an unlimited online backup company that keeps 25,000 hard drives spinning at all time,
Drives take far more wear and tear if they're power-cycled on a daily basis, and allowed to spin down when a machine is idle. I'd like to see the figures from an organization that services a large number of desktop machines.
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
You just mocked the essential point of this whole question. Drives if they fail (other than from abuse) will usually fail in the first few days of use. My question is why are you here mocking anything at all?
This used to be a place to learn interesting technology from interesting and smart people. You won't notice that none of them are still around and there aren't even any good stories on the front page anymore "because you just showed how stupid you are on the internet".
Run the test longer and show us the data for span of 10 years. Additionally, reveal the brands and models of the disks. Thanks.
I worked at an on-line service for several years way back in the late 90s and early 00s and this data is consistent with the data I collected then over perhaps an order of magnitude more units. While 25K drives may not be a lot in the scale of today's internet services it is more than enough to draw statistically valid conclusions, as opposed to that, oh, 1 drive in your desktop gaming system that failed 1 day after the warranty expired.
I remember that all my Deskstar drives failed after each other very soon...
Regarding those statistics, I think we should rule out some brand and model well known for failure, because, as soon as the information goes public, we need to replace them with some other brand/model.
With such strategy we can achieve a lower effective failure rate.
You won't notice that none of them are still around
I see your DISINFORMATION double negative and your FALSE FLAG.
Regard ----> I am in fact STILL AROUND therefore (& everyone agrees with me who isnt already PROGRAMMED) those who know the TRUTH are still around.
I mock nobody these are POWERFUL MASTERS and you have to always sleep with 1 eye (EYE) %%%% OPEN. i feel only sorrow heartache despair and blueballs for Pikoro (844299) who is clearly a victim to the HARD DRIVE FALSE FLAG.
Maxtor disks were also famous for their unreliability.
I've head at least 7 Seagate ES.2 250GB drives fail on me. Luckily, not all at once.
It depends on what you're doing with the drives.
If the drives are mainly holding your torrent/nzb -client output and you only have so many SATA ports and drive bays, then you really don't care if they last over 3 years. In that situation, a four-year-old drive needs a capacity upgrade anyway.
I have three 2TB drives right now, that I'm only keeping because I just massively upgraded my server (Lian Li PC-D8000 case and two SAS HBAs). If were still crammed into a conventional tower case or living with only 10 SATA ports, I'd be replacing these kickass-for-2010 drives with newer models now.
These are the same stupid fucks that use rubber bands around hard drives in their "SAN" storage.
Given that anything remotely serious is based on the premise that you can't trust your hard drives, is a strategy that makes your HDDs incrementally less trustworthy; but much cheaper, actually 'stupid'?
I wouldn't want to use BackBlaze's 'Pods' on a small scale; because part of their low cost is achieved by moving all the redundancy, fault tolerance, etc. into software (and, for a small shop, paying a bit more for fancy hardware that handles that, along with backups, is cheaper than having a software guru on hand); but on a large scale, making the amount of 'overhead' (ie. dollars worth of hardware purchased to support each disk) as low as possible, and just using software (with its high up-front cost; but zero cost to copy an arbitrary number of times) seems pretty reasonable.
Now, if their arrangement was so dodgy that it was actively murdering drives, that'd be another story; but its thermals and electrical supply are good enough that the drives inside get to fail, or not, the same as though they were in any other enclosure, and these enclosures are crazy cheap, so why not?
You're welcome.
...that the failure rate of hard drives is an exponential function of the importance of the data residing on the drive multiplied by the business need for uninterrupted systems uptime.
I had one such thing in my Amiga 500. And it literally released the smoke out on the very same day I bought it.
This isn't surprising. To summarize: most early failures happen within the first year, and after 3 years, the survival rate drastically drops off.
This is a well-known phenomenon in IT storage, and it's why people will typically start replacing storage (or individual disks with any pre-fail signs) after 3 years.
That said, of the many disks I have still in service, most of them are older than 5 years, and I have some which are pushing 15 years old now without any concern of immediate failure. I've had pretty good luck with disk failures, and have only had SSDs die on me (Kingston, looking at you) personally.
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
So it averages to 12.5 years, not too shabby for a HD.
Escher was the first MC and Giger invented the HR department.
In the first phase, which lasts 1.5 years, hard drives have an annual failure rate of 5.1%. For the next 1.5 years, the annual failure rate drops to 1.4%. After three years, the failure rate explodes to 11.8% per year. In short, this means that around 92% of drives survive the first 18 months, and almost all of those (90%) then go on to reach three years.
Extrapolating from these figures, just under 80% of all hard drives will survive to their fourth anniversary.
1.00 (total) - .051 (failure rate for 1.5 years) = .949 (non-failure), but only 92% survive for 18 months (a.k.a. 1.5 years)? What?
With my limited sample of hard drives (around 50 around the years), what I've found so far. The drives range from 1.2GB to 1TB models, SCSI/IDE/SATA
*ALL* but 1 or 2 of my Maxtors either died or sounded like a bandsaw pretty soon
My Seagates are all dead save 1 or 2
My WD seem fine, albeit some are noisy, but my two 1TB green pulled from external cases are pretty much about dead.
I've had only 1 out of 10 SCSI drive die so far.
So my experience so is Maxtor was crap, when Seagate bought them it lowered Seagate's reliability. And since *ALL* the drives I've pulled from enclosures are dead, I'm guessing they are selling their crappiest drives to other manufacturers.
The problem is they are not trying to make better drives, they are trying to make *bigger* drives. Fuck a 4TB drive, gimme a reliable 1TB.
All my obsolete hard drives were dismantled and recycled, and from what I saw, the more recent the drive, the cheaper it's made (and less reliable)
I should've kept statistics while dismantling them.
I've got better things to do tonight than die.
The Google report based on many thousands of drives showed that while some MODEL NUMBERS had much higher failure, various brand names had similar failure rates. Western Digital will make two drives at the same time, one model that's very reliable while the one next to it is crap. Same with every other manufacturer.
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/disk_failures.pdf
If you insist on buying based on the brand name, HGST models have been very good in our datacenter.
How did this unfunny and completely offtopic comment get modded up?? Sock puppets, maybe?
Google's numbers based on 100,000 drives showed that specific model numbers are reliable or not, while brand name doesn't matter as much.
All manufacturers make bad models and good models.
I've actually had the most luck with refurbished drives. If you find a brand on Newegg that's fairly new, you eliminate the re-furbs that failed due to wear and tear. The ones that are left are DOA drives that got sent back because of common manufacturing flaws. These drives are 100% QC tested and I've yet to have one fail. The awesome kicker is that the stigma of a re-furb virtually guarantees that they'll be cheaper as well.
Common Sense (+1)
... which is why I don't believe it, due to the CONFLICT OF INTEREST from Backblaze, who clearly would benefit if more people thought their hard drive had a one in five chance of dying after four years...
http://www.weibull.com/hotwire/issue21/hottopics21.htm
The behavior described is just what we should expect.
Of course, in many installations the failures aren't random but correlate to power, cooling or batch issues. Especially important to beat in mind in disk arrays with long RAID rebuild times. The 2nd or even 3rd failure may come a lot quicker than you'd expect.
This is why even with reliable storage arrays one needs backups.
I had a similar experience with a fleet of about 20 Quantum hard drives that all died within 4 years. They started failing in earnest at about 2.5 years of age and then they started failing at the rate of one or two a month. The drives were all manufactured right about the same time that Quantum was being purchased by Maxtor.
It's six years old now, so perhaps drive failure characteristics have changed, but this study got some different results from a study published by Google in 2007. Google's study obviously involved a lot more than 25,000 drives.
For one, Google didn't observe a strong bathtub curve. They did see some infant mortality, but it was during the first 3-6 months, and the first-year failure rate was still lower than in subsequent years, so what "bathtub" there was hit the low point prior to the one-year mark and then began to climb.
The failure rates Google observed were also much lower. Perhaps drives have gotten less reliable.
Google also reported a lot of detail about how various SMART-reported values correlated with failures. Too bad Backblaze didn't do the same. It would have been very interesting.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
How do you know which ones failed due to wear and tear? Does it say in the item summary somewhere?
--
Promoting critical thinking since 1994.
So what's the oldest in use HDD out there? San Jose's Computer History Museum's RAMAC doesn't count as it was off-line (idle) for a couple decades, before being restored by profesionals.
my oldest drive... hang on...122MB Quantum Prodrive ELS, still working, no bad sectors. Sees occasional use as a scratchpad for my print server.
Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
Look for a newer model that hasn't been out long enough to have that much wear.
Common Sense (+1)
Seems like a conspiracy to me.
presented at a physics conference
http://indico.cern.ch/contributionDisplay.py?contribId=37&sessionId=3&confId=247864
real data from the source
Two AMERICAN fails in six words...
Why do you AMERICANS keep writing 'more THEN' and 'more THAT' instead of 'more THAN'?
How stupid do you have to be to do that?
UPS + good cooling. That said, I don't run any massive databases, just movies, music, and games.
I recently pulled a 40MB Miniscribe ATA from a CNC system we replaced. Thing had been running on the shop floor for 25 years...
I think he's saying that if the drive has only been on the market for a couple of months, the wear-and-tear failures haven't had time to happen yet.
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
On the other hand every refurb I've had has been a dud. Most were DOA, few lasted more than a couple of weeks. The key is how thoroughly you test them.
Back when Magnetic Data Technology (MDT) started doing refurb drives (although they were sold as new, just 10% off the normal price) we got quite a few in over the months. It appeared that they would take drives which had failed due to having too many bad blocks and simply mask those blocks off at the firmware level, reset the failed block count and ship them back out. They seemed to work fine if you just installed them, ran basic SMART tests and installed an OS. If you did complete surface scans and long SMART tests though they would usually fail immediately as more bad blocks were discovered.
For the relatively small saving you make and crappy 1 year warranty it just isn't worth it.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
If you are sure that they were a relatively new model, and the refurb was a FACTORY refurb, that might be a good method. If Joe Stocking Clerk did the refurb, who knows what you will get.
When installing, and periodically there after, It is wise to run something like smartctl -a /dev/sd? on your drives and check the power on hours and power cycle count. (Not to mention the reallocated sector count and spin retry).
You would be surprised how many refurbs are actually fairly heavily used, with a lot of hours.
My current server's raid array is averaging 5.9 years, but has only seen 53 power cycles over that time. I actually tend to believe (without a great deal of evidence) that power cycles are harder on drives than running constantly.
Google actually did a similar study some years ago. Their study of over 100,000 drives largely agreed with the present study, right down to the three-node distribution of failures over time.
Sig Battery depleted. Reverting to safe mode.
Because it doesn't cost enough to be good, dammit!
Everyone knows that you always get what you pay for!
[/sarcasm, for the sarcasm-impaired]
Kid-proof tablet..
> 7 Seagate ES.2 250GB drives fail on me
The model number on the thirty sitting on a shelf in my office is ST3250310NS. They're Seagate Barracuda ES.2 250GB 7200 RPM drives. We've had a 125% failure rate after less than three years. They came four each in six Dell R410 1U servers. 30/24 = 1.25 or 125% failure rate. Dell servers are complete crap with poor airflow and a lot of vibration so that doesn't help, but still Seagate should be ashamed of a 125% failure rate. Of course since they refused to honor the warranty, they've shown they don't give a damn.
I've purchased a fair number of "refurb" products as well. That said, those products may not even be "repaired" in many cases. They could be drives returned due to some fault the QA process could not find (generally assumed to be "user" error by the manufacture), they could in some cases be leased equipment that gone out of lease and been "recertified", etc etc etc.
Bottom line, is don't assume that the quality is "better" just because its gone through some official test.
My disks lasts mostly for a long time, the few times I thought it was failed it actually was other things, as bad ram (never buy value ram) bad controller card in external drive etc.
Losing a drive or two now adays is no biggie, they are cheap enough to keep spares around and replace quickly. Most lost drives result in 0 down time.
Now RAID controllers, those things are a scourge, nothing says "Fuck you" on a Friday at 2pm like a dead RAID Controller.
OK, I'll just say it, if I do "smartctl -a /dev/sda", I have no idea how to interpret the results. Every brand reports things differently.
Is there some way so sum up the report to Fine / Failing / Already lost data / Dead ?
Non-Linux Penguins ?
I built up a number of servers in the late 80's and fitted them with Seagate 100 Meg drives, 4 in all. Apart from about 5 occasions where the equipment has been moved to different buildings or floors or the power supplies replaced, they have been running virtually non stop for 23 years. I have a few spares in my bottom draw to replace them with but I am starting to think they might just last through to retirement.
The article claims that nobody has published a study of hard drive failure rates before. Wrong, of course - Google did some time ago.
The guy ends the article by talking about SSDs as if they are some sort of unknown quantity. The failure modes of SSDs are much more consistent and better understood even at this early stage. Flash blocks become unusable after a fairly fixed number of rewrites. The drives measure the total number of historic writes to the drive and failure can be predicted ahead of time.
I personally replaced thousands of those deathstar drives. Thats how you get the left side of the bathtub curve.
You may find this to be ridiculous....
I have seen more drives in RAID arrays fail than any other type/configuration of drives. I don't know why, but when you put disks into a raid array they seem to be so much more likely to fail. Maybe RAID controllers tend to overwork drives? We always buy the "enterprise" (expensive) drives too...
You want more? I've also seen more power supplies in servers that support redundant power supplies (especially Dell) fail than anything else.
I've got 286 computers with good drives and power supplies that will probably keep working until there's an EMP, but the "enterprise" stuff from today is just awful.
All of the drives we buy today are advertised at 1-1.4 million MTBF and cheap as hell nothing special about them. Obviously I have no idea what manufacturer rated MTBF of the disks they purchased were however I find it hard to believe it would be significantly lower.
At 1m MTBF after 4 years the failure rate should be closer to 2% than 20%.
Thinking back to previous insane reports from Google and bit errors in RAM it is very hard for me to trust any of these "studies".
For all I know something is screwed up in their environment.. vibration, temperature, supply power, power management, grounding/interference or bad batch biasing outcomes.
Neither should it be dismissed this report comes from an online backup company where there is a fairly direct and obvious conflict of interest.
What equipment with 100 MB drives is still in production? Not berating you, just genuinely curious.
This makes me wonder if those replicants in Blade Runner were just common consumer products, not really designed to fail after a few years. Is there anything man made that can function longer than we live? I'm sure there are a few rare examples, but why is it so hard to create something that outlives us?
I'd believe that one for sure. I've had WD Blacks die after swearing by them the previous generation, a couple of the same model. Same with an office setup long ago, 25% failure in a year. No brand has held favor long enough to be useful info to me.
On a sadder note: My faithful Bigfoot drive failed to boot up this weekend, oh well, teenagers are sooo tempramental :(
Happier note: NOS OEM replacement in hand. LOL, long term planning was a tad longer term than expected but still....
I hope you have a strong shelf and nothing valuable underneath, also a failure rate of 25% more drives than you purchase seems somewhat illogical
I have that same drive, and mine still works flawlessly too. Always liked the hollow knocking sound they made when seeking.
My other cool drive story is earlier this year I found in a box an old Conner IDE drive in a bag with a small piece of paper with a bunch of mysterious numbers written on it. Curious as to what might be on it, I plugged it into a USB to IDE adapter. Drive spun up, but nothing else. I was kind of bummed, but then it dawned on my what that mysterious piece of paper was. Realizing that the USB to IDE adaptor wasn't going to work, I dragged an old P3 out of the closet, typed those numbers into the BIOS screen after the auto-detect failed, and I found the drive still worked perfectly fine. Time stamps on the files indicated that it was taken out of commission sometime in 1998.
Until you put valuable data on it with no backups. Then they fail almost instantly.
There are three kinds of falsehood: the first is a 'fib,' the second is a downright lie, and the third is statistics.
It is true that their arrangement isn't particularly fast (lots of port multipliers, all redundancy operations, whether RAID or something else, are handled in software by the not-especially-distinguished CPU) compared to 42-spindle systems actually designed to wring maximum I/O operations out of HDDs; but it's fast enough, by all accounts, for nearline storage purposes, which is what it was designed for.
Exactly. This is obviously not a fool-proof method. But, like I said, I've had some luck with it so I figured I'd share.
Common Sense (+1)
Obviously not. It is at least as much of a gamble as buying a new drive. But where you can intentionally pick a product that has a slightly better chance of having been checked I've experienced some luck. Especially in the case where the defect that caused it to be returned in the first place is unlikely to be related to hours of use or power cycles.
Common Sense (+1)
My first real computer (a 286 that I finally retired for good in 2001, tho I still have it) has an ST-225. Ran hot, slow, and needed a fresh low-level format every couple years, but the durn thing still worked. You could tell exactly what it was doing -- seek, read, write, and delete were four distinct sounds. Unrecognised sounds were a sign that a LLF was in its near future. :(
~REZ~ #43301. Who'd fake being me anyway?
Yep, absolutely.
Without any buzzword bingo: It's just a backup system. It does not have to be fast, it just has to be both big and geographically redundant.
Kid-proof tablet..
I've actually had the most luck with refurbished drives. If you find a brand on Newegg that's fairly new, you eliminate the re-furbs that failed due to wear and tear. The ones that are left are DOA drives that got sent back because of common manufacturing flaws.
Or re-stocked drives that have been abused but haven't failed (yet).
The two things I don't buy refurbished are hard-drives and toilet paper.
And underwear. Ok, the three things I don't buy refurbished are hard-drives, toilet paper, and underwear.
Oh and waffle cones. The four things I dont buy refurbis
I think this study is spot on from my experience. Yeah it is incomplete, and so is my study of HDD failure.... In fact my study of HDD will not be complete until I do not have them in my systems. But there ya go... endorsement..
The "bends" in the curve they plot are too abrupt. There must be something else going on.
Looking at the original article, they had only about 3500 drives around 2009. That's 4 years ago. So their "4 year" survival rate is not based on the 25000 drives they have now, but only on the 3500 that they had in 2009. With the sharp bends in the curves around 1.5 years and 3 years, I think they significantly changed their buying policy around those moments. Or the manufacturers started shipping them different drives.
How else can the drive "know" that it's been on for 1.5 years? The annual failure rate drops by a factor of four inside a month.
The explanation of the bathtub curve eplains it a bit, the random failures is apparently about 1.4% per year. The initial failure is about 5.1-1.4= 3.7 per year. But instead of the initial failures "tapering off" to "small" values around 1.5 years, they stay constant for 1.5 years, and then suddenly drop to zero. To me this points to something like: "they bought a big batch of drives about 1.5 years ago that has such a high random-failure-rate to pull the average first-1.5-year average up to 5.1%/year".
Do the same analysis 3 months from now, and the "1.5 year bend" moves over to 1.75 years. That's my hypothesis based on the data they publish. Having the underlying data and some time to spare, the current data may debunk or prove my hypothesis already. (e.g. if you run the analysis on the data that is now older than 3 months will, if my hypothesis is correct, show the bend around 1.25 years. If that happens, it makes my hypothesis very likely.....)
Haha Yeah, I suppose that's possible. While people do shady things like that, I'd like to believe that doesn't happen *most* of the time. I will say that I wholeheartedly agree with the other three things... ;)
Common Sense (+1)