Analyzing Long-Term SSD Failure Rates
wintertargeter writes "It looks like Tom's Hardware has posted the first long-term study of SSD failure rates. The chart on the last page is interesting — based on numbers, it seems SSDs aren't more reliable than hard drives. "
or should we conclude 'just as reliable'?
SSDs haven't even been available "long term". The SSD in this computer hasn't been available for longer than 20% of its warranty period, for example. Extrapolating data from generation zero SSDs to today's SSDs seems foolish.
Did the poster even look at the chart he linked to? Those big lines that shoot up to the top after 1-3 years? They're the failure rates for hard disks. The ones near the bottom? They're the failure rates for SSDs. Now, some of the SSD figures are projected and look quite optimistic, but the number of hard disks failing after 3 years looks high than the number of SSDs failing after three years by all of the studies. For most workloads, the SSDs fail less often, and the SSD failures only exceed HD failures very early on in their lifetimes.
I am TheRaven on Soylent News
The chart linked is not terribly useful, since the legend doesn't really explain what the curves are (three completely different curves with the same label, HDD Schroeder 2007).
http://www.geoffreylandis.com
Hasn't this always been the assumption? I've always been told by everyone in every discussion about SSD vs HDD that SSD has a lesser lifetime.
I didn't read TFA but the chart doesn't tell me that "SSDs aren't more reliable than hard drives".. the SSDs were generally 6% or under (assuming the linear progression) whereas regular HDD approached 14%+ after five years. And "Long-term" in the title? The SSD data in the chart only goes for 1 year. Not exactly long term when the chart goes from 1-5 years of use. The actual data for the SSDs is only 20% of the time span.
it seems SSDs aren't more reliable than hard drives.
I have never ever heard a single fucking person make the claim that SSDs were more reliable. Faster, yes. Less reliable, yes. But I've never heard anyone claim they were more reliable as some sort of selling point.
Whoever wrote that article might know a lot about drives. But they don't know a lot about how to write an interesting and readable article.
Never email donotemail@WeAreSpammers.com
THG did a good job separating the cause of 'failures' between SSDs and HDDs as I get asked a lot by potential customers the differences between the two.
SSD is a whole other ball field and, personally, I'm more HDD biased than SSD.
I still see SSDs as a brand new technology that has yet to normalize as much as HDDs have in the past 15 years. However, the article does make me realize that SSDs really won't normalize as much as HDDs have due to how vastly different they work. So, I'll have to give SSDs some slack in future customer builds.
In the beginning, the issue with SSDs were the write speed, but now the focus in this sector is on the controller chip (ie. Sandforce) more than the storage medium (SLC vs MLC) the SSD uses.
It looks like having the SSD/HDD combo in a PC is still the best way to go until the storage chips drop considerably.
Previewing comments are for sissies!
I don't think it's really fair to say at this stage that SSDs aren't more reliable than hard drives.
For one, SSDs are still rather new. Yes, they've been around for a few years but compared to hard drives they are still at the beginning of their development cycle, and it shows: firmware issues and recalls, as stated in the article, may be a heavy contributing factor to the SSD failure rates. We can expect this to drop as manufacturers continually revise their firmware and manufacturing techniques for the better.
For another, the article also notes that the SSD failure rates, to this point, are rather constant. If this trend holds, SSDs can easily outstrip HDDs in reliability by the 3rd or 4th year.
Finally, SSD has been coined reliable often in the perspective of the average consumer. The benefits are obvious: when mishandling occurs (which happens much more often than you'd think, even on desktops), HDDs will have a far higher chance of damage.
Hence, while the results of this article is indeed interesting, perhaps a study done when the technology matures further would be more useful.
I've not seen much evidence of bias from Tom's Hardware, although I did stop reading their site several years ago, just shocking levels of ignorance and stupidity.
I am TheRaven on Soylent News
I have had laptop hard drives fail on me because of the hits they take in my bicycle carrier or on my lap as I bounce along in the shuttle on city roads. My SSDs have yet to fail. Never again will I get a hard drive in my laptop, because of reliability.
Think of flash memory as acid/base chemistry: a one is stored by pH much lower than 7, a zero is stored by a pH much greater than 7. The reaction is confined to pores in a pumice stone. In order to reduce cost, pumice stone with increasingly small bubble cavities (and mineral wall thickness) has been pressed into service.
By the laws of solid state physics, this makes acid/base pumice stone inherently more reliable than magnetic domains spinning on a fluid bearing.
The bottom line here is that every SDD die shrink generation is an entirely new set of loaded dice. Due to the incredible churn rate in IC fabrication technology, no SSD product remains in the market after establishing a solid track record.
It could be that Intel SSD products are like the Staal brothers. Or not. If you're willing to average over a flock of white swans and black swans, it could well be more reliable than HDD storage.
I find the underlying variance frightening. The maturity model sucks, because as fast as they figure out one problem, the problem is immediately replaced by a harder problem, as per Douglas Adams.
The candle that burns twice as bright burns half as long, and you have burned so very, very brightly Roy...
Let me summarize:
A) Chart is worthless. I have never see a more ambiguous meaningless chart in my life. They might as well not bother to label things.
B) Lets do a reliability study on SSD's that they don't have any long term data on past 2 years, yet compare it to HDD that typically at least have a 3 year warranty. By that I only mean, I'll go out on a limb and guess that the average failure rate of HDD is > 3 years, if only for economic self preservation.
C) Results in either case depend highly on specific device model and configuration.
The poster probably saw the chart, as they seem to have actually read the article in addition to merely glancing at a picture on the last page. Right below that graph:
There are other numerous quotes as well about MTBF not being equivalent to reliability, correlation to vintage, etc.
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
Based on numbers, the study shows SSDs to be more reliable than HDDs. The best data I have seen in that article is the following:
SSDs: 1.28--2.19% over 2 years
HDDs: >=5% over 2 years
The HDD data comes from: http://media.bestofmicro.com/2/N/289103/original/google_afrtemputilization_475.png The SSD data comes from the table on Page #6.
I don't think any of this data is particularly surprising, HDDs are mechanical so the curves for failure would not be linear. The most interesting part of the article for consideration with SSDs is that SMART is going to be near useless for them. Since most failures are random occurrences in electronics which SMART isn't good at detecting, we may need better technology for detecting SSD failures.
it is also a lot easier to retrieve data from disc then SSD that most of the time go without warning
According a a perfectly baseless linear interpolation on several charts, SSD have similar failure rates as HDDs. Just great... Call me back when we have 5 years of solid data, not just conjectures et inference.
Stupidity is the root of all evil.
Referring to the chart on the last page the HDs look linear at the beginning too. I guess we don't really know yet. What if the SSD start failing at a higher rate at three years.
Reply to un-do accidental moderation. Apologies.
Ditto here. About 50 Intel X-25s or 320s so far, many in service for 2+ years, and zero failures. All in laptops. We started buying all SSDs in laptops about a year ago. We see much higher (~5% annual) failure rates in our desktop mechanical disks, as well as the hundreds of "near-line" 7200 RPM and "enterprise" 15K drives in the datacenter. Our next SAN/NAS purchase will definitely have good MLC SSDs on tier-0 or as massive read/write cache, backed by spinning rust in RAID-6 or something similar for capacity. We will of course hammer the crap out of some demo units with random writes for several weeks to provide confidence in the SSD lifetimes.
I want to know if SSDs are more reliable than HDDs in an environment full of cat hair. I've never had a SCSI HDD outlast its warranty.
I use SSDs on systems I want to boot fast, but that's about the only use I have for them and find the 'no, don't upgrade CPU/RAM/whatever, get an SSD it's the best upgrade for any system' nutters rather amusing.
I have seen people saying that SSDs speed up compilation a lot though I'm surprised because header files and the like should pretty quickly go into the disk cache and never require another read from disk. However, those same people also say they have to replace the SSDs at least once a year because they wear out... which isn't a bad deal if a $100 SSD saves you an hour of programmer time every week.
My most used SSD here has about 1500 hours and has used 1% of its write cycles; but that one is set up to put all the regular writes (/tmp, /var/tmp, /var/log, etc) into a RAM disk instead of going to the SSD.
I have over 100 Intel 320 and X25-M drives in my organization and not had one fail yet.
Good luck. The 320 has a known bug where it will power up claiming to only have an 8MB capacity and requires a complete wipe to recover (do a web search for intel 320 8mb).
Sadly this was discovered about a week after I bought one.
I read TFA, and to me, it seems like a bunch of samples that aren't necessarily comparable nor do they necessarily agree with each other.
This is interesting as a starting point, and I applaud Tom's Hardware for the effort they have put into this article, but I think we will need a lot more data before we can get meaningful conclusions.
What did surprise me, though, are the return rates on hard disks. Multiple percent in a single year seems high to me! I'm glad I'm not in the hardware business.
Please correct me if I got my facts wrong.
We don't buy SSDs because they are more reliable (they don't seem to be in our large RAID arrays), but because they are faster than HDDs.
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
That's my take. And unlike a hard drive, firmware is something which can be continuously improved. SSD manufacturers are starting to understand and deal with the failure modes.
One thing they don't mention is off-line storage. If I take a hard drive out of service and store it on a shelf for a year, it's virtually guaranteed to fail when I power it up. That is, every single HDD I've taken off the shelf will tend to work for a short while, long enough for me to get the data off of it usually, but every single one has failed within a month of being repowered.
I expect over the next few years the combination of firmware improvements and flash improvements (that is, improvements in being able to predict a flash cell failure) will result in the SSDs running away in the reliability department. Hard drives have been around for a long time and yet they still fail at a horribly high rate... too high for the higher capacities they now have. Intel has certainly already seen the light.
Several vendors are now putting hi capacity caps in their SSDs to remove one common failure mode... exploded meta-data/table table due to unexpected power downs, which is a particular problem for SSDs which use idle time for wear leveling activities.
One thing for sure, we are going to get some excellent statistics over the next decade.
-Matt
Any thread on SSD failures should include a link to Jeff Atwood's blog entry on the topic:
Full post here: http://www.codinghorror.com/blog/2011/05/the-hot-crazy-solid-state-drive-scale.html
You are using the wrong sort of cat.
Put the disk in a box with SchrÃdinger's cat .. then the drive can be both alive and dead at the same time as the cat ;)
--I thought I was wrong once, but I was mistaken.
I think a good question is how do HD's fail vs how SDD's fail.
There are two distinct ways that an HD can fail, either the circuitry on the PC board goes bad or what's inside the sealed chamber goes bad.
In the former case the data SHOULD be recoverable. In the later case there are three possible failure points, the platter motor, the head stepper motor, or the heads themselves. In the first two cases the drive could be repaired and the data salvaged, but it will take more effort and money to do so. If the heads themselves failed (with a resulting crash) the platters are likely destroyed and with them the data. Note that with the loss of the platter motor the heads will come to rest on the platter at some point, hopefully a soft landing or they have been retracted first by the drives protection features. If the head position servos died it's possible that some areas of the platters were damaged if the heads didn't come to a soft landing (depends on if the drive's electronics will refuse to spin up the platter if the servos are bad).
SSD failures are mostly in the storage itself. Maybe special firmware can be downloaded to recover something, if anyone has figured out how to do this.
Oh boy, did /. make a mess of Schrodinger's with an umlaut :)
I must preview before I submit!
--I thought I was wrong once, but I was mistaken.
Wouldn't you expect to see a bathtub curve for each item in the graph. Failure right at the very beginning at year 0 there should be a higher than 0 failure rate that decreases before it begins increasing again. Or is that part of the chart obscured by manufacturer burn in that you do not get to see because year 0 is actually after burn in?
Could someone explain why the failure rate for HDDs is a curve and the failure rate (so far) is a line?
I've not seen much evidence of bias from Tom's Hardware, although I did stop reading their site several years ago, just shocking levels of ignorance and stupidity.
Exactly. What a bunch of idiots. I'm surprised that they're actually able to turn on their computers, much less run a website.
Our next SAN/NAS purchase will definitely have good MLC SSDs on tier-0 or as massive read/write cache
Did you perhaps mean SLC? MLC would be such a waste in that use case.
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
For example:
Steadfast Networks' Karl Zimmerman, as quoted from bottom of this page (emphasis added)
We simply get significantly higher I/O [with SSDs] at a lower cost than we'd be able to get with standard drives. We've had many customers needing more I/O than what 4x 15k RPM SAS drives in RAID 10 provide, and an upgrade involves moving to a larger server chassis to support more than four drives, a larger RAID card, etc. Other configurations have needed 16+ 15k RPM drives to get the necessary I/O. Going with a single SSD (or a couple SSDs in RAID) greatly simplifies the configurations and makes them much cheaper overall.
That is then compounded by the fact that you generally use one SSD to replace 4+ standard drives on average.
You're then looking at a 20%+ AFR with hard drives and 1.6% with an SSD.
(AFR = Annual Failure Rate)
What I care about is MTTDL (Mean Time to Data Loss) of a complete system. Hard drives are unreliable as is any component in your computer. That's why you should make at the very least every individual component your data path follows double or triple redundant. This is easily accomplished these days, ZFS, RAID6 etc. Make sure you don't just rely on the mechanical parts of your system to keep your data safe.
What else you want to know is undetected data corruption. Hard disks are very bad at keeping data, SSD's may wear out too (as in the end there is still a physical/mechanical process) but you might be able to read the cells even if you can't write to it. On average hard disks will give an uncorrected error every 12TB read.
Besides that, SSD's give way more IOPS than any hard drive available (even the 15k RPM ones). So you can keep the system's MTTDL much lower as you have less parts and for the same price you could even invest more in redundancy (double mirror an SSD) instead of a set of RAID10's.
Custom electronics and digital signage for your business: www.evcircuits.com
You might find parts in an SDD drive which you can use for pinning stuff to your fridge. But anyone who ever put a hard drive magnet on his fridge knows that nothing can hold your data as reliably as a hard drive magnet.
By definition anything that contains powerful magnets is cool.
(Oh, and a fast spinning BLDC motor.)
Shiny platters!
The only data they could possibly have that is that old is data on some of the earliest mass produced consumer SSD drives. These were first generation products and logic would dictate that the drives being made today would be far more reliable. I think it's too early to try and draw any conclusions, everyone knows there were lots of problems with the first generation of drives, like most first gen products (eg - lack of TRIM, stuttering with the early Indilinx controllers, etc).
Exactly! If you just spent 3 hours installing and updating your system, what better time to make a backup image!
It would be my second, the first being after install, before updates.
It's just too easy not to!