Analyzing Long-Term SSD Failure Rates
wintertargeter writes "It looks like Tom's Hardware has posted the first long-term study of SSD failure rates. The chart on the last page is interesting — based on numbers, it seems SSDs aren't more reliable than hard drives. "
Did the poster even look at the chart he linked to? Those big lines that shoot up to the top after 1-3 years? They're the failure rates for hard disks. The ones near the bottom? They're the failure rates for SSDs. Now, some of the SSD figures are projected and look quite optimistic, but the number of hard disks failing after 3 years looks high than the number of SSDs failing after three years by all of the studies. For most workloads, the SSDs fail less often, and the SSD failures only exceed HD failures very early on in their lifetimes.
I am TheRaven on Soylent News
I didn't read TFA but the chart doesn't tell me that "SSDs aren't more reliable than hard drives".. the SSDs were generally 6% or under (assuming the linear progression) whereas regular HDD approached 14%+ after five years. And "Long-term" in the title? The SSD data in the chart only goes for 1 year. Not exactly long term when the chart goes from 1-5 years of use. The actual data for the SSDs is only 20% of the time span.
The author reviews several data sets that show SSDs are probably less likely to fail, and then describes several reasons why that information cannot be taken at face value. Not all of the data presented by the author is classified as reliable or even useful. The final chart is either not well-documented or would take a seminar to explain because it does not seem directly related to the rest of TFA.
Either way, the SSD drive market, is oddly enough, as good as spindrives but like anything else, the data released by vendors should be taken with a grain of salt.
When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
They're more durable - you can bang one against the desk, throw it around the room all day, then plug it in and it should still work (or, at worst, require fixing a broken solder joint or two, SMD capacitors sometimes fall off the PCB after a strong enough jolt), while no HDD in the world is going to survive that. Maybe people got that confused, the word "reliable" means many different things in layman's speech.
This is Slashdot. Common sense is futile. You will be modded down.
Well considering that the failure rates are bad enough Atwood at Coding Horror says SSDs should be judged on a hot/crazy scale I'd say that is a pretty bad sign. Note that he still buys them even though they keep failing, but this is a guy that spends $400 on a pair of headphones.
My problem with SSDs and why I won't recommend them to anyone but a few edge use cases (those that doing a lot of traveling with their laptop, servers where IOPS is the #1 goal) is because when they DO fail in my experience there is no warning at all and that is simply unacceptable. I have a couple of "Must rule teh benchmarkz!" gamer customers and both went SSD. These guys ain't cheap and bought the baddest SSDs they could find, price be damned. With both guys both drives failed with NO warning, not even SMART. They just turned on their machines one day and poof! Bye bye SSD. One I was able to get a small amount of the data back, the other couldn't even be detected in BIOS. Sure they both had warranties but so what? it isn't like the warranties covered downtime or the HDDs they had to buy to replace it while they waited on the RMA. both ended up selling their SSDs and going with a pair of Raptors in RAID 0.
So until they fix this major flaw I will simply tell my customers to avoid them. With HDDs I don't think I can remember a time I've had a HDD fail without ample warning. Windows delayed write failures, SMART, noise and temp of the drive, in all cases you were given ample time to get your data off the failing drive. Not so with SSD, when it goes it just goes poof! Having that risk hanging like the sword of Damocles over your head just isn't worth the speed IMHO.
ACs don't waste your time replying, your posts are never seen by me.
I remember there being lots of claims that SSDs would be more reliable because they had no moving parts.
The truth is that all men having power ought to be mistrusted. James Madison
Let me summarize:
A) Chart is worthless. I have never see a more ambiguous meaningless chart in my life. They might as well not bother to label things.
B) Lets do a reliability study on SSD's that they don't have any long term data on past 2 years, yet compare it to HDD that typically at least have a 3 year warranty. By that I only mean, I'll go out on a limb and guess that the average failure rate of HDD is > 3 years, if only for economic self preservation.
C) Results in either case depend highly on specific device model and configuration.
Based on numbers, the study shows SSDs to be more reliable than HDDs. The best data I have seen in that article is the following:
SSDs: 1.28--2.19% over 2 years
HDDs: >=5% over 2 years
The HDD data comes from: http://media.bestofmicro.com/2/N/289103/original/google_afrtemputilization_475.png The SSD data comes from the table on Page #6.
I don't think any of this data is particularly surprising, HDDs are mechanical so the curves for failure would not be linear. The most interesting part of the article for consideration with SSDs is that SMART is going to be near useless for them. Since most failures are random occurrences in electronics which SMART isn't good at detecting, we may need better technology for detecting SSD failures.
The fix for this was released a long time ago, it is called proper backups. Instead of avoiding a superior product, trying using them and proper backups.
sigh, someone else who didn't RTFA. If you look on page 8 you'll see this image where Intel's 'reliability study at IDF 2011' says HDDs are pants, SSDs are great.
of course, this is part of Intel's marketing for SSDs, so you'd expect them to say this kind of thing. Of course, that means someone has said this - specifically as some sort of selling point.
it is also a lot easier to retrieve data from disc then SSD that most of the time go without warning
always remember: RAID is not backup.
One day, with a traditional HDD based setup, you'll come into the office to find the place a mess, everyone standing around and when you ask "what's happened", you'll get the reply "we were burgled, your PC is right now being sold on ebay".
So who cares whether SSDs fail immediately or with a huge flashy light show whilst beeping out La Marseillaise, it won't help you none.
You'll find other stories of HDD RAID that failed simultaneously (which is more common than you think, drives go bad in batches, or I think, die at the same time just out of stubborness) either due to power surges, or raid failure that led to data corruption.
So the only solution is to have adequate backup. With the number of continuous backup solutions out there, there's no excuse not to run it.
PS. you replaced your SSDs with a pair or HDDs in RAID 0 format. Beggers belief.
The other failure mode is the "time warp" failure.
http://www.dslreports.com/forum/r25491097-Dell-Laptop-and-SSD-Time-warp-issue
Also updated windows fully, customized everything to my liking... in short, a good 2-3h of work.
This morning, I open up the laptop and surprise... EVERYTHING's back to the pre-format. I have no idea how this is even remotely possible.
The big problem with this failure mode would be if the user doesn't notice anything wrong till too late.
A 100% dead drive sucks, but if you do regular backups you lose 1 day of data.
A "time warp" failure that you don't notice could result in you sending out of date info in an important email. Or overwriting something important with invalid data and not noticing. The resulting damage could be far far worse than a dead drive.
In my experience "spinning rust" rarely fails 100% without warning (or abuse - e.g. you drop the drive ;) ). You can often salvage some stuff out (just hope it's the stuff you want ;) ). I've managed to use knoppix to salvage data from people's failed spinning disk drives.
In contrast these SSDs just go totally dead. Or really weird shit happens.
In both cases the manufacturer might get an RMA. But they're not the same. If OCZ drives are getting RMA'ed at higher rates than spinning drives, and their failure modes are 100% dead or "time warp" they are far worse than the stats show: http://news.softpedia.com/news/French-Website-Publishes-HDD-SSD-and-Motherboard-RMA-Statistics-196538.shtml
The original poster said "it seems SSDs aren't more reliable than hard drives." Do not create a straw man. The article indicates that while marketing and simpletons may point out select statistics as "more reliable," there's a lot more to the story, and it's difficult to impossible to get meaningful data at this point. That is, based on their analysis, SSDs are not provably more reliable at this time.
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
If you're unlucky backups won't save you from this:
http://www.dslreports.com/forum/r25491097-Dell-Laptop-and-SSD-Time-warp-issue
yesterday I spent over an hour fomatting, re-installing windows and everything else I needed.
Also updated windows fully, customized everything to my liking... in short, a good 2-3h of work.
This morning, I open up the laptop and surprise... EVERYTHING's back to the pre-format. I have no idea how this is even remotely possible.
OCZ is calling this the time warp issue, and is related to the sandforce controller...
http://forum.notebookreview.com/alienware-m17x/552728-fresh-os-install-ocz-ssd-r3.html
any firmware before 1.29 can result in you experiencing what OCZ refers to as "Time Warp" (you lose all info stored on drive since last boot - happens at random). 1.29 decreases likelihood of this happening, but does not eliminate the possibility.
The big problem with this failure mode is the drive still appears to work. So if you are unlucky to not notice that the pricelist/tender document you are about to send or commit to is no longer showing the corrected figures/information, things could get way more painful than if your drive just didn't work (in which case work would just be delayed while you restore from backups, or if you have no backups you would just have to deal with the data loss).
Somewhere, Ed Tufte just puked and has no idea why. Poor guy.
For a personal computer SSDs are probably best used for OS/Application storage, not data (documents, images, music, etc.). The cost per GB is too bloody much to justify otherwise and the less noticeable failure symptoms bolster that notion. Besides that, application load time is where these toys have their niche.
Two of my imaginary friends reproduced once
So how long would it take for you to notice you had the "time warp" problem to actually start restoring from backups?
Given you don't appear to have read what I posted, you might not be one of those who would notice in time.