Analyzing Long-Term SSD Failure Rates
wintertargeter writes "It looks like Tom's Hardware has posted the first long-term study of SSD failure rates. The chart on the last page is interesting — based on numbers, it seems SSDs aren't more reliable than hard drives. "
Did the poster even look at the chart he linked to? Those big lines that shoot up to the top after 1-3 years? They're the failure rates for hard disks. The ones near the bottom? They're the failure rates for SSDs. Now, some of the SSD figures are projected and look quite optimistic, but the number of hard disks failing after 3 years looks high than the number of SSDs failing after three years by all of the studies. For most workloads, the SSDs fail less often, and the SSD failures only exceed HD failures very early on in their lifetimes.
I am TheRaven on Soylent News
I didn't read TFA but the chart doesn't tell me that "SSDs aren't more reliable than hard drives".. the SSDs were generally 6% or under (assuming the linear progression) whereas regular HDD approached 14%+ after five years. And "Long-term" in the title? The SSD data in the chart only goes for 1 year. Not exactly long term when the chart goes from 1-5 years of use. The actual data for the SSDs is only 20% of the time span.
Well considering that the failure rates are bad enough Atwood at Coding Horror says SSDs should be judged on a hot/crazy scale I'd say that is a pretty bad sign. Note that he still buys them even though they keep failing, but this is a guy that spends $400 on a pair of headphones.
My problem with SSDs and why I won't recommend them to anyone but a few edge use cases (those that doing a lot of traveling with their laptop, servers where IOPS is the #1 goal) is because when they DO fail in my experience there is no warning at all and that is simply unacceptable. I have a couple of "Must rule teh benchmarkz!" gamer customers and both went SSD. These guys ain't cheap and bought the baddest SSDs they could find, price be damned. With both guys both drives failed with NO warning, not even SMART. They just turned on their machines one day and poof! Bye bye SSD. One I was able to get a small amount of the data back, the other couldn't even be detected in BIOS. Sure they both had warranties but so what? it isn't like the warranties covered downtime or the HDDs they had to buy to replace it while they waited on the RMA. both ended up selling their SSDs and going with a pair of Raptors in RAID 0.
So until they fix this major flaw I will simply tell my customers to avoid them. With HDDs I don't think I can remember a time I've had a HDD fail without ample warning. Windows delayed write failures, SMART, noise and temp of the drive, in all cases you were given ample time to get your data off the failing drive. Not so with SSD, when it goes it just goes poof! Having that risk hanging like the sword of Damocles over your head just isn't worth the speed IMHO.
ACs don't waste your time replying, your posts are never seen by me.
I remember there being lots of claims that SSDs would be more reliable because they had no moving parts.
The truth is that all men having power ought to be mistrusted. James Madison
Let me summarize:
A) Chart is worthless. I have never see a more ambiguous meaningless chart in my life. They might as well not bother to label things.
B) Lets do a reliability study on SSD's that they don't have any long term data on past 2 years, yet compare it to HDD that typically at least have a 3 year warranty. By that I only mean, I'll go out on a limb and guess that the average failure rate of HDD is > 3 years, if only for economic self preservation.
C) Results in either case depend highly on specific device model and configuration.
The other failure mode is the "time warp" failure.
http://www.dslreports.com/forum/r25491097-Dell-Laptop-and-SSD-Time-warp-issue
Also updated windows fully, customized everything to my liking... in short, a good 2-3h of work.
This morning, I open up the laptop and surprise... EVERYTHING's back to the pre-format. I have no idea how this is even remotely possible.
The big problem with this failure mode would be if the user doesn't notice anything wrong till too late.
A 100% dead drive sucks, but if you do regular backups you lose 1 day of data.
A "time warp" failure that you don't notice could result in you sending out of date info in an important email. Or overwriting something important with invalid data and not noticing. The resulting damage could be far far worse than a dead drive.
In my experience "spinning rust" rarely fails 100% without warning (or abuse - e.g. you drop the drive ;) ). You can often salvage some stuff out (just hope it's the stuff you want ;) ). I've managed to use knoppix to salvage data from people's failed spinning disk drives.
In contrast these SSDs just go totally dead. Or really weird shit happens.
In both cases the manufacturer might get an RMA. But they're not the same. If OCZ drives are getting RMA'ed at higher rates than spinning drives, and their failure modes are 100% dead or "time warp" they are far worse than the stats show: http://news.softpedia.com/news/French-Website-Publishes-HDD-SSD-and-Motherboard-RMA-Statistics-196538.shtml
If you're unlucky backups won't save you from this:
http://www.dslreports.com/forum/r25491097-Dell-Laptop-and-SSD-Time-warp-issue
yesterday I spent over an hour fomatting, re-installing windows and everything else I needed.
Also updated windows fully, customized everything to my liking... in short, a good 2-3h of work.
This morning, I open up the laptop and surprise... EVERYTHING's back to the pre-format. I have no idea how this is even remotely possible.
OCZ is calling this the time warp issue, and is related to the sandforce controller...
http://forum.notebookreview.com/alienware-m17x/552728-fresh-os-install-ocz-ssd-r3.html
any firmware before 1.29 can result in you experiencing what OCZ refers to as "Time Warp" (you lose all info stored on drive since last boot - happens at random). 1.29 decreases likelihood of this happening, but does not eliminate the possibility.
The big problem with this failure mode is the drive still appears to work. So if you are unlucky to not notice that the pricelist/tender document you are about to send or commit to is no longer showing the corrected figures/information, things could get way more painful than if your drive just didn't work (in which case work would just be delayed while you restore from backups, or if you have no backups you would just have to deal with the data loss).