Analyzing Long-Term SSD Failure Rates
wintertargeter writes "It looks like Tom's Hardware has posted the first long-term study of SSD failure rates. The chart on the last page is interesting — based on numbers, it seems SSDs aren't more reliable than hard drives. "
Well considering that the failure rates are bad enough Atwood at Coding Horror says SSDs should be judged on a hot/crazy scale I'd say that is a pretty bad sign. Note that he still buys them even though they keep failing, but this is a guy that spends $400 on a pair of headphones.
My problem with SSDs and why I won't recommend them to anyone but a few edge use cases (those that doing a lot of traveling with their laptop, servers where IOPS is the #1 goal) is because when they DO fail in my experience there is no warning at all and that is simply unacceptable. I have a couple of "Must rule teh benchmarkz!" gamer customers and both went SSD. These guys ain't cheap and bought the baddest SSDs they could find, price be damned. With both guys both drives failed with NO warning, not even SMART. They just turned on their machines one day and poof! Bye bye SSD. One I was able to get a small amount of the data back, the other couldn't even be detected in BIOS. Sure they both had warranties but so what? it isn't like the warranties covered downtime or the HDDs they had to buy to replace it while they waited on the RMA. both ended up selling their SSDs and going with a pair of Raptors in RAID 0.
So until they fix this major flaw I will simply tell my customers to avoid them. With HDDs I don't think I can remember a time I've had a HDD fail without ample warning. Windows delayed write failures, SMART, noise and temp of the drive, in all cases you were given ample time to get your data off the failing drive. Not so with SSD, when it goes it just goes poof! Having that risk hanging like the sword of Damocles over your head just isn't worth the speed IMHO.
ACs don't waste your time replying, your posts are never seen by me.
If you're unlucky backups won't save you from this:
http://www.dslreports.com/forum/r25491097-Dell-Laptop-and-SSD-Time-warp-issue
yesterday I spent over an hour fomatting, re-installing windows and everything else I needed.
Also updated windows fully, customized everything to my liking... in short, a good 2-3h of work.
This morning, I open up the laptop and surprise... EVERYTHING's back to the pre-format. I have no idea how this is even remotely possible.
OCZ is calling this the time warp issue, and is related to the sandforce controller...
http://forum.notebookreview.com/alienware-m17x/552728-fresh-os-install-ocz-ssd-r3.html
any firmware before 1.29 can result in you experiencing what OCZ refers to as "Time Warp" (you lose all info stored on drive since last boot - happens at random). 1.29 decreases likelihood of this happening, but does not eliminate the possibility.
The big problem with this failure mode is the drive still appears to work. So if you are unlucky to not notice that the pricelist/tender document you are about to send or commit to is no longer showing the corrected figures/information, things could get way more painful than if your drive just didn't work (in which case work would just be delayed while you restore from backups, or if you have no backups you would just have to deal with the data loss).
The most interesting part of the article for consideration with SSDs is that SMART is going to be near useless for them. Since most failures are random occurrences in electronics which SMART isn't good at detecting, we may need better technology for detecting SSD failures.
Have you ever seen SMART perform in a useful way on a mechanical disk? At work and at home, I've gone through a crap-ton of hard disks in the last decade or so that SMART's been prevalent and never have I seen SMART flag a drive as problematic before I already knew I had a serious problem. More often than not, I've had systems slow to a crawl due to massive numbers of read errors and sector reallocations while the drive firmware actively lied to me about the drive's condition. Only looking at the raw SMART stats and watching the counters increase wildly reveals the truth.