Slashdot Mirror


Analyzing Long-Term SSD Failure Rates

wintertargeter writes "It looks like Tom's Hardware has posted the first long-term study of SSD failure rates. The chart on the last page is interesting — based on numbers, it seems SSDs aren't more reliable than hard drives. "

9 of 149 comments (clear)

  1. Uh, yes they are by TheRaven64 · · Score: 3, Informative

    Did the poster even look at the chart he linked to? Those big lines that shoot up to the top after 1-3 years? They're the failure rates for hard disks. The ones near the bottom? They're the failure rates for SSDs. Now, some of the SSD figures are projected and look quite optimistic, but the number of hard disks failing after 3 years looks high than the number of SSDs failing after three years by all of the studies. For most workloads, the SSDs fail less often, and the SSD failures only exceed HD failures very early on in their lifetimes.

    --
    I am TheRaven on Soylent News
    1. Re:Uh, yes they are by Geoffrey.landis · · Score: 5, Insightful

      Did the poster even look at the chart he linked to?

      Did you? Apparently not.

      Ignore the dashed lines-- those curves are not data, they are "projection." The chart has no data on SSD failures late in the lifetime. So, when you say "...SSD failures only exceed HD failures very early on in their lifetimes," that is equivalent to saying "SSD failures only exceed HD failures in the region of the graph for which there is data."

      --
      http://www.geoffreylandis.com
    2. Re:Uh, yes they are by Baloroth · · Score: 3, Insightful

      Look closer. At any points where they have actual data, the failure rate for SSDs is higher than that of HDD, except for the Google study, which I bet puts the drives under massive load or something else funky (given its massive difference from all the other HDD charts.) Only in the projections for the SSDs do the HDDs begin to curve upwards, throwing off the graph. And from what I know of flash memory, especially MLC (which most SSDs are), I'd bet that SSDs will curve upwards too. Sure, wear leveling will help, but if a cell fails with data in it, which can still happen, then that data is lost. So yeah, for any section where they have actual data, SSDs do have a higher failure rate that hard drives. Incidentally, that's a really terrible and deceptive chart.

      --
      "None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
  2. Huh? by adamjcoon · · Score: 5, Insightful

    I didn't read TFA but the chart doesn't tell me that "SSDs aren't more reliable than hard drives".. the SSDs were generally 6% or under (assuming the linear progression) whereas regular HDD approached 14%+ after five years. And "Long-term" in the title? The SSD data in the chart only goes for 1 year. Not exactly long term when the chart goes from 1-5 years of use. The actual data for the SSDs is only 20% of the time span.

  3. Re:Whaddayamean "long term"? by hairyfeet · · Score: 4, Interesting

    Well considering that the failure rates are bad enough Atwood at Coding Horror says SSDs should be judged on a hot/crazy scale I'd say that is a pretty bad sign. Note that he still buys them even though they keep failing, but this is a guy that spends $400 on a pair of headphones.

    My problem with SSDs and why I won't recommend them to anyone but a few edge use cases (those that doing a lot of traveling with their laptop, servers where IOPS is the #1 goal) is because when they DO fail in my experience there is no warning at all and that is simply unacceptable. I have a couple of "Must rule teh benchmarkz!" gamer customers and both went SSD. These guys ain't cheap and bought the baddest SSDs they could find, price be damned. With both guys both drives failed with NO warning, not even SMART. They just turned on their machines one day and poof! Bye bye SSD. One I was able to get a small amount of the data back, the other couldn't even be detected in BIOS. Sure they both had warranties but so what? it isn't like the warranties covered downtime or the HDDs they had to buy to replace it while they waited on the RMA. both ended up selling their SSDs and going with a pair of Raptors in RAID 0.

    So until they fix this major flaw I will simply tell my customers to avoid them. With HDDs I don't think I can remember a time I've had a HDD fail without ample warning. Windows delayed write failures, SMART, noise and temp of the drive, in all cases you were given ample time to get your data off the failing drive. Not so with SSD, when it goes it just goes poof! Having that risk hanging like the sword of Damocles over your head just isn't worth the speed IMHO.

    --
    ACs don't waste your time replying, your posts are never seen by me.
  4. Re:Who said they were? by Attila+Dimedici · · Score: 3, Insightful

    I remember there being lots of claims that SSDs would be more reliable because they had no moving parts.

    --
    The truth is that all men having power ought to be mistrusted. James Madison
  5. Worst. Ever. by DarthVain · · Score: 4, Insightful

    Let me summarize:

    A) Chart is worthless. I have never see a more ambiguous meaningless chart in my life. They might as well not bother to label things.
    B) Lets do a reliability study on SSD's that they don't have any long term data on past 2 years, yet compare it to HDD that typically at least have a 3 year warranty. By that I only mean, I'll go out on a limb and guess that the average failure rate of HDD is > 3 years, if only for economic self preservation.
    C) Results in either case depend highly on specific device model and configuration.

  6. Re:Whaddayamean "long term"? by TheLink · · Score: 3, Informative

    The other failure mode is the "time warp" failure.

    http://www.dslreports.com/forum/r25491097-Dell-Laptop-and-SSD-Time-warp-issue

    Also updated windows fully, customized everything to my liking... in short, a good 2-3h of work.

    This morning, I open up the laptop and surprise... EVERYTHING's back to the pre-format. I have no idea how this is even remotely possible.

    The big problem with this failure mode would be if the user doesn't notice anything wrong till too late.

    A 100% dead drive sucks, but if you do regular backups you lose 1 day of data.

    A "time warp" failure that you don't notice could result in you sending out of date info in an important email. Or overwriting something important with invalid data and not noticing. The resulting damage could be far far worse than a dead drive.

    In my experience "spinning rust" rarely fails 100% without warning (or abuse - e.g. you drop the drive ;) ). You can often salvage some stuff out (just hope it's the stuff you want ;) ). I've managed to use knoppix to salvage data from people's failed spinning disk drives.

    In contrast these SSDs just go totally dead. Or really weird shit happens.

    In both cases the manufacturer might get an RMA. But they're not the same. If OCZ drives are getting RMA'ed at higher rates than spinning drives, and their failure modes are 100% dead or "time warp" they are far worse than the stats show: http://news.softpedia.com/news/French-Website-Publishes-HDD-SSD-and-Motherboard-RMA-Statistics-196538.shtml

    --
  7. Re:Whaddayamean "long term"? by TheLink · · Score: 5, Interesting

    If you're unlucky backups won't save you from this:
    http://www.dslreports.com/forum/r25491097-Dell-Laptop-and-SSD-Time-warp-issue

    yesterday I spent over an hour fomatting, re-installing windows and everything else I needed.

    Also updated windows fully, customized everything to my liking... in short, a good 2-3h of work.

    This morning, I open up the laptop and surprise... EVERYTHING's back to the pre-format. I have no idea how this is even remotely possible.

    OCZ is calling this the time warp issue, and is related to the sandforce controller...

    http://forum.notebookreview.com/alienware-m17x/552728-fresh-os-install-ocz-ssd-r3.html

    any firmware before 1.29 can result in you experiencing what OCZ refers to as "Time Warp" (you lose all info stored on drive since last boot - happens at random). 1.29 decreases likelihood of this happening, but does not eliminate the possibility.

    The big problem with this failure mode is the drive still appears to work. So if you are unlucky to not notice that the pricelist/tender document you are about to send or commit to is no longer showing the corrected figures/information, things could get way more painful than if your drive just didn't work (in which case work would just be delayed while you restore from backups, or if you have no backups you would just have to deal with the data loss).

    --