Slashdot Mirror


Ask Slashdot: How Do SSDs Die?

First time accepted submitter kfsone writes "I've experienced, first-hand, some of the ways in which spindle disks die, but either I've yet to see an SSD die or I'm not looking in the right places. Most of my admin-type friends have theories on how an SSD dies but admit none of them has actually seen commercial grade drives die or deteriorate. In particular, the failure process seems like it should be more clinical than spindle drives. If you have X many of the same SSD drive and none of them suffer manufacturing defects, if you repeat the same series of operations on them they should all die around the same time. If that's correct, then what happens to SSDs in RAID? Either all your drives will start to fail together or at some point, your drives will become out of sync in-terms of volume sizing. So, have you had to deliberately EOL corporate grade SSDs? Do they die with dignity or go out with a bang?"

11 of 510 comments (clear)

  1. Umm by The+MAZZTer · · Score: 4, Insightful

    It was my understanding that for traditional drives in a RAID you don't want to get all the same type of drive all made around the same time since they will fail around the same time too. Same would apply to SSDs.

    1. Re:Umm by Anonymous Coward · · Score: 5, Insightful

      yeah, sounds like submitter may be mildly deficient

      Which is why he's asking.

      Fuck people who ask questions when they don't know something, right?

    2. Re:Umm by statusbar · · Score: 5, Insightful

      I've seen two instances where a drive failed. Each time there were no handy replacement drives. Within a week a second drive died the same way as the first! back to backup tapes! Better to have replacement drives in boxes waiting.

      --
      ipv6 is my vpn
    3. Re:Umm by ByOhTek · · Score: 4, Insightful

      In general, if you get such an issue, it will happen early on in the life of the drives (one coworker had what he called the 30-day thrash rule - he would plan ahead and get a huge number of drives - the cheapest available meeting requirements, including avoiding manufacturers we had issues with previously, take a handleful, and thrash 'em for 30 days. If nothing bad happend, he'd either keep up 30 day thrashes on sets of hard drives, pulling out the duds, or just return the whole lot.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    4. Re:Umm by NeverVotedBush · · Score: 3, Insightful

      When a drive fails and a RAID goes into reconstruction (if you are set up that way), that's when you are significantly more likely to have another drive fail due to all the extra activity across the RAID.

      We see it all the time on a big array. One must hustle to repair/rebuild the RAID... ;-)

    5. Re:Umm by Anonymous Coward · · Score: 5, Insightful

      I've seen two instances where a drive failed. Each time there were no handy replacement drives. Within a week a second drive died the same way as the first! back to backup tapes! Better to have replacement drives in boxes waiting.

      This. Your spares closet is your best friend in the enterprise. Ensure you keep it stocked.

      And locked. And don't label them "spares". Label them "cold swap fallback device" or something that management won't see as something "extra" that can be "repurposed" (i.e. stolen)

  2. Re:Die! by Quakeulf · · Score: 1, Insightful

    I am new to commenting on /. and I think lame attempts at humor belong to 9GAG and Reddit.

  3. Re:Die! by lister+king+of+smeg · · Score: 1, Insightful

    No offense intended but if your new why are you complaining about our long standing culture of cracking lame jokes, if you don't like it why did you join?

    --
    ---Saying gnome 3 is better than windows 8 not so much a compliment as it is damning with light praise.
  4. Re:CRC Errors by markhahn · · Score: 3, Insightful

    this is not very useful, as it mainly points out that the initial generations of commodity SSDs were immature. not to mention that return rates contain other phenomena than wear or even failure.

  5. Re:CRC Errors by arth1 · · Score: 5, Insightful

    I am running (6) OCZ Vertex2 256GB drives under heavy use 24/7. Almost 2 years on have only had one fail and it still works, just started kicking random errors.

    Your failure rate of > 8% per year isn't very reassurring.

  6. Re:CRC Errors by Dishwasha · · Score: 4, Insightful

    I would counter-argue that any flash drive manufacturer is asking for massive RMAs when the device is clearly targeted for the laptop market (otherwise they would manufacture it in a 3.5" format) where the operating environment is guaranteed to be running on a battery for long periods of time. Any research in to battery operation would expose you to the vast differences in operating voltage as batteries discharge as well as the age of the battery. It is just bad engineering to not take this in to account.

    Reformatting the drive was not an option because the drive wouldn't even detect in the BIOS unless the special factory jumper was set which is a non-operational mode for the drive. This problem was reproduced over 10 times with over 10 different drives of the same model Vertex. Slightly bad power caused the entire drive to be rendered unusable. Amazingly, none of the other hardware in the laptop had any problem with the power (i.e. screen, cpu, memory, other spindle-based hard drive, gpu, etc.). As I said, bad engineering.