Slashdot Mirror


Why I'm Usually Unnerved When Modern SSDs Die on Us (utoronto.ca)

Chris Siebenmann, a Unix Systems Administrator at University of Toronto, writes about the inability to figure out the bottleneck when an SSD dies: What unnerves me about these sorts of abrupt SSD failures is how inscrutable they are and how I can't construct a story in my head of what went wrong. With spinning HDs, drives might die abruptly but you could at least construct narratives about what could have happened to do that; perhaps the spindle motor drive seized or the drive had some other gross mechanical failure that brought everything to a crashing halt (perhaps literally). SSDs are both solid state and opaque, so I'm left with no story for what went wrong, especially when a drive is young and isn't supposed to have come anywhere near wearing out its flash cells (as this SSD was).

(When a HD died early, you could also imagine undetected manufacturing flaws that finally gave way. With SSDs, at least in theory that shouldn't happen, so early death feels especially alarming. Probably there are potential undetected manufacturing flaws in the flash cells and so on, though.) When I have no story, my thoughts turn to unnerving possibilities, like that the drive was lying to us about how healthy it was in SMART data and that it was actually running through spare flash capacity and then just ran out, or that it had a firmware flaw that we triggered that bricked it in some way.

5 of 358 comments (clear)

  1. Re:With spinning disks, you do not know either by Stonent1 · · Score: 4, Interesting

    Ok, I'm in IT and it unnerves me. I've had numerous computers have an SSD totally die and lose all data with no smart warnings in the last few years. (Not me personally, I mean people at our organization)

  2. Re: With spinning disks, you do not know either by omnichad · · Score: 3, Interesting

    Older SSDs didn't even have a wear-leveling SMART attribute or total host writes attribute. Some of the cheaper ones probably still don't. So there is no way to see how close you're getting to the estimated upper limit. There is a pretty clear progression on the newer drives. With hard drives, mechanical failure is actually less predictable than SSD wear-out (defects aside).

  3. Re:With spinning disks, you do not know either by I-am-a-Banana · · Score: 3, Interesting

    Seriously, you do not. You may know the end-result sometimes (head-crash), but the root-cause is usually not clear.

    So get over it. It is a new black-box replacing an older black-box.

    Well I need to partially disagree with you there. With a traditional drive when it fails and you take it apart carefully you can try and determine what happened. If it was a head crash you may be able to see what caused the head crash. In my case a Quantum or Maxtor drive that had 3 extra screws shipped in it loose where the inside control circuitry was. You could tell if it was a frozen motor, or if you are lucky find that the external board had a fried electrical component on it. For friends I desoldered the fried component and put a new one on and the drive worked perfectly. Obviously we copied the data off of it onto something new then we put the drive into storage for safe keeping. With the older drives there is the small chance of repair. Yes there are companies out there that will disassemble the drive, remove the platter, and put them into another working drive to recover data. Obviously with a head crash you may not be able to recover all but, in absolute necessity you could. Or you could just be a nerd that wants to do an investigation to find out why. With SSDs however there is no chance of fixing it, and no chance of knowing exactly why. However I don't know why he would say that SSDs shouldn't have manufacturing defects. They do. They are just not mechanical, but I would hope that because they are not mechanical they would hopefully be less likely to be defective.

  4. Re:With spinning disks, you do not know either by Luckyo · · Score: 1, Interesting

    You do actually. Many if not most disk failures have clearly predictable markers. This has been true for quite a long time at this point, to the level where my last two HDD failures in home machine were diagnosable with no tools beyond SMART reader. Better yet, they weren't "instant" failures, but signs of impeding failure of the drive started appearing months in advance with clear cut warnings on SMART readout. This resulted in sufficient time to buy a new drive and migrate all the data with no problems.

    With SSDs, failure has a problem with being utterly opaque and sudden. This is likely more of a function of lack of expertise due to lack of time through, as it took us decades to get hard drive monitoring systems to where they are now.

  5. Re:With spinning disks, you do not know either by gweihir · · Score: 4, Interesting

    Well, I originally bought OCZ. Today _all_ of 5 OCZ drives I got are stone-dead. After that I moved to Samsung, mostly "Pro". They are all still working fine and some are older now than the first OCZ when it died. So yes, it makes a difference. Incidentally, Samsung had excellent reliability in their spinning drives as well. It seems they just care more about quality and reputation.

    That said, I find it sad that you cannot get "high reliability" SSDs where you basically can forget about the risk of them dying. I am talking reliability levels like a typical CPU here. It seems the market for that is just not there.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.