Ask Slashdot: How Do SSDs Die?
First time accepted submitter kfsone writes "I've experienced, first-hand, some of the ways in which spindle disks die, but either I've yet to see an SSD die or I'm not looking in the right places. Most of my admin-type friends have theories on how an SSD dies but admit none of them has actually seen commercial grade drives die or deteriorate. In particular, the failure process seems like it should be more clinical than spindle drives. If you have X many of the same SSD drive and none of them suffer manufacturing defects, if you repeat the same series of operations on them they should all die around the same time. If that's correct, then what happens to SSDs in RAID? Either all your drives will start to fail together or at some point, your drives will become out of sync in-terms of volume sizing. So, have you had to deliberately EOL corporate grade SSDs? Do they die with dignity or go out with a bang?"
I had 2 out of 5 SSDs fail (OCZ) with CRC errors, I'm guessing faulty cells.
It was my understanding that for traditional drives in a RAID you don't want to get all the same type of drive all made around the same time since they will fail around the same time too. Same would apply to SSDs.
Screaming in agony, hissing bits and bleeding jumperless in the night
Wow - you've been here a long long time then
In general, if the SSD in question has a well-designed controller (Intel, SandForce), then write performance will begin to drop off as bad blocks start to accumulate on the drive. Eventually, wear levelling and write cycles have taken their toll, and the disk can no longer write at all. At this point, the controller does all it can: it effectively becomes a read-only disk. It should operate in this mode until else something catastrophic (tin migration, capacitor failure, etc.) keeps the entire drive from working.
BTW - I haven't seen this either, but that's the degradation profile that's been presented to me in several presentations by the folks making SSD drives and controllers. (Intel had a great one a few years back - don't have a link to it handy, though...)
"The future's good and the present is nothing to sneeze at." - Roblimo's last
The drives will shrink down to nothing. I believe that the drive controller considers a sector dead after 100,000 writes.
Filesystems, generally speaking, aren't resilient to the underlying disk geometry changing after they've been laid down. There's reserved space to replace bad cells as they start to die, but the disk won't shrink. Eventually, though, you get parts of the disk dying in an unrecoverable way and the drive is toast.
All three of the commercial grade SSD failures I've cleaned up after (I do PostgreSQL data recovery) just died. No warning, no degrading in SMART attributes; works one minute, slag heap the next. Presumably some sort of controller level failure. My standard recommendation here is to consider then no more or less reliable than traditional disks and always put them in RAID-1 pairs. Two of the drives were Intel X25 models, the other was some terrible OCZ thing.
Out of more current drives, I was early to recommend Intel's 320 series as a cheap consumer solution reliable for database use. The majority of those I heard about failing died due to firmware bugs, typically destroying things during the rare (and therefore not well tested) unclean shutdown / recovery cases. The "Enterprise" drive built on the same platform after they tortured consumers with those bugs for a while is their 710 series, and I haven't seen one of those fail yet. That's not across a very large installation base nor for very long yet though.
I have seen SSD death many times and it is a strange sight indeed. What is interesting about it when compared to normal drives is that when normal drives fail it is - mostly - and all or nothing ordeal. A bad spot on a drive is a bad spot on a drive. With SSDs you can have a bad spot one place, reboot, and you get a bad spot in another place. Windows loaded on an SSD will exhibit all kinds of bizarre behaviour. Sometimes it will hang, sometimes it will blue-screen, sometimes it will boot normally until it tries to read or write to that random bad spot. Rebooting is like rolling the dice to see what it will do next - that is, until it fails completely.
Not with a bang but a whimper.
I have had two SSD crashes. One was on a very cheap Zelman 32GB drive which never really worked (OK, about twice). The other was on a Kingston 64GB that I have in my server. When it gets really hot in the room (over 100, so probably over 120 for the drive itself in the case), it will crash. But when it cools down, it works perfectly well.
Peter predicted that you would "deliberately forget" creation 2000 years ago...
Total time from system operational to BSOD was about ten minutes. I first noticed difficulties when I invoked a script that called a second script, and the second script was missing. "ls -l" on the missing script confirmed that the other script wasn't present. While scratching my head about $PATH settings and knowing damn well I hadn't changed anything, a few minutes later, I discovered I also couldn't find /bin/ls.exe. In a DOS prompt that was already open, I could DIR C:\cygwin\bin - the directory was present, ls.exe was present, but it wasn't anything that the OS was capable of executing. Sensing imminent data loss, and panic mounting, I did an XCOPY /S /E... etc to salvage what I could from the failing SSD.
Of the files I recovered by copying them from the then-mortally-wounded system, I was able to diff them against a valid backup. Most of the recovered files were OK, but several had 65536-byte blocks consisting of nothing but zeroes.
Around this point, the system (unsurprisingly, as executables and swap and heaven knows what else was being riddled with 64K blocks of zeroes) crashed. On reboot, Windows attempted (and predictably failed) to recover (assinine that Windows tries to write to iself on boot, but also assinine of me to not power the thing down and yank the drive, LOL.) The system did recognize it as an 80G drive and attempted to boot itself - Windows logo, recovery console, and all.
On an attempt to mount the drive from another boot disk, the drive still appeared as an 80G drive once, unfortunately, it couldn't remain mounted long enough for me to attempt further file recovery or forensics.
A second attempt - and all subsequent attempts - to mount the drive showed it as an 8MB (yes, eight megabytes) drive.
I'll bet most of the data's still there. (The early X-25Ms didn't use encryption). What's interesting is that the newer drives have a similar failure mode that's widely recognized as a firmware bug. If there were a way to talk to the drive over its embedded debugging port (like the Seagate Barracuda fix from a few years ago), I'll bet I could recover most of the data.
(I don't actually need the data, as I got it all back from backups, but it's an interesting data recovery project for a rainy day. I'll probably just desolder the chips and read the raw data off 'em. Won't work for encrypted drives, but it might work for this one.)
SSD's have an advertised capacity N and an actual capacity M. Where M > N. In general the bigger M realtive to N the better the performance and lifetime of the drive. As it wears it will "silently" assign bad blocks and reduce M. Your write performance will degrade. If you have good analysis tools it will tell you when it starts getting a lot of blocks near end of life and when M is getting reduced.
Blocks near end of life are also more likely to get read errors. The drive firmware is supposed to juggle things around so all of the blocks near end of life about the same time. With a soft read error the block will be moved to a more reliable portion of the SSD. That means increased wear.
1. Watch write perforamance/spare block count
2. If you get any read errors do a block life audit
3. When you get into life limiting events things accelerate to bad due to the mitigation behaviors
Be carefull depending on the sensitivities of the firmware it will let you get closer to catastrophe before warning you. More likely to be closer in consumer grade.
With traditional mechanical drives, you usually get a clicking noise accompanied by a time period where you can offload data from the drive before it fails completely.
OK, so I'm sure some enterprising /.-er can write a script that watches the SSD controller and issues some clicks to the sound card when cells are marked as failed.
https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
With traditional mechanical drives, you usually get a clicking noise accompanied by a time period where you can offload data from the drive before it fails completely.
Usually? No.
This does happen sometimes, but it certainly doesn't happen "usually". There's enough different failure mechanisms for hard drives that there isn't any one "usual" method --
1- drive starts reporting read and/or write errors occasionally, but otherwise seems to keep working
2- drive just suddenly stops working completely all at once
3- drive starts making noise (and performance usually drops massively), but the drive still works.
4- drive seems to keep working, but smart data starts reporting all sorts of problems.
Personally, I've had #1 happen more often than anything else, usually with a healthy serving of #4 at about the same time or shortly before. #2 is the next most common failure mode, at least in my experience.
The sectors you are talking about are often referred to as "remaps" (or "spares"), which is also used to describe the number of blocks that have been remapped. Strategies vary, but an off-the-cuff average would be around one available spare per 1000 allocatable blocks. Some firmware will only use a spare from the same track, other firmware will pull the next nearest available spare. (allowing an entire track to go south)
The more blocks they reserve for spares, the lower the total capacity count they can list, so they don't tend to be too generous. Besides, if your drive is burning through its spares at any substantial rate, doubling the number of spares on the drive won't actually end up buying you much time, and certainly won't save any data.
But with the hundreds of failing disks I've dealt with, when more than ~5 blocks have gone bad, the drive is heading out the door fast. Remaps only hide the problem at that point. If your drive has a single block fail when trying to write, it will be remapped silently and you won't ever see the problem unless you check the remap counter in smart. If it gets an unreadable block on a read operation, you will probably see an io error however. Some drives will immediately remap it, but most don't and will conduct the remap when you next try to write to that cell. (otherwise they'd have to return fictitious data, like all zeros)
So I don't particularly like automatic silent remaps. I'd rather know whean the drive first looks at me funny so I can make sure my backups are current and get a replacement on order, and swap it out before it can even think about getting worse. I prefer to replace a drive on MY terms, on MY schedule, not when it croaks and triggers any grade of crisis. There are legitimate excuses for downtime, but a slowly failing drive shouldn't be one of them.
All that said, on multiple occasions I've tried to cleanse a drive of IO errors by doing a full zero-it format. All decent OBCCs on drives should verify all writes, so in theory this should purge the drive of all IO errors, provided all available spares have not already been used. The last time I did this on a 1TB Hitachi that had ONE bad block on it, it still had one bad block (via read verify) when the format was done. The write operation did not trigger a remap, (and I presume it wasn't verified, as the format didn't fail) and I don't understand that. If it were out of remaps, the odds of it being ONE short of what it needed is essentially zero. So I wonder in reality just how many drive manufacturers aren't even bothering with remapping bad blocks. All I can attribute this to is crappy product / firmware design.
I work for the Department of Redundancy Department.
For flash memory it is the erase cycles, not the write cycles, that drive life.
http://en.wikipedia.org/wiki/Flash_memory
The quantum tunneling effect described for the erase process can weaken the insulation around the isolated gate, eventually preventing that gate from holding its charge. That's the typical end-of-life scenario for a bit of flash memory.
You generally don't say that writes are end-of-life because you could, in theory, write the same pattern to the same byte over and over again (without erasing it) and not cause reduction in part life. Or, since bits erase high and write low, you could write the same byte location eight times, deasserting one new bit each time, then erase the whole thing once, and the total would still only be "one" cycle.
It doesn't hurt to be nice.
I recently had a "old" (cir 2008) 64gb SSD drive die on me. It's death followed this pattern:
After popping a new disk in and doing a partition resize, my system was back up and running with no data loss. Of all the storage hardware failures I've experienced, this was probably the most pain-free as the failure caused the drive to simply degrade into a read-only device.
Karma: SELECT `karma` FROM `users` WHERE `userid`=138474;
In theory they should degrade to read-only just as others have pointed out in other posts, allowing you to copy data off them.
In reality, just like modern hard drives, they have unrecoverable firmware bugs, fuses that can blow with a power surge, controller chips that can burn up, etc.
And just like hard drives, when that happens in theory you should still be able to read the data off the flash chips but there are revisions to the controller, firmware, etc that make that more or less successful depending on the manufacturer. You also can't just pop the board off the drive like with an HDD, you need a really good surface mount resoldering capability.
So the answer is "it depends"... If the drive itself doesn't fail but reaches the end of its useful life or was put on the shelf less than 10 years ago (flash capacitors do slowly drain away) then the data should be readable or mostly readable.
If the drive itself fails, good luck. Maybe you can bypass the fuse, maybe you can re-flash the firmware, or maybe it's toast. Get ready to pay big bucks to find out.
P.S. OCZ is fine for build it yourself or cheap applications but be careful. They have been known to buy X-grade flash chips for some of their product lines - chips the manufacturers list as only good for kid toys or non-critical, low-volume applications. Don't know if they are still doing it but I avoid their stuff.
Intel's drives are the best and have the most-tested firmware but you pay for it. Crucial is Micron's consumer brand and tends to be pretty good given they make the actual flash - they are my go-to brand right now. Samsung isn't always the fastest but seems to be reliable.
Do your research and focus on firmware and reliability, not absolute maximum throughput/IOPs.
Natural != (nontoxic || beneficial)
I have ordered approximately 500 Intel SSD's over the past 18 months (320 series and the 520 series primarily). To date, we have had exactly one fail to my knowledge. It was a 320 series 160 GB with known firmware issue. We have around 80 of that type and size, and the drive that failed did so on first image. We RMA'ed the drive and got a replacement.