Taking a Hard Look At SSD Write Endurance
New submitter jyujin writes "Ever wonder how long your SSD will last? It's funny how bad people are at estimating just how long '100,000 writes' are going to take when spread over a device that spans several thousand of those blocks over several gigabytes of memory. It obviously gets far worse with newer flash memory that is able to withstand a whopping million writes per cell. So yeah, let's crunch some numbers and fix that misconception. Spoiler: even at the maximum SATA 3.0 link speeds, you'd still find yourself waiting several months or even years for that SSD to start dying on you."
100,000 is only for SLC NAND. MLC, what is currently in most SSDs, is only 3,000, and TLC (found in usb drives, samsung 840, and probably more SSDs soon because it's cheaper) is only 1,000.
Is 1,000 fine for most people, yes.. but you should be aware of it. I have a fileserver that writes 200gb per day.. which would kill a Samsung 840 in about 6-7 months.
http://www.anandtech.com/show/6459/samsung-ssd-840-testing-the-endurance-of-tlc-nand
Our company experienced what we believe was its first age-related failure in October of 2012, an office PC with an Intel SSD drive in the value oriented line of 2008 (which was still high at the time). Basically the drive behaved as a mechanical drive would behave with an occasional bad sector and we were able to successfully image the data to a new one. Out of 200 Intel drives, that's pretty good. (We did have one failure in 2010 but that was an outright dead drive and we were able to RMA it). Not sure if this contributes anything to the conversation but I figured I'd throw this out there.
The Intel X25's in my PC, from 2009, are still humming along nicely and my last benchmark produced the same results in 2012 as they did in 2010. But I've gone so far as to set environment variables for user temp files to a mechanical drive, internet temp files to a RAM drive and system temp files to a RAM drive, offsetting the wear leveling.
The AC is dead-on right. At 25nm the endurance for high-quality MLC cells is about 3,000 writes. That's a relatively conservative estimate so you are pretty much guaranteed to get the 3K writes and likely somewhat more, but it's a far far cry from the 100K writes you can get from the highly expensive SLC chips. Intel & Micron claimed that one of the big "improvements" in the 20nm process was hi-K gates that are claimed to maintain the 3K write endurance at 20nm, which otherwise would have dropped even more from the 25nm node.
The author of the article went to all the time & trouble to do his mathematical analysis without spending 10 minutes to find out the publicly available information about how real NAND in the real world actually performs....
AntiFA: An abbreviation for Anti First Amendment.
meaningful life specs are tough to come by for flash. Yes, as noted above, SLC NAND has a rated life of 100k erases/page on the datasheet, but that's really a guaranteed spec under all rated conditions, so in reality, it lasts quite a bit longer. If you were to write the same page once a second, you'd use it up in a bit more than a day.
However, in real life, the "failure" criteria is when a page written with a test pattern doesn't read back as "erased" in a single readback. Simple enough, except that flash has transient read errors: that is, you can read a page, get an error, read the exact same page again and not get the error. Eventually, it does return the same thing every time, but that's longer than the "first error".
There's also a very strong non-linear temperature dependence on life. Both in terms of cycles and just in terms of remembering the contents. Get the package above 85C and it tends to lose its contents (I realize that the typical SSD won't be hot enough that the package gets to 85C, although, consider the SSD in a ToughBook in Iraq at 45C air temp..)
In actual life, with actual flash devices on a breadboard in the lab at "room temperature", I've cycled SLC NAND for well over a million cycles (hit it 10-20 times a second for days) without failure. This sort of behavior makes it difficult to design meaningful wear leveling (for all I know, different pages age differently) and life specs, without going to a conservative 100k/page uniform standard, which, in practice, grossly understates the actual life.
What you really need to do is buy a couple drives and beat the heck out of them with *realistic* usage patterns.
I specifically had SLCs in mind when I ran the numbers. As for the 100k writes I used my original calculations, I took those from this PDF here: http://www.datasheetcatalog.org/datasheets2/16/1697648_1.pdf - see section 1.5, it lists "Endurance : 100K Program/Erase Cycles" As for the 1M write cycles: http://investors.micron.com/releasedetail.cfm?ReleaseID=440650 - that one came out in 2008, so using it as a baseline for "newer" SLCs didn't seem that far off. I'll have to revise the article to include those links methinks...
Citation needed? The manufacturers typically tell you. For instance here http://www.newegg.com/Product/Product.aspx?Item=N82E16820239045 it states "Budget-minded gamers and enthusiasts will benefit from the lower price of Kingston’s new HyperX 3K SSD. This solid-state drive combines premium 3000 program-erase cycle Toggle NAND with the second-generation SandForce controller" So it gets only 3% of the authors most optimistic graph! Kind of funny article actually. Like the mad scientist doing lots of good math but overlooking the most obvious information the ding bat brought along for comedy plot complications sees in a flash. I wrote a tutorial yesterday on how to make a ram drive on linux so as to avoid using your fancy fast flash drive. It can be found here: https://ioconnor.wordpress.com/2013/02/18/tutorial-on-automatically-moving-home-to-ram-drive-and-back-on-startup-and-shutdown/
RAM disks are cool and all, but except on live CDs they're usually unnecessary. The kernel's buffer cache and directory-name-lookup cache (in RAM) can often outperform RAM disks on second reads and writes.
(Claimer: I worked on file systems for HP-UX, and we measured this when we considered adding our internal experimental RAM FS to the production OS.)
17 December 2008.
5 years? Might as well write a white paper on the benefits of drum memory over mercury delay lines.
AntiFA: An abbreviation for Anti First Amendment.
Actually, NAND flash doesn't "die" when you try to do the N+1 erase-write cycle (it's cycles, not writes. A cycle consists of flipping bits from 1 to 0 (aka write), and then from 0 to 1 (aka erase)). In practically all controllers, you do partial writes. With SLC NAND, it's fairly easy - you can write a page at a time, or even half pages. MLC lets you do page at a time as well - given typical MLC "big block" NAND of 32 4k pages, a block can be written 32 times before it's erased (once per page - you cannot do less than a page at a time).
And... other dirty little secret - the quoted cycle life is guaranteed. It means your part will be able to be written and erased 3000 times. Most typically, they're an order of magnitude more conservative - so a 3000 cycle flash can really get you 30,000 with proper care and tolerance.
Of course, a really big problem with cheap SSDs is lame firmware because what you need is a good flash translation later (FTL) which does wear levelling, sector translations, etc. These things are VERY proprietary and HEAVILY patented. A dirt cheap crappy controller you might find on low end thumbdrives and memory cards may not even DO translation or wear levelling. The other problem is the flash translation table must be stored somewhere so the device can find your data (because of wear levelling, where your data is actually stored versus where your PC thinks it is different - again, the FTL handles this). For some things, it's possible to just scan the entire array and generate the table live, but generally it's impractical at the large scale because it requires time to perform the scan. So usually the table is stored in flash as well, which of course is not protected by the FTL. Depending on how things go, this part could corrupt itself easily leading to an unmountable device or basically, a dead SSD.
For some REAL analysis, some brave souls have been stressing cheap SSDs to their limits until failure - http://www.xtremesystems.org/forums/showthread.php?271063-SSD-Write-Endurance-25nm-Vs-34nm
Some of those SSDs are actually still going strong.
The best bet is to buy from people who know what they're doing - the likes of Samsung (VERY popular with the OEM crowd - Dell, Lenovo, Apple, etc.), Toshiba, and Intel - who all make NAND memory and thus actually do have experience on how to best balance speed and reliability. Everyone else is just using the datasheet and just assembling them together like they would any other PC part.