Taking a Hard Look At SSD Write Endurance
New submitter jyujin writes "Ever wonder how long your SSD will last? It's funny how bad people are at estimating just how long '100,000 writes' are going to take when spread over a device that spans several thousand of those blocks over several gigabytes of memory. It obviously gets far worse with newer flash memory that is able to withstand a whopping million writes per cell. So yeah, let's crunch some numbers and fix that misconception. Spoiler: even at the maximum SATA 3.0 link speeds, you'd still find yourself waiting several months or even years for that SSD to start dying on you."
100000 writes? 1M writes?
What the fuck is this submitter smoking?
Newer NAND flash can sustain maybe 3000 writes per cell, and if it's TLC NAND, maybe 500 to 1000 writes.
100,000 is only for SLC NAND. MLC, what is currently in most SSDs, is only 3,000, and TLC (found in usb drives, samsung 840, and probably more SSDs soon because it's cheaper) is only 1,000.
Is 1,000 fine for most people, yes.. but you should be aware of it. I have a fileserver that writes 200gb per day.. which would kill a Samsung 840 in about 6-7 months.
http://www.anandtech.com/show/6459/samsung-ssd-840-testing-the-endurance-of-tlc-nand
I have never had a laptop hard drive last more than two years, and only had one last more than eighteen months. Maybe your spinning-metal-one-micron-away-from-the-drive-head drives work well in a stationary, temperature-controlled environment, I guess.
But if your SSD is nearly full with data that you never change, wouldn't all the writing happen in the small area that is left? This would significantly reduce lifetime.
I didn't do the maths and just installed an SSD as my OS disk... in 2010. It's still there now despite being used daily and having been re-installed a couple of times (yes, Windows).
Obviious Troll is Obvious but... while SSDs can & do fail (just like old hard drives can & do fail), the reason for SSD failure in the real world is very rarely due to flash memory wear. Hint: If your flash drive suddenly stops working one day, that ain't due to flash wear, which would manifest as gradual failure over time.
AntiFA: An abbreviation for Anti First Amendment.
Cheaper? Maybe per GB but not for the IO.
How many platters am I going to have to raid to get even near what a single SSD can do? Am I ever going to be able to get random reads that high and fit it all in one WTX case?
Our company experienced what we believe was its first age-related failure in October of 2012, an office PC with an Intel SSD drive in the value oriented line of 2008 (which was still high at the time). Basically the drive behaved as a mechanical drive would behave with an occasional bad sector and we were able to successfully image the data to a new one. Out of 200 Intel drives, that's pretty good. (We did have one failure in 2010 but that was an outright dead drive and we were able to RMA it). Not sure if this contributes anything to the conversation but I figured I'd throw this out there.
The Intel X25's in my PC, from 2009, are still humming along nicely and my last benchmark produced the same results in 2012 as they did in 2010. But I've gone so far as to set environment variables for user temp files to a mechanical drive, internet temp files to a RAM drive and system temp files to a RAM drive, offsetting the wear leveling.
Had an SSD in my laptop for just over a year and a half now, no issues what so ever. Daily use as well.
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
"It obviously gets far worse" is referring to "how bad people are at estimating", not the lifespan of the Flash Memory.
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
Reliability always trumps speed, for me anyway.
SJWs are the new boogeyman. -Me
meaningful life specs are tough to come by for flash. Yes, as noted above, SLC NAND has a rated life of 100k erases/page on the datasheet, but that's really a guaranteed spec under all rated conditions, so in reality, it lasts quite a bit longer. If you were to write the same page once a second, you'd use it up in a bit more than a day.
However, in real life, the "failure" criteria is when a page written with a test pattern doesn't read back as "erased" in a single readback. Simple enough, except that flash has transient read errors: that is, you can read a page, get an error, read the exact same page again and not get the error. Eventually, it does return the same thing every time, but that's longer than the "first error".
There's also a very strong non-linear temperature dependence on life. Both in terms of cycles and just in terms of remembering the contents. Get the package above 85C and it tends to lose its contents (I realize that the typical SSD won't be hot enough that the package gets to 85C, although, consider the SSD in a ToughBook in Iraq at 45C air temp..)
In actual life, with actual flash devices on a breadboard in the lab at "room temperature", I've cycled SLC NAND for well over a million cycles (hit it 10-20 times a second for days) without failure. This sort of behavior makes it difficult to design meaningful wear leveling (for all I know, different pages age differently) and life specs, without going to a conservative 100k/page uniform standard, which, in practice, grossly understates the actual life.
What you really need to do is buy a couple drives and beat the heck out of them with *realistic* usage patterns.
Almost certainly MLC. SLC is really only found in industrial SSDs these days. Enterprise and consumer SSDs are all MLC, with the exception of Samsung 840, the first SSD to use TLC.
So then you only use magnetic tape for storage?
How long does it take to boot from that?
I have backups, so I can always restore.
I don't expect most servers to swap at all. If your server is swapping, buy more ram. Cell phones are still ram starved enough to need to do that.
Does anyone know whether the failure count for cells picks up along a nice smooth curve or is like running into a cliff? Intel seem to be suggesting in their spec sheets that the 20% over-provisioning on some of their SSDs (I'm assuming for bad-block remapping when failure is detected) can increase the expected write volume of a drive by substantial amounts:
http://www.intel.co.uk/content/www/us/en/solid-state-drives/solid-state-drives-710-series.html
This seems to go against the anecdotal evidence of sudden total SSD failures being attributed to cell wear - something else must be failing in those, most likely the normal expected allotment of mis-manufactured units.
I have a very old (I think I bought it circa 2004 or so, it has turion cpu). Display hinges failed in it as well as cooling so I can't play games on it anymore (discreet GPU).
Hard drive is trucking on fine.
Some hard drives obviously last less. However if you have systemic problem with hard drives lasting less then two years, it's time to take a look at the factor that remains the same between these hard drives: user.
The issue people point out is that "even if controller is good enough to last you until wear out, your SSD will fail much sooner then a hard drive".
Fact that controllers fail ridiculously often on budget drives doesn't improve SSD reliability. It is however somewhat understandable, as SSD controllers are significantly more complex then hard drive ones.
You are right, they usually die of ... wait for it .... flash memory wear (most likely firmware not being able to recognize damaged cell and insisting on using it).
-electronic failure (power supply, rarely controller chip itself)
-firmware bug triggered by
Who logs in to gdm? Not I, said the duck.
Which is why most SSD drives implement some kind of wear leveling. They will move the often written sectors around the physical storage space in an effort to keep the wear even.
Rotating media drives do similar things and can physically move "bad" sectors too, but this usually means you loose data. Many drives actually come from the factory with remapped sectors. You don't notice it because these sectors are already remapped on the drive onto the extra space the manufacturers build into the drive, but don't let you see.
Reminds me of when I interviewed with Maxtor, years ago. They where telling me that the only difference between their current top of the line storage (which was like 250G at the time) and their 40 Gig OEM drive was the controller firmware configuration and the stickers. Both drives came off the same assembly line and only the final drive power up configuration and test step was different, and then only in the values configured in the controller and what stickers got put on the drive. If you had the correct software, you could easily convert the OEM drive to the bigger capacity, by writing the correct contents to the right physical location on the drive. The reason they did this was it was cheaper than having to stop and retool the production line every time an OEM wanted 10,000 cheap drives.
I'm sure drive builders still do that sort of thing today. Set up a 3Tb drive line, then just down size the drives which are to be sold as 1Tb drives.
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
SSD here has been rejected on multiple and continuous failure rates. Now it only gets given to end users who provide a 'light' write environment - and thats the only place where consumer level 25 and sub level nm write cycle gear can be used sanely (ie, without having a plan for swap out/replacement and higher costs).
I'm expecting a fairly severe level of failure on new equipment shipping today that uses SSD as cache.
I frankly love the speed. But the claims about how long an 'average' user would take to wear out these disks has failed with abysmal rate failures where I work. Admittedly, our users are mid to heavy use cases, but the failure rates have been high, and the life time shorter than anyone would contemplate.
Either the cost of the drives has to fall (which to be fair - it has been), or the reliability question and write limits needs to change substantially.
I no longer consider SSD for front line heavy use. And I'd need serious work to be convinced on contemplating it again with lower nm flash. And SLC level gear is simply beyond the cost level we can attain.
We`re all equal
For laptop batteries I have been told that they (the batteries) will not get a memory. I have yet to find a rechargeable battery that doesn't get a memory. With a laptop it is easy to determine. You charge the laptop battery until fully charged. Then when running the laptop on the battery the low power warning pops up in 5-10 minutes (often less then 5 minutes). This is why I usually make a drain battery power setting plan. This power plan has no auto shut off. I can usually run the laptop with 0% battery life for 1-2 hours. Then the laptop shuts off. Turn the laptop on and repeat. When you might get the laptop to post and then it is off again you can charge the laptop battery for the number of hours it takes to get a full charge. This is a pain if you did not note that when you got the laptop. Mine is 12 hours. I have seen 8 hours, 24 hours, 6 hours you need to know what you laptop battery takes for a full charge. You can over charge it (I did on older batteries) it you leave them charging for too long. After fully charged hour-wise I use the laptop until the power runs out again. Then I charge it and change the power setting back to what I normally use. I get my full life out of the battery again. I usually drain the battery 1-2 times a year. I have a 8 year old laptop still on its original battery. I get 3 hours of no power saving use on it and 6 hours of power saver setting use. I do the same thing with my newer laptop. Until I see otherwise I'll keep doing what I am doing.
I know that is not what the laptop companies tell you. My own experience with about 100+ laptops and few thousand other rechargeable batteries is they all get a memory at some point. Draining them, the timed recharge, then use until out of power resets the memory.
No, magnetic tape is too vulnerable to EMP. He boots from punch card.
Actually, better SSD controllers sense that a page has reached its rewrite limit. The end effect of this is that the size of the overprovisioned space gets reduced by one page. (The controller stops ever writing to the used-up page.) The write performance of the SSD degrades until it goes below a certain amount of overprovisioned space, at which point it refuses to write any more. The disk is still entirely readable, so it's a binary failure mechanism, but a pretty safe one.
Gradual failure over time means either you have a crap controller or that your electronics are failing in ways other than running out of write cycles.
I use them for the speed, but anyone claiming they are reliable are smoking some strong peyote.
Yep, just yesterday I had four embedded boxes on my desk that needed the SSD's pulled for replacement and reinstall. All four had Kingston SSDNOw drives in them and were 1-2 years old. We had much better luck back in the days of IDE CompactFlash adapters and those were less expensive parts than SSD's.
I'm under the impression now that it's because those were 90nm devices and the newer stuff is just crap. MLC SSD's have moved further along the hot/crazy scale in the past couple years. I should say that I'm still happy with the SLC SSD's in my servers, but for low-cost gear we're going back to 2.5" harddrives for reliability.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Fire susceptible.
I've implemented a filesystem on top of OpenCV that uses a laser to read bits carved into granite slabs.
If the laser fails, various sun alignments will allow the passive CdS sensor to take over, at a performance penalty of several years (about one IOP per year).
My desktop Intel X25 died after 8 months due to running out of spare blocks and an ADATA drive I had in my occasional use laptop lasted about a year and a half. My two anecdotes cancel out your anecdotes.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
You can over charge it (I did on older batteries) it you leave them charging for too long.
This is fully a problem with every laptop manufacturer skimping out on the charge controller design. It's apparently cheaper to let your customers burn out their batteries by leaving them plugged in "too much" rather than designing a power supply that cuts off the charging current when the battery is full.
But they still sell the laptops as "desktop replacement" devices, which to me implies that they should be able to be plugged in all the time without damage. Also, they're in computers. They should be able to take care of their own charging profile without people making up their own deep cycle treatment.
Can you be Even More Awesome?!
No, if you buy quality brands, take a little care and, probably most importantly, are lucky, that happens. I'm writing this on my 7 year old MacBook 1.1 (HD is 4 years old, replaced because the old one was too small, not because it failed). Before that I owned a Toshiba Satelite Pro for 6 years. And I'm using my laptop throughout the day allmost every day because I'm a student and programming hobbyist and I don't own a stationary. I may have prioritized wrong by buying expensive stuff and using them long after they are outdated, but I certainly have used them.
> you'd still find yourself waiting several months or even years for that SSD to start dying on you
How comforting!
Those who would give up essential liberty to purchase a little temporary safety, deserve neither liberty nor safety.
I have never had a laptop hard drive last more than two years, and only had one last more than eighteen months.
Then I would have to wonder what the heck you are doing to the hard drives. I'm not sure I've ever had one last less than that long in a laptop. I've had laptop hard drives last for 7 years and were still going strong when I stopped using the machine. In fact I usually have some other component die long before the hard drive does. I have several hard drives that work just fine from laptops with burned out system boards, defective keyboards, borked video and other problems.
Some people are quite hard on their equipment, perhaps you are one of these? I've often been astonished how carelessly some people treat their equipment and then expect it to work.
When you dont use a computer. That happens. And the fact that you are happy with a 2007 dell means you really dont use your computer.
Curious theory. The fact that I run a multi-million dollar company heavily using a half dozen computers between 6-9 years old must really mess with your world view. We run ERP , product test, shipping, time card management, several databases, some very large spreadsheets, CAD and quite a bit more but according to you we must not actually be using the computers for anything. Would a faster computer be nice? Sure but the marginal improvement would be well into diminishing returns.
I wear the letters off of a keyboard in 12 months.
So stop buying crappy keyboards. I have keyboards have have been used for over 20 years without a fleck of paint missing.
Some of us actually use their computers as tools to make money, others look at them as toys for fun.
Some of us actually try to get a decent ROI on our machines and realize that lots of actual work doesn't require the latest and greatest. I run a manufacturing company and if you don't think we don't use our computers I think you don't really understand what that means in the real world.
So far I see a lot of complaints from people who don't appear to even know how to run SMART tools to get write cycle and wear statistics from their SSDs... you know, so real actual numbers can be posted.
So far none of my SSDs have failed, and I have almost 20 installed in various places. The one with the most wear is one of the first SSDs I purchased, an Intel 40G device:
da0: Fixed Direct Access SCSI-4 device
da0: Serial Number CVGB951600AC040GGN
da0: supports TRIM
Power on hours - 19127
Power cycle count - 48
Unsafe shutdown count - 32
Host writes x 32MiB - 375697
Workld media wear - 5120
Available reserved - 99/99/10
Media wearout - 91%
Basically 12TB worth of writes on this 40G drive over the last 2.18 years. No failures. Media wearout indicator 99 -> 91. Estimated durability based on the wear indicator is around 132TB. Roughly comes to ~3300 cycles/cell. This vintage of SSD uses MLC flash whos cells are roughly spec'd at ~10000 cycles.
While firmware issues are well documented for various SSD vendors over the last few years, and cell erase cycle life has gone down as the chips have gotten more dense, I would still expect the vast majority of failures to be due to wear-out.
Lots of things can cause premature wear-out but probably the most common would be using the SSD for something really stupid, like to host a database doing a lot of random writes or with a high frequency of fsync()s, using the SSD for swap on a system which is paging heavily 24x7, using the SSD for WWW log files on a busy web server, formatting an unaligned filesystem on the SSD or a filesystem which uses too-small a block size, and any number of other things.
Venerable but still mostly correct:
http://leaf.dragonflybsd.org/cgi/web-man?command=swapcache
The only adjustment I would make is that as the Intel 40G continues running, the wear I'm getting on it is pointing closer to ~130TB of durability and not 400TB (400TB is the theoretical max at 10,000 cycles/cell). Still reasonable. Generally speaking, that's the older 34nm technology. The newer 24nm technology will get fewer cycles but devices tend to have more storage so, as I say in the manual, you could expect similar total wear out of a newer 120GB 310 series SSD whos flash cells have 1/3 the cycle life.
-Matt
Quite excellent testing by xtremesystems. I'm not doing anything nearly so formal, but my numbers for the two brands I use (Intel and Crucial) are roughly on track with their results. And it does give me more confidence in those two brands.
I have an OCZ as well which still works, but after all the negative issues came up I pulled it out of production boxes. And I only have one... never bought another one, every time I researched them out they just weren't up to snuff.
It should also be noted that SSD firmware continues to undergo radical change, so for leading vendors such as Intel and Crucial who seem to be more on top of the firmware running on the more generic chipsets underneath, we should expect further stabilization verses older products. I'm frankly a bit surprised that my old 40G Intel SSD hasn't hit one of its known firmware issues yet, but my environment is backed up by a UPS so it might simply be bullet dodging.
(OCZ, on the otherhand, seems to put out new firmware with inadequate testing, their newer products are not any more reliable than their older products).
-Matt