Endurance Experiment Kills Six SSDs Over 18 Months, 2.4 Petabytes
crookedvulture writes Slashdot has previously covered The Tech Report's SSD Endurance Experiment, and the final chapter in that series has now been published. The site spent the last 18 months writing data to six consumer-grade SSDs to see how much it would take to burn their flash. All the drives absorbed hundreds of terabytes without issue, far exceeding the needs of typical PC users. The first one failed after 700TB, while the last survived an astounding 2.4 petabytes. Performance was reasonably consistent throughout the experiment, but failure behavior wasn't. Four of the six provided warning messages before their eventual deaths, but two expired unexpectedly. A couple also suffered uncorrectable errors that could compromise data integrity. They all ended up in a bricked, lifeless state. While the sample size isn't large enough to draw definitive conclusions about specific makes or models, the results suggest the NAND in modern SSDs has more than enough endurance for consumers. They also demonstrate that very ordinary drives can be capable of writing mind-boggling amounts of data.
Within a year, but not from the memory, but a shit controller. From OCZ iirc. Something about it getting past half full giving it bad performance and anything before a full, drawn out format didn't cut it.
Since then, no SSDs died. But a fair number of spinning disks.
I think all most people are waiting for is for the GB/$ gap to decrease markedly. Otherwise it stays a SSD for your boot drive, a spinning disk to archive your junk market.
So does this mean I can use my SSD as a swap drive now? Seriously, that would be awesome. Lots of times I go over the 16 gigs of RAM I have while editing 3D models and the second it starts to swap to disk it's painful.
The fact that 2 of them died without warning is disappointing. I would rather have a shorter life time, but a clear indication that the drive is going to die.
Talk about your planned obsolescence - not a single sector reallocation registered, but the firmware counter says it's write-tolerance is reached so it kills itself. I suppose it's nice that it switches to read-only mode when it dies, except for the fact that it bricks itself entirely after a power cycle. I mean come on - if it's my OS and/or paging drive then switching to read-only mode is going to kill the OS almost immediately, and there goes my one chance at data recovery. Why not just leave it in permanent read-only mode instead? Sure it's useless for most applications, but at least I can recover my data at my leisure.
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Who thought this was a good idea? If the drive thinks future writes are unstable, good for it to go into read only mode. But to then commit suicide on the next reboot? What if I want to take one final backup, and I lose power?
As SSD cells wear, the problem is that they hold charge for less time. Starting new, the time that the charge will be held would be years, but as the SSD wears, the endurance of the held charge declines.
Consequently, continuous write tests will continue to report "all good" with a drive that is useless in practice, because while the continuous write will re-write a particular cell once every few hours, it might only hold a charge for a few days - meaning if you turned it off for even a day or so, you'd suffer serious data loss.
SSDs are amazing but you definitely can't carry conventional wisdom from HDDs over.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
No Text.
In a time of universal deceit, telling the truth is a revolutionary act. George Orwell
the results suggest the NAND in modern SSDs has more than enough endurance for consumers
"Challenge accepted." - some guy trying to invent octo-level-cell flash
Wanna know how to kill a spinning disk? Put it in a DVR. My DVR (with the "pause and rewind live TV" ability), would re-write 100% of the time. It died in a few months. Replaced it with a larger one and turned off the live-TV buffer, and it's lasted years. But it's all anecdotal, so I expect tests like this to give us some level of comparison.
Learn to love Alaska
I'd like a mix of drives on my next box. A moderate "traditional" spinning oxide 1TB drive with a lot of cache for the primary boot, swap, and home directories, and an SSD mounted as my project workspace under my home directory. The work directory is where I do 99% of my writes, producing roughly 3GB for one particular project in about an hour's time.
My existing drive on my main box has survived a god-awful number of writes, despite it's 1TB size. My work is emphatically I/O bound for the past month or so, since I did some bug fixing and tuning, so switching that project directory over to an SSD would speed things up even for this aging 3.8 GHz P4 single core. But not enough to justify the investment in just an SSD without a boost in memory, CPU, and memory bandwidth.
I've got my eye on a Lenovo i7 unit for about $900, plus SSD, but it'll take about a year to save up the money. On the bright side, either the specs on the unit will improve or the price will come down with an intervening year before purchase. :D
I do not fail; I succeed at finding out what does not work.
Sorry, market rules...
“He’s not deformed, he’s just drunk!”
Why Intel, why? We can all discuss whether the device should prematurely fail by some arbitrary software limit, but why BRICK it, as it can cause complete data loss!?
Instead, just set the drive to always boot in read-only mode, with secure erase being the only other allowed command. Then someone can recover their data and wipe the drive for good.
Intel doesn't have confidence in the drive at that point, so the 335 Series is designed to shift into read-only mode and then to brick itself when the power is cycled. Despite suffering just one reallocated sector, our sample dutifully followed the script. Data was accessible until a reboot prompted the drive to swallow its virtual cyanide pill.
-=Lothsahn=-
You have to be one of the biggest MORONS I've ever seen giving hopeless advice on this site. Your 'argument' is actually a STATEMENT against the entire concept of CACHES in computer science.
Run out of L0 storage on your CPU design? Whatever you do, do NOT build a L1 cache, because L1 caches are FAR slower memory units, and this means the OBVIOUS solution is to simply build a bigger L0 memory sub-system. Your L1 cache is suffering 'misses'? It isn't big enough, you FOOL- make it BIGGER!!!111!!!!!.
That L2 cache is only 1MB? What are you thinking? God, don't you know how 'cheap' transistors are these days- make it BIGGER man, 16MB- 32MB- hell one day maybe you can get to 1GB.
And what's that about only having 16GB of RAM. MOAR RAM!!11!!!!111!!! MOAR RAM!!111!!!111!! Using an SSD drive as L5 cache? Are you STUPID? Who ever told you that COMPUTER SCIENCE proves the logic of ever larger, ever slower, every cheaper caches as you move further out from the ALU of the CPU? MOAR RAM!!11!!!!!1!!! That's where the smart money is at.
Can the cretins of Slashdot fall any lower?
It's also a kick in the nuts to people like me, who stuck with Intel drives for the quality. They haven't had the fastest drive since the early days of SSDs, but I've continued to buy Intel SSDs anyway because (barring very few exceptions, like the 8MB bug) they generally have the reputation for being the most reliable. It's also why I've even avoided Intel drives with non-Intel controllers. Granted this particular drive (335) is a model with a SandForce controller, but it sounds to me like the issue here was not in the controller but in the firmware (which, even in SandForce controller drives, is written by Intel).
So, Intel....what a way for you to just kick me in the nuts and leave me lying on the floor. Good way to reward my trust in you.
If I trust the suicide, I suppose the upside would be that the drive can be safely tossed without worrying about the data on it.
This experiment only documents the survivability of the NAND Flash itself, really. I've had two consumer SSDs and at least one SD fail completely for other reasons; they became completely un-usable, not just un-writable. In the case of the SSDs at least, I was told it was due to internal controller failure, meaning the NAND itself was fine but the circuits to control and access it were trashed. I suppose a platter-drive analog to that would be having the platters in mint condition with all data intact but the servo coil melted, or something.
Since I've only owned three consumer SSDs and two of those died from a mode of failure that wasn't even addressed by this experiment, what am I to make of the real value of the results? They certainly have no meaning for me, but YMMV.
So read the SMART attributes of your drives and swap them out when they're getting close to 0 on the relative value for attribute 225, "Host Writes (Incremented by 32MB)". I bet you aren't even down to 95 (from 100) yet. Wear is a non-issue for almost all SSD users.
But superlatives make that sound like an amazingly good thing.
Funny you should mention that:
http://arstechnica.com/gadgets/2013/11/once-great-ssd-manufacturer-ocz-filing-for-bankruptcy/
Get the Samsung 250 Pro series.
It uses the new vertical flash cells.
They have a 10 year warranty.
Can't get that from a spinning drive !
About the best rotating drives you get now are the 1TB, single platter from WD. (Go for the Black series)
Single platter means least moving parts, most reliable.
The 5 platter behemoths to get to 4 and 6 TB per drive takes a lot of heads and a big motor to keep all those platters spinning.
Who thought this was a good idea?
Probably the same person that had killbots shut down when they reach their preset kill limit. :)
Suddenly a bunch of SSD drives of all types and manufacturers have shown up on ebay. Coincidence? I think not.
If I trust the suicide, I suppose the upside would be that the drive can be safely tossed without worrying about the data on it.
With the proper equipment, I'm sure the data can be recovered. Still best to thoroughly destroy the drive.
Don't try to out wierd me, three-eyes. I get stranger things than you, free with my breakfast cereal. --Zaphod Beeblebr
I've had some SSDs last for almost three years, but I would not trust them for important data. They are fine as a cache for speeding up OS access, or for a music player, but a magnetic hard drive is better for professional use.
Some additional info from an earlier article:
According to Intel, this end-of-life behavior generally matches what's supposed to happen. The write errors suggest the 335 Series had entered read-only mode. When the power is cycled in this state, a sort of self-destruct mechanism is triggered, rendering the drive unresponsive. Intel really doesn't want its client SSDs to be used after the flash has exceeded its lifetime spec. The firm's enterprise drives are designed to remain in logical disable mode after the MWI bottoms out, regardless of whether the power is cycled. Those server-focused SSDs will still brick themselves if data integrity can't be verified, though.
SMART functionality is supposed to persist in logical disable mode, so it's unclear what happened to our test subject there. Intel says attempting writes in the read-only state could cause problems, so the fact that Anvil kept trying to push data onto the drive may have been a factor.
All things considered, the 335 Series died in a reasonably graceful, predictable manner. SMART warnings popped up long before write errors occurred, providing plenty of time—and additional write headroom—for users to prepare.
So, it sounds like this is the intended behavior for *enterprise* drives. It may not be the same for *consumer* drives, but that's a bit unclear.
While it may make you feel better if consumer SSD drives would go into a permanent read-only mode, it seems extremely unlikely that a typical consumer would ever actually reach this point in an SSD's life at all. So, I'm not really losing sleep that my own Intel SSD drives are going to brick themselves, when at a typical consumer write volume, this isn't going to happen anytime in the next century (seriously, look at the volume of data that was written). The drive will long be dead because of some electronic component failure long before I reach it's natural end of write life. Moreover, I'd appear to have plenty of warnings and could easily replace them long before that happened.
Irony: Agile development has too much intertia to be abandoned now.
Buy a Corsair's Neutron SSD
Slashdot, fix the reply notifications... You won't get away with it...
Why is that parameter given such a cryptic name? I don't have an Intel SSD, but if I saw that parameter in SMART, I'd probably ignore it because at face value it doesn't mean anything.
Call it "% drive life remaining" or something more accurate.
Most NAND flash devices require the use of ECC all the time (they can even generate errors on reads). Even a brand-new IC, can have read/write errors and require ECC use. So, while YOU think that the "fix" is to start using ECC, the simple fact is that your SSD had almost certainly always been using them to generate the "good" behavior you have observed (sheltering you from seeing just how "messy" NAND flash tech actually is). By the time YOU see an error, the embedded controller is already at the end of its ability to keep you from seeing the problems that have been piling up all along. NAND flash-based storage begins re-mapping sectors almost immediately and depending on the embedded code, may even choose to mislead you about how many sectors are bad and how much empty storage is actually still available.
I designed early NAND flash into embedded systems (which the product treated as a drive) and we needed to write our own custom tools for data validation and recovery (not PC based and not always human-tended/accessible) so we had to pay particular attention to the bad sector mapping, error detection and correction etc. At that time, various manufacturers published great info on how to use the devices, including recommended ECC, read, write, remapping algorithms and so-on and we followed all the recommendations very carefully, but because we wrote our tools and were working that close to the bits, we could see what was actually happening and observe the decay of the devices. Everything people assume would be "extraordinary measures" on a disk drive actually need to be standard practice with NAND flash just for basic reliability. This is a good technology when used the right way and in appropriate applications but I believe it's a bad idea to push this onto typical PC users who are clueless about how often and how much they write to disk, how often they are overwriting data, and who have no plan for what to do when files suddenly are unreadable.
Doesn't really matter what it's called. If any one of the attributes drops to zero, the disk is toast, and if you haven't turned SMART warnings off, your computer will tell you that the disk is about to fail. But again, this is a non-issue. You won't see that attribute reach zero, ever, unless you try really hard to exhaust the disk's total write capacity.
This reminds me of the systems they tried to impose in the early 70s' US cars: if the seat belts weren't latched, the cars would not start. No, not just an annoying bong for a brief time; the freaking engine would not even turn over. And this being completely analog, system failures happened quite a bit. But fear not! If there was a failure, there was an under hood button you could push to bypass the system. Once. A one-time use, then you had to tow the car to a dealership for a new box. As most everyone by-passed this insanity, the system was quickly dropped.
Here's hoping Intel also comes to their senses about their one-time recovery window (hopefully with some well-deserved threats from the FTC).
I move the following off my Western Digital Velociraptor SATA II 10,000 rpm 16mb buffered harddisks that are driven off a Promise Ex-8350 128mb ECC ram caching raid sata 1/2 controller (which defers/delays writes via said cache, & also lessens physical head movement on disks & this is where I am going to make it even faster via lessening its workloads, read on & reduces fragmentation as well in the same stroke - "bonus") onto my 4gb DDR2 Gigabyte IRAM PCIExpress ramdisk card 2006-present (& before it, a CENATEK "RocketDrive" 4gb PC-133 SDRAM based one on PCI 2.2 bus circa 2002-2006):
---
A.) Pagefile.sys (partition #1 1gb size, rest is on 3gb partition next - this I didn't do on software ramdrives though)
B.) OS & App level logging (EventLogs + App Logging)
C.) WebBrowser caches, histories, sessions & browsers too
D.) Print Spooling
E.) %Temp% ops (OS & user level temp ops environmental variable values alterations)
F.) %Tmp% ops (OS & user level temp ops environmental variable values alterations)
G.) %Comspec% (command interpreter cmd.exe in this case, & in DOS/Win9x years before, command.com also)
H.) Lastly - I also place my custom hosts file onto it, via redirecting where it's referenced by the OS, here in the registry (for performance AND security):
HKLM\system\CurrentControlSet\services\Tcpip\Parameters
(Specifically altering the "DataBasePath" parameter there which also acts more-or-less, like a *NIX shadow password system also!)
---
* All of which lessen the amount of work my "main" OS & programs slower mechanical hard disks have to do, "speeding them up" by lessening their workload, fragmentation, and speeding up access/seek latency for the things in the list above too.
Since 1992 or so I've done this, 1st using separate HDDs (slower seek/access by FAR) & then using software ramdisks per the list below (on a MS-DDK based one I wrote in fact, on how I apply them) & FINALLY, using hardware based RamDrives (using PCI-133 SDRAM or DDR-2 RAM):
Then applying Software-Based Ramdrives to database work with EEC Systems/SuperSpeed.com on paid contract (which did me VERY WELL @ both Windows IT Pro magazine in reviews, & also MS TechEd 2000-2002 in its hardest category: SQLServer Performance Enhancement & SuperSpeed.com too - since I improved their wares efficacy by up to 40% via programmatic control & tuning programs for them) - which, only the past few years now it seems, OTHERS are finally "latching onto" for performance purposes in database work in industrial environs! The EEC/SuperSpeed.com unit had 1 great thing going for it - mirroring back to HDD to save state of data!)
APK
P.S.=> HDD's concentrate on program &/or data fetches that are still hdd bound (& not kernelmode diskcaching subsystem cached in 4gb of DDR3 system ram here either yet) done on a media that has no heads to move, & thus, more mechanical latency + slower seek/access as you get on hard disks + reduced filesystem fragmentations due to that all, also & it works!
... apk
I move these items off my Western Digital Velociraptor SATA II 10,000 rpm 16mb buffered harddisks that are driven off a Promise Ex-8350 128mb ECC ram caching raid sata 1/2 controller (which defers/delays writes via said cache, & also lessens physical head movement on disks & this is where I am going to make it even faster via lessening its workloads, read on & reduces fragmentation as well in the same stroke - "bonus") onto my 4gb DDR2 Gigabyte IRAM PCIExpress ramdisk card 2006-present (& before it, a CENATEK "RocketDrive" 4gb PC-133 SDRAM based one on PCI 2.2 bus circa 2002-2006):
---
A.) Pagefile.sys (partition #1 1gb size, rest is on 3gb partition next - this I didn't do on software ramdrives though)
B.) OS & App level logging (EventLogs + App Logging)
C.) WebBrowser caches, histories, sessions & browsers too
D.) Print Spooling
E.) %Temp% ops (OS & user level temp ops environmental variable values alterations)
F.) %Tmp% ops (OS & user level temp ops environmental variable values alterations)
G.) %Comspec% (command interpreter cmd.exe in this case, & in DOS/Win9x years before, command.com also)
H.) Lastly - I also place my custom hosts file onto it, via redirecting where it's referenced by the OS, here in the registry (for performance AND security):
HKLM\system\CurrentControlSet\services\Tcpip\Parameters
(Specifically altering the "DataBasePath" parameter there which also acts more-or-less, like a *NIX shadow password system also!)
---
* All of which lessen the amount of work my "main" OS & programs slower mechanical hard disks have to do, "speeding them up" by lessening their workload, fragmentation, and speeding up access/seek latency for the things in the list above too.
Since 1992 or so I've done this, 1st using separate HDDs (slower seek/access by FAR) & then using software ramdisks per the list below (on a MS-DDK based one I wrote in fact, on how I apply them) & FINALLY, using hardware based RamDrives (using PCI-133 SDRAM or DDR-2 RAM):
Then applying Software-Based Ramdrives to database work with EEC Systems/SuperSpeed.com on paid contract (which did me VERY WELL @ both Windows IT Pro magazine in reviews, & also MS TechEd 2000-2002 in its hardest category: SQLServer Performance Enhancement & SuperSpeed.com too - since I improved their wares efficacy by up to 40% via programmatic control & tuning programs for them) - which, only the past few years now it seems, OTHERS are finally "latching onto" for performance purposes in database work in industrial environs! The EEC/SuperSpeed.com unit had 1 great thing going for it - mirroring back to HDD to save state of data!)
APK
P.S.=> HDD's concentrate on program &/or data fetches that are still hdd bound (& not kernelmode diskcaching subsystem cached in 4gb of DDR3 system ram here either yet) done on a media that has no heads to move, & thus, more mechanical latency + slower seek/access as you get on hard disks + reduced filesystem fragmentations due to that all, also & it works!
... apk
For BETTER performance (see subject): I move the following off my Western Digital Velociraptor SATA II 10,000 rpm 16mb buffered harddisks that are driven off a Promise Ex-8350 128mb ECC ram caching raid sata 1/2 controller (which defers/delays writes via said cache, & also lessens physical head movement on disks & this is where I am going to make it even faster via lessening its workloads, read on & reduces fragmentation as well in the same stroke - "bonus") onto my 4gb DDR2 Gigabyte IRAM PCIExpress ramdisk card 2006-present (& before it, a CENATEK "RocketDrive" 4gb PC-133 SDRAM based one on PCI 2.2 bus circa 2002-2006):
---
A.) Pagefile.sys (partition #1 1gb size, rest is on 3gb partition next - this I didn't do on software ramdrives though)
B.) OS & App level logging (EventLogs + App Logging)
C.) WebBrowser caches, histories, sessions & browsers too
D.) Print Spooling
E.) %Temp% ops (OS & user level temp ops environmental variable values alterations)
F.) %Tmp% ops (OS & user level temp ops environmental variable values alterations)
G.) %Comspec% (command interpreter cmd.exe in this case, & in DOS/Win9x years before, command.com also)
H.) Lastly - I also place my custom hosts file onto it, via redirecting where it's referenced by the OS, here in the registry (for performance AND security):
HKLM\system\CurrentControlSet\services\Tcpip\Parameters
(Specifically altering the "DataBasePath" parameter there which also acts more-or-less, like a *NIX shadow password system also!)
---
* All of which lessen the amount of work my "main" OS & programs slower mechanical hard disks have to do, "speeding them up" by lessening their workload, fragmentation, and speeding up access/seek latency for the things in the list above too.
Since 1992 or so I've done this, 1st using separate HDDs (slower seek/access by FAR) & then using software ramdisks per the list below (on a MS-DDK based one I wrote in fact, on how I apply them) & FINALLY, using hardware based RamDrives (using PCI-133 SDRAM or DDR-2 RAM):
Then applying Software-Based Ramdrives to database work with EEC Systems/SuperSpeed.com on paid contract (which did me VERY WELL @ both Windows IT Pro magazine in reviews, & also MS TechEd 2000-2002 in its hardest category: SQLServer Performance Enhancement & SuperSpeed.com too - since I improved their wares efficacy by up to 40% via programmatic control & tuning programs for them) - which, only the past few years now it seems, OTHERS are finally "latching onto" for performance purposes in database work in industrial environs! The EEC/SuperSpeed.com unit had 1 great thing going for it - mirroring back to HDD to save state of data!)
APK
P.S.=> HDD's concentrate on program &/or data fetches that are still hdd bound (& not kernelmode diskcaching subsystem cached in 4gb of DDR3 system ram here either yet) done on a media that has no heads to move, & thus, more mechanical latency + slower seek/access as you get on hard disks + reduced filesystem fragmentations due to that all, also & it works!
... apk
I know this is slightly off-topic, but I found this surprising. About twenty years ago my main PC was a 66 MHz machine with 8 MB of RAM running Windows 95. I was learning to use the 3D graphics program "trueSpace" and I created a scene that was 11 MB big when saved to the hard drive as a wireframe. When I tried to render the scene, the hard drive thrashed for ten hours straight, and the scene was still only halfway rendered. Later, I bought 16 MB of RAM (if I recall for ~$400[!]), bringing my total up to 24 MB. That same scene rendered completely in twenty minutes. That was an fascinating lesson.