Flash Destroyer Tests Limit of Solid State Storage
An anonymous reader writes "We all know that flash and other types of solid state storage can only endure a limited number of write cycles. The open source Flash Destroyer prototype explores that limit by writing and verifying a solid state storage chip until it dies. The total write-verify cycle count is shown on a display — watch a live video feed and guess when the first chip will die. This project was inspired by the inevitable comments about flash longevity on every Slashdot SSD story. Design files and source are available at Google Code."
a live stream linked on slashdot.. ouch..
Wait, which flash are we talking about here?
I was expecting something cool, like storing a picture, displaying it, and then constantly XORing each pixel with some random number twice, repeatedly, and watching the image decay over time. Although it would appear that it'd need quite a lot of time.
article says: We used a Microchip 24AA01-I/P 128byte I2C EEPROM (IC2), rated for 1million write cycles.
Um, SSDs don't use anything like this part as their storage.
More importantly, the test pattern does not resemble normal SSD usage. Complete writes are very unusual for SSD and a cycle is not completed nearly as quickly as a cycle on this EEPROM (400 cycles per minute). When an SSD is written to in normal usage, a wear leveling algorithm distributes the data and avoids writing to the same physical blocks again and again. The German computer magazine C't has run continuous write tests with USB sticks and never managed to destroy even a single visible block on a stick that way. The first test (4 years ago) wrote the same block more than 16 million times before they gave up. The second test (2 years ago) wrote the full capacity over and over again. The 2GB stick did not show any signs of wear after more than 23TB written to it.
Yeah, the title seems misleading, since they're writing and verifying data on an EEPROM, which is not used in solid state drives last time I checked.
If there's anything more important than my ego around here, I want it caught and shot immediately.
Or connect the drive inside any computer running a Prescott P4 with 100% CPU utilization.
Tequila: It's not just for breakfast anymore!
If you have any important data on that drive, urine trouble...
They could add an extra digit to the front of the display showing how many times the other numbers have reached their maximum! Brilliant, 10x the capacity for only one digit more!
They're testing an EEPROM: while the underlining physics of storing data in an EEPROM and Flash RAM are the same - floating gate transistors - EEPROMs use best-of-breed implementations, single-bit addressable floating gate, while the Flash RAM found in SSDs is the cheapest, lest enduring MLC NAND. MLC NAND are the cheapest per bit, and have a write cycle endurance of two to three orders of magnitude lower than EEPROMs.
SSDs do not contain EEPROMs. They don't even contain SLC (NOR or NAND). In fact, SSDs don't even contain NOR MLCs. Only the cheapest will do, for SSDs.
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
Here is work from the academic community exploring error rates, latencies and some other factors. It compares 11 NAND flash chips (both SLC and MLC) from 5 manufacturers: http://nvsl.ucsd.edu/ftest.html
Actually, I believe *you* are incorrect. Different AC here, but I had to respond because your response doesn't match what I understand to be the case as an engineer working with vendors selecting NAND flash for use in consumer devices. I'll be interested to see if I'm incorrect or if this even gets read as an AC post.
Specifically, it doesn't matter to the flash device if the host has written a sector and never touched it again, that sector *will* be moved when it's been read enough times that the ECC indicates it's likely to become unreadable soon. This is called read disturbance and it can happen surprisingly frequently with MLC cells in small process sizes (i.e. at sufficient density to make multi-GB modules). It also happens on SLC devices but to a lesser extent because they can cope with more voltage decay per bit and still be able to read the bit correctly. This is done as a function of even the simplest block-access controllers because otherwise you wouldn't be able to read your own data back more than a few hundred times. In fact, if you wish to get technical about it, it also has a massive dependency upon the temperature the module is at when the data was originally written since this directly impacts the amount of electrons which can be stored.
In addition to moving data to counter read disturbance, most controllers (even the very simple ones in SD Cards & eMMC devices) will move sectors (actually not filesystem sectors, but individual blocks although the distinction isn't important here) around in order to optimise wear across the entire device even if the content hasn't changed. If you think about it, this has to happen at some level even without wear levelling since the sector is massively smaller than the superblock size for most of the densities we have available today - it's not unusual to see a device with an erase block size of 256KB, which is normally way larger than a sector.
I don't know much about SSD controllers, they're far too expensive for our devices, but they can't possibly work the way you think they do - not if they use the same raw NAND that is used for other block storage abstractions.
Cause wear leveling only picks another sector to write to from among the unused sectors. Simplified, if your drive is 80% full, you write to the same sectors five times as often.
Especially because once blocks start failing, other blocks start failing too, at an accellerating rate, and they rapidly reach a state of being completely unusable.
That's a contradiction. If the wear-leveling algorithm was ineffective then you'd have a relatively constant rate of block failure. A good wear-leveling algorithm ensures you won't get a significant number of block failures until almost every block has been worn out. Then you get a bunch. So the behavior described is failing exactly as intended, and indicates the wear-leveling algorithm worked almost perfectly.
But you're right in that a wear algorithm that only uses free space would be terrible. That's one reason no device uses one like that. The primary reason though, is because the SSD has no idea which blocks are empty and which are free, unless it is told via the TRIM command (later generation SSDs with newer OSes). The filesystem knows, but an SSD is filesystem agnostic. Moving data is the cause behind the performance drop-off when the drive runs out of unused/un-TRIM'd blocks.
Personally, I have the cheapest, buggiest SSD in common knowledge (the one that can get bogged down to 4 IOPS), and it has worked beautifully for me. Just checking a diagnostic tool, in the past two years I've power cycled it 5,666 times (which probably explains why I kill HDDs so quickly), the average block has been erased 7,333 times, and no block has been erased more than 7,442 times. I've got zero ECC failures. Honestly, I'm a little surprised I've written 234 TB of data to my poor 32 GB drive, but my usage is a bit heavy (~10 complete Gentoo compiles with countless updating, ~5 DISM'd Windows 7 installs, ~5 DISM'd Vista installs, ~30 Haiku installs, ~20 SVNs of 10 GB projects, and a good amount of downloading).
But, in my experience, the wear leveling algorithm is only ~3% away from being "perfect".