Flash Destroyer Tests Limit of Solid State Storage
An anonymous reader writes "We all know that flash and other types of solid state storage can only endure a limited number of write cycles. The open source Flash Destroyer prototype explores that limit by writing and verifying a solid state storage chip until it dies. The total write-verify cycle count is shown on a display — watch a live video feed and guess when the first chip will die. This project was inspired by the inevitable comments about flash longevity on every Slashdot SSD story. Design files and source are available at Google Code."
It'll be nice to get some third-party data on exactly how long these things last on average.
a live stream linked on slashdot.. ouch..
Flash! Aa-aaahhh!!
Wait, which flash are we talking about here?
I was expecting something cool, like storing a picture, displaying it, and then constantly XORing each pixel with some random number twice, repeatedly, and watching the image decay over time. Although it would appear that it'd need quite a lot of time.
article says: We used a Microchip 24AA01-I/P 128byte I2C EEPROM (IC2), rated for 1million write cycles.
Um, SSDs don't use anything like this part as their storage.
The fact that you said this shows you spend way to much time on slashdot. The fact that I recognized it, and was one of the first posters in the thread you refer to says the same about me. I wonder if I can find a life for sale on craigslist?
Link to thread in question...
http://ask.slashdot.org/story/10/05/23/1547202/Scientific-RampD-At-Home
Now, to see how much explosives it takes to MAKE it fail!
This is my favorite part! :-)
Any technology distinguishable from magic is insufficiently advanced. - Geek's corollary to Clarke's law
Most modern flash memories have their controllers check which blocks are dying or dead and re-route write and read requests to good blocks. So while your flash may seem to be working perfectly well, various blocks inside it may be dying and its storage size may be progressively decreasing.
So I hope they are rewriting the entire flash in their test. Otherwise it is not representative.
More importantly, the test pattern does not resemble normal SSD usage. Complete writes are very unusual for SSD and a cycle is not completed nearly as quickly as a cycle on this EEPROM (400 cycles per minute). When an SSD is written to in normal usage, a wear leveling algorithm distributes the data and avoids writing to the same physical blocks again and again. The German computer magazine C't has run continuous write tests with USB sticks and never managed to destroy even a single visible block on a stick that way. The first test (4 years ago) wrote the same block more than 16 million times before they gave up. The second test (2 years ago) wrote the full capacity over and over again. The 2GB stick did not show any signs of wear after more than 23TB written to it.
Okay, I'll bite. Let me introduce you to this thing called "functional equivalence". You do realize that even though they are all "nonvolatile storage," there is a difference between EEPROM and Flash, and that there are many different kinds of low- and high-density Flash and they all have different proprietary silicon designs with different characteristics?
Microchip EEPROMs are specifically designed for low-density, high-reliability applications, and are totally different at the transistor level from high-density MLC Flash used in solid state disks.
But it will be *over nine thousand!!!*
They could add an extra digit to the front of the display showing how many times the other numbers have reached their maximum! Brilliant, 10x the capacity for only one digit more!
They're testing an EEPROM: while the underlining physics of storing data in an EEPROM and Flash RAM are the same - floating gate transistors - EEPROMs use best-of-breed implementations, single-bit addressable floating gate, while the Flash RAM found in SSDs is the cheapest, lest enduring MLC NAND. MLC NAND are the cheapest per bit, and have a write cycle endurance of two to three orders of magnitude lower than EEPROMs.
SSDs do not contain EEPROMs. They don't even contain SLC (NOR or NAND). In fact, SSDs don't even contain NOR MLCs. Only the cheapest will do, for SSDs.
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
(Being a software guy rather than a Flash memory guy, I wouldn't want to guess whether over-erased cells would be at logic 1, logic 0 or a mix of the two.)
Well I'm not an expert on flash, but I know a little about how they work. In NOR flash the data line is pulled up to one, so that's the default state for any bit. There's a transistor connected to ground, and if the floating gate has a charge in it and the transistor is on, then it pulls the data line down to 0. "Erasing" a NOR flash sets all the bits to 1, and programming it sets certain bits to 0.
The most common failure mode as I understand it is that electrons get trapped in the floating gate even after erase cycles such that it's very close to or over Vt for the transistor, so that bit would be stuck in the "programmed" state of logic 0.
NAND memory is the opposite, the erased state is 0 and the programmed state is 1, so a permanently charged floating gate should result in a stuck-at-1 fault.
Which, relating to the OP's question, means either way the memory wouldn't be good for much of anything. Your NAND SSD is going to fail during an erase-program (aka "write") cycle, and except in the extremely unlikely case that the pattern you were writing did not involve changing any previously stored 1s to 0s on stuck bits, then the result is going to be wrong. You could read it, but you'd be reading the wrong data.
The enemies of Democracy are
Graceful as in data not related to your recent failed writes are still readable so they can be backed up and migrated to a new drive. Not sure why that concept is so difficult. I consider something dead as "completely unreadable, ALL your data has been destroyed - have a nice day."
No longer reliable but still semi recoverable isn't quite "dead."
Maybe I'm just using a stricter interpretation of the word dead than you are?
Let's use a marker on a white board analogy. If I was storing all my data on a suitably large white board using a marker and I completely exhausted my marker's supply of ink, I'd be pissed if this resulted in a blank whiteboard, wouldn't you? On that same note, if I wiped a small section of my whiteboard with the intent of writing something new in that area and only then realized that my marker was no longer suitably supplied with ink and my write failed, I would find the blank void in that section alone acceptable.
Does that clarify things?
...will the Flash Destroyer hold up under this load?
http://www.bynarystudio.com
This is what I got from a 2GB Kingston Flash Key. After that there were errors in almost all overwrites. However the real kicker is that while the key read back wrong data, there never ever was any error reported. Since doing that beginning of 2009, I do not rust USB Flash anymore.
Set-up: Linux, 1MB random data replicated to fill the chip, then read back to compare. Repeat with new random data. I had one isolated faulty read-back around 3500 cycles and then from arounf 3700 cycles 90% (and pretty soon 100%) faulty read-backs. Language was Python, no errors for the device on STDERR, or the systemlogs. And I looked carefully.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
I am working on flash write/erase cycling right now in my day job and I can tell you that this is not a very good test. Temperature affects cycling endurance (and this is reflected in the spec), so if your SSD is 20-30C higher than room temp it's going to make a difference. Fowler-Nordheim tunneling (which NAND flash uses for program and erase) is hardest at cold temperatures, so the first operation after powerup might be the worst case in a PC. (Yes, I know they're not using an SSD here, but they are doing their cycling at room temp.)
Another thing to keep in mind is that continuous cycling is not realistic. The wear-out mechanism here is charge trap-up, where electrons get stuck in the floating gate oxide and repel other electrons, slowing down program and erase. Over time, thermal energy lets the electrons detrap. So irregular usage in a hot PC should actually be nicer environment for endurance.
A final factor is process variation, which can only be covered by using a large sample size (>100) and/or using units from separate lots with known characteristics, none of which an end user will likely have access to. Even that doesn't tell you anything about the defect rate.
There are really two types of tests that people are talking about here. The first is a spec compliance test, which uses the extreme conditions I mentioned above to guarantee that all units will have the spec endurance under all spec conditions. This should be done by the manufacturer. The second is a real world usage test, which will only give realistic results if done under actual use conditions. The number you get from the article's test probably won't tell you much.
[Disclaimer: I work on embedded NOR flash, not NAND, but the bits are the same and the article's talking about EEPROM so I figure I can butt in.]
Visit the