Flash Destroyer Tests Limit of Solid State Storage

← Back to Stories (view on slashdot.org)

Flash Destroyer Tests Limit of Solid State Storage

Posted by timothy on Thursday May 27, 2010 @08:02AM from the step-right-up-place-your-bets dept.

An anonymous reader writes "We all know that flash and other types of solid state storage can only endure a limited number of write cycles. The open source Flash Destroyer prototype explores that limit by writing and verifying a solid state storage chip until it dies. The total write-verify cycle count is shown on a display — watch a live video feed and guess when the first chip will die. This project was inspired by the inevitable comments about flash longevity on every Slashdot SSD story. Design files and source are available at Google Code."

31 of 229 comments (clear)

live stream by Anonymous Coward · 2010-05-27 08:07 · Score: 5, Funny

a live stream linked on slashdot.. ouch..
1. Re:live stream by kipin · 2010-05-27 08:30 · Score: 4, Informative
  
  http://torrentstream.org/
  
  Works pretty well actually.
  
  --
  If I can not smoke in heaven, then I shall not go. -- Mark Twain
2. Re:live stream by TooMuchToDo · 2010-05-27 08:31 · Score: 4, Informative
  
  You've just described what multicast was designed to solve.
  https://www.cisco.com/en/US/products/ps6552/products_ios_technology_home.html
Subject here by Anonymous Coward · 2010-05-27 08:10 · Score: 4, Funny

Flash! Aa-aaahhh!!
1. Re:Subject here by Chris+Burke · 2010-05-27 08:29 · Score: 4, Funny
  
  Now do that a million more times and we'll see if you wear out. Don't forget to include the live video feed.
  
  --
  
  The enemies of Democracy are
Die Flash, DIE! by Anonymous Coward · 2010-05-27 08:11 · Score: 5, Funny

Wait, which flash are we talking about here?
dull by Threni · 2010-05-27 08:14 · Score: 5, Funny

I was expecting something cool, like storing a picture, displaying it, and then constantly XORing each pixel with some random number twice, repeatedly, and watching the image decay over time. Although it would appear that it'd need quite a lot of time.
SSD's? no. by hypethetica · 2010-05-27 08:16 · Score: 5, Informative

article says: We used a Microchip 24AA01-I/P 128byte I2C EEPROM (IC2), rated for 1million write cycles.
Um, SSDs don't use anything like this part as their storage.
Re:Interesting! by mantis2009 · 2010-05-27 08:16 · Score: 4, Informative

Just checked out the video feed. The chip already lasted longer than 1 million writes, which is the number of writes the chip is supposed to last over its lifetime. As of this writing, the chip has survived more than 1,600,000 write cycles and counting.

Still, since this test isn't on an actual, shipping solid state drive (SSD) product, the results will be discounted by a lot of critics.
Myth Busters by PSaltyDS · 2010-05-27 08:23 · Score: 4, Funny

Now, to see how much explosives it takes to MAKE it fail!
This is my favorite part! :-)

--
Any technology distinguishable from magic is insufficiently advanced. - Geek's corollary to Clarke's law
Re:Interesting! by Smallpond · 2010-05-27 08:30 · Score: 4, Insightful

Mechanical disks have lots of great failure modes. You can do seek tests until the arm breaks or voice coil fails, you can do write/read tests until you get enough bad sectors that they can't recover the data any more, or you can do start-stop of the drive motor until it dies. Another good one is to stop the motor for a while, then see if it starts up or has stiction (sic), but that test takes a long time. If the drive is not held rigidly enough, vibration will kill it, and it it isn't cooled properly, heat will kill it. Did I miss any?
Re:SSD's? no. by Anonymous Coward · 2010-05-27 08:34 · Score: 5, Informative

More importantly, the test pattern does not resemble normal SSD usage. Complete writes are very unusual for SSD and a cycle is not completed nearly as quickly as a cycle on this EEPROM (400 cycles per minute). When an SSD is written to in normal usage, a wear leveling algorithm distributes the data and avoids writing to the same physical blocks again and again. The German computer magazine C't has run continuous write tests with USB sticks and never managed to destroy even a single visible block on a stick that way. The first test (4 years ago) wrote the same block more than 16 million times before they gave up. The second test (2 years ago) wrote the full capacity over and over again. The 2GB stick did not show any signs of wear after more than 23TB written to it.
Re:Interesting! by jellomizer · 2010-05-27 08:36 · Score: 4, Interesting

I would like to see a comparison with a mechanical drive doing the same thing in parallel.
While the Solid Sate has a theoretical Limited number of writes vs. the mechanical drive, it would be interesting to see what real world has to offer.

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Re:SSD's? no. by robot256 · 2010-05-27 08:39 · Score: 4, Informative

Okay, I'll bite. Let me introduce you to this thing called "functional equivalence". You do realize that even though they are all "nonvolatile storage," there is a difference between EEPROM and Flash, and that there are many different kinds of low- and high-density Flash and they all have different proprietary silicon designs with different characteristics?
Microchip EEPROMs are specifically designed for low-density, high-reliability applications, and are totally different at the transistor level from high-density MLC Flash used in solid state disks.
Re:Interesting! by Dancindan84 · 2010-05-27 08:40 · Score: 4, Insightful

And honestly it's a pretty valid argument. This is definitely going to be informative, but I'm just as interested in how a particular SSD handles the flash blocks failing as when they fail. A SSD with flash that averages 1,000,000 writes before blocks start to fail but does it gracefully with little/no data loss could be better than one that averages 2,000,000 but goes out in a blaze of glory as soon as the first block fails.

--
"Always forgive your enemies; nothing annoys them so much." - Oscar Wilde
Re:Interesting! by Kindgott · 2010-05-27 08:42 · Score: 5, Informative

Yeah, the title seems misleading, since they're writing and verifying data on an EEPROM, which is not used in solid state drives last time I checked.

--
If there's anything more important than my ego around here, I want it caught and shot immediately.
Re:Interesting! by Pharmboy · 2010-05-27 08:46 · Score: 5, Insightful

Or connect the drive inside any computer running a Prescott P4 with 100% CPU utilization.

--
Tequila: It's not just for breakfast anymore!
Re:Interesting! by D+Ninja · 2010-05-27 08:51 · Score: 5, Funny

If you have any important data on that drive, urine trouble...
Re:Interesting! by InsaneProcessor · 2010-05-27 08:52 · Score: 4, Informative

I find this "not very interesting" RTFA. This is not a flash destroyer. It is an EEPROM destroyer. NOT THE SAME THING AND NOT USEFUL!

--

Athiesm is a religion like not collecting stamps is a hobby.
I know by billlava · 2010-05-27 08:53 · Score: 5, Funny

They could add an extra digit to the front of the display showing how many times the other numbers have reached their maximum! Brilliant, 10x the capacity for only one digit more!
Apples and hippos by blind+biker · 2010-05-27 08:53 · Score: 5, Informative

They're testing an EEPROM: while the underlining physics of storing data in an EEPROM and Flash RAM are the same - floating gate transistors - EEPROMs use best-of-breed implementations, single-bit addressable floating gate, while the Flash RAM found in SSDs is the cheapest, lest enduring MLC NAND. MLC NAND are the cheapest per bit, and have a write cycle endurance of two to three orders of magnitude lower than EEPROMs.
SSDs do not contain EEPROMs. They don't even contain SLC (NOR or NAND). In fact, SSDs don't even contain NOR MLCs. Only the cheapest will do, for SSDs.

--
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
Re:Interesting! by TeknoHog · 2010-05-27 09:13 · Score: 4, Interesting

I'm just curious, why use sic in your own posts? Wouldn't you just correct whatever you are sic-ing?
IMHO, this kind of use of [sic] is perfectly valid. It means "this is not a typo, it's really how it is spelled" (literally "thus"). In this case it refers to an unusual word that may look like a misspelling of a more common word. However, it can also refer to a genuine misspelling, when you are referring to what somebody else wrote.

--
Escher was the first MC and Giger invented the HR department.
Re:Huh? by Denis+Lemire · 2010-05-27 09:17 · Score: 4, Insightful

Graceful as in data not related to your recent failed writes are still readable so they can be backed up and migrated to a new drive. Not sure why that concept is so difficult. I consider something dead as "completely unreadable, ALL your data has been destroyed - have a nice day."
No longer reliable but still semi recoverable isn't quite "dead."
Maybe I'm just using a stricter interpretation of the word dead than you are?
Let's use a marker on a white board analogy. If I was storing all my data on a suitably large white board using a marker and I completely exhausted my marker's supply of ink, I'd be pissed if this resulted in a blank whiteboard, wouldn't you? On that same note, if I wiped a small section of my whiteboard with the intent of writing something new in that area and only then realized that my marker was no longer suitably supplied with ink and my write failed, I would find the blank void in that section alone acceptable.
Does that clarify things?
Re:Interesting! by Chris+Burke · 2010-05-27 09:20 · Score: 4, Funny

A SSD with flash that averages 1,000,000 writes before blocks start to fail but does it gracefully with little/no data loss could be better than one that averages 2,000,000 but goes out in a blaze of glory as soon as the first block fails.
That depends on how you define "better", and for my personal definition, it depends on exactly how glorious a blaze it is. :)

--

The enemies of Democracy are
Re:Interesting! by lauragrupp · 2010-05-27 09:33 · Score: 5, Informative

Here is work from the academic community exploring error rates, latencies and some other factors. It compares 11 NAND flash chips (both SLC and MLC) from 5 manufacturers: http://nvsl.ucsd.edu/ftest.html
Re:Interesting! by networkBoy · 2010-05-27 11:07 · Score: 4, Informative

And in fact, the more advanced wear leveling algorithms do this already. There are spare blocks specifically such that the data can be moved, then the old block that was not used can be freed.

--
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Re:Interesting! by networkBoy · 2010-05-27 11:11 · Score: 4, Informative

In fact, they are read back. At the flash component level.
The flash cell is a charged gate. when programmed the uC in the flash device compares the charge state with a reference voltage. Not enough? Add more charge. Still not enough? Cell is bad, mark it (block level, so you lose xx bits for one bad one) and move on.
This is fairly high level and not exactly how it works, but close enough.

--
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Re:Interesting! by Anonymous Coward · 2010-05-27 11:39 · Score: 5, Informative

Actually, I believe *you* are incorrect. Different AC here, but I had to respond because your response doesn't match what I understand to be the case as an engineer working with vendors selecting NAND flash for use in consumer devices. I'll be interested to see if I'm incorrect or if this even gets read as an AC post.
Specifically, it doesn't matter to the flash device if the host has written a sector and never touched it again, that sector *will* be moved when it's been read enough times that the ECC indicates it's likely to become unreadable soon. This is called read disturbance and it can happen surprisingly frequently with MLC cells in small process sizes (i.e. at sufficient density to make multi-GB modules). It also happens on SLC devices but to a lesser extent because they can cope with more voltage decay per bit and still be able to read the bit correctly. This is done as a function of even the simplest block-access controllers because otherwise you wouldn't be able to read your own data back more than a few hundred times. In fact, if you wish to get technical about it, it also has a massive dependency upon the temperature the module is at when the data was originally written since this directly impacts the amount of electrons which can be stored.
In addition to moving data to counter read disturbance, most controllers (even the very simple ones in SD Cards & eMMC devices) will move sectors (actually not filesystem sectors, but individual blocks although the distinction isn't important here) around in order to optimise wear across the entire device even if the content hasn't changed. If you think about it, this has to happen at some level even without wear levelling since the sector is massively smaller than the superblock size for most of the densities we have available today - it's not unusual to see a device with an erase block size of 256KB, which is normally way larger than a sector.
I don't know much about SSD controllers, they're far too expensive for our devices, but they can't possibly work the way you think they do - not if they use the same raw NAND that is used for other block storage abstractions.
Re:Interesting! by Bing+Tsher+E · 2010-05-27 14:45 · Score: 4, Funny

That brings to mind an old favorite of mine: the Light Emitting EPROM. The power pins on EPROM chips are in opposite corners. Plug in the EPROM chip backwards and you've hooked the power up backwards. Result: A light emitting EPROM, though one with a very limited service life.
Re:Interesting! by izomiac · 2010-05-27 16:24 · Score: 5, Informative

Cause wear leveling only picks another sector to write to from among the unused sectors. Simplified, if your drive is 80% full, you write to the same sectors five times as often.

Especially because once blocks start failing, other blocks start failing too, at an accellerating rate, and they rapidly reach a state of being completely unusable.
That's a contradiction. If the wear-leveling algorithm was ineffective then you'd have a relatively constant rate of block failure. A good wear-leveling algorithm ensures you won't get a significant number of block failures until almost every block has been worn out. Then you get a bunch. So the behavior described is failing exactly as intended, and indicates the wear-leveling algorithm worked almost perfectly.

But you're right in that a wear algorithm that only uses free space would be terrible. That's one reason no device uses one like that. The primary reason though, is because the SSD has no idea which blocks are empty and which are free, unless it is told via the TRIM command (later generation SSDs with newer OSes). The filesystem knows, but an SSD is filesystem agnostic. Moving data is the cause behind the performance drop-off when the drive runs out of unused/un-TRIM'd blocks.

Personally, I have the cheapest, buggiest SSD in common knowledge (the one that can get bogged down to 4 IOPS), and it has worked beautifully for me. Just checking a diagnostic tool, in the past two years I've power cycled it 5,666 times (which probably explains why I kill HDDs so quickly), the average block has been erased 7,333 times, and no block has been erased more than 7,442 times. I've got zero ECC failures. Honestly, I'm a little surprised I've written 234 TB of data to my poor 32 GB drive, but my usage is a bit heavy (~10 complete Gentoo compiles with countless updating, ~5 DISM'd Windows 7 installs, ~5 DISM'd Vista installs, ~30 Haiku installs, ~20 SVNs of 10 GB projects, and a good amount of downloading).

But, in my experience, the wear leveling algorithm is only ~3% away from being "perfect".
This is a bad test by AdamHaun · 2010-05-27 16:49 · Score: 4, Informative

I am working on flash write/erase cycling right now in my day job and I can tell you that this is not a very good test. Temperature affects cycling endurance (and this is reflected in the spec), so if your SSD is 20-30C higher than room temp it's going to make a difference. Fowler-Nordheim tunneling (which NAND flash uses for program and erase) is hardest at cold temperatures, so the first operation after powerup might be the worst case in a PC. (Yes, I know they're not using an SSD here, but they are doing their cycling at room temp.)
Another thing to keep in mind is that continuous cycling is not realistic. The wear-out mechanism here is charge trap-up, where electrons get stuck in the floating gate oxide and repel other electrons, slowing down program and erase. Over time, thermal energy lets the electrons detrap. So irregular usage in a hot PC should actually be nicer environment for endurance.
A final factor is process variation, which can only be covered by using a large sample size (>100) and/or using units from separate lots with known characteristics, none of which an end user will likely have access to. Even that doesn't tell you anything about the defect rate.
There are really two types of tests that people are talking about here. The first is a spec compliance test, which uses the extreme conditions I mentioned above to guarantee that all units will have the spec endurance under all spec conditions. This should be done by the manufacturer. The second is a real world usage test, which will only give realistic results if done under actual use conditions. The number you get from the article's test probably won't tell you much.
[Disclaimer: I work on embedded NOR flash, not NAND, but the bits are the same and the article's talking about EEPROM so I figure I can butt in.]

--
Visit the