Slashdot Mirror


How Does Flash Media Fail?

bhodge writes "Aside from the obvious 'it stops working' answer, how does flash media — such as USB, SD, and CF — fail? Unlike with traditional hard drive, where anyone who's worked with computers for a while knows what a drive failure looks like, I don't know anyone who has experienced such a failure with flash. I've haven't been able to find more than scant evidence of what such failures look like at the OS level. The one account I have found detailed using a small USB drive for /var/log storage; it failed very quickly, and then utterly (0 byte unformatted device), after five years of service in the role. This runs contrary to other anecdotal claims that you should still be able to read the media after you can no longer write to it. So my question is: what have you seen of the nature of flash media failure, if anything?"

88 of 357 comments (clear)

  1. In my case by ptomblin · · Score: 4, Funny

    It usually "fails" because it went through the washing machine in my pants too many times.

    --
    The next Cmdr Taco duplicate will be ready soon, but subscribers can beat the rush and see it early!
    1. Re:In my case by MindlessAutomata · · Score: 4, Funny

      In that case, what's truly "failing" is you.

    2. Re:In my case by Yvan256 · · Score: 5, Funny

      You have a washing machine in your pants?

    3. Re:In my case by scorp1us · · Score: 2, Interesting

      Why would a solid state device fail from multiple submergings? Especially if there is no current running through it during said submergings?

      --
      Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
    4. Re:In my case by Anonymous Coward · · Score: 2, Interesting

      Mine is a beast. I washed it at least twice, still worked, and then most recently ran over it twice. Once backing up, and again coming back up the driveway when i 'forgot' it inside. Not realizing i had dropped it. I found it when i got home and it was crushed. Removed the metal around the USB connector since it was a pancake, plugged it in while holding it and the dang thing still worked. However since i'm lazy i don't want to hold it in forever so it's been retired.

    5. Re:In my case by CRCulver · · Score: 5, Informative

      Because the water is not pure and there are corrossive elements in it.

    6. Re:In my case by AKAImBatman · · Score: 5, Insightful

      Washing machines are pretty harsh places. You get tidal forces that will apply various physical stresses to the components. Rapid heating and cooling can cause expansion problems. Water can wear down contacts. Soaps can contaminate contacts or have negative chemical effects. So on and so forth.

      If it makes it to the drier, your card could easily end up at temperatures outside the optimal storage temperature for the device. (Ever read those warnings, "Store between 70F and 100F?" Yeah, me neither.) These extreme temperatures combined with the rapidity at which they're introduced is a cornucopia of ways your device could be damaged.

      In short, water isn't the real problem. It's all the stuff above and beyond that.

    7. Re:In my case by Moryath · · Score: 4, Informative

      Corrosion.

      Being repowered while the internal circuit board is still damp with soap-contaminated water (shorting).

      Physical stress ("agitate" cycle, "spin" cycle, Tumble Dry...).

      Heat stress (which heat cycle did you use/did it go through the dryer too).

      Need I go on?

    8. Re:In my case by ptomblin · · Score: 3, Insightful

      Usually the case falls apart. I can still get the data off the drive, but I stop using it and just spend another $20 to get something with 8 times the capacity of the last time.

      --
      The next Cmdr Taco duplicate will be ready soon, but subscribers can beat the rush and see it early!
    9. Re:In my case by AKAImBatman · · Score: 3, Insightful

      However since i'm lazy i don't want to hold it in forever so it's been retired.

      There's a fix for that. :-P

    10. Re:In my case by Anonymous Coward · · Score: 3, Funny

      How do you sit?

    11. Re:In my case by Chyeld · · Score: 5, Funny

      That's just what all the guys call it...

    12. Re:In my case by qwertyatwork · · Score: 2, Funny

      ...Rapid heating and cooling can cause expansion problems.

      Thats what she said!

    13. Re:In my case by uncledrax · · Score: 2, Funny

      Also, in my case, they usually make it to the dryer too..

      I've learned to be alittle more careful.. but that doesn't mean they don't occasionally get a nice wash and dry :/

      --
      ----- The internet has given everyone the ability to have their voice heard equally as loud.. even if they shouldn't be
    14. Re:In my case by Bakkster · · Score: 5, Insightful

      Don't forget about the extreme static charges built up in a drier. Even though most USB devices have mechanisms to prevent static damage, a drier could overwhelm these protections. Regardless, usually a SSD failure should usually be due to the failure of the suport electronics, not the storage itself.

      --
      Write your representatives! Repeal the 2nd Law of Thermodynamics!
    15. Re:In my case by Anonymous Coward · · Score: 5, Funny

      That's just what all the guys call it...

      My wife didn't appreciate it when I said that about her!

    16. Re:In my case by klaun · · Score: 2, Insightful

      Washing machines are pretty harsh places. You get tidal forces that will apply various physical stresses to the components. Rapid heating and cooling can cause expansion

      I'm sorry, tidal forces in a washing machine? Tidal forces are caused by gravity. It's an effect of the inverse square distance portion of the gravity force equation. They certainly exist in a washing machine as they do anywhere else subject to the effects of gravity, but no more so than anywhere else.

      Within the rotating frame of a washing machine drum, there are dynamic forces, centrifugal and Coriolis. I imagine that only the former is really significant, but I would think contact with an agitator or sides of the drum would subject the flash memory to far higher forces.

    17. Re:In my case by SnarfQuest · · Score: 2, Funny

      Being in your pants, I'm not surprised there are corrosive elements in it.

      --
      Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
    18. Re:In my case by GungaDan · · Score: 5, Funny

      No, no - he means from the detergent, Tide. It's disrespectful to dirt and soiled state media.

      --
      Eloi are stupid, throw morlocks at them!
    19. Re:In my case by BitZtream · · Score: 5, Informative

      You are only partially correct.

      The company I work for sells USB flash drives and going through a washer is rather common and they survive more often than not.

      The question is: Did you use soap?

      Water is practically harmless if you allow the device to dry completely before using it. The problem is water in washing machines isn't just water, its almost always water AND detergent, and probably some fabric softener as well.

      When the device dries, the detergent and fabric softener are left behind and are conductive, not like metal, but the resistance is low enough in the tiny spaces between the pins on surface mount chips to make all the difference in the world.

      The main reason devices fail however is simply abuse, or poor manufacturing depending on the device. Most of our returns are due to the USB connector pulling the solder pads off the circuit board because of the stresses during insertion/removal. Sometimes the pads don't come off at the USB connector but the board flexes enough to eventually break the connection at one of the flash chips or the controller. When that happens you go from working perfectly to 0 byte unformatted device in an instant as the controller can no longer talk to the actual flash.

      We have on occasion successfully retrieved customer data for them by removing the case from the device and flexing the board while its plugged in to get it to work or if that doesn't work, reflowing the solder where possible. Most of the time, thats all it takes.

      The heating and cooling is bad, but its not that bad. The temps in a dryer aren't as bad as one might think. My personal device has been washed and dried at least a dozen times in the last couple of years. When I find it in the dryer I simply pull it out of the case, clean the PCB with some PCB cleaner, let it dry, reassemble and life goes on. If its a good quality device doing it once will probably be okay, but as has been stated, doing it too many times and the heat expansion will certainly come into play and destroy solder joints or start making the board lamination fail.

      Now ... don't take that as a recommendation to wash your thumb drives, my stick is trying to get into the record books or something, I think it just refuses to die.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    20. Re:In my case by LearnToSpell · · Score: 4, Funny

      Sitting's fine. It's hard to stand up though.

    21. Re:In my case by rackserverdeals · · Score: 2, Funny

      The question is: Did you use soap?

      Ah.... and olde tyme geek.

      Times have changed. You don't need a long beard, poor hygiene and smelly clothes to be take seriously these days :)

      --
      Dual Opteron < $600
    22. Re:In my case by FooAtWFU · · Score: 4, Funny
      I blame Macromedia, and maybe YouTube.

      Oh, wait, you meant the other "Flash media".

      --
      The World Wide Web is dying. Soon, we shall have only the Internet.
    23. Re:In my case by Anonymous Coward · · Score: 2, Funny

      On the agitator!

    24. Re:In my case by dkh2 · · Score: 5, Funny

      Of course, now, everybody is thinking... "What were you doing in his pants?

      --
      My office has been taken over by iPod people.
    25. Re:In my case by aztracker1 · · Score: 2, Informative

      Actually, I've done that a couple times, and haven't had one not work after (then waiting a couple days)

      --
      Michael J. Ryan - tracker1.info
    26. Re:In my case by roseblood · · Score: 3, Interesting

      Same here. I've washed SD and CF cards more often than I'd like to admit. Despite that I've never had one fail out of the wash.

      I've had one card fail though. My Palm Treo uses SD cards and once when removing the card my fingernail was in the perfect position to split the card open at the seam. When I removed the card from the phone I had 2 pieces. One plastic cover(the part w/o the label) and the remainder of the card stayed together. I re-attached to cover and attempted to read the card in several USB readers and the phone as well. It was dead. The devices never recognized that a card was inserted.

      --
      There are lies, damned lies, and statistics.
    27. Re:In my case by Anonymous Coward · · Score: 2, Funny

      My personal device has been washed and dried at least a dozen times in the last couple of years.

      I suppose that's okay, but I find that the ladies prefer my personal device to be subjected to soap and water a bit more often.

    28. Re:In my case by scorp1us · · Score: 2, Funny

      Washing machines are pretty harsh places. You get tidal forces that will apply various physical stresses to the components.

      I don't use Tide. I use Gain. Perhaps I have it up to high?

      --
      Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
  2. Comment removed by account_deleted · · Score: 3, Interesting

    Comment removed based on user account deletion

  3. Here's what it looked like for a friend. by Slartibartfast · · Score: 5, Interesting

    He'd taken it out of his camera, tried to put it back in, and nothin'. Slapped it into my Linux box. It "saw" that there was a device there, but wasn't real happy about it:
    [ 5555.618324] sd 4:0:0:0: [sdb] Add. Sense: No additional sense information
    [ 5558.777567] sd 4:0:0:0: [sdb] Sense Key : No Sense [current]

    "It's dead, Jim."

    I'm tempted to try the old hard-drive swaparoo: get the exact same SD card, unsolder the flash chips, and put the bad one's flash on the new one's circuitry. See if it's the circuitry that's bad, or the flash, itself. If anyone has any bright ideas on how to determine definitively which it is without me going through that exercise, I'm all ears.

    1. Re:Here's what it looked like for a friend. by grahamsz · · Score: 2, Interesting

      Firstly, "getting the exact same SD card" might be a challenge. I've bought various cards from the same manufacturers and they tend to have subtle variations.

      Secondly I believe there isn't really much on an SD card except for the flash chip. CF cards have more of a traditional controller on there. A lot of the early criticism of SD was that a poorly made reader could screw up your card.

  4. Burnt out by abigsmurf · · Score: 2, Informative

    From what I gather, the most common cause of failure is the flash getting fried. Dodgy card readers, pulling the card out when a voltage is running through it, the chips are very sensative to spikes in current or voltage and burn out because of it.

    1. Re:Burnt out by Samschnooks · · Score: 5, Funny

      Dodgy card readers,...

      That's what you get for buying a Chrysler product or any Detroit product. Try getting a Honda or Toyota card reader. Or if you're a yuppy, a BMW card reader. Although, no one holds a candle to the Japanese.

    2. Re:Burnt out by dasunst3r · · Score: 5, Informative

      I am currently taking a class on solid state devices, and we just talked about how MOSFETs would fail. Basically, a high voltage to the gate would create these electrons that have so much kinetic energy that they create pairs of opposing charges (electron-hole pairs) in what was supposed to be the insulator. These pairs of charges would create an internal electric field inside the insulator. This process reduces the barrier for tunneling to occur, so more electrons are able to tunnel through the insulator and do the same thing, creating a runaway effect.

      For more information, look up "Time-Dependent Dielectric Breakdown" and refer to pages 293 and 294 of Streetman and Banerjee's "Solid State Electronic Devices" (6th ed).

    3. Re:Burnt out by BitZtream · · Score: 4, Interesting

      Seen this failure mode a lot too. Static build up on your body, then when you go to insert the device the charge jumps between you, through the device to the grounded casing around the USB connector.

      Can do anything from reboot your PC (if you're lucky) to destroying the stick or the USB controller on the PC (or HUB if you're luck).

      As you said power is a major problem with USB. Cheap USB sticks need FULL power to work right. Often times we'll have a customer with a stick that works fine in one PC (at home or work for instance) and will either not be recognized or will give read/write errors in another. Most of the time this is solved by using an external powered USB hub as the mother board simply isn't supplying enough voltage or current to power the stick. I'm not really sure if in general the problem is the motherboard or the stick as I haven't bothered to pull out the multimeter and do any serious testing, but I'm inclined to think its the stick as it seems to happen mostly with cheap/noname sticks that were probably rejected by the likes of Sandisk and co.

      As far as pulling them out while the card is powered, that is part of the specification for SD and USB, not sure about compact flash, but I would assume its there as well. USB and SD have the connector configured in such a way to ensure power is applied and removed in the proper order, which is why their connectors have some contacts that are longer than the others.

      What you said is still true however, a cheap chip on either side may not handle that process well. I can say however that we have successfully ran 3.3v SD cards at 7 to 9 volts for short periods of time due to mis configured testing setups where we didn't check the voltage after switching modes. Of course, we've also lost more than a few SD cards for that very reason, even at 5 volts they won't last more than a few minutes. mini and micro cards in an adapter to full SD fair better generally as the mini and micro's work at around 1.8v (I think, memory is fuzzy about that atm, might be 2.7) and have internal voltage dividers to cut down the 3.3 v input from the system, the still fail eventually due to over voltage, they just seem to do better although I have only anecdotal evidence to support that.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    4. Re:Burnt out by jazzkat · · Score: 3, Informative

      You'd have to have a hell of a lot of built-up voltage to jump through the plastic casing, through the air gap to the non-grounded metal on the PC board, and then from there across the air gap to the USB grounding shield. USB grounding is rugged as hell. At one point, the outlet behind my computer desk did not have a plate. One day when I was re-arranging cables, the metal shield of a USB plug brushed one of the screws for the 120v hot side in the outlet. The 120v had a clear path thru the USB cable, into the metal chassis, and out of the metal chassis via the power supply's ground pin. There's a nice big divot in that USB cable where the arcing occurred, but the cable and PC are still in use today.

    5. Re:Burnt out by PitViper401 · · Score: 2, Funny

      I'm still confused. Could you use a car analogy?

  5. Was there a point to this article? by Intron · · Score: 2, Insightful

    If a cell fails, you can't read or write that cell.

    If a gate fails in a page, you lose access to the page.

    If a gate fails in the overall control logic, you lose access to the whole device.

    Is there something I'm missing? Did you think there were oil changes or brake shoes? It's one silicon chip with metal on it.

    --
    Intron: the portion of DNA which expresses nothing useful.
    1. Re:Was there a point to this article? by Sooner+Boomer · · Score: 4, Insightful

      If a cell fails, you can't read or write that cell.

      If a gate fails in a page, you lose access to the page.

      If a gate fails in the overall control logic, you lose access to the whole device.

      Is there something I'm missing? Did you think there were oil changes or brake shoes? It's one silicon chip with metal on it.

      What about redundancy and self-healing? How do those work?

      --
      Chaos maximizes locally around me.
    2. Re:Was there a point to this article? by Anonymous Coward · · Score: 5, Informative

      Those work behind the scenes, if they are implemented. You wouldn't know they had been activated. If you lose a gate in the redundancy circuitry, that dies as well.

    3. Re:Was there a point to this article? by Vellmont · · Score: 4, Insightful


      Is there something I'm missing?

      Maybe the part where you assume everyone knows the above?

      Or how about the part where the submitter is asking about typical failure modes, not all possible failure modes?

      --
      AccountKiller
    4. Re:Was there a point to this article? by scatterbrained · · Score: 5, Informative

      If a cell fails, you can't read or write that cell.

      If a gate fails in a page, you lose access to the page.

      If a gate fails in the overall control logic, you lose access to the whole device.

      Is there something I'm missing? Did you think there were oil changes or brake shoes? It's one silicon chip with metal on it.

      Conceptually at least, there are several parts to worry about:

      1 - the OS & storage driver
      2 - the USB driver
      3 - the flash controller
      4 - the flash memory

      At the flash memory cell level the usual failures are breakdown of the dielectric materials and trapping charges in the memory cell that prevent an erase from happening and yield 'stuck' cells. This is normal for /all/ flash chips and is why they all have an erase cycle rating. There are certainly more exceptional ways for the chips to fail (soldering, wire bond failure, static damage, etc).

      The flash controller is supposed to be doing wear leveling, error detection and correction on the flash, to get around those problems with the flash chips, and also talking USB. These chips usually have a microcontroller in them somewhere, and there's probably bugs in that code, no doubt more in the parts that get exercised the least, like error paths :-)

      The OS and drivers just have the garden variety bugs and features that we all know and love...

      --
      -- All that's left of me, is slight insanity, whats on the right, I don't know. -- Bob Mould
    5. Re:Was there a point to this article? by scatterbrained · · Score: 5, Informative

      There's no redundancy or self healing in the hardware of a common USB flash stick. The illusion that there is comes from a flash controller chip that does a mapping between disk sectors and flash sectors and shuffles things in and out so you don't notice the failures until it can't compensate for them anymore.

      --
      -- All that's left of me, is slight insanity, whats on the right, I don't know. -- Bob Mould
    6. Re:Was there a point to this article? by AvitarX · · Score: 3, Informative

      Having a broken SD card in my pocket, I will describe how it behaves (which I think is what the article is asking). It is a 1GB SVP.

      In Windows (XP and Vista), it asks me to format the drive, chkdsk fails because the partition type is raw. Using recovermyphotos on it I get between 10 and 200 photos found before the card reader decides it is not in their anymore, and I can't recover the ones found (perhaps if I paid I could recover as it scanned).

      On Linux cat /dev/sdb returns no media found (I assume this is card to card reader trouble again).

      Interestingly, on a different reader that gives IO errors with every other card I use I get the raw partition do you want to format it issue.

      The fact that I can't read the drive at all from Linux ended my exploration.

      --
      Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
    7. Re:Was there a point to this article? by dargaud · · Score: 4, Interesting

      There may be other manners of failure. I have a recent 2Gb USB thumb drive that started going ever more slowly after a few days of use. I last measured a "dd if=/dev/random of=/media/device/test" of no more than 0.5kB/s. If somebody wants to have some fun analyzing it, I can put it in an envelope free of charge.

      --
      Non-Linux Penguins ?
    8. Re:Was there a point to this article? by James+McP · · Score: 5, Informative

      If a cell fails, you can't read or write that cell.

      This is a silent failure, much like hard drives marking blocks as bad. Capacity is reduced without any obvious signs. Not sure if OS tools can recognize it unless the controller reports bad cells as bad blocks. This will eventually result in "disk full" messages when there appears to be space on the drive. Reformatting won't recover the cells but it will likely result in your OS being aware of the flash's reduced capacity.

      If a gate fails in a page, you lose access to the page.

      Very similar to above, but larger amounts of data. I want to say there's 64 cells to the page but don't take that as gospel.

      If a gate fails in the overall control logic, you lose access to the whole device.

      Hello failed/unreadable/size 0 disk error. The data storage mechanism is intact but there's no way to access them. As people stated above, a lot of the time it is not the failure of a transistor so much as a trace or solder point failing. If you know your device has been abused physically, you can try the low-tech approach of gently squeezing or bending the stick while it's in the USB port (use an extension cable so you don't damage your mobo!!) to try and get the contacts to reconnect long enough to retrieve data. If that fails you can pop the case apart and use a magnifying glass to look for breaks in the solder or traces; if you're handy with a soldering iron you can try to bridge the connection. Again, temporary fix.

      Is there something I'm missing? Did you think there were oil changes or brake shoes? It's one silicon chip with metal on it.

      Actually most of them are several silicon chips; one controller plus a variable amount of memory chips. The increase in traces and board assembly is offset by the ability to reuse components and the overall design while memory chip prices fall. It also cuts down on the impact of failed chips, since you aren't losing controller+memory for one bad gate on the controller.

      --
      I've been on slashdot so long I'm starting to get out of touch with the cool stuff if it ain't on slashdot.
    9. Re:Was there a point to this article? by dgatwood · · Score: 4, Informative

      Yes and no. A page or cell failure will result in I/O errors if there are no more spares, and if it occurs during a read cycle, it -should- result in I/O errors for all subsequent reads from that cell or page until it gets rewritten to a new cell or page. If it doesn't work that way, then the device is fundamentally violating the contract between the device and the OS to report all nonrecoverable errors that result in data loss.

      Also, while a multi-chip design reduces the probability of a device failing outright, it dramatically increases the probability of a failure. First, using a separate controller significantly increases the probability of failure because instead of having interconnect traces on a slab of silicon that (electromigration notwithstanding) almost never change or fail if they work from the factory, you have solder joints exposed on a circuit board. Solder joints are the most common cause of circuit failure in my experience.

      Even ignoring the increased risk of having extra solder joints between the controller and flash parts, the odds of failure are still much worse for multi-chip devices. Remember your RAID MTBF theory. The MTBF of a collection of devices is equal to the MTBF of one device divided by the number of devices. If you have one part, the MTBF on that slab of silicon and associated solder joints might be a year. If you have five parts, the MTBF is now 73 days. That's an extreme example, but sadly, I've seen flash sticks with large numbers of failures in the first month, so that's not nearly as gross an exaggeration as you might think.... And whether one part fails or the whole thing fails, you still lose data.

      Also, a controller failure is still likely to cause all flash parts to be inaccessible whether it is integrated into a flash chip or is driving eight discrete flash chips. It's not like you're going to use a separate flash controller per flash part. And I -think- that a device showing zero capacity is probably caused by the flash controller being unable to communicate with the flash parts. If so, then that is much more likely to be caused by a failed connection between the two than by a failed flash controller (unless there are problems with interconnects inside the flash controller chip package failing due to overzealous compliance with ROHS rules).

      The original poster also failed to mention the most common failure mode, bar none: poor solder joints or other physical interconnects getting broken by physical force. This is very common among cheap flash drives. I wouldn't expect the same with SSDs, of course---you don't normally carry a SSD in your pocket---but at least in my experience, this one cause of failure is easily an order of magnitude more frequent than any other single cause, and is in all likelihood greater than all the others put together. And that's not even counting actual abuse (washing machines, run over by cars, and so on).

      My Lexar JumpDrive Secure flash drive suddenly stopped working, and I talked to my mother, whose entire university class was using that same model of drive. Turns out that between us, we had experienced close to a 50% failure rate on those things within the first month or so, having seen somewhere around 14 or 15 failures. The failure was interesting. Mine failed suddenly, but worked if you tipped the connector at an angle... at least for a couple of seconds once or twice. This told me pretty conclusively that the failure was caused by poor hardware design. As best I can tell, when you carry the drive in your pocket, the cap puts pressure on the USB connector. Over time, this gradually causes solder joint or trace failure (I never cut one open to figure out which) at or near the USB connector.

      Since then, I only buy flash devices with mechanisms where the USB connector retracts into a solid housing. Sure, you have an elevated risk of gunk from your pocket getting into the connector because it isn't covered, but at least you don't have the flexing problem. Gunk can be cleaned with a flat toothpick and alcohol. Failed solder joints requires disassembly and SMT soldering skills.... :-)

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    10. Re:Was there a point to this article? by James+McP · · Score: 2, Interesting

      Also, while a multi-chip design reduces the probability of a device failing outright, it dramatically increases the probability of a failure.

      I didn't make it clear that I was referring to the manufacturing side of things. I meant multiple chips reduced the chance of failures during manufacturing making the whole product unsellable. The more transistors to the package, the greater chance that some of them will be bad off the line. If the package can't tolerate any transistor failures and the cost per failed unit is high enough, you're better off building component chips indvidually then joining them on a PCB after validation.

      Plus, many flashdrive manufacturers are assemblers and not chip fabs. They buy flash and controllers from various fabs and install them on PCBs in the apropriate combination to get their sizes. An external controller makes it easier to switch between producing 8GB, 16GB, or 64GB flashdrives since they only have to change the size and/or quantity of the flash chips.

      Given the volatility in the flash market both to size and cost, I'm not sure it is financially viable to produce many memory+integrated controllers for anything but the largest bulk orders.

      I haven't bought a retractable flash drive. Does the whole PCB slide within the housing or is there a flexible ribbon connecting the PCB to the USB connector? The former seems like it would be cheaper and more durable but the latter seems lazier and lazy seems to rule the day.

      --
      I've been on slashdot so long I'm starting to get out of touch with the cool stuff if it ain't on slashdot.
    11. Re:Was there a point to this article? by Intron · · Score: 2, Interesting

      You might be the victim of one of the crooks who reprogram controllers in smaller flash to report as larger. It works fine until you wrap around and overwrite your file system. Beware of great deals on eBay.

      --
      Intron: the portion of DNA which expresses nothing useful.
  6. Failure to Write by Toad-san · · Score: 5, Informative

    Had two finally wear out. Both started giving "could not write to device" sort of errors. The system (Windows 2K or XP) would still recognize the drive, would show the files, etc. Indeed, I could still access (read) the files, so the data was there and copyable. But I'd get a file write error every time I read anything, because Windows was trying to update the flash drive's file directory with "last accessed" or some such, and that write would fail.

    No biggie; copied the data to a replacement, threw the old ones away, after hitting them several times with a hammer to "clear" the memory :-)

    1. Re:Failure to Write by bkaul · · Score: 3, Interesting

      I had a 2 GB Micro-SD card in my phone fail on me; it also failed to write, but there was also data corruption of some of the contents that were already on the card.

      The first symptom I encountered was that my backup program would report that it had failed to successfully back up the phone to the card. I popped the card out of the phone and into a PC, and noticed the data corruption in several places when trying to back up the contents - not just CRC read errors, but filenames actually turned to garbage, etc. in a couple of directories. After reformatting the card, the symptoms persisted - sometimes writes would fail, etc. Don't know what caused the failure, but that's what it looked like in my experience.

  7. Fail on write by fishybell · · Score: 4, Insightful
    The biggest difference I've encountered is when traditional hard drives fail, they fail on reading data back.

    Flash media fails when you write the data. In theory this means that you can always recover data as you can never write data to bad sectors. In practice the entire media device (CF, SD, etc.) fails at once.

    --
    ><));>
    1. Re:Fail on write by SatanicPuppy · · Score: 3, Informative

      It just seems like the traditional drives only fail on reads: they mostly do reads, so when they fail, it's more likely on a read.

      I've had many a drive fail during writes though, usually at the worst possible time (deadlines, when the machines are getting read/write hammered, and then bam, drive goes down and RAID performace goes to shit, and people start whinging.)

      I've had flash drives die all at once. It's not the norm, but there are things that can happen that will take them from "fine" to "dead" with no steps in between. Usually it's thumbdrives that that happens with; I haven't had a full flash harddrive fail at all yet, so I don't have any insight there.

      --
      ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
  8. Flashmemory by Narpak · · Score: 3, Informative

    Maybe I am totally on the wrong track here but don't the fact that they can't use Lead in some of the alloys contribute to the lifespan of some computer parts?

    As I understand it aluminium alloys created without lead and then used in computers degenerate several magnitudes quicker than alloys with lead. The process is apparently that the aluminium start sprouting tiny tiny "hairs" and when one of these connects to another one of these coming from somewhere else in the machine then it's thank you and good night for that part.

    Anyway the reason I mentioned this is because apparently with intensive use 5-7 years is how long parts in your computer takes to make a connection and after that it is LED OFF (see what I did there?) Of course unless you have a computer constructed before the mid nineties (I think that was the point); since they use lead in their alloys this isn't something that will affect them (though a range of other issues will).

    1. Re:Flashmemory by EdZ · · Score: 4, Informative

      You're thinking of 'Tin whiskers', and I'm not sure they're an issue with Silicon chips (because, well, they're SILICON), and the amount of time it takes for whiskers to grow between SMT components shouldn't differ between SSDs and HDDs. Plus it's a very slow process anyway, especially in the atmosphere.

    2. Re:Flashmemory by scatterbrained · · Score: 2, Insightful

      google 'tin whiskers' and 'RoHS solder failures'

      --
      -- All that's left of me, is slight insanity, whats on the right, I don't know. -- Bob Mould
    3. Re:Flashmemory by ckthorp · · Score: 2, Funny

      And tasty, too!

  9. FAT by AKAImBatman · · Score: 5, Informative

    The one account I have found detailed using a small USB drive for /var/log storage; it failed very quickly, and then utterly (0 byte unformatted device), after five years of service in the role.

    Without knowing more about this specific situation, I'd say this failure sounds like it pre-dates wear leveling. Prior to wear leveling, the most used sectors were likely to fail the fastest. And what sector gets written to more than the file allocation table?

    If the file allocation table was lost, that would explain why the device became completely inaccessible. The card might not be a total loss if the card contains firmware or circuitry to remove bad blocks from usage. In that case it might be possible to reformat it. (Of course, if it lacks wear leveling I wouldn't count on it.)

    Wear leveling neatly solves this issue by shifting writes to different free blocks with every write. This assures that the maximum use of the card is obtained prior to failure. Should any given block fail the card will detect the checksum error, mark the block as bad, then attempt to rewrite to a different block. This is communicated back to the reader in a transparent way. As far as the reader knows, nothing happened.

    As you can imagine, wear leveling makes it incredibly rare to see Flash failures these days. It can still happen, but the results are likely to be unpredictable. The card will need to chew through all free blocks before it starts returning errors. In that case you may be able to continue reading the media. Or it may fail like the USB drive you mentioned. It all depends on the importance of the block on which the erasure was attempted. Since you only know about a failure *after* the block erasure, you're at the mercy of the quality of the card's electronics and algorithms to protect against a dangerous erasure.

    1. Re:FAT by daid303 · · Score: 5, Informative

      Even with wear leveling devices still can fail easy. A single power failure during a write can ruin a perfectly good SD card. It took me a single try.

      Most devices that do hardware wear leveling are not power fail safe. And get corrupted beyond repair, random data corruption may follow, or an unreadable device.
      (I've done extensive testing with SD and Compact Flash devices in power fail cases. Because not all manufactures deliver what they promise)

    2. Re:FAT by AKAImBatman · · Score: 2, Insightful

      A single power failure during a write can ruin a perfectly good SD card. It took me a single try.

      You're right, I think that's the most common situation people see these days. Most of the other posters are describing sudden, total failures. Which are consistent with frying the drive rather than failures of bad blocks. Not all that different than losing a head on a hard drive.

  10. like a CPU by Lord+Ender · · Score: 2, Informative

    I've been booting linux servers off of flash for a few years. For some of them, the whole OS, even /var/log, is on the flash drive.

    I've had one drive fail, and it basically got hot and stopped being recognized as being connected by the computer. It was older generation technology, though. Newer flash technology designed for computers doesn't fail, as far as I have experienced. I'm talking about the flash SATA drives from name-brand manufactures.

    --
    A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
  11. Flash mail server by ace123 · · Score: 4, Informative

    I had a 4GB FAT32 flash drive that I used as storage for a mail server attached to an OpenWRT router. It required renaming and deleting files all the time (every time it got an e-mail)--so I think it wore down pretty quickly.

    One day, the storage for the flash drive stopped working (from one hour to the next, without being touched, the computer acted like I had just yanked the drive out)--it would be recognized but report a "no media in drive" error when you tried to access it, like an empty CD drive. In fact I think Windows would say "Insert CD" or "No disc in drive F"

    1. Re:Flash mail server by ranulf · · Score: 2, Informative
      Similar experience for me. I was running a slug (basically NAS device with network and 2 USB ports) as a general server using a USB memory stick.

      After about 6 months of fairly heavy use (with only 32Mb RAM I needed to swap to flash), one day the USB flash drive just stopped working, and it's no longer even detected when I plug it into any system now.

      I'd done all the obvious things such as mounting with noatime and have the swapiness to 0, but ultimately discovered that flash really doesn't like being constantly written to.

      Fortunately, even large capacity USB sticks are pretty cheap, so they're still quite good for as long as they last.

  12. Anandtech 'splains it all by spyrochaete · · Score: 4, Interesting

    A few weeks ago /. linked to a really wonderfully written article by Anand Lal Shimpi about SSD drives. In the article he includes some simple and clear explanations of how flash memory works, its lifespan, and how it handles writes and deletes to maximize the life of every block of storage.

    http://www.anandtech.com/printarticle.aspx?i=3531

    The only think missing from the article is a description of the behaviour of a failing drive.

  13. It depends on what and where by flyingrobots · · Score: 2, Insightful

    If the flash drive fails, yes you can continue to read from it, but you also have to consider what is meant by reading.

    You can always read the raw data from the device, that will never change. There is nothing that prevents the electrical signals from forming a proper read transaction on the IO pins of the flash IC chip.

    However, when you consider the software that is on top of the raw data (a file system for example), this is where you will have the trouble.

    With older CF cards, the concept of wear leveling was not implemented, I don't know about newer ones. This being the case, the directory structure for a file would more than likely reside in the same physical location on the flash. Opening, writing, closing a file with the same name would no doubt wear that space out as the directory entry gets hammered. Once that has "worn out", data is lost because the file system can no longer track it (even though the actual data may be viable).

    Also consider the device that does support wear leveling. At some point it will run out of places to wear. Some large files will remain static and won't move (they are only read), some files will be moved all over the device by the device's ASIC as the data in the file is updated or changed. At some point, the flash will run out of cells. This could happen as some critical directory entry is being updated, and the whole file system could be corrupted because there are no more viable flash cells to use.

    Your data might still be there is all its binary glory, but w/o a viable file system data structure to access it, well, you're toast. Unlike a harddrive that burped and lost a few bytes, a worn out flash drive has no recordable medium available to do any file system data structure repairs.

    Kevin

  14. gracefully... by bdewet · · Score: 2, Informative

    I had flash failing on my 'gracefully'. The amount of available storage just becomes fewer and fewer after usage. It seems like the cells(if one can call it that) just dies after repetitive usage. Formatting does not help either.

  15. CF by psergiu · · Score: 4, Informative

    Some years ago i used a 64Mb CF to install a minimal Debian on a IBM PC110 with 8Mb of ram. As the install process wanted more memory i created a 12Mb swap partition.
    Big mistake.
    The install took a whole day. I happily ran some programs the next day and crash - kernel screams of i/o errors in the swap partition.
    Formated the card MS-DOS - it found a few bad sectors. Then i ran Norton Disk Doctor and at every run it was founding more and more bad sectors. But each time i was re-formating the card using a camera, the bad sectors were shifting around. Unusable.

    FYI: IBM PC110 is a 486 Palmtop with a CF slot to be used as hard-drive. The CF interface is IDE.

    --
    1% APY, No fees, Online Bank https://captl1.co/2uIErYq Don't let your $$$ sit in a no-interest acct.
    1. Re:CF by AKAImBatman · · Score: 2, Interesting

      10MB CF cards predated the common deployment of wear leveling. Those old cards could fail at the drop of a hat. Especially if anyone was foolish enough to use them in a high-volume write situation.

  16. The short answer... by earnest+murderer · · Score: 2, Informative

    Your flash memory is fine, the controller is hosed.

    This kind of (essentially unrecoverable) failure will continue to be an issue wherever the logic is integrated with the storage.

    If it's any consolation, except for those who are always forgetting to "eject" or turn off their device before removing the media this kind of failure should be quite rare*.

    Enjoy.

    *Mfr's producing shoddy products not withstanding.

    --
    Platform advocacy is like choosing a favorite severely developmentally disabled child.
  17. Depends on the Filesystem I suppose by grahamsz · · Score: 4, Informative

    On a modern filesystem, your writes should essentially be atomic and in theory it shouldn't be possible to leave the drive in an inconsistent state when the write fails.

    Of course most camera memory cards end up being formatted with fat32 which can be a little less forgiving.

    1. Re:Depends on the Filesystem I suppose by aaron.axvig · · Score: 2, Insightful

      There is a conceivable edge case:

      You perform an atomic write to sectors A and B. Write A succeeds, but write B fails as that sector is worn out. Then you try to roll back sector A, only to discover that sector is also now worn out. Boom, inconsistent file state.

      This would probably be a rare occurance.

  18. flash faliure by erbbysam · · Score: 5, Interesting

    About 5-6 years ago, I decided that it would be a good idea to build a small application on a flash drive, that is, code and compile it directly to the drive.
    After what must have been hitting compile a few hundred to a thousand times, the 128MB thumb drive starting giving me drive write errors and then stopped responding altogether within about a minute after errors starting appearing.
    I think the moral of this story is backup your data, even when it's on a flash based drive, and don't code directly on a cheap thumb drive :)

    1. Re:flash faliure by clone53421 · · Score: 3, Insightful

      Actually, the rule of thumb is:

      backup your data, ESPECIALLY when it's on a flash based drive

      --
      Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
    2. Re:flash faliure by Midnight+Thunder · · Score: 2, Insightful

      I think the moral of this story is backup your data, even when it's on a flash based drive, and don't code directly on a cheap thumb drive :)

      Yup, this is important, but then again this important because for me the single biggest cause for data loss related to thumb-drives is: loss of drive.

      I would like to say that I am very careful with my drives, but the truth is the loop holding the drive to the key chain is usually very weak. There is also the person is in question which has something to do with it, but that is a little harder to change.

      --
      Jumpstart the tartan drive.
  19. Because people focus on the GB... by blind+biker · · Score: 5, Informative

    ...and quality and longevity take a back seat. So companies stopped offering SLC Flash RAM (+100.000 writes) and only offer MLC (5000 writes), and are now pushing even eight-level MLC, which will be even less reliable than standard 4-level MLC Flash RAM. But who cares, the consumer will be slightly fucked after a while, but that will be much later, after they enjoyed the happiness of getting slightly more GB for their buck.

    The only manufacturer that I know of, that is an exception, if Kingston, which still offers SLC Flash products - namely their elite pro line of SD and CF cards, and the Data traveler USB drives. But that's it, everyone else has not completely transitioned to MLC.

    --
    "The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
  20. this is a good freaking question. by nimbius · · Score: 3, Funny

    ive been able to roll over a flashdrive with my car, wash, and bake a flashdrive in the process of doing laundry, and its never failed...however ive had one on my desk for a month that failed like a whale for no good (read:user abusing it as normal) reason. blaming gremlins, jeebus, and FSM until a solution abounds.

    --
    Good people go to bed earlier.
  21. FAT Failure by ArcherB · · Score: 2, Insightful

    When I was in the digital imaging kiosk business, we had to repair about three flash drives a week. A customer would put it in one of our systems and pull it out while it was being read, or it was a cheap drive or whatever. Either way, the customer would blame our systems for killing their drives (rightly or wrongly). Of course, it would contain pictures of their dead grandfather or ex-girlfriend naked or whatever was completely priceless and irreplaceable.

    The vast majority of the time, we would be able to run an application that would be able to recover whatever was on the drive. While I'm not certain of the original problem, the system acted as if the drive had no FAT (File Allocation Table... do I really need to say it?) on it or the FAT had become corrupted. This particular application would be able to go in and recover whatever was on the drive and most of the time repair the drive to its previous working state.

    I say it ACTED like the FAT was corrupt, but I don't know or care if a flash drive has a FAT on it. Could have been a hardware thingie in there that hiccuped. The repair utility acted much like a scan-disk that would repair an MBR or FAT and/or act like an undelete utility would, restoring the files on the drive.

    --
    There is no "I disagree" mod for a reason. Flamebait, Troll, and Overrated are not substitutes.
  22. Have done some extensive testing... by spock_iii · · Score: 5, Informative

    For a prior employer, I had set up a process to qualify flash media for use in embedded products. There's a couple of different failure modes you are likely to see.

    First off, when the actual flash media itself wears out, it takes longer and longer to erase individual sectors.

    A flash device such as a USB stick or a CF card is slight more complicated because it has something known as an FTL (Flash Translation Layer). The FTL has the job of implementing the virtual media to flash sector translations, implementing wear leveling, and handling the awkward page erases. (Multiple sectors in a page, but you can only erase full pages.)

    The FTL obviously must store some mapping information in the media in addition to your data.

    If you start writing flash media, and time those writes, you see an initial rapid growth in the write timing that evetually levels off as the FTL tables swell to their constant operational size.

    The over all flash write speed will level off to some average value that follows slow growth over a very very long tail as the media wears.

    Early flash chips supported about 10,000 erases per page, and modern chips shipped by Samsung and others support a couple million erases per page. When you consider this is spread over say 4GB of media, you can understand that tail is very very long and flash media are probably comperable to hard drives in their MTBF these days.

    Secondly, when flash actually does begin to fail, the media itself tends to exhibit a small number of different symptoms.

    The flash may stat to show occasional data corruption when read. You might also have instances where data persists in the media only so long as power is applied. And then of course you have the fact that erases take longer and longer to achieve. Eventually erases or programming start timing out occasionaly.

    With the FTL between you and the flash, you don't directly observe these effects. Presumably the FTL is smart enough to try and re-map your data elsewhere. In most cases there's ECC to attempt correction of moderately corrupted data. The real killers are when the data fails to persist after power cycling, when ECC fails to recover critical FTL data tables, or when there are no more spare sectors to re-map data too.

    Those first two critical errors are likely to produce the lightbulb effect where your flash card or USB stick one day simply fails to come up when probed after device insertion. In more rare cases, the lack of spares may show up as some sort of reported write failure in your kernel logs assuming the flash device reports proper IDE/ATAPI/??? error data.

    One final note -- please don't leave your USB stick inserted in the PC as you power it off! USB ports supply power and use a FET device to control that power. When you turn off the PC, the gates float and significant leakage current goes to the USB device. Some of the cheaper USB drives lack a key resistor that bleads this current away and protects the flash memory chips. This leads to data corruption. I have seen the FTL break in such sticks simply by doing POR on the PC.

    Oh...almost forgot. When you put you flash stick through the washer and dryer, always use fabric softner or Bounce strips to reduce the static. :-)

  23. 1GB USB drive failed on me by Scorchio · · Score: 3, Interesting

    I have a Philips DVD drive with a usb port, and was using a 1GB flash drive to play back video files copied from my PC. The drive failed relatively quickly - I'd had it for about a year, but hadn't used it all that often. I started to notice the video files were corrupt on playback, but initially suspected the file itself, or possibly a problem with the DVD player's decoder. I diagnosed the problem by copying a file onto the drive, then repeatedly checksumming it. The first couple of times, the checksum value would be often be correct, then on subsequent checks it would change on me. I'd end up seeing several different checksum values, never seeing it return to a previous value. Whether this was due to a problem in the interface harware when reading, or memory cells failing to retain their state, I don't know.

    Even though it was a year old and I had no receipt, the manufacturer (Kingmax, I think?) was happy to send a free replacement. The new drive has seen much more use, but is still working fine.

  24. Quiet failure... by NotQuiteReal · · Score: 4, Informative

    I too had a flash drive fail, but in the "worst" way... quietly.

    Fortunately, the drive was mostly used for "sneaker net" use, and did not contain any irreplaceable data. This use exposed the issue quickly too (had it been a backup device, the backup would have been useless and I wouldn't know until I needed it.)

    A typical failure was to zip up a software installation on a dev machine, then take it to a clean target machine, where the zip would fail to unpack, or the installer exe, once unpacked, would fail to run with various errors.

    I finally got to the point where I simply copied several megabytes of plain text data to the memory key, then copied it back and diffed the files to see the corruption (large areas of nulls, as I recall.)

    Never heard a peep from the OS.

    It was a 1 1/2 year old Patriot XT 2GB, and, after a couple of emails and a PDF of my NewEgg receipt, a new drive showed up in the mail under the lifetime warranty.

    I also had an expensive Lexar CF card for a digital SLR that failed. In that case pictures that I know I took simply weren't on the card... but could be "recovered" with the Lexar utility (along with EVERYTHING else on the card, so it was a PITA.) Since that was nearly $200 when it was new, I figured getting my lifetime warranty honored would be easy, since the cards were down to about $20. No dice. Just got the run-around and finally gave up. Lexar lost a customer.

    --
    This issue is a bit more complicated than you think.
    1. Re:Quiet failure... by houstonbofh · · Score: 2, Funny

      I also had an expensive Lexar CF card for a digital SLR that failed. In that case pictures that I know I took simply weren't on the card... but could be "recovered" with the Lexar utility (along with EVERYTHING else on the card, so it was a PITA.) Since that was nearly $200 when it was new, I figured getting my lifetime warranty honored would be easy, since the cards were down to about $20. No dice. Just got the run-around and finally gave up. Lexar lost a customer.

      They lost more than one... They are now in the same group as Maxtor, politicians, and strippers...

    2. Re:Quiet failure... by gmccloskey · · Score: 2, Insightful

      hey, what have strippers ever done to deserve being classed with politicians?

  25. My anecdotal experiences with Flash. by dannycim · · Score: 2, Insightful

    I've been running my home desktop/server (Linux 2.6) on a Sandisk Cruzer 8GB usb stick (root, swap, tmp, everything except large media files) for a year and four months without any glitches. I've napkin-calculated that at current usage and wear levelling, I should be able to use it for over 50 years without a failure. Funnily enough, the portable USB drive that I use to back it up failed last December. I keep multiple backups, I didn't flinch.

    Then again some flash devices fail miserably and silently. I've had a few 64MB and 128MB stick batches with stuck bits, and those were practically new. The operating systems they were used on didn't detect the errors, I did, by trying to open garbled files.

    My wish list: A SATA gizmo that has 4-5 USB connectors with each their own bus that presents itself to the SATA bus as a single drive, and does RAID-5 automatically. That'd be sweet.

  26. Another interesting MOSFET failure mode by smellsofbikes · · Score: 2, Interesting

    is packaging. There is stuff in the potting epoxy that holds enough electric charge to make the FET's gate start to conduct a little, playing havoc with everything. We've been having to redo parts with an extra layer of metal over the top of the IC to protect it from an intermittent contamination in our packaging material.
    I believe I remember reading that Intel had problems with their water being mildly radioactive downstream of an old uranium mine, and running into the same problems (only much worse, since they're doing much finer geometry.)
    So this is a case where the FET hasn't failed, precisely: it's just getting messed up by external interference.

    --
    Nostalgia's not what it used to be.
  27. Overview by meregistered · · Score: 2, Informative

    So, I will pass on what I have discussed with my brother-in-law who is an Electrical Engineer that writes software to test flash memory:

    1. Flash memory is built with additional fail over storage (so a 1GB SD card actually has a certain % more memory than 1GB).
    When a section of memory fails it is marked bad by the flash controller and some of the fail over memory comes into service (marked bad much like failures on standard hard drives... although I get the impression the flash controller may be the thing remembering it's bad... wasn't clear on this now I have something else to ask him)

    2. Flash memory will fail... it can only be written to so many times before it will no longer be able to be written to... and the number of times is definitely not as high as a standard hard drive
    So it's likely that you can extend the life of a flash device by writing to it less often.

    And, not from my brother-in-law discussions, I personally had a flash drive fail (I was using it as the master copy of documents as I moved data between my work machine and home machine while working toward an online degree). When it failed there was no warning previously. It simply stopped working... wouldn't be read and wouldn't write. I suspect my batch file that performed the backups to it must have written to it too many times (it was a smaller 128MB drive so, considering the above discussion about fail-over memory a smaller drive SHOULD fail faster...)

    Hope that helps

  28. Free the blue smoke! by bombastinator · · Score: 2, Interesting

    I wonder how well that hammer thing worked.

    It's beautifully effective for drives because they are high speed high precision mechanical devices, but even if you broke up the circuit board the chips were soldered to a guy with a soldering iron and some know how might still be able to get it back together again. Looking at that cell to gate progression posted earlier it sounds like unless you are able to actually destroy a given gate you don't destroy access to a give chip. If you were able to access the internals of the chip that might not be a barrier.

    Too many electrons are easy to find though. Maybe get some rubber gloves, one of those hand held stun guns and zap the board parts a few times after (or before) you're finished hammering. It could be fun and sparkley. This also provides opportunity for some memorable conversations with management. " ...It's these SSDs boss. They're just really hard to erase when they fail. I'm afraid the department is going to need it's own Vandegraff generator..."

    The blue smoke wants to be free.

  29. Flash media by DougWare · · Score: 2, Funny

    Everyone knows that the mainstream flash media is so left wing that's going to fail.