Fatal WeaknessWith High-Capacity MMC/SD Cards?
"I am working on an embedded project where I am using Secure Digital and/or MultiMediaCards to store data. For convenience in developing and updating, I have decided to use a Windows FAT-type file system. This way I can create them, debug them, and update them on Windows development machines using USB card readers.
Since I have to keep around 25,000 files on the card, and since I'd like to minimize disk fragmentation that would result from large cluster size, I would like to use FAT32 with 512-byte clusters. This is no big deal, and certainly supported on windows - "format f: /fs:fat32 /a:512". Done and done.
The interesting thing was that I bought 4 256MB SD cards (Three from SanDisk, one from Lexar Media), and quickly killed 3 of the 4. The SanDisk cards report that track 0 is not readable when I try to format it. Snooping the SD bus shows the card inits OK, and allows writes, but returns an error whenever track 0 is read. The Lexar card's failure is a little more subtle: a format looks like it works, but subsequent chkdsks always fail. The 4th card I'm afraid to repeat this on.
SanDisk (after some weeks of running around) will replace my cards, but hasn't addressed the cause of the failure. I'm also still waiting for a reply from the Lexar 2-day-turnaround support, after 7 business days, including a reminder email.
My theory goes like this: on FAT32, in the first sector (sector 0), there's a field that gives the sector number of the File System Info Sector (FSInfoSec). Every indication I've seen puts this in sector 1, the second physical sector. This sector contains updated counters of used and free clusters on the device. The 256MB cards have about 499,000 512-byte clusters on them. These flash devices have a lifetime of 300,000 write per block, so if I copy 25,000 files to fill the card, the FSInfoSec has been updated either 25,000 times or 499,000 times (depending on when the filesystem updates the counters). If it's the former, I've just eaten up 8% of the lifetime of the card. If the latter, I've killed it before even finishing my write, since a write anywhere also causes a write to sector 1. At best case, once I update this card 12 times, I have to throw it away.
There is some Microsoft documentation that says the FSInfoSec pointer in sector 0 can be set to 0xffff to indicate it's not used. When I used dskprobe.exe from the Microsoft Windows Resource Kit to patch this pointer, Windows 2000 Professional (with a fresh Windows Update applied) blue screens so frequently when I do a dir or chkdsk on the card that I can't do anything useful before I need to cycle power on my PC.
To test my theory, I replaced the dead Lexar card, and repeated the experiment, this time formatting the card FAT16 (no FSInfoSec anymore), and the minimum supported cluster size of 2K. The bad news is of course that I lose about 26MB on the card to fragmentation since the clusters are so large. The good news is that I can write the disk full as many times as I can put up with, and it never fails.
So there are two conclusions: 1) There's a staggeringly high defect rate in the 256 MB cards (SanDisk denies this) and all my ideas about the large cards ever working well with FAT32 are groundless, or 2) even though FAT16 on a 256MB card is hugely wasteful, it's the only way to get the cards to work for very long at all."
... but could you use the loopback device to create an image file and then "dd" the image file to the card? This way the 499,000 writes would be made on the host computer and only the final version written to the card.
Flash disks tend to have filesystems specifically designed for them because they have very different characteristics from traditional drives (ones can change to zeros, but zeros can't change to ones, unless you erase the entire flash sector, and writing a flash sector doesn't matter, it's the erases that count.)
A good flash filesystem will ensure that sectors are only erased when absolutely necessary, and will spread the allocation table out accross multiple sectors FAT16 and FAT32 are horrible about this, and will lead to extremely early flash death. So, if you are going to use flash, please treat it like it is flash, even though it has an IDE interface, it is very different than a standard disk on the other end.
This sounds more like a bug in the controllers inside the Flash cards than the actual choice of filesystem. Most Flash card formats (CompactFlash, MemoryStick, MMC/SD) contain a microcontroller that does wear-leveling and ECC. So, logical block zero of the device does not remain physical device zero if that block gets worn out. There are lots of references on the web discussing the microcontrollers in various Flash cards, for example this article (linked via Google cache because the original is a PDF).
These microcontrollers are precisely the reason why it is not a good idea to use these formats in devices that can be powered off suddenly. Look here (search down for "asynchronous power fail" for a mention of these problems. Elsewhere on the site (and in the JFFS author's other online comments), more discussion of this problem is available, including the JFFS author's own experiments.
JFFS works with MTD devices, which are flat Flash arrays with no microcontroller (and the JFFS author doesn't plan on supporting ATA-type Flash cards, although it appears others may be working on this). This gives JFFS complete control over journalling, wear-leveling, and error correction. It is able to do these things in a fashion that is robust in the face of asynchronous power failures. The microcontrollers in various Flash cards do not appear to be this sophisticated.
So, 1) it may not be the choice of filesystem that is the problem, 2) there are documented reasons for not using Flash cards in certain types of systems, and 3) JFFS (and JFFS2), even if they support non-MTD devices now, probably cannot safeguard against the problems in microcontroller-based Flash cards.
Yesterday it worked; today it is not working; Windows is like that...