Sony Beefs up FAT for Consumer Devices
An anonymous reader points to a report at LinuxDevices which says that "Sony has created an enhanced version of the vFAT filesystem that it says works better in Linux-based consumer electronic devices with removable USB mass storage devices. Unlike vFAT, the xvFAT filesystem will not induce a kernel panic if a USB storage device is removed during a write operation, Sony says," and writes "For now, xvFAT is a patch to the Linux 2.4.20 source tree maintained by CELF, an industry group of consumer electronics giants working to improve Linux for CE devices. Sony intends to submit the filesystem for inclusion in the mainstream 2.6 Linux tree as well."
It's a mount option -- in the case of data corruption, it's usually safer to go down instantly rather than continue. However, I don't expect anyone but the worst lunatics to set it on a FAT filesystem, especially one on an USB device.
Having the kernel crash (as opposed to a panic) due to a bug in a filesystem driver is another story. I've once discovered that a maliciously malformed filesystem can send everything into the la-la land. This is where Hurd's separation of kernel structures would be useful.
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Creepy Crawler:
Why cant you prevent Panics from removing vFat utilizing devices? Shouldnt have Linux came up with a way to gracefully determine 'dirtiness' and then dump the kmod gracefully?
Foolhardy:
What does the filesystem have to do with crashing, other than the quaility of the driver? i.e. what do the on-disk file structures have to do with having a kernel panic?
Good questions. You've just stumbled into a significant flaw in *nix generally.
Linux, begotten of Unix, does not subscribe to the notion of transient filesystems. Behavior is undefined when filesystems vanish suddenly. It seems obvious enough; the kernel should block IO activity, flush buffers, unmount and return errors to users that are attempting IO to the now missing filesystem. Whatever "damage" occurs to the data (as opposed to filesystem metadata) is, rightly, the users problem. Unfortunately, this is not what happens.
What does happen falls under the euphemism "implementation defined." A good example is evident with NFS; *nix admins have been independently discovering this for years. If an NFS mount vanishes, *nix processes often hang indefinitely with no means of recovery. Various "soft mount" hacks appeared to accommodate the real world where network problems exist. Again, the actual behavior is not consistent; "soft mounts" are not always honored and obscure things like NFS versions or various "modes" of IO factor into why or why not.
I believe that in the early days the need to optimize IO led to designs that made no allowance for transient filesystems. This design propagated itself into POSIX, where behavior was left undefined. Even today you find crazy things like kernel panics when a FAT filesystem does something other than remain perpetually mounted. There is no "correct" thing to do and developers, hesitant to start inventing policy where none exists, go on being oblivious to the problem.
The fact is that a large percentage of "important" filesystems are transient. Remote storage, removable storage, etc. host valuable data, while permanently attached storage provides only basic machinery.
Sony, stuck trying to make transient vFAT filesystem hosting devices play nice with Linux, has stepped in and attempted to address the problem. *nix will be dragged kicking and screaming into the modern era of transient filesystems. Unfortunately, Sony's pragmatic, special case solution does nothing to address the larger problem, and whatever solutions emerge for all the other possible cases probably will be/are inconsistent in both implementation and behavior.
Blame the *nix folks who, 30 years ago, failed to anticipate hot pluggable keychains with hundreds of megabytes of storage.
Lurking at the bottom of the gravity well, getting old
I'm gonna give it a shot.
Right now I'm going to start copying a large file to my thumb drive, and once it's got 30-40mb done, I'm going to pull out.
Wait for it!
Good news, everyone! All I got was an error message from GNOME -- "I/O Error while copying file foo.avi. Would you like to continue? Skip/Cancel/Retry"
I'm gonna stick the drive back in and tell it to continue -- stay tuned!
Holy crap, it picked up all on its own.
Wait...
Yep, it just passed an fsck.Sony, what are you smoking???
Everything you've said here is correct, and I agree with you, but you haven't mentioned the fact that in general, dealing with transient filesystems is an enormously hard problem on any real OS. There is no quick fix for this.
The problem is that you have to make sure that the filesystem on disk is consistent when the media is removed --- but by the time you know that the media is being removed, it's too late to do anything!
Unix deals with this problem by simply refusing to deal with it: it requires you to dismount all filesystems before disconnecting the media. Which is fine if you're dealing with hard disks, but less fine on USB devices and floppies. (There's a good reason why most serious Unix hardware have software floppy drive eject mechanisms.)
(Unix has the particularly unpleasant issue of the unified VM and I/O system; what do you do if you want to page in a block from a file system that's gone away? Seg fault? Block until it comes back again? Wave your arms in the air and run around in small circles? Different implementations do all three...)
Windows and DOS attempt to deal with the problem by using write-through cacheing on anything it thinks is transient. This kills performance. (Try switching off write-through cacheing on your floppy disk sometime.) But even Windows wants you to dismount USB devices before removing them.
Even CP/M had a variation of this problem --- there were specific system calls to detect disk changes and discard its caches. You were supposed to call this every time your program stopped for user input. Not all programs did, which meant that if you changed disks at the wrong time, you could end up with a corrupted disk...