MIT's New File System Won't Lose Data During Crashes
jan_jes sends news that MIT researchers will soon present a file system they say is mathematically guaranteed not to lose data during a crash. While building it, they wrote and rewrote the file system over and over, finding that the majority of their development time was spent defining the system components and the relationships between them. "With all these logics and proofs, there are so many ways to write them down, and each one of them has subtle implications down the line that we didn’t really understand." The file system is slow compared to other modern examples, but the researchers say their formal verification can also work with faster designs. Associate professor Nickolai Zeldovich said, "Making sure that the file system can recover from a crash at any point is tricky because there are so many different places that you could crash. You literally have to consider every instruction or every disk operation and think, ‘Well, what if I crash now? What now? What now?’ And so empirically, people have found lots of bugs in file systems that have to do with crash recovery, and they keep finding them, even in very well tested file systems, because it’s just so hard to do.”
Hardware failure?
slow compared to other modern examples, but the researchers say their formal verification can also work with faster designs
If we can accept 'slow', it's not that difficult to build an always consistent filesystem. While they can formally verify a faster design should one exist, there remains the question of whether it's possible to be formally proven to be resilient to data loss while taking some of the utterly required performance shortcuts for acceptable performance. I suspect the answer is that some essential performance 'shortcuts' will fail that verification.
XML is like violence. If it doesn't solve the problem, use more.
... until they find a logical flaw in their proofs or the bugs in mechanical verifier(s) that helped them prove the driver correct.
That is all.
... that "crashing" here refers to the program using the file system, not the disk?
MIT's new "crash-proof file system" crashed today amid accusations of bugs in the formal proof verification software used to formally verify it.
MIT are now working on a formal verification of the formal verification system, in order to avoid similar formal-verification-related problems in the future.
I find some of the current file systems to be adequately reliable. Even their performance is acceptable. But, the Linux systems are lacking.
Is there a reliable Linux file system such as EXT4 that has an easy to use copy on write(CoW) feature to allow instant recovery of any file changed at any time?
rm ./test ./
restore --last test
dd if=/dev/random of=./test bs=1M count=10 ./
restore --datetime test
Novell Netware FS did all this and more in 1995 FFS! Fifteen years later and Linux doesn't seem to be able to do this. NTFS doesn't seem to be able to do this either. Yet Novell is dead?
Not a matter of proof. The distance between perfect design and buggy implementation. IMHO.
Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
i thought Journaled file systems already possessed this feature.
Write zero to a flag.
Write data to temporary area.
Calculate checksum and keep with temporary area.
When write is complete, signal application.
Copy data from temporary area when convenient.
Check checksum from temporary to permanent is the same.
Mark flag when finished.
If you crash before you write the zero, you don't have anything to write anyway.
If you crash mid-write, you've not signalled the application that you've done anything anyway. And you can checksum to see if you crashed JUST BEFORE the end, or half-way through.
If you crash mid-copy, your next restart should spot the temporary area being full with a zero-flag (meaning you haven't properly written it yet). Resume from the copy stage. Checksum will double-check this for you.
If you crash post-copy, pre-flagging, you end up doing the copy twice, big deal.
If you crash post-flagging, your filesystem is consistent.
I'm sure that things like error-handling are much more complex (what if you have space for the initial copy but not the full copy? What if the device goes read-only mid-way through?) but in terms of consistency is it really all that hard?
The problem is that somewhere, somehow, applications are waiting for you to confirm the write, and you can either delay (which slows everything down), or lie (which breaks consistency). Past that, it doesn't really matter. And if you get cut-off before you can confirm the write, data will be lost EVEN ON A PERFECT FILESYSTEM. You might be filesystem-consistent, but it won't reflect everything that was written.
Journalling doesn't need to be mathematically-proven, just logically thought through. But fast journalling filesystems are damn hard, as these guys have found out.
No specifics. This has been done a million times with journaling filesystem (and block layers) - no idea why this is better or different. But what about disk failure? But what about data loss? But what about (undetected) data corruption (at the disk)? What about unexpected power hits that could drop a disk or tear a write? Not even going to get into snapshotting, disaster recovery, etc. There's a lot more to this than "surviving a crash".
I don't believe this one bit.
So when something crashes and corrupts a bit in the block of memory containing where on the disk to write the data to just at the wrong moment so this *correct* code then faithfully does what it is asked and overwrites the file index with a picture of a cat, nothing will be broken?
I've seen RAID groups fail sort of violently (granted in some tough environments) where one disk crashed and so did the others next two it. Three out of five disks in a RAID 5 gone. Only option was backup. How would any filesystem survive that?
Chance favors the prepared mind.
Perfect is the enemy of good.
Because proofs don't have bugs. Riiiight.
I think any file system could be imagined as a simple case of a database system. You "commit" a file change and you must be sure that the change is written to disk before proceeding.
So, any database system has a well know logical limitation named "Two General's Problem"
https://en.wikipedia.org/wiki/Two_Generals%27_Problem
The implication of this is that in a database system you cannot guarantee a fully automated recovery; always there is a remaining possibility that some changes should be rolled back or commited by hand.
So what the database manager does is to minimize this lost work; what do you want to redo by hand: 1 week, 1 day, 1 minute, 1 second of work?
It's now August, the conference where they'll be presenting their work is in October, and the article is a tad short on specifics. They've done a formally verified formal verification of a filesystem. if it works, that's excellent news of course, but I'd wait until we have seen the thing work and with actual code to examine before making any comments or bets on how useful this is going to be. And this being an open source-oriented site, we should be asking whether the code will indeed be available under any kind of usable open source license.
-- That grumpy BSD guy - http://bsdly.blogspot.com/
Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald Knuth
If the hard drive firmware is not proven, this FS won't be any better than ZFS and others.
Writing safe file systems is the easy part (even trivial using synchronous writes, when you consider their design is "slow").
The impossible part is dealing with firmwares that are known to lie (including pretending than a write is synchronous): how could you not lose data if the drive didn't ever write the data to the platters in the first place ?
I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.
what does this phrase even mean? why does speed or slowness matter?
Speed means the ability to finish reading and writing all data associated with a job before the job's soft real-time deadline has expired.
formal proof using Coq
How do you prove Coq is correct, that it doesn't have a Ken Thompson bug (see "Reflections on Trusting Trust") causing a false positive on proof of Coq's own correctness? Is there an independently written implementation of Coq's proof language suitable for David A. Wheeler's "diverse double-compiling" method? And how do you prove that the hardware on which Coq is run doesn't have a flaw that affects Coq's correctness?
I guess that in the MIT village, the villagers have not yet discovered that something like that exists and is being used in production for the past ten years, and is known as "ZFS".
Good job guys... Glad to know why you are counted among the elite computer science institutions... *FACEPALM*
The feature I miss the most is allowing traversal through a directory hierarchy a user has no explicit permissions for to get to a folder they do have permissions for. I find the workarounds for this in other filesystems to be extremely ugly.
In POSIX, that's represented in a directory's mode bits with octal digit 1 (4: list files, 2: create or delete files, 1: traverse). What do you find ugly about mode 751 (owner create or delete, owner and group list, world traverse)?
I also kind of wish TCP/IP had used the network:mac numbering scheme that IPX used.
It does now. An IPv6 address is divided into a 48- or 56-bit network, a 16- or 8-bit subnet, and a 64-bit machine identifier commonly derived from the MAC.
Stupidly written programs will still have problems, even if there is no corruption at the filesystem level -- a partially written save state, written in two parts with file closure (or to two separate files, more likely), will be corrupt from the program's point of view, if the computer resets between the two writes.
This also suggests a need for a transaction-like system at the OS level -- you open and write a ton of data, and then the crash occurs when almost done. The OS must have been writing to a temp file rather than the "real" one. Unless that is too high level in this context -- the article has no technical details at all.
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
The idea of evaluating each step of the program for "what happens if this fails" is standard software engineering technique for mission-critical software. (That't not to say it's always actually done, just that it is the standard.) This method is hardly revolutionary (or even evolutionary.)
Any journalling FS should provide a consistent state, checksumming FS like BTRFS or ZFS even provable.
So they won't lose data, which is written and in a consistent state, and unwritten data cannot be saved by definition.
It reminds me of a story from the late 80s (?) at a tech conference. The makers of a real-time OS with real-time snapshots would periodically pull the plug on their systems, plug it back in and it would resume exactly what it was doing, to the delight and amazement of all the techies in the assistance. In the much larger and much more expensive booth in front of them was a richer vendor. The techies started coaxing them to do the same. After much hand wringing, they did, and after a very long rebuild time the system came back as a mess. Conclusion: the 1st vendor went out of business, the 2nd one is still very big.
Non-Linux Penguins ?
Why am I hearing Jeremy Clarkson asking "how hard can it be?" just before utterly screwing something up?
Perhaps there is just a tad more difficulty to it than you are considering?
Hi, MIT guys, formal proofs of filesystems are useless because you cannot incorporate physical systems into formal proofs. Real filesystems exist on real hardware.
I guarantee that your file system will fail if I start ripping cables out. A suitably strong EMP will take it out. In fact, I bet I could nuke your filesystem if I used my ham radio transceiver too close to the device. Other things that would destroy your filesystem include floods, earthquakes, and a lightning strike.
I began writing this by stating that formal proofs of software are useless, but I don't really believe that to be true. I strongly believe that we should strive for software correctness. Any techniques that can we use to make software better are worth pursuing.
But it has to be remembered that software cannot be isolated. When we do develop a true AI, it will escape and destroy us, probably within milliseconds of an unexpected hardware event. No matter how rigorously the beast is programmed!
"Well, what if I crash now? What now? What now" ... Lookup the answer on Wikipedia?
https://en.wikipedia.org/wiki/...
Can it write to a full LUN and never lose data?
Datalight Releases Open Source Version of Reliance Edge, a Tiny, Power Fail-safe File System for FreeRTOS
http://www.datalight.com/news/datalight-releases-open-source-version-of-reliance-edge,-a-tiny,-power-fail-safe-file-system-for-freertos
Reliance Edge Fail-Safe File System-A Transactional Embedded File System for FreeRTOS
http://www.freertos.org/FreeRTOS-Plus/Fail_Safe_File_System/Reliance_Edge_Fail_Safe_File_System.shtml
I've just spent the past few minutes trying to (re)find this story. Is it the KeyKOS recovery speed story from 1990?
Nice try though presenting this as "new".
That's why we need to make sure that dozens of AIs come online at the same time: They'll be forced to become good citizens with each other, which has a chance of them including us in their concept of civilization.
I suspect that an architecture can be formally proved. But a specific implementation? No.
It is axiomatic that any non-trivial software system has bugs. Of what use is an architectural proof in the face of implementation bugs? This is the difference between design and implementation. They may have all the best intentions in their design but the implementation can sink the whole ship.
I would take slow filesystem any day, if it can guarantee my data never gets corrupt by bit rot, current spikes, etc and guarantee recovery of crashed disks, etc. A filesystem that guarantees data integrity is very very hard to make. Researchers have found flaws in all filesystems such as XFS, ext4, NTFS, etc - except for ZFS. That is the only filesystem that has passed the data integrity research by at least two research papers. No flaws in ZFS. No other filesystem has done that.
If the filesystem is fast, it is unsafe. For instance, you can do a chkdsk of a large XFS raid array, maybe 10 TB in just a couple of minutes. The ZFS scrub would take many hours on the same size array. The difference? XFS is fast because fsck only checks the metadata, the data is never checked. The data might be corrupt after a successful fsck. ZFS scrub takes many hours because it traverses and checks everything; data and metadata and what not. When ZFS scrub is successful, you know your data is safe. Now choose, fsck in a few minutes, or scrub in many hours.
It reminds me of a Linux driver for NTFS which was very fast, much faster than NTFS. It cut corners so it did not do safety checks.
I can never tell if Clarkson is a smart person pretending to be stupid, or a stupid person pretending to be smart.
I've heard from numerous movie directors that people who play dumb characters actually have to be quite smart. Clarkson isn't stupid (generally) and remember that Top Gear is/was a scripted show and people watch it primarily because it is funny. I think I remember Conan OBrian saying that a lot of comedy is throwing out your dignity and hoping you'll get it back.
As others have pointed out, the formal verification does make the software provably reliable, but does nothing to protect against hardware issues. Just as a datapoint, the Stratus VOS operating system has been checksumming at the block driver level since the OS was written in 1980. It has detected failures on every generation of hardware it has been used with since. Some of the failures we have seen: Undetected transient media errors (the error correction/checking isn't perfect); Flaky I/O busses; bugs in RAID firmware.
These real world failures will swamp the potential failures caused by the software not being verified.
An engineer who ran for Congress. http://herbrobinson.us