One Developer's Experience With Real Life Bitrot Under HFS+
New submitter jackjeff (955699) writes with an excerpt from developer Aymeric Barthe about data loss suffered under Apple's venerable HFS+ filesystem. HFS+ lost a total of 28 files over the course of 6 years. Most of the corrupted files are completely unreadable. The JPEGs typically decode partially, up to the point of failure. The raw .CR2 files usually turn out to be totally unreadable: either completely black or having a large color overlay on significant portions of the photo. Most of these shots are not so important, but a handful of them are. One of the CR2 files in particular, is a very good picture of my son when he was a baby. I printed and framed that photo, so I am glad that I did not lose the original.
(Barthe acknowledges that data loss and corruption certainly aren't limited to HFS+; "bitrot is actually a problem shared by most popular filesystems. Including NTFS and ext4." I wish I'd lost only 28 files over the years.)
Bitrot isn't the fault of the filesystem unless something is badly buggy. It's the fault of the underlying storage-device itself. Attacking HFS+ for something like that is just silly. Now, with that said there are filesystems out there that can guard against bitrot, most notably Btrfs and ZFS. Both Btrfs and ZFS can be used just like a regular filesystem where no parity-information or duplicate copies are saved and in such a case there is no safety against bitrot, but once you enable parity they can silently heal any affected files without issues. The downside? Saving parity consumes a lot more HDD-space, and that's why it's not done by default by most filesystems.
At least you would know that the file was corrupted, so that you could restore it from a good backup.
Yes absolutely great idea! Rather than having technical decisions being made at tech conferences and among developers, system administrators and analysts we should move that authority over to legislature. Because we all know we are going to see a far better weighing of the costs and benefits of various technology choices by the legislature than by technology marketplace.
Apple used HFS+ because it worked to successfully migrate people from Mac OS9, it supported a unix / MacOS hybrid. They continue to use it because it has been good enough and many of the more robust filesystems were pretty heavyweight. I'd like something like BTFS too. But I don't think the people who disagree with me should be jailed.
Depends on the backup methodology. If your backup works the way Apple's backups do, e.g. only modified files get pushed into a giant tree of hard links, then there's a good chance the corrupted data won't ever make it into a backup, because the modification wasn't explicit. Of course, the downside is that if the file never gets modified, you only have one copy of it, so if the backup gets corrupted, you have no backup.
So yes, in an ideal world, the right answer is proper block checksumming. It's a shame that neither of the two main consumer operating systems currently supports automatic checksumming in the default filesystem.
Check out my sci-fi/humor trilogy at PatriotsBooks.
A database is something special
I basically make a "full backup" of my Oracle DBs once a week, and a "incremental backup" in the form of DB change logs every five minutes. (that is, the change logs are pushed "off site" every five minutes, of course they are being written locally continuously with every change.
The thing with backups, though, is not only to make them often but to also *check* them often. With my DBs there is a handy tool where I can check the backup files for "flipped bits" because there are also checksums in the DB files.
For my "private backups to DVD/BR" I only fill them up to ~70%, and fill the rest of the disk with checksum data with dvdisaster., for other "online backups" I create PAR2 files that I also store. With those parity files I can check "are all bits still OK?" now and then, and repair the damage when/if bits start to rot in the backup. In the 10 years I do this, with ~150 DVDs and ~20BRs so far I had 2 DVDs that became "glitchy", but because of the checksum data I was able to repair the ISO and re-burn them.
Basically, IF you go through the trouble of setting up an automated backup system either with software or with your own scripts, It doesn't add much work to also add verification/checksum data to the backup. And that goes a long way into preventing data loss due to bit rot.