One Developer's Experience With Real Life Bitrot Under HFS+

← Back to Stories (view on slashdot.org)

One Developer's Experience With Real Life Bitrot Under HFS+

Posted by timothy on Saturday June 14, 2014 @01:25AM from the so-really-it's-both-plus-and-minus dept.

New submitter jackjeff (955699) writes with an excerpt from developer Aymeric Barthe about data loss suffered under Apple's venerable HFS+ filesystem. HFS+ lost a total of 28 files over the course of 6 years. Most of the corrupted files are completely unreadable. The JPEGs typically decode partially, up to the point of failure. The raw .CR2 files usually turn out to be totally unreadable: either completely black or having a large color overlay on significant portions of the photo. Most of these shots are not so important, but a handful of them are. One of the CR2 files in particular, is a very good picture of my son when he was a baby. I printed and framed that photo, so I am glad that I did not lose the original. (Barthe acknowledges that data loss and corruption certainly aren't limited to HFS+; "bitrot is actually a problem shared by most popular filesystems. Including NTFS and ext4." I wish I'd lost only 28 files over the years.)

14 of 396 comments (clear)

Min score:

Reason:

Sort:

I've also had this happen with HFS+ by carlhaagen · 2014-06-14 01:30 · Score: 4, Informative

An old partition of some 20000 files, most of them 10 years or older, in where I found 7 or 8 files - coincidentally jpg images as well - that were corrupted. It struck me as nothing other than filesystem corruption as the drive was and still is working just fine.
1. Re:I've also had this happen with HFS+ by istartedi · 2014-06-14 01:57 · Score: 4, Insightful
  
  coincidentally jpg images as well
  Well, JPGs are usually lossy and thus compressed. Flipping one bit in a compressed image file is likely to have severe consequences. OTOH, you could coXrupt a fewYentire byteZ in an uncompressed text file and it would still be readable. I suspect your drives also had a few "typos" that you didn't notice because of that.
  
  --
  For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Bitrot not the fault of filesystem by Gaygirlie · 2014-06-14 01:40 · Score: 5, Insightful

Bitrot isn't the fault of the filesystem unless something is badly buggy. It's the fault of the underlying storage-device itself. Attacking HFS+ for something like that is just silly. Now, with that said there are filesystems out there that can guard against bitrot, most notably Btrfs and ZFS. Both Btrfs and ZFS can be used just like a regular filesystem where no parity-information or duplicate copies are saved and in such a case there is no safety against bitrot, but once you enable parity they can silently heal any affected files without issues. The downside? Saving parity consumes a lot more HDD-space, and that's why it's not done by default by most filesystems.
1. Re: Bitrot not the fault of filesystem by jmitchel!jmitchel.co · 2014-06-14 01:52 · Score: 4, Insightful
  
  Even with just checksums, knowing that there is corruption means knowing to restore from backups. And in the consumer space most people have plenty of space to keep parity if it comes to that.
Re:Backup? by kthreadd · 2014-06-14 01:42 · Score: 4, Insightful

The problem with bit rot is that backups doesn't help. The corrupted file go into the backup and eventually replace the good copy depending on retention policy. You need a file system which uses checksums on all data block so that it can detect a corrupted block after reading it, flag the file as corrupted so that you can restore it from a good backup.
Re:Legacy file systems should be illegal by kthreadd · 2014-06-14 01:43 · Score: 5, Insightful

At least you would know that the file was corrupted, so that you could restore it from a good backup.
Re:Legacy file systems should be illegal by jbolden · 2014-06-14 01:47 · Score: 5, Insightful

Yes absolutely great idea! Rather than having technical decisions being made at tech conferences and among developers, system administrators and analysts we should move that authority over to legislature. Because we all know we are going to see a far better weighing of the costs and benefits of various technology choices by the legislature than by technology marketplace.
Apple used HFS+ because it worked to successfully migrate people from Mac OS9, it supported a unix / MacOS hybrid. They continue to use it because it has been good enough and many of the more robust filesystems were pretty heavyweight. I'd like something like BTFS too. But I don't think the people who disagree with me should be jailed.
Re:Legacy file systems should be illegal by mgmartin · 2014-06-14 01:54 · Score: 4, Informative

As does zfs: man zfs
copies=1 | 2 | 3 Controls the number of copies of data stored for this dataset. These copies are in addition to any redundancy provided by the pool, for example, mirroring or RAID-Z. The copies are stored on different disks, if possible. The space used by multiple copies is charged to the associated file and dataset, changing the used property and counting against quotas and reservations. Changing this property only affects newly-written data. Therefore, set this property at file system creation time by using the -o copies=N option.
Re:Legacy file systems should be illegal by peragrin · 2014-06-14 02:04 · Score: 4, Interesting

This is something everyone forgets.
It takes decades to build long term reliable file systems.
ZFS, BTFS, are less than a decade old.
Windows runs on NTFS Version something. NTFS was started in what year?
HFS, and then HFS + was built in what year?
How long has Microsoft been promising WinFS?
File systems change but only slowly. This is good. you need a good long track record to convince people they won't lose files every ten years due to random malfunctions.

--
i thought once I was found, but it was only a dream.
So far what I lost... by cpct0 · 2014-06-14 02:08 · Score: 4, Interesting

Bitrot is not usually the issue for most files. Sometimes, but it's rare. What I lost is a mayhem repository of hardware and software and human failure. Thanks for backup, life :)
On Bitrot:
- MP3s and M4As I had that suddenly started to stutter and jump around. You play the music and it starts to skip. Luckily I have backups (read on for why I have multiple backups of everything :) ) so when I find them, I just revert to the backup.
- Images having bad sectors like everyone else. Once or twice here or there.
- A few CDs due to CD degradation. That includes one that I really wish I'd still have, as it was a backup of something I lost. However, the CD takes hours to read, and then eventually either balks up or not for the directory. I won't tell you about actually trying to copy the files, especially with normal timeouts in modern OSes or the hardware pieces or whatnot.
Not Bitrot:
- Two RAID Mirror hard drives, as they were both the same company, and purchased at the same time (same batch), in the same condition, they both balked at approximately the same time, not leaving me time to transfer data back.
- An internal hard drive, as I was making backups to CDs (at that time). For some kind of reason I still cannot explain, the software thought my hard drive was both the source and the destination !!!! Computer froze completely after a minute or two, then I tried rebooting to no avail, and my partition block was now containing a 700mb CD image, quarter full with my stuff. I still don't know how that's possible, but hey, it did. Since I was actualy making my first CD at the time and it was my first backup in a year, I lost countless good files, many I gave up upon (especially my 90's favorite music video sources ripped from the original betacam tapes in 4:2:2 by myself).
- A full bulk of HDs on Mac when I tried putting the journal to another internal SSD drive. I have dozens of HDDs, and I thought it'd go faster to use that nifty "journal on another drive" option. It did work well, although it was hell to initialize, as I had to create a partition for each HDD, then convert them to journaled partitions. Worked awesomely, very quick, very efficient. One day after weeks of usage, I had to hard close the computer and its HDD. When they remounted, they all remounted in the wrong order, somehow using the bad partition order. So imagine you have perfectly healthy HDDs but thinking they have to use another HDDs journal. Mayhem! Most drives thought they were other ones, so my music HDD became my photos HDD RAID, my system HDD thought it was the backup HDD, but just what was in the journal. It took me weeks sporting DiskWarrrior and Data Rescue in order to get 99% of my files back (I'm looking at you, DiskWarrior as a 32 bit app not supporting my 9TB photo drive) with a combinaison of the original drive files and the backup drive files. Took months to rebuild the Aperture database from that.
- All my pictures from when I met my wife to our first travels. I had them in a computer, I made a copy for sure. But I cannot find any of that anywhere. Nowhere to be found, no matter where I look. Since that time, many computers happened, so I don't know where it could've been sent. But I'm really sad to have lost these
- Did a paid photoshoot for an unique event. Took 4 32GB cards worth of priceless pictures. Once done with a card, I was sifting through the pictures with my camera and noticed it had issues reading the card. I removed it immediately. When at home, I put the card in my computer, it had all the troubles in the world reading it (but was able to do so), I was (barely) able to import its contents to Aperture (4-5 pictures didn't make the cut, a few dozens had glitches). It would then (dramatically, as it somehow have its last breath after relinquishing its precious data) not read or mount anywhere, not even being recognized as a card by the readers. Childs, use new cards regularly for your gigs :)
- A RAID array b
Re:Backup? by dgatwood · 2014-06-14 02:09 · Score: 5, Insightful

Depends on the backup methodology. If your backup works the way Apple's backups do, e.g. only modified files get pushed into a giant tree of hard links, then there's a good chance the corrupted data won't ever make it into a backup, because the modification wasn't explicit. Of course, the downside is that if the file never gets modified, you only have one copy of it, so if the backup gets corrupted, you have no backup.
So yes, in an ideal world, the right answer is proper block checksumming. It's a shame that neither of the two main consumer operating systems currently supports automatic checksumming in the default filesystem.

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
article is suspect, summary is worse by sribe · 2014-06-14 02:15 · Score: 4, Informative

In a footnote he admits that the corruption was caused by hardware issues, not HFS+ bugs, and of course the summary ignores that completely.
So, for that, let me counter his anecdote with my own anecdote: I have an HFS+ volume with a collection of over 3,000,000 files on it. This collection started in 2004, approximately 50 people access thousands of files on it per day, and occasionally after upgrades or problems it gets a full byte-to-byte comparison to one of three warm standbys. No corruption found, ever.
Clueless article by alexhs · 2014-06-14 02:27 · Score: 4, Informative

People talking about "bit rot" usually have no clue, and this guy is no exception.
It's extremely unlikely that a file would become silently corrupted on disk. Block devices include per-block checksums, and you either have a read error (maybe he has) or the data read is the same as the data previously written. As far as I know, ZFS doesn't help to recover data from read errors. You would need RAID and / or backups.
Main memory is the weakest link. That's why my next computer will have ECC memory. So, when you copy the file (or otherwise defragment or modify the file, etc), you read a good copy, some bit flips in RAM, and you write back corrupted data. Your disk receives the corrupted data, happily computes a checksum, therefore ensuring you can read back your corrupted data faithfully. That's where ZFS helps. Using checksumming scripts is a good idea, and I do it myself. But I don't have auto-defrag on Linux, so I'm safer : when I detect a corrupted copy, I still have the original.
ext2 was introduced in 1993, and so was NTFS. ext4 is just ext2 updated (ext was a different beast). If anything, HFS+ is more modern, not that it makes a difference. All of them are updated. By the way, I noticed recently that Mac OS X resource forks sometimes contain a CRC32. I noticed it in a file coming from Mavericks.

--
I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.
Re: Legacy file systems should be illegal by aix+tom · 2014-06-14 03:01 · Score: 5, Interesting

A database is something special
I basically make a "full backup" of my Oracle DBs once a week, and a "incremental backup" in the form of DB change logs every five minutes. (that is, the change logs are pushed "off site" every five minutes, of course they are being written locally continuously with every change.
The thing with backups, though, is not only to make them often but to also *check* them often. With my DBs there is a handy tool where I can check the backup files for "flipped bits" because there are also checksums in the DB files.
For my "private backups to DVD/BR" I only fill them up to ~70%, and fill the rest of the disk with checksum data with dvdisaster., for other "online backups" I create PAR2 files that I also store. With those parity files I can check "are all bits still OK?" now and then, and repair the damage when/if bits start to rot in the backup. In the 10 years I do this, with ~150 DVDs and ~20BRs so far I had 2 DVDs that became "glitchy", but because of the checksum data I was able to repair the ISO and re-burn them.
Basically, IF you go through the trouble of setting up an automated backup system either with software or with your own scripts, It doesn't add much work to also add verification/checksum data to the backup. And that goes a long way into preventing data loss due to bit rot.