One Developer's Experience With Real Life Bitrot Under HFS+

I've also had this happen with HFS+ by carlhaagen · 2014-06-14 01:30 · Score: 4, Informative

An old partition of some 20000 files, most of them 10 years or older, in where I found 7 or 8 files - coincidentally jpg images as well - that were corrupted. It struck me as nothing other than filesystem corruption as the drive was and still is working just fine.

Re:I've also had this happen with HFS+ by istartedi · 2014-06-14 01:57 · Score: 4, Insightful

coincidentally jpg images as well
Well, JPGs are usually lossy and thus compressed. Flipping one bit in a compressed image file is likely to have severe consequences. OTOH, you could coXrupt a fewYentire byteZ in an uncompressed text file and it would still be readable. I suspect your drives also had a few "typos" that you didn't notice because of that.

--
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Re:I've also had this happen with HFS+ by Jane+Q.+Public · 2014-06-14 11:14 · Score: 3, Insightful

I agree with istaredi.

Ultimately, it isn't a "failure" of HFS+ when your files get corrupted. It was (definitely) a hardware failure. It's just that HFS+ didn't catch the error when it happened.

Granted, HFS+ is due for an update. That's something I've said myself many times. But blaming it when something goes wrong is like blaming your Honda Civic for smashing your head in when you roll it. It wasn't designed with a roll cage. You knew that but you bought it anyway, and decided to hotdog.

Checksums also have performance and storage costs. So there are several different ways to look at it. One thing I strongly suggest is keeping records of your drive's S.M.A.R.T. status, and comparing them from time to time. And encourage Apple to update their FS, rather than blaming it for something it didn't cause, or for not doing something it wasn't designed to do.
Re:I've also had this happen with HFS+ by thegarbz · 2014-06-14 20:07 · Score: 1

coincidentally jpg images as well
Well, JPGs are usually lossy and thus compressed. Flipping one bit in a compressed image file is likely to have severe consequences.
That depends entirely on how the compression technique works. With JPEGs flipping a bit here or there shouldn't actually have too much effect on the overall image but may affect an 8x8 block of pixels.
The problem with unreadable JPEGs usually comes from how decoders handle the errors, or rather can't handle the errors. Windows Picture Viewer is the worst often simply flat out failing to render anything at all. Most other viewers will render down to the flip bit then render the rest either garbage or grey. There are however tools to fix the resulting data.
Re:I've also had this happen with HFS+ by Anonymous Coward · 2014-06-15 04:23 · Score: 1

...blaming it when something goes wrong is like blaming your Honda Civic for smashing your head in when you roll it. It wasn't designed with a roll cage. You knew that but you bought it anyway, and decided to hotdog. ...I strongly suggest is keeping records of your drive's S.M.A.R.T. status
In this case (and every case, really) "hotdogging" means "reading files." Reading files should not be an extreme sport for a filesystem. The drive's S.M.A.R.T. status was "everything is just dandy." This is just thermal turbulence and cosmic rays hitting the drive platter and HFS+ failing to notice or recover. Everything about this is utterly mundane, but it's stupid and shouldn't happen in 2014. We have half a dozen modern filesystems that know better, and whatever cost there is to error recovery they've made up for it in many ways.
You're right that you can't blame HFS+ for doing exactly what it's designed to do. I don't read anyone's writings so far as blame for HFS+ itself causing harm. But, bitrot is real, and a filesystem that was initially designed for computing and storage constraints 25 years ago is a bad tool for today. Apple has had pressure to change the filesystem for a long time. Articles like this do exactly what you suggest—renew pressure on Apple to fix it.
Re:I've also had this happen with HFS+ by Desty · 2014-06-15 04:39 · Score: 1

JPEG is quite robust to corruption, and even PNG's lossless compression seems to be tolerant of a few stray bytes. However, encrypted files would probably be badly damaged by this sort of corruption.

This is the kind of situation where some form of transparent, redundant error-recovery system is extremely important. I'm sure that in the medium term future (after everyone is using SSDs and the cost/capacity ratio falls much further) some kind of RAID setup will be the norm and these kinds of problems will become vanishingly unlikely.
Re:I've also had this happen with HFS+ by doccus · 2014-06-15 10:50 · Score: 1

The problem with unreadable JPEGs usually comes from how decoders handle the errors, or rather can't handle the errors. Windows Picture Viewer is the worst often simply flat out failing to render anything at all. Most other viewers will render down to the flip bit then render the rest either garbage or grey. There are however tools to fix the resulting data.
I suspect Windows Picture Viewer isn't so terribly much an issue on HFS+
Re:I've also had this happen with HFS+ by thegarbz · 2014-06-15 19:23 · Score: 1

I suspect that NTFS is no better than HFS+ at dealing with Bitrot.

Legacy file systems should be illegal by Anonymous Coward · 2014-06-14 01:35 · Score: 1

We know how to build good file systems. We have done it for years with ZFS and now Btrfs. Sticking to legacy file systems which are prone to corruption is simply not acceptable. It is about time that legislative authorities makes it illegal for Apple and other negligent vendors to ship file systems that are essentially faulty by design. A noticeable fine per corrupted file would be appropriate, with possibility of prison time upon recurring incidents.

Re:Legacy file systems should be illegal by Anonymous Coward · 2014-06-14 01:39 · Score: 2, Informative

The problem is, neither ZFS or Btrfs would have stopped an arbitrary bit inside an arbitrary file from becoming corrupt if the disk failed to write it or read it correctly. Only multiple disks and redundancy would have solved that.
Re:Legacy file systems should be illegal by kthreadd · 2014-06-14 01:43 · Score: 5, Insightful

At least you would know that the file was corrupted, so that you could restore it from a good backup.
Re:Legacy file systems should be illegal by Chandon+Seldon · 2014-06-14 01:46 · Score: 3, Interesting

Btrfs (at least) can store multiple copies on one disk and use a checksum to identify the good copy to read. Obviously more disks is better, but...

--
-- The act of censorship is always worse than whatever is being censored. Always.
Re:Legacy file systems should be illegal by jbolden · 2014-06-14 01:47 · Score: 5, Insightful

Yes absolutely great idea! Rather than having technical decisions being made at tech conferences and among developers, system administrators and analysts we should move that authority over to legislature. Because we all know we are going to see a far better weighing of the costs and benefits of various technology choices by the legislature than by technology marketplace.
Apple used HFS+ because it worked to successfully migrate people from Mac OS9, it supported a unix / MacOS hybrid. They continue to use it because it has been good enough and many of the more robust filesystems were pretty heavyweight. I'd like something like BTFS too. But I don't think the people who disagree with me should be jailed.
Re:Legacy file systems should be illegal by mgmartin · 2014-06-14 01:54 · Score: 4, Informative

As does zfs: man zfs
copies=1 | 2 | 3 Controls the number of copies of data stored for this dataset. These copies are in addition to any redundancy provided by the pool, for example, mirroring or RAID-Z. The copies are stored on different disks, if possible. The space used by multiple copies is charged to the associated file and dataset, changing the used property and counting against quotas and reservations. Changing this property only affects newly-written data. Therefore, set this property at file system creation time by using the -o copies=N option.
Re:Legacy file systems should be illegal by peragrin · 2014-06-14 02:04 · Score: 4, Interesting

This is something everyone forgets.
It takes decades to build long term reliable file systems.
ZFS, BTFS, are less than a decade old.
Windows runs on NTFS Version something. NTFS was started in what year?
HFS, and then HFS + was built in what year?
How long has Microsoft been promising WinFS?
File systems change but only slowly. This is good. you need a good long track record to convince people they won't lose files every ten years due to random malfunctions.

--
i thought once I was found, but it was only a dream.
Re:Legacy file systems should be illegal by Anonymous Coward · 2014-06-14 02:12 · Score: 1

The op's position is obviously absurd, but seriously how well is this tech conference decision thing going? We have like what, just a small number of even FOSS operating systems have modern file systems. I wouldn't mind Apple being one of the larger operating systems vendors being kicked in the butt on this. Even Microsoft deserves its share of shame on this.
Re: Legacy file systems should be illegal by the_B0fh · 2014-06-14 02:58 · Score: 3, Informative

Not if your OS is tied intimately to your filesystem. Linux might not, because a large number of things are abstracted out, but FreeBSD depends on its file system, Solaris took a very long time/effort before it could boot off ZFS. Forget about moving Windows off NTFS. Apple actually did some work on putting it onto ZFS, maybe they will continue.
Re:Legacy file systems should be illegal by nine-times · 2014-06-14 03:00 · Score: 1

How long has Microsoft been promising WinFS?
I thought Microsoft gave up on WinFS. Are they still promising it?
Re:Legacy file systems should be illegal by the_B0fh · 2014-06-14 03:00 · Score: 1

Bullshit. I was running up through 10.4 or 10.5 on a PowerMac G4 450mhz. It was more responsive (I didn't say faster) and usable than the dual 1.4Ghz Pentium 3 I had.
Re: Legacy file systems should be illegal by aix+tom · 2014-06-14 03:01 · Score: 5, Interesting

A database is something special
I basically make a "full backup" of my Oracle DBs once a week, and a "incremental backup" in the form of DB change logs every five minutes. (that is, the change logs are pushed "off site" every five minutes, of course they are being written locally continuously with every change.
The thing with backups, though, is not only to make them often but to also *check* them often. With my DBs there is a handy tool where I can check the backup files for "flipped bits" because there are also checksums in the DB files.
For my "private backups to DVD/BR" I only fill them up to ~70%, and fill the rest of the disk with checksum data with dvdisaster., for other "online backups" I create PAR2 files that I also store. With those parity files I can check "are all bits still OK?" now and then, and repair the damage when/if bits start to rot in the backup. In the 10 years I do this, with ~150 DVDs and ~20BRs so far I had 2 DVDs that became "glitchy", but because of the checksum data I was able to repair the ISO and re-burn them.
Basically, IF you go through the trouble of setting up an automated backup system either with software or with your own scripts, It doesn't add much work to also add verification/checksum data to the backup. And that goes a long way into preventing data loss due to bit rot.
Re:Legacy file systems should be illegal by Anonymous Coward · 2014-06-14 03:15 · Score: 1

I had an iBook G4 1.2 GHz and 10.5 was the most sluggish thing I've ever seen. I later downgraded back to 10.4 until I retired the hardware.
Re: Legacy file systems should be illegal by wagnerrp · 2014-06-14 03:38 · Score: 1

Actually, no, they won't. The chances of bitrot occurring in the same location on both your primary store and your backup, such that neither had viable data to recover, is astronomically low.
Re: Legacy file systems should be illegal by Anonymous Coward · 2014-06-14 03:59 · Score: 1

FreeBSD 10 has an "install to ZFS root" option now. It's been possible for a while to do it manually.
Re:Legacy file systems should be illegal by gweihir · 2014-06-14 04:20 · Score: 2

Bullshit. Anybody doing competent archiving will either use professional archive-grade tape or spinning disks in RAID that gets checked frequently and with a second copy on a geographically independent location. I do that and my loss statistics for the last 14 years is exactly zero. I do have to replace a disk about once every 2 years in the 3-way RAID1 I use as primary archive though. This RAID runs with full SMART self-test every 7 days and RAID consistency check (full data compare) every 14 days. Expecting your data to not rot if you do not maintain is is just plain incompetent.
There used to be one consumer-affordable medium that was archival-grade as well: MOD. I used them for about 10 years and never lost a single bit. Then it became to hard to get a replacement drive, and I moved to the spinning disk solution with additional off-site copies. It seems the consumer does not actually care about long-term archival or at the very least is far to stingy to pay anything for it. Otherwise MOD would have lived on. And do not even get me started about trash like "archival grade" writable DVD/CD. They are not.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:Legacy file systems should be illegal by gweihir · 2014-06-14 04:22 · Score: 2

Indeed. And only regular checking can detect it (ling SMART self-test and RAID consistency check every 7-14 days). The OP simply is naive and did not bother to find out how to properly archive data.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:Legacy file systems should be illegal by kthreadd · 2014-06-14 04:37 · Score: 1

HFS+ was just an extension to HFS, which goes back to the System 2 days. HFS suffered from a number of limitations which made in unsuitable on volumes larger than 2 GB.
Re:Legacy file systems should be illegal by NJRoadfan · 2014-06-14 04:57 · Score: 2

Think of HFS+ as the equivalent of FAT32 for Macs. Its basically the old file system with support for larger drives and files. Apple latter tacked on journaling in OS X 10.3. I'm surprised Apple didn't push for a replacement file system after the switchover to Intel CPUs.
Re:Legacy file systems should be illegal by steelfood · 2014-06-14 04:58 · Score: 1

This is good. you need a good long track record to convince people they won't lose files every ten years due to random malfunctions.
Or every few months due to unhandled edge cases.
Writing reliable code is hard. Writing reliable code for generic uses is even harder. Filesystems are at the farthest end of both spectrums: it needs to be most generic and most reliable at the same time.

--
"If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be."
Re:Legacy file systems should be illegal by NJRoadfan · 2014-06-14 05:00 · Score: 1

NTFS has received a few non-backward compatible revisions over the years. The last big update was with Windows 2000. What was evil about it was Win2k would silently upgrade the file system on mounted drives, rendering them unreadable on machines running older versions of Windows NT (NT4 SP4 was the first to add NTFS5 read/write support).
Re:Legacy file systems should be illegal by mlts · 2014-06-14 05:08 · Score: 1

Microsoft has two technologies in Windows Server 2012: Storage Spaces (which is LVM level), and ReFS. Both when used together can detect bit rot, but IIRC, only when the Storage Space volume is set to mirroring, nor parity.
This is similar with ZFS. RAID-Z will detect bit rot, but won't fix it. RAID-1, RAID-Z2, and RAID-Z3 will detect and fix bit rot on a scrub. One can also use copies or ditto blocks.
Linux, there isn't much either way. I have no clue if LVM2 + btrfs will do anything about bit rot, assuming it has the ability to repair it from a mirror or a RAID 6 volume. This seems to be one of those "ask four people, get five answers" type of items.
If I were setting up a file server or backend RAID, I'd probably will go with Linux and ZFS (from the zfsonlinux projects.) The / and /boot filesystems wouldn't be able to be placed on ZFS, but almost everything else can. With a RAID-Z2 pool, this will go far in detecting and handling bit rot.
Re:Legacy file systems should be illegal by CaptnZilog · 2014-06-14 05:19 · Score: 1

I'll wait for MansonFS, it has a longer history and doesn't pretend to be innocent.
Re: Legacy file systems should be illegal by O('_')O_Bush · 2014-06-14 05:30 · Score: 3, Informative

Vs the chances you back up already corrupted files and don't notice until you've aged off the good versions.

--
while(1) attack(People.Sandy);
Re: Legacy file systems should be illegal by Guspaz · 2014-06-14 06:28 · Score: 1

So? If ZFS or Btrfs has redundancy, an unreadable sector gets corrected just the same as a garbage sector.
Re:Legacy file systems should be illegal by jbolden · 2014-06-14 07:26 · Score: 1

but seriously how well is this tech conference decision thing going?

I think pretty well. The technical community has managed to advance and keep up a good rate of advancement, disrupting itself over and over and over again to produce superior products.
Re:Legacy file systems should be illegal by Em+Adespoton · 2014-06-14 07:34 · Score: 1

Think of HFS+ as the equivalent of FAT32 for Macs. Its basically the old file system with support for larger drives and files. Apple latter tacked on journaling in OS X 10.3. I'm surprised Apple didn't push for a replacement file system after the switchover to Intel CPUs.
https://en.wikipedia.org/wiki/...
It's a bit more complicated than that. First off, HFS+ is actually more like a combination of NTFS and exFAT; they got the basic design down at an early stage so that it would be extensible. This means that they didn't fall into the FAT32 trap. Interestingly, although people refer to it as HFS+, the internals refer to the current incarnation as HFSX.
Journaling was added to HFS+ with OS X 10.2.2. This moved the internal name from HFS+ to HFSJ.
HFSX was added with 10.3, and introduced optional case sensitivity and made the volume wrapper optional.
With 10.4, full ACL support was added to HFSX.
10.5 added hardlinking.
10.6 added optional compression -- which could be where some of the issues being discussed in TFA are from. I used to use STACKER back in the day, until some bad bit flips caused massive data corruption -- I've avoided compressed dynamic storage ever since, until 10.6.
There have been rumours for years of Apple adopting ZFS, and at one point the DP releases of the OS even had it available -- but it has never rolled out into OS X itself.
Instead of tackling "bitrot" head-on, Apple seems to have taken the "make backups easy" approach. This works to some degree, but since the backups use hardlinking, you really only have two copies of the data -- the one on your main drive, and the one on your backup drive. This makes cycling your backup drives even more important than it already was.
Re: Legacy file systems should be illegal by jbolden · 2014-06-14 07:39 · Score: 2

The developer involved left Apple, went off to found his own company. Completed the work and then got acquired by Oracle. http://getgreenbytes.com/solut...
So it is in some sense done. The question is whether Apple is going to buy an Oracle product or Oracle will sell or ...
Re:Legacy file systems should be illegal by jbolden · 2014-06-14 07:41 · Score: 1

No they aren't. You are correct in 2006 they announced that the WinFS team was being moved under the SQLServer group and in 2008 they completed a less ambitious product to allow SQL Server to store and access arbitrary files efficiently.
Re:Legacy file systems should be illegal by greenfruitsalad · 2014-06-14 07:53 · Score: 1

they've got ReFS instead - https://en.wikipedia.org/wiki/...
Re: Legacy file systems should be illegal by maccodemonkey · 2014-06-14 07:57 · Score: 3, Informative

They did try and replace the file system around the time of the Intel switch. Got killed by licensing problems.
http://appleinsider.com/articl...
Re:Legacy file systems should be illegal by Guy+Harris · 2014-06-14 08:37 · Score: 2

10.5 added hardlinking.
Are you certain? The ln command, when run without -s, would return an error if you used it under Tiger or earlier?
Or are you referring to hardlinking to directories, which was something UNIX traditionally supported, but which required root permissions (as it was used by the mkdir command to create the . and .. directories), and which was removed at one point (4.2BSD, as that added the mkdir() system call, making the ability of link() to link to a directory unnecessary?), and added back in 10.5 with the introduction of Time Machine, so that it could be used in backup trees as a very hacky form of de-duplication (each backup tree is a complete copy of the file system being backed up, but if there's an older copy of an unchanged file or a directory everything under which is unchanged, the "copy" is done by making a hard link rather than by copying the file to the backup disk).

Instead of tackling "bitrot" head-on, Apple seems to have taken the "make backups easy" approach. This works to some degree, but since the backups use hardlinking, you really only have two copies of the data -- the one on your main drive, and the one on your backup drive. This makes cycling your backup drives even more important than it already was.
That's what happens with any backup scheme that does incremental backups - if a file hasn't changed, a copy isn't made.
Re: Legacy file systems should be illegal by Guy+Harris · 2014-06-14 08:42 · Score: 1

Forget about moving Windows off NTFS.
Microsoft haven't. I guess they realized that software actually used alternate data streams, so they had to add them back to ReFS, although only "up to 128K for both Windows 8.1 and Windows Server 2012 R2", so they're more like "big extended attributes" than full alternate data streams.
Re:Legacy file systems should be illegal by Drinking+Bleach · 2014-06-14 08:48 · Score: 1

I have both / and /boot on ZFS on Linux.
Re:Legacy file systems should be illegal by Jane+Q.+Public · 2014-06-14 11:29 · Score: 1

It is about time that legislative authorities makes it illegal for Apple and other negligent vendors to ship file systems that are essentially faulty by design.
This is exactly like saying "It should be illegal to sell a Toyota Prius without 4-point harnesses, ABS, and a roll cage."

You knew what it was when you bought it. It is inappropriate to try to make the entire world "safe" via legislation. It doesn't work that way and you'd probably hurt more people than you help.

You aren't a little kid who needs to be forced to wear a bicycle helmet. Or for that matter if it's YOUR kid, you shouldn't need a law that forces you to make her wear a helmet. That's your job.
Re:Legacy file systems should be illegal by KevReedUK · 2014-06-14 13:19 · Score: 1

On reading up on ReFS, I am of the opinion that it is a step in the right direction, but has been released before it's ready. The version included with Server 2012 (and subsequent versions so far) doesn't include a whole raft of technologies that are (and have been for a looong time) present in NTFS.

Don't get me wrong, I'm no NTFS fanboy, but the majority of the features that they failed to include are ones that are practically indispensable in a range of settings. Whilst I could concur that, with the advent of (compared to a few years ago) cheap storage, NTFS compression has essentially had its day, other features are not so easy to do without. If you're a company that uses (out of necessity) legacy software with a need for 8.3 filenames, you can't use ReFS to host your data. If you need to use EFS, you're SOL. And to design a file-system with such a limited feature-set and release it, then say that it is ideal for using in a file server situation when it doesn't even support QUOTAS? Yes, I know, you can get third party COTS packages to handle that, but why bother if it's already in NTFS?

Frankly, one of the most laughable things to me is that they release this new file system designed to heal itself, but leave out so many features that THEIR OWN OS can't even be installed on it (no hard-links, for a start).

I do, however, go back to my original sentence.... ReFS is a step in the right direction, but is essentially useless in many (most) scenarios until it gets the features that we have largely come to rely on. IMHO, despite MS's claims to the contrary, it's not even file-share ready. The only environment I would even consider this to have a place in its current incarnation is in a tiny-business or home server environment in those frightening (but thankfully rare) cases where hardware RAID is out of the question. Even then, however, it could only be used to store the file shares. No chance of putting your Exchange/Sharepoint/SQL on it. But why bother, when you can just as easily use NTFS?

--
Just my $0.03 (At current exchange rates, my £0.02 is worth more than your $0.02)
Re:Legacy file systems should be illegal by KevReedUK · 2014-06-14 13:20 · Score: 1

Microsoft has two technologies in Windows Server 2012: Storage Spaces (which is LVM level), and ReFS. Both when used together can detect bit rot, but IIRC, only when the Storage Space volume is set to mirroring, nor parity.
Used to be true. In 2012R2, you can use integrity streams on parity spaces too, and get the corruption prevention there.

--
Just my $0.03 (At current exchange rates, my £0.02 is worth more than your $0.02)
Re: Legacy file systems should be illegal by Guspaz · 2014-06-14 17:30 · Score: 1

Where did I claim otherwise? You must have failed to read the part where I said "If ZFS or Btrfs has redundancy".
Re:Legacy file systems should be illegal by Electricity+Likes+Me · 2014-06-15 09:40 · Score: 1

RAID-Z will absolutely detect and fix bit-rot.
RAID-Z is single disk parity redundancy aka RAID-5 (but using CoW to plug the write-hole).
A regular, single-disk ZFS device will detect but not be able to fix bit-rot, but that's not any type of RAID.
Re:Legacy file systems should be illegal by Electricity+Likes+Me · 2014-06-15 09:41 · Score: 1

10 years is an absurd timeframe.
You can't leave any type of storage media inactivated for that long and expect it to survive. The only "safe" storage is the type you constantly re-verify.
Re:Legacy file systems should be illegal by qubezz · 2014-06-15 12:13 · Score: 1

The problem is, neither ZFS or Btrfs would have stopped an arbitrary bit inside an arbitrary file from becoming corrupt....
I think you should have a look at this 10 year old blog post: https://blogs.oracle.com/elowe...
ZFS can use single and double-parity (like RAID5 with two parity drives, but no failure if power is pulled during writing). In addition, it has bit scrubbing where all data is verified regularly.
Re:Legacy file systems should be illegal by HuguesT · 2014-06-15 14:13 · Score: 1

I'm sorry, citation needed. It does have compression and encryption, and you *can* make a RAID 0 or 1 with it, so it has some (weak) redundancy support, but I have never heard about it having deduplication. ZFS is the only filesystem I know that has deduplication.
Re:Legacy file systems should be illegal by cmurf · 2014-06-15 14:52 · Score: 1

On both 10.8 and 10.9 computers, Disk Utility's default format option is "Mac OS Extended (Journaled)" and this translates into a signature on disk of:
00000400 48 2b 00 04 80 00 20 00 48 46 53 4a 00 00 00 75 |H+.... .HFSJ...u|

From opensource.apple.com hfs_format.h says this is version 4.

When choosing either of the case sensitive options, I get:
00000400 48 58 00 05 80 00 20 00 48 46 53 4a 00 00 00 75 |HX.... .HFSJ...u|

If journaling is disabled they become (respectively):
00000400 48 2b 00 04 80 00 00 00 31 30 2e 30 00 00 00 75 |H+......10.0...u|
00000400 48 58 00 05 80 00 00 00 31 30 2e 30 00 00 00 75 |HX......10.0...u|
So internally it's H+ or HX, and both can be HFSJ. For whatever reason, by default we still get version 4 (H+).
Re:Legacy file systems should be illegal by cmurf · 2014-06-15 15:07 · Score: 1

HFS+ doesn't support deduplication or redundancy, A copy of a file is not what's meant by redundancy, nor is a hard link what's meant by deduplication.
Re:Legacy file systems should be illegal by CauseBy · 2014-06-16 02:59 · Score: 2

I don't know if any popular filesystems do so, but there are ways to write data to disk such that flipped bits can be detected and corrected. I remember studying this in comps sci class way back in the Clinton/Bush transition era. So multiple disks and redundancy is one good solution but another is a filesystem that can recover lost bits.
If you need to store 64 bits of data you can imagine laying them out in an 8x8 square. Now, widen the square to 9x9 and write a checksum/evenness bit in the extra column to the right, plus the extra row on the bottom. So, if 8 bits sum to checksum 5, then your evenness bit is a 1 making 5+1 an even number. If the checksum is 4, your evenness bit is a 0.
Now, if one of the bits in the 8x8 square is flipped accidentally, then you can detect it and fix it by looking at your extra row-and-column of evenness data. If one bit in the 8x8 grid gets flipped, then the evenness bits in one row and one column will be wrong, reliably indicating the bad bit.
As a bonus, you have one extra bit in the bottom right corner which can act as a final checksum on the evenness bits. That allows you to detect if your checksums are valid. If only one bit in the whole 9x9 grid is flipped, you should be able to detect it and correct it no matter where it is.
I don't know much about real-world filesystems so I don't know if that is a common procedure or not.

Backup? by graphius · 2014-06-14 01:36 · Score: 3, Insightful

shouldn't you have backups?

Re:Backup? by kthreadd · 2014-06-14 01:42 · Score: 4, Insightful

The problem with bit rot is that backups doesn't help. The corrupted file go into the backup and eventually replace the good copy depending on retention policy. You need a file system which uses checksums on all data block so that it can detect a corrupted block after reading it, flag the file as corrupted so that you can restore it from a good backup.
Re:Backup? by ZosX · 2014-06-14 01:49 · Score: 2

This is a good idea, but not a solution. Often you have no idea that the file is bad until after the fact, in this case years later. I've had mp3 collections get glitches here and there after a few copies from various drives. If you have no idea the data is bad in the first place, your backup of the data isn't going to be any better. I would say that all of my photography I've collected over the years has stayed readable somehow. I do check in lightroom every once in a while, but I wouldn't be shocked to find a random unreadable file. Not good really, but there's probably not much I can do other than make sure that my files are verifiable.

--
zosxavius photography
Re:Backup? by ColdWetDog · 2014-06-14 02:01 · Score: 1

I have close to 4 terabytes of photography and video stored (not that kind of photography and video). I, too, have seen occasional unreadable files, typically in JPEGS but also an occasional TIFF file. Any compressed container (like a JPEG) is going to be more susceptible to this issue thus JPEGs aren't a great storage format. Video files are harder to figure - a corrupted bit could easily get overlooked.
I've never actually lost a picture that I was interested in - I always have more than one copy of the image on the disk - a TIFF and a RAW file typically. Yes, it would be nice if the file system didn't do that. No, I don't think I would believe anybodies claim that that would indeed happen. Further, it's always a risk - benefit calculation. You can spend a lot more money getting near perfect replication but I don't think many people are willing to have a system with ECC memory throughout the chain.

--
Faster! Faster! Faster would be better!
Re:Backup? by dgatwood · 2014-06-14 02:09 · Score: 5, Insightful

Depends on the backup methodology. If your backup works the way Apple's backups do, e.g. only modified files get pushed into a giant tree of hard links, then there's a good chance the corrupted data won't ever make it into a backup, because the modification wasn't explicit. Of course, the downside is that if the file never gets modified, you only have one copy of it, so if the backup gets corrupted, you have no backup.
So yes, in an ideal world, the right answer is proper block checksumming. It's a shame that neither of the two main consumer operating systems currently supports automatic checksumming in the default filesystem.

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Re:Backup? by rnturn · 2014-06-14 02:27 · Score: 2

Even if you did have backups how could you even begin to know which saveset to restore from? You could have been backing up a corrupted file for a lo-o-ong time.
Friends wonder why I still purchase physical books and CDs. This is why. I'll have to come up with a simple 2-3 sentence explanation of the problem the OP was describing for when they ask next time. I've had MP3 files made from my CD collection mysteriously become corrupted over time. No problem, I can just re-rip/convert/etc. but losing the original digital version of your newborn would be heartbreaking. Make several copies to reduce the odds of losing it. Make a good print using archival paper and inks and keep in away from light in a safe deposit box so it could be rescanned should the digital file become corrupted. Of course, one can go overboard as not every photo is worth that kind of effort but it appears we might be starting to see, first-hand, the problems described in Bergeron's "Dark Ages II". Even worse what if this were to happen? (So don't even bring up the "cloud", OK?)

--
CUR ALLOC 20195.....5804M
Re:Backup? by Antique+Geekmeister · 2014-06-14 02:41 · Score: 1, Informative

The bitrot will change the checksums and cause the files to show up as modified.
Moreover, what will you do about a reported bitrotted file unless you have genuine archival backups somewhere else?
Re:Backup? by nine-times · 2014-06-14 03:03 · Score: 2

If I remember correctly, that's not how Apple's current backup system works. Every time a file gets written to, there's a log someplace that records that the file was modified. Next time Time Machine runs, it backs up the files in that log. If the OS didn't actually modify the file, it won't get backed up.
I may be wrong, but that's how I understood it.
Re:Backup? by wagnerrp · 2014-06-14 03:47 · Score: 1

Video files are harder to figure - a corrupted bit could easily get overlooked.
Again, it depends on whether it is compressed or not. A corrupted bit in video with only interframe compression will look just like a damaged JPEG. You may have an unreadable frame, or may have a corrupted macroblock or two in that frame. A corrupted bit in video with intraframe compression will smear that corrupted frame or macroblock for potentially several seconds until you hit the next I-frame to flush the image.

You can spend a lot more money getting near perfect replication but I don't think many people are willing to have a system with ECC memory throughout the chain.
The common solution to this issue is software, not hardware. You have your filesystem compute and store checksums at the block level, and you give your filesystem access to redundancy, either through redundant copies on disk, or multiple parity disks. When your filesystem reads the data, it checks it against the checksum, and if needed, recomputes the data from the redundant storage. That said, you do still need ECC memory on the CPU doing those calculations for it to be reliable.
Re:Backup? by Anonymous Coward · 2014-06-14 04:21 · Score: 2, Informative

Macosx Time Machine works by listening to filesystem events except for the first backup where everything is copied over as is. Bit rot doesn't get transferred until you overwrite the file, time by which it should have been obvious something was fishy or the bitrot was negligible and you didn't notice yourself. There are also situations where Time Machine itself says "this backup is fishy, regenerate from scratch?". Happened last week, but only after a failed drive had to be replaced which caused a 150GB backup.
Re:Backup? by gweihir · 2014-06-14 04:42 · Score: 2

You should have:
1. backups
2. redundancy
3. regular integrity checks of your data
Or alternatively, you should have been using an archival grade medium, like archival tape or (historically now unfortunately) MOD.
What the OP did is just plain incompetent and stupid and if he had spent 15 minutes to find out how to properly archive data, he would now not be in this fix. Instead he made assumptions without understanding or verification against the real world now blames others for his failure. Pathetic. Dunning-Kruger effect at work.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:Backup? by NJRoadfan · 2014-06-14 05:05 · Score: 1

Its a shame that none of the major SOHO NAS vendors (Synology, Drobo, etc) use check summing file systems. They seem to be sticking with things like ext4 instead of ZFS.
Re:Backup? by Bing+Tsher+E · 2014-06-14 05:15 · Score: 1

only modified files get pushed into a giant tree of hard links,
Wouldn't that mean that files modified by a corrupt filesystem would be the first candidates for backing up?
Re:Backup? by Anonymous Coward · 2014-06-14 05:48 · Score: 1

Are you stupid or just plain retarded.
ZFS is the most robust, least susceptible to bitrot disk/volume/filesystem manager out there.
It can detect problems due to simple power supply issues, bad sectors, etc and fix them before the data is committed.
It not only checksums the data as written, but does it at every layer where the data transitions through.
Disk surface to disk cache, disk cache to system memory, and back - every place there's a transition, it does a checksum, correcting any errors along the way.
Aside from having total media failure, you will not lose a single bit of data with ZFS.
Re:Backup? by jopsen · 2014-06-14 06:00 · Score: 1

You need a file system which uses checksums on all data block so that it can detect a corrupted block after reading it, flag the file as corrupted so that you can restore it from a good backup.
When I decide to archive a lot of files I put them in a tarball and generate par2 files... That way a single bitflip or two will be okay :)
Re:Backup? by Marillion · 2014-06-14 06:45 · Score: 1

I'm a fan of computing par2 repair blocks at a 15%. Every so often run a par2verify.

--
This is a boring sig
Re:Backup? by ColdWetDog · 2014-06-14 08:09 · Score: 1

True, but what I'm getting at is that a couple of dropouts in multi terrabyte data sets doesn't bother me (YMMV, of course). The current system seems 'good enough' for what I'm doing and I imagine I am in the majority as far as home / SOHO class PCs are concerned. If you're running an enterprise data shop, you have other priorities and options, but I can see why Apple in particular, hasn't moved off of HFS+ so far.

--
Faster! Faster! Faster would be better!
Re:Backup? by dargaud · 2014-06-14 08:52 · Score: 1

I've never understood why, when you save a file, a checksum isn't computed at the same time and stored among the metadata. Then you can have a command that operates on a file, a directory or the entire filesystem (in that case when there's low disk activity) to verify that checksum. It would be easy and useful, no ?

--
Non-Linux Penguins ?
Re:Backup? by dgatwood · 2014-06-14 09:44 · Score: 1

Only files modified by a vnode write operation.

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Re:Backup? by m00sh · 2014-06-14 09:48 · Score: 1

The bitrot will change the checksums and cause the files to show up as modified.
Moreover, what will you do about a reported bitrotted file unless you have genuine archival backups somewhere else?
Bitrot happens when the error correcting can't catch the error because there is a say 1e-10 chance of a bunch of bits flipping and the error correcting not catching it. Yes, add checksum and what not over it and you push the error chance to say 1e-20 but there is still chance of bitrot. There is still a pattern of bit flips that will bypass the checksum as well.
You can always make the error correction stronger to make the error probability as small as possible but over years and gigabytes later, there will be one bit that will be flipped. However, you will lose performance by going with such a strong error correction/checksum system.
Re:Backup? by wagnerrp · 2014-06-14 12:24 · Score: 1

Then you store three copies of the checksum, and compare.
Re:Backup? by stinerman · 2014-06-14 12:34 · Score: 1

Totally read "4 terabytes of pornography". I was debating where to make a toast to you or send you a drum of Jergens.
Re:Backup? by radarskiy · 2014-06-14 12:38 · Score: 1

How do you distinguish between intentional and unintentional changes? How much storage overhead do you need to keep all changes so that you can roll back any unintentional change?
Re:Backup? by Smurf · 2014-06-14 14:39 · Score: 1

The parent post is assuming that the user is using Time Machine for the backups. In that case, the checksums are usually not verified (as nine-times said in his reply).
Nevertheless, in some cases Time Machine will perform a "deep" scan, for example if you have not backed up for a long time or if you upgrade your computer's drive. In that case, the corrupted file would be identified as a "change" and would be backed up again, just as you said.
Nevertheless, take into account that the corrupted file is not replacing the original in the backup. Both copies are left there so once you discover the corruption you can use Time Machine to navigate to a backup that is old enough and allow you to recover the file.
Re:Backup? by Jeremi · 2014-06-14 15:56 · Score: 1

I've never understood why, when you save a file, a checksum isn't computed at the same time and stored among the metadata. [...] It would be easy and useful, no ?
It would be, and applications that just write out a file to disk can and do implement exactly that (although I think many of them save the checksum in the file's data-stream itself, rather than as filesystem-dependent metadata).
Implementing checksumming at the filesystem level is a good deal trickier, because the filesystem has to support more than just one-and-done writing out of new files. It has to allow programs to do things like mmap() files into a region of memory and keep the file-on-disk synchronized with the mapped region of RAM whenever the CPU writes data to that RAM; and it has to allow programs to seek around and overwrite arbitrary subsections of an existing file from multiple threads simultaneously, etc.
I think it is possible for a filesystem to maintain/update checksums even in the face of all of that, and it may even possible to do it efficiently -- but I think it is also sufficiently difficult that most filesystems implementers don't bother to try (especially since even minor errors in the checksumming mechanism will present themselves to the user as messages that his files have been corrupted -- and whether the files actually are corrupted or not, that will make the user very unhappy).

--

I don't care if it's 90,000 hectares. That lake was not my doing.
Re:Backup? by dgatwood · 2014-06-15 02:14 · Score: 1

Even during a deep traversal, AFAIK, it is just using modification information in the filesystem, not reading the entire file. Otherwise, a deep traversal would take as long as a full backup (days) instead of a tiny fraction of that time.
Now if the length of the file changes because of filesystem corruption, that's another matter.

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Re:Backup? by Smurf · 2014-06-15 03:38 · Score: 1

Very good point!
Still the gist of my comment remains: the old, uncorrupted copy of the corrupted file is kept in Time Machine even if the corrupted file ever gets into the backup. Having access to all older versions of your files is what Time Machine is all about!
Re:Backup? by gweihir · 2014-06-15 10:01 · Score: 1

You are talking about a different problem. This discussion is about flat files that do not get changed. You are talking about database-sores and the like.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:Backup? by radarskiy · 2014-06-15 12:17 · Score: 1

"You are talking about a different problem"
Confiming the discussion to one specific subset on one specific person's storage needs would make it a pretty useless public discussion.
"You are talking about database-sores and the like."
No I am not. Stop lying.
Re:Backup? by gweihir · 2014-06-15 13:55 · Score: 1

What kind of fucked-up jerk are you? Have you even read the OPs article? It is about long-term flat-file storage in a file-system. It us not about journaled, reversible changes to data and that is a completely different problem with completely different requirements and approaches.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:Backup? by radarskiy · 2014-06-15 17:59 · Score: 1

"It us not about journaled, reversible changes to data and that is a completely different problem with completely different requirements and approaches."
That is not what I am talking about. Stop lying.
Re:Backup? by gweihir · 2014-06-16 05:51 · Score: 1

Well, obviously you are not talking about anything relevant here. Go away.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:Backup? by sh00z · 2014-06-20 06:59 · Score: 1

How do you distinguish between intentional and unintentional changes? How much storage overhead do you need to keep all changes so that you can roll back any unintentional change?
I'm only a rocket scientist, not a CS person, but it seems intuitive to me. If it's an intentional change, then the new version will have a later last-modified date than the back-up. If the hash of the back-up copy made at the time it was written matches a hash calculated at the time of the second back-up, then the integrity of the back-up is confirmed. If the active copy has not been intentionally changed in the interval, and its hash no longer matches the other two, then the active should be discarded in favor of the back-up. I'm sure you can do the mental figuring for the equivalent to detect if the back-up rather than the active has bitrotted.
The challenge arrives if you have *both* intentional and unintentional changes to the same file between successive back-ups. To exclude unintentional changes, you would have to do hashes every time you save/compile/whatever,as well as keep a keystroke log of the edits. Then, you would have to execute the exact same change process on the back-up copy, repeating the process described above. It would be incredibly resource-intensive (essentially having a 'bot duplicate the work you have performed between back-ups), but it would sure be thorough.
Lord, I HOPE this is not an original idea. If I just invented it, and somebody tries to patent it later, you're in for a world of hurt.
Re:Backup? by wagnerrp · 2014-06-20 10:08 · Score: 1

No, just duplication. Most filesystems have some degree of metadata duplication anyway, for redundancy and performance.

Bitrot not the fault of filesystem by Gaygirlie · 2014-06-14 01:40 · Score: 5, Insightful

Bitrot isn't the fault of the filesystem unless something is badly buggy. It's the fault of the underlying storage-device itself. Attacking HFS+ for something like that is just silly. Now, with that said there are filesystems out there that can guard against bitrot, most notably Btrfs and ZFS. Both Btrfs and ZFS can be used just like a regular filesystem where no parity-information or duplicate copies are saved and in such a case there is no safety against bitrot, but once you enable parity they can silently heal any affected files without issues. The downside? Saving parity consumes a lot more HDD-space, and that's why it's not done by default by most filesystems.

Re:Bitrot not the fault of filesystem by jbolden · 2014-06-14 01:52 · Score: 1

There is a 3rd possibility. As the size of the dataset increases you can construct a more complex error correcting code on that dataset with loss of spacing being 1/n. Note that's essentially saving information about the decoding and then the coded information, sort of like how compression works. Which for most files would be essentially free. And of course you could combine with this compression by default which might very well result in a net savings. But then you pick up computational complexity. With extra CPUs though having a CPU (or hardware in the drive) dedicated to handling that isn't unreasonable.
Re: Bitrot not the fault of filesystem by jmitchel!jmitchel.co · 2014-06-14 01:52 · Score: 4, Insightful

Even with just checksums, knowing that there is corruption means knowing to restore from backups. And in the consumer space most people have plenty of space to keep parity if it comes to that.
Re:Bitrot not the fault of filesystem by Gaygirlie · 2014-06-14 03:07 · Score: 1

More likely it is bad RAM, and not using ECC RAM. Or flaky power supply. Maybe in the process of copying the file or saving it the data buffer got corrupted.
The hard disk (or CD or DVD, OP does not mention at all what type of physical storage he is using) has built-in error correction so the data shouldn't be easily corrupted.
Mmmmno. If he had bad RAM he would be having a lot more issues with the system than just 28 broken files over 6 years. And no, HDDs do not have built-in error correction, they have checksums -- those things are not the same thing.
Re:Bitrot not the fault of filesystem by fnj · 2014-06-14 03:38 · Score: 1

And no, HDDs do not have built-in error correction, they have checksums -- those things are not the same thing.
Sorry to inform you that your knowledge on this subject is not perfectly correct and inclusive. Hard drives use per-sector ECC. ECC stands for Error Correction Code. The very term tells you its function is to do precisely what you say is not done. Here is one tutorial. This stuff is pretty basic and widely known.
"When a sector is written to the hard disk, the appropriate ECC codes are generated and stored in the bits reserved for them. When the sector is read back, the user data read, combined with the ECC bits, can tell the controller if any errors occurred during the read. Errors that can be corrected using the redundant information are corrected before passing the data to the rest of the system. The system can also tell when there is too much damage to the data to correct, and will issue an error notification in that event. The sophisticated firmware present in all modern drives uses ECC as part of its overall error management protocols. This is all done "on the fly" with no intervention from the user required, and no slowdown in performance even when errors are encountered and must be corrected."
Re:Bitrot not the fault of filesystem by wagnerrp · 2014-06-14 03:50 · Score: 1

Isn't that RAID6?
Re:Bitrot not the fault of filesystem by Electricity+Likes+Me · 2014-06-14 03:56 · Score: 1

Hard disks use ECC to allow the disk to reach the capacities it does. It is not designed for anything other then making the hard disk perform well. It doesn't protect you against hard disks which write incorrect data to start with, or have faulty cables etc.
Re:Bitrot not the fault of filesystem by gweihir · 2014-06-14 04:52 · Score: 1

Actually, bit-rot is the fault of the user that a) selected a non-archival grade storage system for his archiving and b) failed to verify the data was actually written correctly to the archival system.
This is user stupidity, plain and simple. If this person had spent 15 minutes finding out how to properly archive data, he would not have lost anything.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:Bitrot not the fault of filesystem by steelfood · 2014-06-14 05:09 · Score: 1

It's amusing you're putting the blame for imperfection on the known imperfect nature of physical systems.
I'd prefer to blame management of the software product for not pushing for more reliable software, which is by nature supposed to be perfect. Note that in some cases, management and lead developers are basically the same people, but in Apple's case, I'm certain they can afford to hire real managers to make these kinds of important decisions.

--
"If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be."
Re:Bitrot not the fault of filesystem by jbolden · 2014-06-14 07:32 · Score: 1

No, you are talking out of your ass there.

The fact you don't know about something doesn't mean the other person is talking out of their ass.

First of all, if the system worked like you explain it then having the decode - block itself get corrupted would render every single file relying on it invalid, so you'd still end up having to maintain at least a second copy of the decode - block and checksums for them both, but you'd still have two points of breakage that, if ever corrupted, would still render everything corrupted. That's really shitty design.

Well first off the decode matrix is derivable. It wouldn't necessarily be a block. Moreover a 6000x6000 matrix is 4.5m you can have to copies of it and it wouldn't have any impact on storage.

. . Then do the same while you enabled recovery mode, ie. the compression system writes a second decode - block in the file, and the file will certainly get even bigger. Go on, try it, you'll see.

Of course it gets bigger with error correct. But using a complex code not by very much. As the check matrix gets larger (i.e. more computationally complex) the cost goes to zero. The cost in terms of storage is slightly worse than 1 -> 1+1/n where n is the number of bits in the encoding.
Re:Bitrot not the fault of filesystem by jbolden · 2014-06-14 07:32 · Score: 1

BTW this anon is absolutely correct. Worth modding up.
Re:Bitrot not the fault of filesystem by laird · 2014-06-14 12:10 · Score: 1

To correct slightly - ECC isn't really about disk capacity. It's there because magnetic media isn't perfect, so even on a well written block there's a percentage chance of a read having a bit mis-read, and even on a failed read there's some percentage of good data read. The ECC lets the drive controller correct for those errors, so the vast majority of errors are corrected by the controller. A really smart controller or driver (GCC Technologies had this 20 years ago, perhaps they all do now) pays attention to ECC errors and re-writes blocks that have errors so that any marginal writes are rewritten with the user's data is protected before the block degrades to an unrecoverable failure, automatically. And if a read is so bad that ECC fails, you can re-try the read until you get a good enough read for ECC to recover the data, which almost always works, then rewrite it. If you do that, it's very, very hard to lose data on magnetic media, because it's nearly impossible for a block to go completely bad with no warning.
That being said, with the huge volumes of data that people use now, even a very rare percentage multiplied by that huge pile of data means they'll lose data. If you really care about that, you need to store your data on two physically separate devices, so a physical failure of one can't affect the other. This is expensive-ish, but that's the cost of protecting data. So an offsite backup is really the only solid option. Anything in your house can be affected by a fire, power spike, etc., so if you really care about the data, get CrashPlan or something like that.

--
Enable 3D printed prosthetics!
Re:Bitrot not the fault of filesystem by fnj · 2014-06-14 14:40 · Score: 1

Hard disks use ECC to allow the disk to reach the capacities it does. It is not designed for anything other then making the hard disk perform well. It doesn't protect you against hard disks which write incorrect data to start with, or have faulty cables etc.
So you don't think it's worth providing recovery from some error cases just because you can't protect against every single case?
If you don't mind my asking, why would you claim something that is patently untrue? ECC detects and corrects ALL instances of a hard drive writing single bad bits per sector. So it's clearly "protecting you" against a great many instances of "incorrect data" being written by the drive.
Incidentally, the SATA protocol incorporates 32-bit CRCs in all packets flowing in either direction. This will pick up an extremely high percentage of errors arising from faulty cables and bad receiver and transmitter circuits at the interface. For reads, the host does not use data flagged with CRC errors. For writes, the drive does not write data flagged with CRC errors. This feature has saved my ass more than once by preventing the writing of corrupt data when I had problems with my cables.
Re:Bitrot not the fault of filesystem by Electricity+Likes+Me · 2014-06-14 20:10 · Score: 1

What part of my comment sounds like I'm saying it's not worth doing error recovery?
Re:Bitrot not the fault of filesystem by fnj · 2014-06-14 21:45 · Score: 1

The part where you claim "It [ECC] doesn't protect you against hard disks which write incorrect data to start with, or have faulty cables etc."
Re:Bitrot not the fault of filesystem by Electricity+Likes+Me · 2014-06-15 09:37 · Score: 1

And again, which part of that seems like I'm saying it's not worth doing error recovery at all, given that I'm saying the exact opposite.
Re:Bitrot not the fault of filesystem by bwwatr · 2014-06-16 04:23 · Score: 1

Not really. Hardware RAID5 uses a parity disk to allow sectors to be read when an unrecoverable read error (URE) occurs on one of the member disks. RAID6 will allow unrecoverable errors to happen on two member disks. But in cases where the member disk doesn't encounter a read error, but instead happily reads back a block of data with a flipped bit, RAID isn't going to help you. ZFS/Btrfs would have helped you though.

Re:It's the contents of the files... by kthreadd · 2014-06-14 01:46 · Score: 1

The point is that there are good file systems that can detect when the storage unit fails, give you an alert and allow you to restore the file from a good backup. Without this feature the corrupted file will just get backed up like any other file and eventually replace the good backup.

Good backups aren't enough by jmitchel!jmitchel.co · 2014-06-14 01:47 · Score: 1

Good backups aren't enough. If the filesystem isn't flagging corruption as it happens, the backup software will happily back up your corrupted data over and over until the last backup which has the valid file in it has expired or become unrecoverable itself.

Re:Good backups aren't enough by drew_92123 · 2014-06-14 02:00 · Score: 1

I copy all of my non-changing files to a a special directory and have the backup app I use compare them to another copy and alert me of any changes. At any one time I have 3-4 copies of my important files on separate disks PLUS my backups. Cuz fuck losing files! ;-)
Re:Good backups aren't enough by gweihir · 2014-06-14 04:54 · Score: 1

That is complete BS! The disks certainly flag any data that has gone as bad as bad. Undetected read errors do not happen unless the disk electronics is dying. And for that you have redundancy.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

ZFS, Apple! by grub · 2014-06-14 01:48 · Score: 2

This is why Apple should resurrect its ZFS project. Overnight they would be the largest ZFS vendor to match with being the largest UNIX vendor.

--
Trolling is a art,

Re:ZFS, Apple! by ColdWetDog · 2014-06-14 02:03 · Score: 1

I'm curious why it's been ignored or deprecated or whatever Apple did to it. They have the resources to throw at a project like that. Presumably there was some calculation somewhere along the line that didn't make sense. Not that Apple is much for telling us things like that, but it would be fun to know.

--
Faster! Faster! Faster would be better!
Re:ZFS, Apple! by grahamsaa · 2014-06-14 02:12 · Score: 2

I'm not sure this is true. Other vendors like iXsystems already sell products that ship with ZFS. As I understand it, ZFS is BSD licensed. While Oracle distributes its own version of ZFS that may (or may not) include proprietary features, the open sourced version is freely distributable. The only reason it's packaged as a userland utility for Linux is that the BSD license isn't compatible with the kernel's GPL license. Apple's kernel is definitely not GPL, so this isn't a problem for them.

One problem might be that using ZFS without ECC memory can result in data loss, and ECC memory is more expensive (and not compatible with most consumer oriented processors that Intel makes). This would increase the cost of Apple hardware and could (possibly) be a hurdle, as Intel doesn't want to support ECC memory on their consumer oriented processors (as this could hurt sales of more expensive server-oriented processors. But Apple is a large enough vendor that they could probably negotiate something with Intel that could be workable.

That said, I don't know many Apple users that know what ZFS is, and it doesn't seem like there are many people clamoring for it. It would be a great addition to OSX though.

--
Facts have a liberal bias.
Re:ZFS, Apple! by kthreadd · 2014-06-14 02:14 · Score: 1

So how is FreeBSD able to license ZFS by simply importing it into the source tree and Apple is not?
Re:ZFS, Apple! by kthreadd · 2014-06-14 02:20 · Score: 1

ZFS does not require ECC memory more than any other file system. I have no idea where you got that from.
Re:ZFS, Apple! by Calibax · 2014-06-14 02:23 · Score: 2

No they would not be sued by anyone.
Sun open sourced ZFS under a permissive license. Oracle close sourced it again. However, a number of companies are supporting derivatives of the open source version.
ZFS is available for a number of operating systems today. A non-inclusive list:
FreeBSD from iXsystems
Linux from Lawrence Livermore National Laboratory and also Pogo Linux
SmartOS from Joyent
OmniOS from Omniti
Osv from CloudOS
In addition a number of companies are using ZFS in their products:
CloudScaling
DDRdrive
datto
Delphix
GE Healthcare
Great Lakes SAN
Losytec
High-Availability
HybridCluster
Nexenta Systems
OSNEXUS
RackTop
Spectra Logic
Storiant
Syneto
WHEEL Systems
Zetavault
ZFS can detect and correct silent corruption when configured to do so. I have a NAS that has 24 TB of raw storage, 16 TB of useable storage, running under OmniOS. I have well over 10 million files on the NAS (it is used as a backup for 8 systems) - I haven't lost a file in 4 years and I don't expect to lose any.
Re:ZFS, Apple! by grahamsaa · 2014-06-14 02:39 · Score: 1

Of course it doesn't, and I never said that. But your chances of data corruption if you use ZFS without ECC are somewhat greater, and potentially much more catastrophic. A web search for 'ZFS without ECC' will point you to a number of horror stores. Basically, ZFS always trusts what's in memory, so if what's in memory differs from what's on disk, the contents on disk get overwritten. If this discrepancy is due to bit rot, that's great -- you've just saved your data. But if it's due to a memory error, your system proactively corrupts your data. Considering that most non ECC DIMMs have a couple errors a year, you will very likely lose data if you run ZFS on a system without ECC.

Of course, ECC doesn't fix everything, but it should halt your system if your RAM has an uncorrectable error, which is better than corrupting your files on disk.

--
Facts have a liberal bias.
Re:ZFS, Apple! by Anonymous Coward · 2014-06-14 02:40 · Score: 1

NetApp sued Sun over ZFS saying ZFS infringed their patents. Sun (later Oracle) countersued. Both suits were settled without any money flowing either way.
ZFS is considered safe to use without threat of legal action.
Do you seriously think that dozens of companies would use it in their businesses if there was a risk of being sued out of existence by Oracle?
Re:ZFS, Apple! by sribe · 2014-06-14 02:40 · Score: 1

Sun open sourced ZFS under a permissive license.
And NetApp claims that Sun & Oracle violated their patents.
Re:ZFS, Apple! by Calibax · 2014-06-14 02:46 · Score: 1

I would hesitate to call GE Healthcare a small company. I doubt that Lawrence Livermore National Labs would be considered small as it's part of the government. Joyent is the company that supports node.js.
Anyone can sue anybody about anything, but winning is different matter. ZFS is considered safe from a legal point of view.
Re:ZFS, Apple! by kthreadd · 2014-06-14 03:07 · Score: 1

I see what you mean now, but I must say that I really don't agree with these non-ECC horror stories. You have much bigger problems if you have memory corruption.
Re:ZFS, Apple! by fnj · 2014-06-14 04:01 · Score: 1

They would also be sued pretty quickly by Oracle. Clearly not an option.
Your conclusion is a bit hasty and unwarranted. I am not going to tell you that Oracle CANNOT sue anyone for any trumped-up reason, but ZFS is licensed under the Common Development and Distribution License (CDDL) and is open source. For linux, there is an issue with how CDDL plays with GPL, so no distro has yet bundled ZFS with linux. Linux users, however, can themselves pick up "ZFS on Linux" and install it themselves without violating either the CDDL or GPL.
But OSX is not GPL. Other systems that are not GPL bundle ZFS, and are not sued. For example, FreeBSD comes with ZFS, and there are a number of other systems, such as FreeNAS, PS-BSD, illumos and nexenta.
See OpenZFS.
Re:ZFS, Apple! by nabsltd · 2014-06-14 04:13 · Score: 1

You have much bigger problems if you have memory corruption.
If you don't use ECC memory, you will have memory corruption. Even if you do use ECC memory, you might have corruption, and it might even go unnoticed, but the odds are far less likely.
"Corruption" in this sense doesn't mean that whole DIMMs are broken...it just means that one bit has changed in a way that the user/OS/CPU didn't want it to. In many cases, this can be completely harmless (e.g., graphical data used in-memory only has one bit wrong...you might not even notice a color shift if it's the LSB), a little annoying (e.g., unexpected program termination), or very annoying (e.g., BSOD). But, if this happens in memory used for disk write buffers, then you get the issue that the GP had you Google for.
Re:ZFS, Apple! by kthreadd · 2014-06-14 04:43 · Score: 1

Then it's much simpler. This ECC issue has absolutely nothing to do with ZFS. You should use ECC RAM if you are doing any form of disk IO no matter which file system you're using, or you are under the risk of data loss.
Re:ZFS, Apple! by Just+Some+Guy · 2014-06-14 05:21 · Score: 1

As I understand it, ZFS is BSD licensed.
Nope. It's under the CDDL, which isn't GPL-compatible and prevents ZFS from being distributed as part of the Linux kernel. If it could, it probably would've been adopted by the masses years ago.

--
Dewey, what part of this looks like authorities should be involved?
Re:ZFS, Apple! by Kaenneth · 2014-06-14 07:49 · Score: 2

Back when I did tech support for a lightweight Mac database product, they didn't use Parity (much less ECC) RAM.
I had a customer call in because students were continually getting corrupted databases on their assignments.
over the course of several phone calls, we narrowed it down to only happening in 1 of 3 labs.
After excluding anything high-energy (like a physics lab) in the building, I got the customer to reveal that they were constructing a new building next door, and the construction power tools were running off the same circuits as the computer lab...
They got the construction workers to use a different source of power, and the corruption problems disappeared.
Re:ZFS, Apple! by jbolden · 2014-06-14 08:11 · Score: 2

Apple did announce why the project failed. ZFS on consumer grade hardware with consumer interactions was too dangerous. Things like pulling an external drive out during mid write could corrupt an entire ZFS volume. Apple simply couldn't get ZFS to work under the conditions their systems need it to. They had to backout completely and come up with a plan-B. The developer who worked on this left Apple and now produces a better ZFS for OSX. That company got bought by Oracle so Oracle owns it now.
Re:ZFS, Apple! by sjames · 2014-06-14 08:58 · Score: 1

All file systems trust what's in memory (they have to, anything that would test it lives in memory too!). So the point about ECC RAM and ZFS is that since it is doing it's part to prevent bitrot (unlike most file systems), adding ECC RAM to the picture makes bitrot practically non-existent.
If a block has just been fetched from disk and the checksum fails, it will not write that back to disk, since it knows the data is bad. If the buffer for a write is corrupt, no filesystem can know that.
Re:ZFS, Apple! by laird · 2014-06-14 12:13 · Score: 1

ZFS was a Sun project, and they've effectively killed it. Apple might have been looking at ZFS, but they never made it a part of their OS.
Shame, as it's a really nice filesystem. My previous file server was ZFS, and it was a delight. But it's kinda picky about what hardware it'll run on, and the old file server (dual Xeon) was just too power hungry to keep running at home...

--
Enable 3D printed prosthetics!
Re:ZFS, Apple! by Jane+Q.+Public · 2014-06-14 13:32 · Score: 1

Mod this one up. If memory is corrupted, disk can become "corrupted", but only because it's a copy of the actual contents of memory.

Fault-tolerant memory and fault-tolerant file systems may have similarities but they are separate issues. If either one becomes corrupted it can "corrupt" the other, but only because one is a copy of the other.
Re:ZFS, Apple! by nabsltd · 2014-06-17 00:44 · Score: 1

You should use ECC RAM if you are doing any form of disk IO no matter which file system you're using, or you are under the risk of data loss.
I agree. Unfortunately, no Intel desktop CPU/chipset supports ECC, and many AMD desktop chipsets are castrated by the board manufacturer to not allow ECC.
When RAM was slow and 4GB was huge and expensive, this wasn't as big a deal, but now that 8GB is the reasonable starting point and 32GB is quite affordable, Intel especially needs to step up and add ECC support to their desktop CPUs/chipsets.

28 files in 6 years is a hardware defect by Anonymous Coward · 2014-06-14 01:55 · Score: 1

Sure, a modern filesystem should be designed to catch and possibly work around bit errors, but in the end, hardware which causes that many bit errors is defective and needs to be fixed or replaced. RAM would be my first suspect if there aren't any error messages in SMART or disk related entries in system logs. If the RAM is defective, can you really blame the filesystem? What if the files got corrupted in RAM while you were working on them?

Re:28 files in 6 years is a hardware defect by kthreadd · 2014-06-14 01:57 · Score: 1

How could the RAM be responsible for damaging a file between the time it was written to disk and when it was read from disk?
Re:28 files in 6 years is a hardware defect by Qzukk · 2014-06-14 02:14 · Score: 1

The RAM is responsible for damaging the file while it sits in a buffer waiting to be written to the disk in the first place.

--
If I have been able to see further than others, it is because I bought a pair of binoculars.
Re:28 files in 6 years is a hardware defect by washu_k · 2014-06-14 02:14 · Score: 1

Bad RAM could have corrupted the file as it was being written to disk. The file is corrupted all along, but not the disk/filesystem's fault

Or the file could have been corrupted in RAM on read, and would actually be fine if read on a working machine.

Or the disk has been replaced in those 6 years and the file was corrupted during the copy because of bad RAM

There are lots of possibilities for the file to get corrupted that don't involve the disk or filesystem.
Re:28 files in 6 years is a hardware defect by v1 · 2014-06-14 02:15 · Score: 1

I see bad RAM cause two problems. First, when you are copying a file or editing it, and it gets saved, if the data was corrupted while it was in memory, it can become damaged when writing it. It doesn't have to affect the part of the file you were working with. If you were adding to the end of a long text document, page 2 could get damaged when you hit Save.
Second problem, more common in my experience, is directory corruption due to bad ram. When a machine would come in with a trashed directory, we used to just fix it and return it. But sometimes they'd come back again in a similar state. I'd run a memory test and find/replace a bad stick before repairing it again. Later I just got in the habit of running a short ram test anytime there was unusual directory damage. I found it in about 1 in 10 of the cases I checked. Those checks were only run in cases of severe or unusual damage though. Directory damage takes out files wholesale, and can affect data that never entered the computer, and not due to any hardware failure in the storage.
For the record, I manage over 20tb of data here, and to date I've lost two files. One was a blonde moment with RM on a file that wasn't backed up. (I had NO idea that RM followed symlinks!) The other was a failed slice in a mirror that cost me a singe document. That's over a span of over 20 years. If you've lost over 20 files in the last 10 years, you're doing something (or more probably several somethings) wrong.

--
I work for the Department of Redundancy Department.
Re:28 files in 6 years is a hardware defect by Antique+Geekmeister · 2014-06-14 03:17 · Score: 1

"rm" doesn't follow symlinks. However, if you have a symlink that is a directory, and hit "tab" to complete the link's name, it will put a dangling "/" on the link name. _That_ is referencing the directory from effectively "inside" the actual target directory.
I've had several conversations with colleagues over why just hitting 'tab for completion' can be hazardous. This is one of the particular cases.
Re:28 files in 6 years is a hardware defect by kthreadd · 2014-06-14 03:23 · Score: 1

That depends on your shell. Bash works that way, but zsh does not; at least not by default as far as I know.
Re:28 files in 6 years is a hardware defect by gweihir · 2014-06-14 04:58 · Score: 1

This is not a filesystem-layer issue at all. And I agree, if the disk did not detect defective sectors, then this was not bit-rot on them. While the rate for uncorrectable sectors stated by disk manufacturers is something like 1 in 10^15, the rate for undetected bad sectors is so low it will not happen, unless the disk electronics that calculates the checksums is dying.
And people that do not verify the files they put into an archive have not understood the first thing about archiving data or making backups. Yes, stupidity will cause you to lose data. It is not a technology problem, it is a problem of people making assumptions without verifying them.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Detachment by PopeRatzo · 2014-06-14 02:04 · Score: 1

The solution is to not become too attached to data. It's all ephemeral anyway, in the grand scheme of things.

--
You are welcome on my lawn.

Re:Detachment by grasshoppa · 2014-06-14 02:10 · Score: 1

Well, in the "grand scheme of things", so are we.
Me? I get rather attached to the source file I've been working on for the past 6 months.

--
Mod me down with all of your hatred and your journey towards the dark side will be complete!
Re:Detachment by NormalVisual · 2014-06-14 02:11 · Score: 2

Yeah, tell that to the IRS when you go to pull your records during an audit... ;-)

--
Please stand clear of the doors, por favor mantenganse alejado de las puertas
Re:Detachment by Anonymous Coward · 2014-06-14 02:30 · Score: 1

Tell them you stored all of your records in e-mail messages. They'll understand the loss.
Re:Detachment by 93+Escort+Wagon · 2014-06-14 06:44 · Score: 1

Tell them you stored all of your records in e-mail messages. They'll understand the loss.
What loss? They can ask the NSA for their copies.

--
#DeleteChrome
Re:Detachment by PopeRatzo · 2014-06-14 14:58 · Score: 1

You'll never find enlightenment as long as you remain invested in earthly things such as "source files".
Learn to let go, grasshoppa. This is the way to enlightenment.
Now go practice your kung fu and let me finish this polish sausage.

--
You are welcome on my lawn.

Isn't Samsung the largest UNIX vendor? *grin* by sirwired · 2014-06-14 02:06 · Score: 1, Informative

Due to their commanding smartphone marketshare, along with millions of devices with embedded Linux shipped every year, wouldn't Samsung be the largest UNIX vendor?

Oh? What's that? You weren't counting embedded Linux and I'm a pedantic #$(*#$&@!!!. Can't argue with that!

Re:Isn't Samsung the largest UNIX vendor? *grin* by jo_ham · 2014-06-14 02:10 · Score: 1

Now there's a can of worms. I think the question "Is Linux really Unix?" is a guaranteed heat-generator.
Re:Isn't Samsung the largest UNIX vendor? *grin* by Antique+Geekmeister · 2014-06-14 03:14 · Score: 1

If you follow the specifications, there's no need for heat. No Linux variant has been certified according to the POSIX standards for UNIX, and most variants have subtle ways in which they diverge from the POSIX standards, at least subtly. Wikipedia has a good note on this at http://en.wikipedia.org/wiki/S...
Personally, I've found each UNIX to each have some rather strange distinctions from the other UNIX's, and using the GNU software base and the Linux based software packages to assure compatibility among the different UNIX variants.
Re:Isn't Samsung the largest UNIX vendor? *grin* by Bing+Tsher+E · 2014-06-14 05:23 · Score: 1

But is Mach UNIX? I don't mean 'POSIX compliant' because Windows NT 4.0 is POSIX compliant.
I have several UNIX license plates. They are officially licensed and sold (or were sold) by The Open Group.
Saying that Apple pimps off UNIX to produce their closed-source candyland binaries isn't really 'The UNIX Way' no matter how much it's one of Apple's new 'Altivec Unit' bullshit bullet points.
Re:Isn't Samsung the largest UNIX vendor? *grin* by Bing+Tsher+E · 2014-06-14 05:25 · Score: 1

UNIX is a brand name, and POSIX is a standard to meet. Just like an appliance might be UL Certified, an OS might be POSIX compliant.
The trademark of the UNIX brand name is owned by a whole separate group.
Re:Isn't Samsung the largest UNIX vendor? *grin* by Bing+Tsher+E · 2014-06-14 05:28 · Score: 1

Linux stands for 'we didn't allow Linus to call it Freax.' Nothing more.
Re:Isn't Samsung the largest UNIX vendor? *grin* by Karlt1 · 2014-06-14 05:46 · Score: 1

Due to their commanding smartphone marketshare, along with millions of devices with embedded Linux shipped every year, wouldn't Samsung be the largest UNIX vendor?
Oh? What's that? You weren't counting embedded Linux and I'm a pedantic #$(*#$&@!!!. Can't argue with that!
Mac OS X is Unix -- it's been certified as Unix by the group that holds the copyright to the term. Every version of OS X from 10.5 - 10.9 except for 10.7 has been certified unix.
http://www.opengroup.org/openb...
Linux is Unix like.
Re:Isn't Samsung the largest UNIX vendor? *grin* by jbolden · 2014-06-14 08:13 · Score: 1

Given Linux's intellectual and usage dominance I'd say that the old Open Systems approach clearly no longer works. A standard that excludes Linux is not a standard. So I'm coming down that POSIX / Open Group should not be the definition of UNIX.
Re:Isn't Samsung the largest UNIX vendor? *grin* by jbolden · 2014-06-14 08:14 · Score: 1

Not it doesn't. LINUX stands for "Linus' Minix".
Re:Isn't Samsung the largest UNIX vendor? *grin* by sjames · 2014-06-14 08:34 · Score: 1

The heat comes from the reason it isn't certified. Due to the costs involved, nobody has tried it. Same deal for *BSD.
When most people ask if it is Unix, they mean would a person familiar with Unix have any problems with tricks and traps or not. That is a much more subtle question, but also a much more important one.
I have used and admined several and find that in spite of certification, each has it's own quirks. Linux and BSD don't seem to have any more than the others. So while calling Linux or *BSD Unix might violate a trademark, they are, for practical purposes, Unix systems in all but name.
Re:Isn't Samsung the largest UNIX vendor? *grin* by jo_ham · 2014-06-14 08:44 · Score: 1

It is - they bought a UNIX licence back in the NeXT days I believe.
OS X is POSIX compliant and ostensibly the core pieces of NeXT.
Re:Isn't Samsung the largest UNIX vendor? *grin* by sl149q · 2014-06-14 09:04 · Score: 1

Is it more important that Linux be considered to be POSIX or for POSIX to figure out how to accomodate LInux?
Re:Isn't Samsung the largest UNIX vendor? *grin* by grub · 2014-06-14 09:24 · Score: 1

Linux isn't UNIX. Apple is the largest UNIX vendor in the world.

--
Trolling is a art,
Re:Isn't Samsung the largest UNIX vendor? *grin* by Guy+Harris · 2014-06-14 09:28 · Score: 1

But is Mach UNIX? I don't mean 'POSIX compliant' because Windows NT 4.0 is POSIX compliant.
If Mach is "the Mach kernel", I don't think it offers UNIX APIs, but at least two OSes based on Mach have passed the Single UNIX Specification test suite (which NT 4.0 hasn't, and which even Interix^Wthe Subsystem for Unix-based Applications hasn't).
Re:Isn't Samsung the largest UNIX vendor? *grin* by jbolden · 2014-06-14 12:30 · Score: 1

UNIX" has a specific meaning, both in terms of branding and adhering to a defined standard.

I disagree. I don't think UNIX is a brand. I think it is a cultural movement that led to a variety of products of which the Open Software movement of the 1990s was a part.

since any standard that excludes Apple and Microsoft is not a standard.

I'd agree with that providing you mean "personal computing standard" or "desktop standard" or whatever. Yes, absolutely. That was precisely my position with Internet Explorer any standard Microsoft didn't buy into isn't a standard.
Re:Isn't Samsung the largest UNIX vendor? *grin* by jbolden · 2014-06-14 12:55 · Score: 1

UNIX is a registered trademark. UNIX as an entity pre-existed that trademark. UNIX is used as a word in ways that aren't associated with the Open Group. They've been fighting real hard to assert that UNIX isn't a generic term because if it is a generic term they lose their trademark. The fact that they've had to fight its use as a generic term is something even the Open Group agrees to. It is you who is ignoring verifiable facts.
Re:Isn't Samsung the largest UNIX vendor? *grin* by Jane+Q.+Public · 2014-06-14 13:04 · Score: 1

No Linux variant has been certified according to the POSIX standards for UNIX, and most variants have subtle ways in which they diverge from the POSIX standards, at least subtly.
Haha. Now you've opened a completely different can of worms. For just one example: why should POSIX matter much these days?

BSD, for example, can essentially (though perhaps not completely technically) be called "Linux with extensions". (They deny it but their own description pretty much gives no technical differences except to say that Linux binaries won't run... and without further explanation that could simply be a compiler dead-man. The only specific difference they point out is licensing.) And the only real reason BSD isn't POSIX-compliant is because they have no interest in paying the fees.

Take OS X for example... it's built on BSD yet it IS POSIX-compliant. Because they wanted to be and paid the cert fees. Big deal.

If OS X and even Windows can be made POSIX compliant (they can), then just about anything could be made POSIX compliant if the owners wanted to bother. They just don't want to.

So now that the waters are thoroughly muddied, I'll muddy them further by saying: today, if you're not an Enterprise shop... you should ask yourself whether you really have any reason to give a shit.
Re:Isn't Samsung the largest UNIX vendor? *grin* by Jane+Q.+Public · 2014-06-14 13:09 · Score: 1

Given Linux's intellectual and usage dominance I'd say that the old Open Systems approach clearly no longer works. A standard that excludes Linux is not a standard. So I'm coming down that POSIX / Open Group should not be the definition of UNIX.
Yes, but you should clarify that "Open Groups" is the name of a group, and doesn't come close to representing all of Open Source or Open Software. You know that, I know that, but not everybody knows that.

Having said that, I think we are basically agreed. The existing POSIX should be re-labeled just POS, and we should all just move on. I am all for open technical standards, but if they aren't changing with the times, then the times will move along without them.
Re:Isn't Samsung the largest UNIX vendor? *grin* by Jane+Q.+Public · 2014-06-14 13:15 · Score: 1

When most people ask if it is Unix, they mean would a person familiar with Unix have any problems with tricks and traps or not. That is a much more subtle question, but also a much more important one.
I think a better question would be: can I take my code for X, natively compile it on Y, and expect it to run?

Although it is built on BSD, the majority of C code written for Linux will compile and run in OS X. At least, just about everything I've tried has. I haven't tried to do FPS games or anything. But "work" code, sure. Despite differences here and there, it's a *nix system and works like one. Even X11 stuff runs fine (like Gimp for example), even though OS X has its own proprietary GUI.
Re:Isn't Samsung the largest UNIX vendor? *grin* by Jane+Q.+Public · 2014-06-14 13:22 · Score: 1

Not it doesn't. LINUX stands for "Linus' Minix".
Unfortunately the etymology of the word is lost in obscurity. Even Linus' own word on the matter is no longer trustworthy.

The fact that around the same time, self-referencing acronyms ("Gnu is Not Unix", for example) were very popular and it is likely you won't convince many people even if you're right, unless you have an unimpeachable source.
Re:Isn't Samsung the largest UNIX vendor? *grin* by Jane+Q.+Public · 2014-06-14 13:27 · Score: 1

Mac OS X is Unix -- it's been certified as Unix by the group that holds the copyright to the term. Every version of OS X from 10.5 - 10.9 except for 10.7 has been certified unix.
No. OS X meets the "Open Group" standard called POSIX, which means it is "sufficiently" Unix-like... to meet that standard.

That is all it means. It doesn't mean "OS X is Unix". If anything, it is more like Linux than Unix, but it isn't quite either one.

There are versions of Linux that could also be POSIX-compliant if they wanted to make a minor tweak or two, and pay the certification fees. They don't want to bother. It's that simple.
Re:Isn't Samsung the largest UNIX vendor? *grin* by Karlt1 · 2014-06-14 14:09 · Score: 1

No. OS X meets the "Open Group" standard called POSIX, which means it is "sufficiently" Unix-like... to meet that standard.
That is all it means. It doesn't mean "OS X is Unix". If anything, it is more like Linux than Unix, but it isn't quite either one.
No. Did you read the link? The Open Group certified OS X as meeting all of the requirements to be certified as " Unix".
POSIX compliance is different.
Re:Isn't Samsung the largest UNIX vendor? *grin* by Jane+Q.+Public · 2014-06-14 14:28 · Score: 1

No. Did you read the link? The Open Group certified OS X as meeting all of the requirements to be certified as " Unix".

POSIX compliance is different.
NO, it isn't! POSIX *IS* the "Single Unix Specification"! They are the same things!

POSIX certification does NOT mean the OS "is Unix"!!!
Re:Isn't Samsung the largest UNIX vendor? *grin* by jones_supa · 2014-06-14 15:50 · Score: 1

Lost in the obscurity? Just to get the facts straight, back in the day Linux was initially called "Freax". Ari Lemmke provided some FTP space for Linus and he tongue-in-cheek created a directory called "linux". There is nothing ambiguous about the background of the name.
Re:Isn't Samsung the largest UNIX vendor? *grin* by Jane+Q.+Public · 2014-06-14 16:08 · Score: 1

I don't say this often, because this is /., not Wikipedia. But for the same reason I mentioned before: [citation needed].

Everything you say may be true. But it contradicts many things other people have been saying, for a very long time.

So an assertion that "it is called that because" needs some evidence. I didn't make a claim, GP did. I didn't say anybody was lying, I just said other people (for a very long time now) have said other things.

So get off MY butt and produce evidence, or shut up.
Re:Isn't Samsung the largest UNIX vendor? *grin* by jbolden · 2014-06-14 16:47 · Score: 1

Glad you agree on the main point.
I don't know that the Open Group even claims to represent Open Source at all. They represented Open Standards which was an earlier movement about interoperability between commercial vendors. They claim to represent interests of "customers" not "users" as per Open Source. They claim to work with various "suppliers" not "developers", etc...
Re:Isn't Samsung the largest UNIX vendor? *grin* by jbolden · 2014-06-14 16:49 · Score: 1

Small correction. Ari created a directory called "LINIX", the more unixy "LINUX" was one more stage.
Re:Isn't Samsung the largest UNIX vendor? *grin* by jones_supa · 2014-06-14 16:51 · Score: 1

Ah, interesting. Didn't know that part. :)
Re:Isn't Samsung the largest UNIX vendor? *grin* by BasilBrush · 2014-06-15 08:43 · Score: 1

More Unixy? LINIX is closer to UNIX than LINUX.
Back in the 70s there used to be a UK porn magazine called "FLICK. It was called that because if you pick the font and kerning right, that L and I begin to look like a U.
Re:Isn't Samsung the largest UNIX vendor? *grin* by jbolden · 2014-06-15 13:09 · Score: 1

Yep. It makes more sense for Linus' Minix = LINIX. I've often wondered if the pronunciation issue "Lin-ix" vs. "Lin-ux" comes from LINIX / LINUX name that even though LINIX didn't last long it lasted long enough to get into the oral culture.
Re:Isn't Samsung the largest UNIX vendor? *grin* by jbolden · 2014-06-15 13:10 · Score: 1

LINIX = Linus' Minix
Linux = a Unix variant (the letters for Unix are in there)
Re:Isn't Samsung the largest UNIX vendor? *grin* by Bill_the_Engineer · 2014-06-16 00:23 · Score: 1

I think the actual question being "Is Android on Linux really Unix?" should cause very little heat since most people wouldn't care or think "No" is pretty obvious.

--
These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
Re:Isn't Samsung the largest UNIX vendor? *grin* by jo_ham · 2014-06-16 06:17 · Score: 1

I think the actual question being "Is Android on Linux really Unix?" should cause very little heat since most people wouldn't care or think "No" is pretty obvious.
Tell that to the guy in a parallel thread to this one who thinks that because "any standard that doesn't include Linux is not a standard" and that because he doesn't think that the Open Group deserves to control what is and isn't Unix that he is taking the term for himself and declaring that it means whatever he wants it to mean.
Re:Isn't Samsung the largest UNIX vendor? *grin* by Bill_the_Engineer · 2014-06-16 06:34 · Score: 1

Well some folks are fanatics.

--
These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...

So far what I lost... by cpct0 · 2014-06-14 02:08 · Score: 4, Interesting

Bitrot is not usually the issue for most files. Sometimes, but it's rare. What I lost is a mayhem repository of hardware and software and human failure. Thanks for backup, life :)

On Bitrot:

- MP3s and M4As I had that suddenly started to stutter and jump around. You play the music and it starts to skip. Luckily I have backups (read on for why I have multiple backups of everything :) ) so when I find them, I just revert to the backup.
- Images having bad sectors like everyone else. Once or twice here or there.

- A few CDs due to CD degradation. That includes one that I really wish I'd still have, as it was a backup of something I lost. However, the CD takes hours to read, and then eventually either balks up or not for the directory. I won't tell you about actually trying to copy the files, especially with normal timeouts in modern OSes or the hardware pieces or whatnot.

Not Bitrot:

- Two RAID Mirror hard drives, as they were both the same company, and purchased at the same time (same batch), in the same condition, they both balked at approximately the same time, not leaving me time to transfer data back.

- An internal hard drive, as I was making backups to CDs (at that time). For some kind of reason I still cannot explain, the software thought my hard drive was both the source and the destination !!!! Computer froze completely after a minute or two, then I tried rebooting to no avail, and my partition block was now containing a 700mb CD image, quarter full with my stuff. I still don't know how that's possible, but hey, it did. Since I was actualy making my first CD at the time and it was my first backup in a year, I lost countless good files, many I gave up upon (especially my 90's favorite music video sources ripped from the original betacam tapes in 4:2:2 by myself).

- A full bulk of HDs on Mac when I tried putting the journal to another internal SSD drive. I have dozens of HDDs, and I thought it'd go faster to use that nifty "journal on another drive" option. It did work well, although it was hell to initialize, as I had to create a partition for each HDD, then convert them to journaled partitions. Worked awesomely, very quick, very efficient. One day after weeks of usage, I had to hard close the computer and its HDD. When they remounted, they all remounted in the wrong order, somehow using the bad partition order. So imagine you have perfectly healthy HDDs but thinking they have to use another HDDs journal. Mayhem! Most drives thought they were other ones, so my music HDD became my photos HDD RAID, my system HDD thought it was the backup HDD, but just what was in the journal. It took me weeks sporting DiskWarrrior and Data Rescue in order to get 99% of my files back (I'm looking at you, DiskWarrior as a 32 bit app not supporting my 9TB photo drive) with a combinaison of the original drive files and the backup drive files. Took months to rebuild the Aperture database from that.

- All my pictures from when I met my wife to our first travels. I had them in a computer, I made a copy for sure. But I cannot find any of that anywhere. Nowhere to be found, no matter where I look. Since that time, many computers happened, so I don't know where it could've been sent. But I'm really sad to have lost these

- Did a paid photoshoot for an unique event. Took 4 32GB cards worth of priceless pictures. Once done with a card, I was sifting through the pictures with my camera and noticed it had issues reading the card. I removed it immediately. When at home, I put the card in my computer, it had all the troubles in the world reading it (but was able to do so), I was (barely) able to import its contents to Aperture (4-5 pictures didn't make the cut, a few dozens had glitches). It would then (dramatically, as it somehow have its last breath after relinquishing its precious data) not read or mount anywhere, not even being recognized as a card by the readers. Childs, use new cards regularly for your gigs :)

- A RAID array b

Re:So far what I lost... by Electricity+Likes+Me · 2014-06-14 04:03 · Score: 1

Scratched CDs can be recovered by polishing them with Brasso.
The trick is it polishes the scratches out flat so they don't mess with reflections of the reader.
Re: So far what I lost... by cpct0 · 2014-06-14 04:38 · Score: 1

Yeah, with scratches it's a valid assumption. However, in my case it was cheap CDs with inks that degraded, so the reflectivity of the data itself was degraded, the drive was ultimately unable to retrieve data on most sectors, or it was able after dozen of reads over the same block of data, until the data got its green flag from the recovery algorithm embedded in Data-CD format specs.
A scratch is localized. CD dye degradation is global. But thanks for the idea.
Re:So far what I lost... by steelfood · 2014-06-14 05:17 · Score: 1

I've had pressed CDs that began developing holes in the reflective layer starting from the outside edge. Those defects are irrecoverable.
Scratched CDs and DVDs are not as big of a deal these days as they used to be. Really good drives can handle them without losing or corrupting data, though the drive would slow down over the scratched areas, so if the disc was bad, the read speeds would drop to almost nothing. There used to be a really good site that did incredibly detailed reviews of optical drives, including taking black sharpies to media. Not sure if it's around anymore.
But honestly, who uses optical media anymore? Especially after the HD-DVD/Blu-ray debacle, I think everyone's turned off by optical media and prefers streaming now.

--
"If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be."

And the story is? by Immerman · 2014-06-14 02:08 · Score: 3, Insightful

Bitrot. It's a thing. It's been a thing since at least the very first tape drive - hell it was a thing with punch cards (when it might well have involved actual rot). While the mechanism changes, every single consumer-level data-storage system in the history of computing has suffered from it. It's a physical phenomena independent from file system, and impossible to defend against in software unless it transparently invokes the one and only defense: redundant data storage. Preferably in the form of multiple redundant backups.

So what is the point of this article?

--
--- Most topics have many sides worth arguing, allow me to take one opposite you.

Re:And the story is? by Immerman · 2014-06-15 02:40 · Score: 1

That's why you need multiple redundant backups - bitrot *will* hit all of them, but it's extremely unlikely to hit them all in the same spot, so redundant backups stored in an error-detecting format will allow you to reconstruct a single good copy.

--
--- Most topics have many sides worth arguing, allow me to take one opposite you.

reading checksum + n blocks is SLOW by raymorris · 2014-06-14 02:13 · Score: 1

It's not a matter of CPU load. Suppose you have one checksum block for every eight data blocks. In order to verify the checksum on read, you have to read the checksum block and all eight data blocks. So you have to read a total of nine blocks instead of one. Reading from the disk is one if the slowest operations in a computer, so ddoing it nine times instead of one slows things down considerably.

Re:reading checksum + n blocks is SLOW by jbolden · 2014-06-14 02:57 · Score: 2

You don't have checksum blocks in the space efficient method. Rather in the computational way I'm talking about it is a transformation. You might have something like every 6354 bits becomes 6311 bits after the complex transformation. It doesn't slow down the read but you have to do math.

Clickbait generating shit by Torp · 2014-06-14 02:14 · Score: 1

The real article would be titled "file systems with no data redundancy and no checksums are vulnerable to bitrot".
That covers about any file system with the lone exception of ZFS when ran on a raid, maybe btrfs? and i guess some mainframe stuff.

--
I apologize for the lack of a signature.

Re:Clickbait generating shit by gweihir · 2014-06-14 05:01 · Score: 1

Not even that. As the case does not seem to involve unreadable sectors, the corruption did likely not happen on disk. So the title should more be "people that are to stupid to verify their backups are readable and correct may lose data". He may also have copied his data around without making sure the copy matches the original. Stupid.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

article is suspect, summary is worse by sribe · 2014-06-14 02:15 · Score: 4, Informative

In a footnote he admits that the corruption was caused by hardware issues, not HFS+ bugs, and of course the summary ignores that completely.

So, for that, let me counter his anecdote with my own anecdote: I have an HFS+ volume with a collection of over 3,000,000 files on it. This collection started in 2004, approximately 50 people access thousands of files on it per day, and occasionally after upgrades or problems it gets a full byte-to-byte comparison to one of three warm standbys. No corruption found, ever.

Re:article is suspect, summary is worse by pla · 2014-06-14 03:38 · Score: 1

In a footnote he admits that the corruption was caused by hardware issues, not HFS+ bugs, and of course the summary ignores that completely.

The summary doesn't claim HFS caused the bitrot, you read that into it. The summary merely points out that HFS doesn't reliably detect and correct flaws in the underlying storage media (as does NTFS, as does almost every filesystem widely used).

More importantly, while merely detecting this issue may not incur too much overhead, correcting it requires some fairly large degree of redundancy. So although plenty of people have mentioned assorted alternative FSs that may (or may not) actually address the problem, doing so still requires wasting some not-insignificant percent of your disk space (a mere 10% of a 4TB drive would hold a whopping 90 full single-layer DVDs).

Joe Sixpack doesn't even get why 4TB doesn't equal 4 TiB; you expect him to understand the concept of parity striping to deal with cosmic rays randomly flipping bits on his platters? Try explaining that one to the public, and next time you visit Grandma, you can expect to find her PC dead because she wrapped it in tinfoil, including the ventilation fans.
Re:article is suspect, summary is worse by sribe · 2014-06-14 04:28 · Score: 1

The summary doesn't claim HFS caused the bitrot, you read that into it.
The summary's first sentence ends: "about data loss suffered under Apple's venerable HFS+ filesystem" and shortly thereafter it continues with: "HFS+ lost a total of 28 files over the course of 6 years." So the chosen wording most certainly does imply that HFS is at fault. One has to click the link to the article, then read all the way through the frickin' footnotes before one encounters anything to explicitly disavow that implication.
Re:article is suspect, summary is worse by pla · 2014-06-14 04:51 · Score: 1

Yes, it does - But those all hold true. The corruption did occur under HFS+, and HFS+ did "lose" some portion of those files. It didn't, however, cause the corruption or the loss of those files. You have read attribution into a scenario where none exists.

In fairness, I will agree with you that TFS (and to a lesser degree, TFA) doesn't clearly discriminate between "HFS-induced damage" and "cosmic-ray-induced damage". But they both knew the root cause, and it wouldn't have made sense to blame it on HFS+ unless they did so as an outright deception. Do you claim they did so intentionally, or just that their wording leaves room for interpretation? If the latter, I will agree with you. If the former, I don't know what else to say except that I don't agree.
Re:article is suspect, summary is worse by laird · 2014-06-14 12:19 · Score: 1

The issue is that the headline and summary have HFS all over the place, and even say that "HFS corrupted files", when HFS wasn't relevant to the corruption - no standard filesystem protects you completely from disk drives' blocks going bad.
That being said, some of the high end SAN/NAS systems do have controls like forcing all blocks on the device to be read and (if needed) rewritten periodically, which would refresh the data and prevent "bit rot". But that's not done by the filesystem, either - it's a layer between the filesystem and the disk drives.

--
Enable 3D printed prosthetics!
Re:article is suspect, summary is worse by jones_supa · 2014-06-14 16:17 · Score: 1

Agree. Apple is in the market of creating premium products and thus its creations should also be scrutinized rigorously.

Clueless article by alexhs · 2014-06-14 02:27 · Score: 4, Informative

People talking about "bit rot" usually have no clue, and this guy is no exception.

It's extremely unlikely that a file would become silently corrupted on disk. Block devices include per-block checksums, and you either have a read error (maybe he has) or the data read is the same as the data previously written. As far as I know, ZFS doesn't help to recover data from read errors. You would need RAID and / or backups.

Main memory is the weakest link. That's why my next computer will have ECC memory. So, when you copy the file (or otherwise defragment or modify the file, etc), you read a good copy, some bit flips in RAM, and you write back corrupted data. Your disk receives the corrupted data, happily computes a checksum, therefore ensuring you can read back your corrupted data faithfully. That's where ZFS helps. Using checksumming scripts is a good idea, and I do it myself. But I don't have auto-defrag on Linux, so I'm safer : when I detect a corrupted copy, I still have the original.

ext2 was introduced in 1993, and so was NTFS. ext4 is just ext2 updated (ext was a different beast). If anything, HFS+ is more modern, not that it makes a difference. All of them are updated. By the way, I noticed recently that Mac OS X resource forks sometimes contain a CRC32. I noticed it in a file coming from Mavericks.

--
I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.

Re:Clueless article by gweihir · 2014-06-14 05:06 · Score: 1

I agree. Silent corruption on disk data can basically only happen with a defective bit in the disk's RAM. Even then, it is exceedingly unlikely and the defective bit will also make sectors that are fine show up as defective.
As to main RAM corruption, you do not need to use ECC. You just need to verify what you put on disk, flushing OS disk caches before.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:Clueless article by rabtech · 2014-06-14 05:56 · Score: 2

People talking about "bit rot" usually have no clue, and this guy is no exception.
It's extremely unlikely that a file would become silently corrupted on disk. Block devices include per-block checksums, and you either have a read error (maybe he has) or the data read is the same as the data previously written. As far as I know, ZFS doesn't help to recover data from read errors. You would need RAID and / or backups.
I'm afraid it is you who is clueless. Up until ZFS started gaining traction, we all had the luxury of assuming the storage chain was reliable (RAM, SATA controller, cables, drive firmware, read/write heads, oxide layers, etc). Or at least we would know something went wrong.
But it was found that in the actual real world, these systems all silently corrupt data from time to time. The problem is much worse as the volume of data grows because the error rates are basically unchanged, meaning what was once expected to be a random bit flip that would strike one user out of a million once per year is now something that strikes every single user multiple times per year.
I'm not talking theory or what *should* happen. I'm talking about actual real world experience with check summing filesystems that demonstrate, beyond any doubt, that bit rot happens and happens far more frequently than most people believe. Actual experience with ZFS proves that disks can and **will** read back out different bits than what was written silently with no block read errors.
Further, you're increadibly ignorant of now ZFS or BTRFS deal with redundancy. You can setup to mirror blocks, in some cases on a per-file or directory basis, providing protection against corrupting. A background scrubber scans the disk when idle cycles are available and detects and repair corrupting from the available good blocks, or log an error if there are no good mirrors or parity blocks available.
With our new knowledge and experience it is no longer sufficient to cross our fingers and hope for the best. We cannot trust filesystems or the underlying hardware, we must verify.

--
Natural != (nontoxic || beneficial)
Re:Clueless article by Anonymous Coward · 2014-06-14 08:21 · Score: 1

As far as I know, ZFS doesn't help to recover data from read errors. You would need RAID and / or backups.
Data point: ZFS can recover data from read errors, as long as you're using some form of data redundancy as part of a vdev or pool, i.e. a mirror or raidzX.
For example, say your pool consists of 2 mirrored devices (disks): if the read from one of those fails (effectively returns EIO), ZFS can and will read from the other device in attempt to get the data it wants. If it's successful, it logs a read error for that device (see "zpool status") and informs you, in layman's terms, "hey, I got a read error from this device, but I was able to read it from the other device so valid data was given to the userland app that did a read(), but you should probably do something about that device". The userland application doing the read() never sees any of this go on -- it's happening at the ZFS and kernel layer.
The same methodology applies for raidzX (through parity and other methodologies), ditto with if you have multiple vdevs consisting of redundancy methods (think RAID-10).
If you're using ZFS with a single device (i.e. no redundancy) then ZFS will inform you of the read error but cannot "auto-correct" it -- meaning the underlying userland application syscall gets EIO.
You already covered checksumming so I won't go into that, but checksumming does not guarantee you can recover data -- only that you can detect things like silent corruption or "bit rot".
P.S. I have never seen "bit rot" where the magnetic media on a hard disk has quietly gone bad on any system I've used ZFS on (this would show up as a random checksum error in "zpool status"). I'm more inclined to believe what others have reported as "bit rot" is more likely filesystem corruption through software means (bugs), or unexpected power loss on the system (this does wonders to a filesystem; journalling doesn't recover your data, it just ensures your filesystem is usable after-the-fact). I do data recovery (software-based) as a hobby and spend a lot of time reading about and tinkering with actual ATA protocol.
Re:Clueless article by dargaud · 2014-06-14 09:07 · Score: 1

I had a home server / workstation with ECC, but had to convert to a laptop: no ECC anywhere except maybe a few milspec models that cost the price of a car and have the specs of a watch...

--
Non-Linux Penguins ?
Re:Clueless article by eWarz · 2014-06-14 18:06 · Score: 1

As someone who has 20 mb hard drives from decades ago...I find this whole bit rot thing a bit hard to swallow. While i'm not saying it's not possible...it's not as likely as people claim. What has more likely happened is the amount of improper power ons/offs/resets damaged the data as it was being written to disk to begin with.
Re:Clueless article by swilver · 2014-06-15 05:56 · Score: 1

Just use ECC RAM. The price difference is tiny.
Of course you can verify, but very few people do that, and it certainly is not something most filesystems do for performance reasons.
Re:Clueless article by swilver · 2014-06-15 05:59 · Score: 1

...and I suppose this silent corruption was verified by reading it into main memory?
My own simple tests (copy 1 TB of data from one place to another) on ECC and non-ECC systems showed quite clearly where the culprit was. Bit error rates of 1 bit/100 GB with the non-ECC system showed the problem clearly.
Re:Clueless article by gweihir · 2014-06-15 10:00 · Score: 1

ECC RAM does not help you against most sources of corruption. Defective controllers, bus drivers, etc. all are completely untouched by ECC RAM. On the other hand, verifying data helps against all of them. If you do not verify, then you will have undetected bad data on disk sooner or later.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:Clueless article by toddestan · 2014-06-15 12:46 · Score: 1

I have wondered about those i3-taking, ECC-supporting server boards if the error checking still works with the consumer processors.

Since the memory controller is part of the CPU you can't just drop in a regular consumer processor and get ECC this way. You're stuck with whatever models that Intel decides to turn on the ECC bit for, which is pretty much the Xeons and a few oddball embedded versions.

Btrfs by Flammon · 2014-06-14 02:53 · Score: 1

I've slowly been moving all my systems to Btrfs from least important to most important and have had no problems so far.

--
ayottesoftware.com

Re:Btrfs by wisnoskij · 2014-06-14 04:14 · Score: 1

Btrfs "pronounced "Butterface"" - Wikipedia
Lol.
Strangely that acronym could also stand for BiT Rot Free System which is pretty ironic, I guess.

--
Troll is not a replacement for I disagree.
Re:Btrfs by gweihir · 2014-06-14 05:07 · Score: 1

You are moving data you care about to a new and not well-aged filesystem? Then you will get all the data-loss you deserve.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:Btrfs by Flammon · 2014-06-14 05:33 · Score: 1

There are varying degrees of stability and I felt that after 7 years of development, official inclusion into the Linux kernel, Facebook deployment and the default fs on OpenSUSE that it's good enough for my laptop, workstation and a few other systems. Having that said, I've not migrated by backup drives yet, they're still on XFS. It may be a while until I migrate those. http://www.phoronix.com/scan.p...

--
ayottesoftware.com
Re:Btrfs by gweihir · 2014-06-14 08:10 · Score: 1

Well, if your date is worthless, then by all means, go ahead. Here is a hint though: How long an FS has been in development is completely immaterial. What matters is how long it has been stable.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:HFS reliability by sribe · 2014-06-14 03:05 · Score: 3, Insightful

Anyone who owned a Mac since the 80s remembers having to use Norton Disk Doctor and later DiskWarrior at least once per month to repair the filesystem. Entire folders could go randomly missing each time you booted up your Mac, and if you accidentally lost power to your hard drive, the use of one of those was mandatory.

Oh yes. I remember those days well. Journaled HFS+ fixed that, and for about the last decade the only times I have encountered a corrupted file system on a Mac, that discovery was followed shortly by total failure of the hard disk.

So, what was your fucking point?

Re:HFS reliability by Smurf · 2014-06-14 03:07 · Score: 1

Anyone who owned a Mac since the 80s remembers having to use Norton Disk Doctor and later DiskWarrior at least once per month to repair the filesystem. Entire folders could go randomly missing each time you booted up your Mac, and if you accidentally lost power to your hard drive, the use of one of those was mandatory.

No, not "anyone who owned a Mac since the 80s...". My first Mac was a Mac Plus bought in 1987 (IIRC), and I have never used those tools nor experienced the problems you mention.

So answer me this... by trparky · 2014-06-14 03:08 · Score: 1

Some people are talking about the fact that bitrot could happen as a result of bad RAM. Are you talking about bad system RAM or the RAM onboard the HDD's controller board?

If it was indeed bad system RAM, wouldn't bad system RAM cause a random BSOD (Windows) or Kernel Panic (Linux)? With how much RAM we use these days it's very likely we're going to be using all of the storage capacity of each of the DIMMs that we have in our systems.

Myself I have 16 GBs of RAM in my Windows machine and at any moment in time I'm using at the very least 40% of the RAM in the system with spikes up to at least 60% depending upon what I'm doing at the time. So with that said, the possibility of kernel memory structures being corrupted at some point while using memory (in even less used DIMMs in your system) I figure is going to happen. I'm not sure how the memory in the DIMMs are being used though. Is it being used sequentially? (DIMM 0, chip 1... 2... 3... 4, DIMM 1, chip 1... 2... 3...4, etc.) Or is the data thrown about randomly on the DIMMs?

Myself, if I had a random BSOD just happen I'd be running MemTest86+ in a hot second to test my system RAM and be asking to Corsair (the company that made my DIMMs) for an RMA.

So if does indeed turn out to be bad system RAM that causes this, I guess that it's a good idea not to be buying cheap RAM to begin with. Myself, I've never had a problem with Corsair Vengeance RAM modules so I will continue to buy that line of Corsair memory.

Re:So answer me this... by fnj · 2014-06-14 04:20 · Score: 1

If it was indeed bad system RAM, wouldn't bad system RAM cause a random BSOD (Windows) or Kernel Panic (Linux)?
Likely so, but if we are talking about errors that only show up in 28 file-reads out of millions of file-reads, there is no reason to believe that you would be bound to see such a panic during the period in question.
BTW, bad RAM anywhere in the chain from disk drive to CPU - main system RAM, CPU cache RAM, hard drive cache RAM, controller RAM, etc - could cause such a panic, since most data travels all the way through such a chain. I am rather awestruck at how reliable the millions to billions of transistors in that chain actually are.
Re:So answer me this... by trparky · 2014-06-14 04:33 · Score: 1

If I were in your shoes, if that module failed a MemTest (even just one pass) then that module will be getting replaced with an RMA from the RAM manufacturer. I don't care if the system is stable, if that module failed... it's getting replaced.
Re:So answer me this... by silas_moeckel · 2014-06-14 08:29 · Score: 1

It need not be bad ram. If your not running ECC from top to bottom a stray bit of radiation etc will flip a bit every now and then. ECC lets this be detected and corrected. This can be an issue with the whole chain and is a tradeoff calculating ecc means added latency, requires buffers which in themselves give more places to have a bit flip.

--
No sir I dont like it.
Re:So answer me this... by jones_supa · 2014-06-14 16:28 · Score: 1

Unlike bit flips from radiation, RAM defects aren't randomly spread over the entire address space. Often the defect is only in a few bits or even in just one bit, and then it isn't necessarily something simple, like a stuck bit (always 0 or 1). I once owned a DIMM with just one defective bit which failed just one of Memtest's patterns, and then only about 50% of the time. That DIMM caused file corruption similar to that described in the story. The machine was rock solid otherwise. Apparently the OS never used that part of the physical address space for vital OS structures.
As a nifty little trick, if you know the exact memory address of that bit, you can use the Linux kernel "badram" boot parameter to exclude that location. :)

how is this a file system problem? by stenvar · 2014-06-14 03:39 · Score: 1

This sounds like actual disk errors. File systems can't do much about them, you really need something like a RAID.

Re:how is this a file system problem? by gweihir · 2014-06-14 04:35 · Score: 1

The OP did it wrong due to stupidity or laziness and now he is blaming others like an immature, petulant child would do.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:how is this a file system problem? by kthreadd · 2014-06-14 04:47 · Score: 2

The file system can do quite a bit if it actually does consistency checks on the data when reading it. ZFS does this and will alert you if the contents of a file has changed after it was last written, allowing you to restore a good copy from backup and verify that it is still valid.

Re:It's the contents of the files... by wagnerrp · 2014-06-14 03:49 · Score: 1

Sufficiently advanced RAID implementations will carry checksums of those blocks for exactly that purpose.

It's all about ERROR rates by bussdriver · 2014-06-14 03:53 · Score: 2

RAM may have a low error rate much better than HDDs or SDs. That does not mean that you won't have errors even if you have a good brand and treat it well. Bit-level errors can and do happen all the time without us knowing; other times it happens in the wrong place and we notice (but think it is something else) it isn't until it gets really bad that we notice.

Example, say your RAM has a 1% bit loss rate (ignore that is insanely high) well if 90% of your data is not touchy code but data, the odds are that you may not notice 1 bit getting flipped that often. Then you have the fact that RAM could maintain that error rate over decades of smaller faster RAM but now you are storing MORE data and cycling it MORE than was possible on the older computers. So, if you had 1 bit error every gigabyte of throughput on a slow 1Mhz computer with 1MB of RAM it would take a long time for that 1% bit flip to happen (and if you noticed you'd still not likely blame the RAM) -- but today pumping though in seconds what that old machine would take a year; the error would occur quite often. SAME problem with storage but with an additional problem in that they still have the same lifespan requirements - RAM can be refreshed can checked.

Something else to be considered, the error correction schemes being used today are being pushed by the demand for higher density storage. Your HD isn't doing huffman or any of those old simple bit recovery schemes they've moved beyond that long ago to the next gen stuff from what your 56k modem was doing to fight phone line noise. They could make it better... but you would be giving up significant storage space. Perhaps somebody with a good marketing scheme and enough upset consumers could get you to pay MORE for less storage space... I know I would buy into it.

Essentially, we are at a point where HDDs expect you to scrub them for errors every year to avoid the bit rot... which is what I now do... haven't detected an error in years... however, the block level checksums the HDD uses has false positive error rate (just like CRC16 does) and the odds of a false positive may be poor--- again, we are working in the trillions now-- up near it's limitations (I'm assuming whatever they use now scaled... but it may not have which is why more people are talking about these issues. We know it's unlikely industry has adapted to the trends evenly over the decades... it's likely become a minior problem before they are forced to change devices to a newer proprietary checksum and error correction scheme. )

Do serious work? use ECC RAM. I'm still waiting for some low power AM1 motherboard that supports ECC so I can build a ZFS server... the AM1 chip supports ECC but no motherboards do.

--
Democracy Now! - uncensored, anti-establishment news

Re:It's all about ERROR rates by trparky · 2014-06-14 04:14 · Score: 1

I have noticed that a lot of OEMs (Dell, HP, Apple, etc.) use a no-name brand of RAM in many of their systems that they build. If you look at them, especially the CAS latency stats, you'll notice that many of the RAM chips found in most pre-made computers are absolutely pitiful (to say the least).

So with that being said, who knows if this no-name RAM that is installed in many pre-made computers that many people buy is of any real quality. I'm guessing... no. So, with that said perhaps that odds of bitrot happening on pre-made machines is going to be higher than that of systems that have better quality of system RAM installed in them.
Re:It's all about ERROR rates by gweihir · 2014-06-14 04:34 · Score: 1

ECC is not what you need for reliable data archiving. What you need is independent checksums and you need to actually compare them to the data on disk. If you store an MD5 or SHA1 hash with all files, corruption from RAM, buses and the like will not go undetected. The way things go today though, most people do not even verify a backup. No surprise they lose data, incompetence and laziness comes at a price. Of course, you should make sure your RAM runs stable, but I have not had a single ECC corrected bit in running with ECC for several years on several machines including two servers, so I decided to drop it.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:It's all about ERROR rates by laird · 2014-06-14 12:23 · Score: 1

I used to work in supercomputing, and with terabytes of RAM and Petabytes of data I/O, you *bed* everything had ECC and parity bits every step of the way (yeah, extra parity bits in the RAM). And cosmic rays really do flip bits in RAM from time to time, and when you're running a $10M machine running a 2 month computation, you really do care about being able to detect the error, restore the machine's state to the previous snapshot, and keep running.
It's amusing to me that consumers now have enough data that this stuff starts affecting them. It'll be interesting to see if consumers start paying extra for the reliability.

--
Enable 3D printed prosthetics!
Re:It's all about ERROR rates by bussdriver · 2014-06-15 04:29 · Score: 1

Note that I did mention vendors expect us to scrub our storage data to correct errors to catch and repair losses... I don't remember being told... but then I've not read any paperwork that came with a HDD in a long time.
I talked about ECC RAM because it is a similar problem. We are raising our demands on the tech so the reliability level (includes associated techniques) has to increase to meet our higher demands. The fact we are noticing this more to me indicates either that we have more discussion of the problem or that the reliability has not been scaling at the pace of our increased demand on the technology.
I do remember experiencing LESS bit rot in my storage in the past; I had less data in the past... but I also accessed that smaller data set more.
When I had 5.25" floppy disks I had errors happen and I noticed them... likely all of them. The impact was huge when data was low and code was high. Plus, I didn't have much storage to keep track of so I was able to spot it. Today, I have more data storage than I have time to review it.
The natural entropy of magnetic storage does far more harm to modern data densities than the old floppies. One would expect more data is required to correct errors at the same integrity levels because of the densities involved today and the nature of the physics involved is different (more chaotic) than in the past... but we expect to use a % of the newly discovered storage so integrity would go down.

--
Democracy Now! - uncensored, anti-establishment news
Re:It's all about ERROR rates by gweihir · 2014-06-15 10:03 · Score: 1

And these machines have ECC on all buses, controllers, etc. Then it makes sense. Consumer-grade ECC is only on the memory itself. That is not enough and you need to do data-verification.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:It's the contents of the files... by the_B0fh · 2014-06-14 04:28 · Score: 1

ZFS raid does that.

Those that are incompetent will lose their data by gweihir · 2014-06-14 04:30 · Score: 1

There are only two options for reliable data archiving: 1. Spinning disks with redundancy and regular checks 2. Archival grade tape. There used to be MOD as well, but as nobody cared enough to buy it, development stalled and then died. The OP simply was naive and stupid and did not bother to find out how to archive data properly. It is well-known how to do it and has been for a long time. I have not lost a single bit that I care about. Of course, I have a 3-way RAID1 with regular SMART and RAID consistency checks. I have off-site backups that are made with full or at least crypto-hash comparison to the original. I have lost plenty of bits that were not on RAID and I have to replace a disk in that RAID1 about every 1-2 years because of read errors, but none of that is surprising.

In short: The OP is lamenting his own stupidity and he is not even aware of it. Dunning-Kruger effect at work.

And BTW, before I forget: SSDs have worse properties for archiving that spinning disks. As people are generally stupid, I expect the "problem" of bit-rot will get worse. At least as long as people are too lazy to find out how to do things properly or are unwilling to spend the money that doing things right takes.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:Those that are incompetent will lose their data by WheezyJoe · 2014-06-14 05:39 · Score: 1

There are only two options for reliable data archiving: 1. Spinning disks with redundancy and regular checks 2. Archival grade tape. There used to be MOD as well, but as nobody cared enough to buy it, development stalled and then died.
Any experience with M-discs as archival media? Newer cd and dvd burners are compatible with them, but do they deliver?

--
Take it easy, Charlie, I've got an Angle...
Re:Those that are incompetent will lose their data by gweihir · 2014-06-14 08:04 · Score: 1

No. Consumer-grate trash. The absence of a cartridge is already enough to see that. Also, anybody claiming "1000 years data lifetime" must be lying, as the best accelerated aging models can give you 60-80 years predictability but not more.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:What I hear from this guy... by gweihir · 2014-06-14 04:38 · Score: 1

Yes, he did it to himself by making assumptions he liked and zero verification whether they hold up in the real world. Now he blames others for his stupidity.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:It's the contents of the files... by gweihir · 2014-06-14 04:44 · Score: 1

No, there are not. There are data-archival systems that can do this though. This is not a filesystem-layer problem at all.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:It's the contents of the files... by gweihir · 2014-06-14 04:48 · Score: 1

That does not happen unless you fail to verify the data when placing it on that raid. For bit-rot detection on the disks, the disk-internal is more than enough. WHile the manufacturers state "1 uncorrectable sector in 10^15" read, it is more like "1 in 10^30" undetected faulty sector. And of course, any sane RAID setup includes a full disk data consistency check every 14 days or so. If you place defective data on the RAID, the RAID can do nothing for you.

I would also really recommend to read up on RAID and disk technology, you do not seem to understand how things work.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:It's the contents of the files... by gweihir · 2014-06-14 04:50 · Score: 1

Bullshit. That is not a RAID-layer task. And the disks do that themselves just fine. Historically, there were actually RAID implementations that did what you describe, but they were scrapped due to various problems. Doing this in RAID is the wrong approach.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Not if you're the IRS... by Nova+Express · 2014-06-14 05:18 · Score: 1

...and you want to prevent those nosy congressmen pawing through your emails looking for felonies...

--
Lawrence Person (lawrencepersonh@gmailh.com (remove all "h"s to mail)

http://www.lawrenceperson.com/

Re: Incompetent -- Learning Archival Strategies by BoRegardless · 2014-06-14 05:58 · Score: 1

What is the best overview doc/book out there for covering backup-archiving options?

I want to be more conversant with the subject before starting work with a FileMaker Pro DB consultant.

I will be doing a mission critical but small database, so data storage size won't be an issue as far as existing 1-4TB HDDs go & RAID arrays. Losing a day's or even an hour's data entry is not an option.

Re:It's the contents of the files... by Guspaz · 2014-06-14 06:35 · Score: 1

Yes, there are. There are filesystems that do per-block checksums. If data corruption occurs, it knows about it as soon as it tries to read the block. If it has no redundancy, ZFS will tell you which file is corrupt and suggest restoring it from backup.

Re:It's the contents of the files... by Guspaz · 2014-06-14 06:36 · Score: 1

Nope, wagnerrp is correct. raidz does exactly what he describes, and your claim that raidz was "scrapped due to various problems" is incorrect.

Re:HFS reliability by Etcetera · 2014-06-14 07:07 · Score: 1

Anyone who owned a Mac since the 80s remembers having to use Norton Disk Doctor and later DiskWarrior at least once per month to repair the filesystem. Entire folders could go randomly missing each time you booted up your Mac, and if you accidentally lost power to your hard drive, the use of one of those was mandatory.

I think you're confusing generic Disk Repair with rebuilding the Desktop File...

Unless your drives were seriously damaged (floppies thrown in a backpack were always a bad idea no matter where you were), missing icons and whatnot were at the disk catalog level (used by Finder), not the HFS level. Command-Option on disk insert would fix it for me.

In the event of a power outage or something similar, it was always advisable to run Disk First Aid (and later versions System 7.5+ or Mac OS 8.1 maybe?) would run it automatically for you in the event of an unsafe shutdown, but that's just morally equivalent to running an fsck.

--
Hire a Linux system administrator, systems engineer,

Re: Incompetent -- Learning Archival Strategies by FaxeTheCat · 2014-06-14 07:37 · Score: 2

Losing a day's or even an hour's data entry is not an option.

If you have that kind of requirements (less than an hour lost data), then you are not looking for just backup/archive. You are looking for a fully redundant storage system.
In addition to the backup system, of course.

For reading, check up on backupentral.com, Symantec.com (Backup Exec/Netbackup) emc.com (Avamar, networker).
I once managed a Filemaker database server (v5), and it has a built in featuer to copy the database files for backup. Real simple. Cannot remember if the database had to be taken offline, as we had users only during normal working hours, but these days that should NOT be a requirement.

Re: Incompetent -- Learning Archival Strategies by gweihir · 2014-06-14 08:06 · Score: 1

Simple: It is a "Datasheet" covering an "archival grade medium". If you do not know that, you have absolutely no business working on any kind of "mission critical" storage, as you are simply incompetent with regard to that subject.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:It's the contents of the files... by gweihir · 2014-06-14 08:07 · Score: 1

You have no clue what you are talking about. Simple per-block checksums on the HDDs are already doing that. This is not a filesystem issue. This is also not a subject topic where clueless idiots like you can contribute anything.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re: Incompetent -- Learning Archival Strategies by WheezyJoe · 2014-06-14 08:39 · Score: 2

Simple: It is a "Datasheet" covering an "archival grade medium". If you do not know that, you have absolutely no business working on any kind of "mission critical" storage, as you are simply incompetent with regard to that subject.

Easy, there, big fella. Posting a link to a datasheet would have sufficed. Ain't right to call a man incompetent for asking a question. Truly, an incompetent is one who don't never ask the question assuming he already knows. Credit is due for seeking to learn something.

--
Take it easy, Charlie, I've got an Angle...

No, it doesn't. by sirwired · 2014-06-14 11:57 · Score: 1

GNU stands for GNU's Not UNIX.

Linux is just "Linus with an 'x' at the end to make the name look UNIX-y"

Re:HFS reliability by laird · 2014-06-14 12:26 · Score: 1

+1 this. The only time I've ever had Mac filesystem problems was when there were unexpected power loss. When you lose power while writing to the drive, bad things can happen. But I've not seen even that since Mac OS 7 or so. :-)

--
Enable 3D printed prosthetics!

POSIX and LINUX by jbolden · 2014-06-14 12:33 · Score: 1

Well IMHO I'd say more important for The Open Group (POSIX) to figure out what role they should play in a world where we don't have a variety of mostly coequal Unixes. A rather a highly fragmented family of Unixes in Linux including Android, a very popular desktop Unix that violates most of the Unix norms in spirit in OSX, and the only remaining big box Unix (AIX) is more aimed at bring over cool features from mainframe. None of them really care about running each other's software. So really the question is what role should the The Open Group play in such a world?

Single Disk Parity by randallman · 2014-06-14 14:04 · Score: 1

Does any file systems support single disc parity?

Set a parity ratio depending on risk vs. space loss tolerance. Say it is 1000. You can lose any of 1000 bytes in a parity group and recover while only giving up .1% of your disk space to parity.

Re: Incompetent -- Learning Archival Strategies by BoRegardless · 2014-06-14 15:38 · Score: 1

Just for clarity, I'm not going to run the backup/archival system as an FMPro consultant will do that.

I need to get some more background, so I have knowledge of where the tradeoffs are. I know this is done all the time, but I'm sure there are still choices to be made.

Re:HFS reliability by jones_supa · 2014-06-14 16:21 · Score: 1

So, what was your fucking point?

He was just thinking back the ole times.

Re:It's the contents of the files... by Marsell · 2014-06-14 16:51 · Score: 1

The irony of your calling someone else clueless...

Drives do indeed have checksums on their blocks. That does not prevent them from sometimes feeding you back garbage anyway -- see misdirected and phantom reads and writes. Since ZFS uses a self-validating merkle tree, whereas disk checksums live in the same block as the data, ZFS is largely immune to this problem.

If you've worked with disks any length of time, as in actually trying to write a robust filesystem, you'd know that disks sometimes lie. They usually work but every now and then they do the most ridiculous things, due to mechanical, electrical or firmware problems. That's why filesystems like ZFS were created (what, you thought Sun spent man-decades of expert time on it for giggles?). kthreadd is correct.

Please just stay away from storage. The topic is much more complicated than you make it out to be.

Re:It's the contents of the files... by Guspaz · 2014-06-14 17:31 · Score: 1

Uh huh. As somebody who uses raidz, and has a decent high-level idea of how it works, I'm going to say you're full of shit.

Re:It's the contents of the files... by gweihir · 2014-06-14 19:08 · Score: 1

Still pathetic. And no, you have absolutely no clue. Do you even know what an ECC is and how low the probability of it not detecting an error is for HDDs? And while you are looking that up, look up the Dunning-Kruger effect as well.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:It's the contents of the files... by Guspaz · 2014-06-14 20:53 · Score: 1

I don't know why all your posts are so focused on the drive's internal checksums not detecting errors. As you say, that's very rare. A far more common occurrence, and one that I've seen many times, is a drive detecting corrupt data and being unable to correct it. At that point, it's up to the filesystem to use whatever redundancy you've provided (be it duplication or parity) to recover the lost data. The error correction on a drive can't do squat if a block is sufficiently corrupt.

You act as if the drive missing corruption is the problem. It's not.

Re:HFS reliability by Smurf · 2014-06-15 03:34 · Score: 1

All the Macs I've owned have always been my main personal computer, and the first couple were my only computer at the time. I did everything on them: schoolwork, gaming, stuff for my dad's office and for others, etc. Looking back, I believe I spent way more time with them than I should have.

Did I experience system crashes with the dreaded bomb box? Yes, plenty of them. Did I experience sad Macs? Yes, occasionally. (I believe it was supposed to appear on hardware failure, but after restarting the computers continued to hum along for years). I never owned (nor pirated) a copy of Norton Disk Doctor, although I did see it running on other people's computers.

It's not my fault that my experience differs from yours.

ok you asked for it by bussdriver · 2014-06-15 04:04 · Score: 1

I dug up the study.
"End-to-end Data Integrity for File Systems: A ZFS Case Study"
Zhang, Rajimwale, Arpaci-Dusseau

Cosmic rays do happen; odds go up as elevation increases. I would guess location also matters.
other looking provided this gem:
Google reports that more than 8% of every DIMM gets error, each year. Google found that the error rates were several magnitudes larger than small scale studies showed.

--
Democracy Now! - uncensored, anti-establishment news

Re:ok you asked for it by gweihir · 2014-06-15 10:03 · Score: 1

I am not disputing that. But RAM is not the only significant source of bit-errors.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Clueless article by Anonymous Coward · 2014-06-15 09:36 · Score: 1

Just some minor corrections in your text.

ZFS always detects bit rot. ZFS can always repair bit rot if it is configured to do so; if some redundancy is used. For instance, raid or mirror. BUT! You can also configure ZFS to double (triple) every data on a single disk, so ZFS can repair data using only a single disk. BTRFS can also do this, and it does by partitioning the disk into two partitions, building a mirror on a single disk. This is extremely cumbersome, ZFS does not do that. You dont need to repartition or anything with ZFS, just specify "copies=2". Done.

BTW, there is no research if BTRFS is safe, on the the other hand, there are several research projects showing that ZFS is safe, read the research papers on the ZFS wikipedia article.

I read some guy speculating in a storage solution that should repair the corrupted data block from a given checksum, by trying different valid data blocks fulfilling the checksum. He googled this and it turned that someone already tried that. Guess which solution? Yep, ZFS. But that solution was omitted because it took to much time. Pretty cool anyway.

Re:It's the contents of the files... by gweihir · 2014-06-15 09:52 · Score: 1

My claim is exactly the other way round. The claim by others was that extra error detection on RAID layer was needed. It is not.

"Sufficiently advanced RAID implementations will carry checksums of those blocks for exactly that purpose." is wrong. That is all I am saying. RAID does not carry block checksums because they are not needed. RAID may carry redundancy in several different forms, but redundancy (even ECC) is not "checksums".

What people here seem to completely miss is that filesystem-level data checksums are not there to detect corruption on the disk. The disk does that just fine. They are there to detect data corruption due to corruption in the path from main memory to the disk and back from it.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re: Incompetent -- Learning Archival Strategies by gweihir · 2014-06-15 09:55 · Score: 1

"Incompetent" is a state, not an insult. And no, I cannot post any datasheets as I do not know what kind of equipment will be used.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re: Incompetent -- Learning Archival Strategies by gweihir · 2014-06-15 09:56 · Score: 1

You need to find out the details of the backup/archival system being used. There is no way around that. It cannot be modeled as an opaque component, you need to understand the whole stack.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re: Incompetent -- Learning Archival Strategies by gweihir · 2014-06-15 09:57 · Score: 1

And it is AC's job to spread lies. So take all he says with a grain of salt.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re: How about the "next gen file system" ? by Anonymous Coward · 2014-06-15 10:08 · Score: 1

ZFS definitely can detect bit rot! It's designed to do so.

Re:It's the contents of the files... by cmurf · 2014-06-15 15:20 · Score: 1

OK so you're saying that manufacturer's are wrong by f'n 15 orders of magnitude when they say "less than 1 uncorrectable error for every 10^14 bits read". That's such an immense amount of error that you're basically accusing them of being incompetent, possibly even of fraud. Next you're also proposing that consumer hard drives have a bit error rate 13 orders of magnitude less than that claimed by LTO tape manufacturers. You're like the drunk guy running into walls, tripping over himself, shouting and pissing himself, while bitching about everyone else have craptastic balance and smelling like alcohol and urine and talking way too loudly.

I thought bitrot.. by doccus · 2014-06-15 16:39 · Score: 1

was what happened to Microsoft coder's socks.. I have that on good authority from someone at Apple..

Re:HFS reliability by doccus · 2014-06-15 16:42 · Score: 1

Stuff disappears on me all the time ;-)

Differential backup by phorm · 2014-06-16 09:34 · Score: 1

That should be OK for differential backups (depending on how often you make a "full"). If a file changes by 1 bit, that'll affect a byte. A differential will be written, but it hardly takes any space and you can still roll back.

Make a good reason to *check* backups every now and then though.

Slashdot Mirror

One Developer's Experience With Real Life Bitrot Under HFS+

265 of 396 comments (clear)